DAOS Version 2.4 Release Notes¶
We are pleased to announce the release of DAOS version 2.4.
DAOS Version 2.4.2 (2024-03-15)¶
Updates in this Release¶
The DAOS 2.4.2 release is mainly a bug fix release on top of DAOS 2.4.1.
Note that due to changes in the EL8 EPEL repository, the isa-l-2.30.0-2,
libisa-l-2.30.0-2, and libisa-l-devel-2.30.0-2 RPMs have been removed
from the DAOS packages repository.
Bug fixes¶
The DAOS 2.4.2 release includes fixes for several defects. For details, please refer to the Github release/2.4 commit history and the associated Jira tickets as stated in the commit messages.
DAOS Version 2.4.1 (2024-01-19)¶
Updates in this Release¶
The DAOS 2.4.1 release contains the following updates on top of DAOS 2.4.0:
-
Operating System support for SLES 15.5 and Leap 15.5.
-
Operating System support for EL8.8 (RHEL, Rocky Linux, Alma Linux).
-
MLNX_OFED Version 23.04 has been validated on InfiniBand fabrics.
-
The UCX provider support on InfiniBand fabrics has been expanded to include
ucx+ud_x, which is now the recommended provider for large InfiniBand fabrics. -
The following prerequisite software packages that are included in the DAOS RPM builds have been updated with DAOS 2.4.1:
- Argobots has been updated to 1.1-3
- DPDK has been updated to 21.11.2-2
- Libfabric has been updated to 1.19.0-1
- Mercury has been updated to 2.3.1-2
- Raft has been updated to 0.10.1-2
- SPDK has been update to 22.01.2-5
Bug fixes¶
The DAOS 2.4.1 release includes fixes for several defects. For details, please refer to the Github release/2.4 commit history and the associated Jira tickets as stated in the commit messages.
DAOS Version 2.4.0 (2023-09-22)¶
General Support¶
DAOS Version 2.4.0 supports the following environments:
Architecture Support:
- DAOS 2.4.0 supports the x86_64 architecture.
Operating System Support:
-
SLES 15.4 and Leap 15.4
-
EL8 (RHEL, Rocky Linux, Alma Linux):
- EL8.6 (EUS)
- Validation of EL8.8 is in progress.
Fabric and Network Provider Support:
-
libfabric support for the following fabrics and providers:
ofi+tcpon all fabrics (without RXM)ofi+tcp;ofi_rxmon all fabrics (with RXM)ofi+verbson InfiniBand fabrics and RoCE (with RXM)ofi+cxion Slingshot fabrics (with HPE-provided libfabric)
-
UCX support on InfiniBand fabrics:
ucx+dc_xon InfiniBand fabrics
Storage Class Memory Support:
-
DAOS Servers with 2nd gen Intel Xeon Scalable processors and Intel Optane Persistent Memory 100 Series.
-
DAOS Servers with 3rd gen Intel Xeon Scalable processors and Intel Optane Persistent Memory 200 Series.
-
DAOS Servers without Intel Optane Persistent Memory, using the Metadata-on-SSD (Phase1) code path (Technology Preview)
For a complete list of supported hardware and software, refer to the Support Matrix.
Key features and improvements¶
Software Version Currency¶
-
See above for supported operating system levels.
-
Libfabric and MLNX_OFED (including UCX) have been refreshed. Refer to the Support Matrix for details.
-
The
ipmctltool to manage Intel Optane Persistent Memory has been updated to Version 3 (provided by the OS distributions). -
The following prerequisite software packages that are included in the DAOS RPM builds have been updated:
- Argobots has been updated to 1.1-3
- DPDK has been updated to 21.11.2-2
- Libfabric has been updated to 1.18.1-1
- Mercury has been updated to 2.3.1~rc1-1
- Raft has been updated to 0.10.1-1.408
- SPDK has been update to 22.01.2-4
New Network Providers¶
-
UCX support on InfiniBand fabrics is now generally available (it was a Technology Preview in DAOS 2.2). Refer to UCX for details.
-
Slingshot fabrics are now supported with the
ofa+cxiprovider.
New Features and Usability Improvements¶
-
The
daos_server scm preparecommand now supports the creation of multiple SCM namespaces per CPU socket, using the--scm-ns-per-socketoption. On DAOS servers with Intel Optane Persistent Memory modules, this can be used to configure multiple DAOS engines per CPU socket (to support multiple HPC fabric links per CPU socket). -
DAOS Version 2.4 includes a Technology Preview of the Metadata-on-SSD (Phase1) code path to support DAOS servers without Intel Optane Persistent Memory.
-
DAOS Version 2.4 includes initial support for excluding, draining, and reintegrating DAOS engines to/from a pool, using the
dmg pool {exclude|drain|reintegrate}commands. Expanding a pool by adding additional DAOS engines to the pool is also supported, using thedmg pool extendcommand. Refer to Pool Modifications in the Administration Guide for more information. -
The default container redundancy level has been changed from engine to server (the
rf_lvlcontainer property now has a value ofnode (2)). For DAOS systems with multiple engines per server, this will reduce the number of available fault domains. So it may be possible that wide erasure codes no longer work. For testing purposes, it is possible to change the redundancy level back to engine. For production usage, the new default is highly recommended as it more appropriately reflects the actual fault domains. -
The Erasure Coding implementation now uses EC parity rotation. This significantly improves EC performance, in particular for parallel I/O into a single shared file.
-
In addition to the
libioil.sointerception library (which can be used to intercept POSIX data I/O calls but not metadata operations), DAOS Version 2.4 includes a Technology Preview of a new interception librarylibpil4dfs.sowhich can also intercept POSIX metadata calls. Refer to this section in the User Guide for more information onlibpil4dfs.so, including the current limitations of this Technology Preview. -
On DAOS servers with VMD enabled, the
dmg storage led identifycommand can now be used to visually identify one or more NVMe SSD(s). -
DAOS Version 2.4 supports Multi-user dfuse. This feature is particularly useful on shared nodes like login nodes: A single instance of the
dfuseprocess can be run (as root, or under a non-root service userid), and all users can access DAOS POSIX containers through that singledfuseinstance instead of starting multiple per-userdfuseinstances. -
Several dfuse enhancements have been implemented, including readdir caching, interception support for streaming I/O calls, and the ability to fine-fune the dfuse caching behavior through container properties and dfuse command parameters.
Other notable changes¶
To delete a pool that still has containers configured in it,
the dmg pool destroy command now needs the --recursive option.
In dmg pool create the -p $POOL_LABEL option is now obsolete.
Use $POOL_LABEL as a positional argument (without the -p).
The daos container create command no longer supports the
-l $CONT_LABEL option. Use the container label as a
positional argument instead (without -l).
Known Issues and limitations¶
-
DAOS-11317: Running the Mellanox-provided
mlnxofedinstallscript to install a new version of MLNX_OFED, while themercury-ucxRPM is already installed, will un-installmercury-ucx(as well as mercury-ucx-debuginfo if the debuginfo RPMs are installed). This leaves DAOS non-functional after the MOFED update. Workaround: Run{yum|dnf|zypper} install mercury-ucx [mercury-ucx-debuginfo]after the MLNX_OFED update and before starting DAOS again. -
No OPA/PSM2 support. For Omni-Path fabrics, please use the
ofi+tcpprovider. Please refer to the "Fabric Support" section of the Support Matrix for details. No workaround is available at this point. -
The
daos-client-testsanddaos-server-testsRPM packages havegolangprerequisites that are newer than the version provided in EL8. To install those RPMs on EL8 systems, it is necessary to rundnf module enable go-toolset:rhel8to satisfy the golang requirements. -
DAOS-13129: With the "Metadata-on-SSD" technology preview, sporadic checksum errors have been observed in 48 hours soak stress testing. This issue is still under investigation.
Bug fixes¶
The DAOS 2.4 release includes fixes for numerous defects. For details, please refer to the Github release/2.4 commit history and the associated Jira tickets as stated in the commit messages.
Additional resources¶
Visit the online documentation for more information. All DAOS project source code is maintained in the https://github.com/daos-stack/daos repository. Please visit this link for more information on the licenses.
Refer to the System Deployment section of the DAOS Administration Guide for installation details.