DAOS Version 2.4 Support¶
Community Support and Commercial Support¶
Community support for DAOS is available through the DAOS mailing list and the DAOS Slack channel. The DAOS community JIRA tickets can be searched for known issues and possible solutions. Community support is provided on a best effort basis without any guaranteed SLAs.
The Intel DAOS engineering team can also be contracted to provide Commercial Level-3 Support for DAOS. Under such a support agreement, Intel partners that offer DAOS Commercial Support to their end customers will provide the DAOS Level-1 and Level-2 support. They can then escalate Level-2 support tickets to the Intel Level-3 support team through a dedicated JIRA path with well-defined SLAs. Please refer to the intel.com landing page for DAOS for information on the DAOS partner ecosystem.
This document describes the supported environments for Intel Level-3 support at the DAOS Version 2.4 level. Information for future releases is indicative only and may change. Partner support offerings may impose further constraints, for example if they include DAOS support as part of a more general cluster support offering with its own release cycle.
Some members of the DAOS community have reported successful compilation and basic testing of DAOS in other environments (for example on ARM64 platforms, or on other Linux distributions). Those activities are highly appreciated community contributions. However such environments are not currently supported by Intel in a production environment.
Hardware platforms supported for DAOS Servers¶
DAOS Version 2.4 supports the x86_64 architecture.
DAOS servers require byte-addressable Storage Class Memory (SCM) for the DAOS metadata, and there are two different ways to implement SCM in a DAOS server: Using Persistent Memory, or using DRAM combined with logging to NVMe SSDs.
DAOS Servers with Persistent Memory¶
All DAOS versions support Intel Optane Persistent Memory (PMem) as its SCM layer. DAOS Version 2.4 has been validated with Intel Optane Persistent Memory 100 Series on 2nd gen Intel Xeon Scalable processors, and with Intel Optane Persistent Memory 200 Series on 3rd gen Intel Xeon Scalable processors.
For maximum performance, it is strongly recommended that all memory channels of a DAOS server are populated with one DRAM module and one Optane PMem module. All Optane PMem modules in a DAOS server must have the same capacity.
Note that the Intel Optane Persistent Memory 300 Series for 4th gen Intel Xeon Scalable processors has been cancelled, and is not supported by DAOS.
PMDK is used as the programming interface when using Optane Persistent Memory.
DAOS Servers without Persistent Memory¶
To support DAOS servers without Optane Persistent Memory, DAOS Version 2.4 includes a Technology Preview of the Metadata-on-SSD feature. This code path uses DRAM memory to hold the DAOS metadata, and persists the DAOS metadata on NVMe SSDs through a write-ahead log (WAL) and asynchronous metadata checkpointing.
More details on the Metadata-on-SSD functionality can be found in the article DAOS beyond Persistent Memory in the ISC High Performance 2023 International Workshops proceedings and in the DAOS Administration Guide.
For maximum performance, it is strongly recommended that all memory channels of a DAOS server are populated.
NVMe Storage in DAOS Servers¶
While not strictly required, DAOS servers typically include NVMe disks for bulk storage, which must be supported by SPDK. (NVMe storage can be emulated by files on non-NVMe storage for development and testing purposes, but this is not supported in a production environment.) All NVMe disks managed by a single DAOS engine must have identical capacity, and it is strongly recommended to use identical drive models. It is also strongly recommended that all DAOS engines in a DAOS system have identical NVMe storage configurations. The number of targets per DAOS engine must be identical for all DAOS engines.
DAOS Version 2.4 supports Intel Volume Management Devices (VMD) to manage the NVMe disks on the DAOS servers. Enabling VMD is platform-dependent; details are provided in the Administration Guide.
Each DAOS engine needs one high-speed network port for communication in the DAOS data plane. DAOS Version 2.4 does not support more than one high-speed network port per DAOS engine. (It is possible that two DAOS engines on a 2-socket server share a single high-speed network port for development and testing purposes, but this is not supported in a production environment.) It is strongly recommended that all DAOS engines in a DAOS system use the same model of high-speed fabric adapter. Heterogeneous adapter population across DAOS engines has not been tested, and running with such configurations may cause unexpected behavior. Please refer to "Fabric Support" below for more details.
Hardware platforms supported for DAOS Clients¶
DAOS Version 2.4 supports the x86_64 architecture.
DAOS clients have no specific hardware dependencies.
Each DAOS client needs a network port on the same high-speed interconnect that the DAOS servers are connected to. Multiple high-speed network ports per DAOS client are supported. Note that a single task on a DAOS client will always use a single network port, but when multiple tasks per client node are used then the DAOS agent will distribute the load by allocating different network ports to different tasks.
Operating Systems supported for DAOS Servers¶
The DAOS software stack is built and supported on Linux for the x86_64 architecture.
DAOS Version 2.4 has been primarily validated on Rocky Linux 8.6 and openSUSE Leap 15.4. The following subsections provide details on the Linux distributions which DAOS Version 2.4 supports on DAOS servers.
Note that all DAOS servers in a DAOS server cluster (also called DAOS system) must run the same Linux distribution. DAOS clients that access a DAOS server cluster can run the same or different Linux distributions.
SUSE Linux Enterprise Server 15 and openSUSE Leap 15¶
DAOS Version 2.4 is supported on SLES 15 SP4 and openSUSE Leap 15.4.
General support for SLES 15 SP3 has ended on 31-Dec-2022. DAOS nodes running SLES 15 SP3 or openSUSE 15.3 have to be updated to 15.4 before updating DAOS to version 2.4.
Links to SLES 15 Release Notes:
Links to openSUSE Leap 15 Release Notes:
Refer to the SLES Life Cycle description on the SUSE support website for information on SLES support phases.
Enterprise Linux 8 (EL8): RHEL 8, Rocky Linux 8, AlmaLinux 8¶
DAOS Version 2.4.0 is supported on EL 8.6 with Extended Update Support (EUS). Support for the EL 8.7 release has ended, and DAOS Version 2.4 is not supported on EL 8.7. Validation of DAOS Version 2.4 on EL 8.8 is in progress.
Most validation of DAOS Version 2.4 has been done on the Rocky Linux 8.6 release.
CentOS Linux 8 is not supported by DAOS Version 2.4. Please install a supported EL8 operating system before deploying DAOS Version 2.4.
Links to RHEL 8 Release Notes:
Links to Rocky Linux 8 Release Notes:
Links to AlmaLinux 8 Release Notes:
Refer to the RHEL Life Cycle description on the Red Hat support website for information on RHEL support phases.
Enterprise Linux 9 (EL9): RHEL 9, Rocky Linux 9, AlmaLinux 9¶
DAOS Version 2.4.0 has not been validated and is not supported on EL9. Support for EL 9.2 (or later) will be added in DAOS Version 2.6.
Links to RHEL 9 Release Notes:
Links to Rocky Linux Release Notes:
Links to AlmaLinux Release Notes:
Unsupported Linux Distributions¶
With DAOS Version 2.4, CentOS 7 and RHEL 7 are no longer supported. Please update your DAOS servers to a supported EL8 level before updating to DAOS 2.4.
DAOS also does not support openSUSE Tumbleweed, Fedora, CentOS Linux, CentOS Stream, Ubuntu, or Oracle Linux.
Operating Systems supported for DAOS Clients¶
The DAOS software stack is built and supported on Linux for the x86_64 architecture.
In DAOS Version 2.4, the supported Linux distributions and versions for DAOS clients are identical to those for DAOS servers. Please refer to the previous section for details.
In future DAOS releases, DAOS client support may be added for additional Linux distributions and/or versions.
DAOS Version 2.4 supports both OFI libfabric. and UCF UCX for communication in the DAOS data plane. This section describes the supported network providers and contains references to vendor-specific information for the supported networking hardware.
With the exception of UCX for InfiniBand networks, OFI libfabric is the recommended networking stack for DAOS. DAOS Version 2.4 ships with version 1.18.1 of libfabric (but see below for DAOS on HPE Slingshot). It is strongly recommended to use exactly the provided libfabric version on all DAOS servers and all DAOS clients.
Links to libfabric releases on github (the RPM distribution of DAOS includes libfabric RPM packages with the correct version):
Not all libfabric core providers listed in fi_provider(7) are supported by DAOS. The following providers are supported:
ofi+tcpprovider is supported on all networking hardware. It does not use RDMA, so on an RDMA-capable network this provider typically does not achieve the maximum performance of the fabric.
ofi+verbsprovider is supported for RDMA communication over InfiniBand fabrics. Note that as an alternative to libfabric, the UCX networking stack can be used on InfiniBand fabrics as described in the next subsection.
ofi+cxiprovider is supported for RDMA communication over Slingshot.
Starting with libfabric 1.18.0, libfabric has support for TCP without
To support this, DAOS 2.4 no longer automatically adds
rxm to the
provider string. To use
rxm with DAOS 2.4, it has to be explicitly added
ofi+psm2 provider for Omni-Path fabrics has known issues
when used with DAOS, and it has been removed from DAOS Version 2.4.
ofi+psm3 provider for Ethernet fabrics has not been validated with
and is not supported by DAOS Version 2.4.
UCF Unified Communication X (UCX)¶
For InfiniBand fabrics, DAOS 2.4 also supports UCX, which is maintained by the Unified Communication Framework (UCF) consortium.
DAOS Version 2.4 has been validated primarily with UCX Version 1.14.0-1, which is included in the MLNX_OFED 5.8 levels listed in the next section. UCX Version 1.15.0-1 (included in MLNX_OFED 5.9 and 23.04 and 23.07) has also been validated with DAOS 2.4.0.
ucx+dc_xprovider has been validated and is supported with DAOS Version 2.4. It is the recommended fabric provider on InfiniBand fabrics.
ucx+ud_xproviders can be used for evaluation and testing purposes, but they have not been fully validated with DAOS Version 2.4 and are not supported for use in production environments.
NVIDIA/Mellanox OFED (MLNX_OFED)¶
DAOS Version 2.4 has been primarily validated with MLNX_OFED Version 5.8 (LTS), and Versions older than 5.8-1 are not supported by DAOS 2.4.
Validation of MLNX_OFED 5.9 and 23.04 is in progress.
Links to MLNX_OFED Release Notes:
- MLNX_OFED 5.8-126.96.36.199 (October 31, 2022)
- MLNX_OFED 5.8-188.8.131.52 (December 1, 2022)
- MLNX_OFED 5.8-184.108.40.206 (February 28, 2023)
- MLNX_OFED 5.8-220.127.116.11 (July 09, 2023)
- MLNX_OFED 5.9-0.5.6.0 (February 2, 2023)
- MLNX_OFED 23.04-0.5.3.3 (May 8, 2023)
- MLNX_OFED 23.04-18.104.22.168 (June 1, 2023)
- MLNX_OFED 23.07-0.5.0.0 (August 10, 2023)
It is strongly recommended that all DAOS servers and all DAOS clients run the same version of MLNX_OFED, and that the InfiniBand adapters are updated to the firmware levels that are included in that MLNX_OFED distribution. It is also strongly recommended that the same model of InfiniBand fabric adapter is used in all DAOS servers. DAOS Version 2.4 has not been tested with heterogeneous InfiniBand adapter configurations. The only exception to this recommendation is the mix of single-port and dual-port adapters of the same generation, where only one of the ports of the dual-port adapter(s) is used by DAOS.
Customers using an HPE Slingshot fabric should contact their HPE representatives for information on the recommended HPE software stack to use with DAOS Version 2.4 and the libfabric CXI provider.
DAOS is a scale-out storage solution that is designed for extreme scale. This section summarizes the DAOS scaling targets, some DAOS architectural limits, and the current testing limits of DAOS Version 2.4.
Note: Scaling characteristics depend on the properties of the high-performance
interconnect, and the libfaric provider that is used. The DAOS scaling targets
below assume a non-blocking, RDMA-capable fabric. Most scaling tests so far
have been performed on InfiniBand fabrics with the libfabric
DAOS scaling targets (these are order of magnitude figures that indicate what the DAOS architecture should support - see below for the scales at which DAOS 2.4 has been validated):
- DAOS client nodes in a DAOS system: 105 (hundreds of thousands)
- DAOS servers in a DAOS system: 103 (thousands)
- DAOS engines per DAOS server: 100 (less than ten)
- DAOS engines per CPU socket: 100 (1, 2 or 4)
- DAOS targets per DAOS engine: 101 (tens)
- SCM storage devices per DAOS engine: 101 (tens)
- NVMe storage devices per DAOS engine: 101 (tens)
- DAOS pools in a DAOS system: 102 (hundreds)
- DAOS containers in a DAOS pool: 102 (hundreds)
- DAOS objects in a DAOS container: 1010 (tens of billions)
- Application tasks accessing a DAOS container: 106 (millions)
Note that DAOS has an architectural limit of 216=65536 storage targets in a DAOS system, because the number of storage targets is encoded in 16 of the 32 "DAOS internal bits" within the 128-bit DAOS Object ID.
DAOS Version 2.4 has been validated at the following scales:
- DAOS client nodes in a DAOS system: 256
- DAOS servers in a DAOS system: 256
- DAOS engines per DAOS server: 1, 2 and 4
- DAOS engines per CPU socket: 1 and 2
- DAOS targets per DAOS engine: 4-32
- SCM storage devices per DAOS engine: 6 (Optane PMem 100), 8 (Optane PMem 200)
- NVMe storage devices per DAOS engine: 0 (PMem-only pools), 4-12
- DAOS pools in a DAOS system: 100
- DAOS containers in a DAOS pool: 100
- DAOS objects in a DAOS container: 6 billion (in mdtest benchmarks)
- Application tasks accessing a DAOS container: 3072 (using verbs)
This test coverage will be expanded in subsequent DAOS releases.