UCX Fabric Support (DAOS 2.2 Technology Preview)¶
DAOS 2.2 includes a technology preview of UCX support for clusters using InfiniBand, as an alternative to the default libfabric network stack.
Note
EL8 and Leap15 only. It is not supported on CentOS7.
The goal of this technology preview is to allow early evaluation and testing. DAOS over UCX has not been fully validated yet, and it is not recommended to use it in a production environment with DAOS 2.2. It is a roadmap item to fully support UCX in DAOS 2.4.
!!! note The network provider is an immutable property of a DAOS system. Changing the network provider to UCX requires that the DAOS storage is reformatted.
To enable DAOS UCX support on InfiniBand fabrics, the following steps are needed:
-
A supported version of MLNX_OFED must be installed before DAOS is installed. This is the same for libfabric and for UCX: DAOS only supports the NVIDIA-provided MLNX_OFED stack, not the inbox drivers. Refer to the DAOS Support Matrix for information about supported MLNX_OFED releases.
-
The
mercury-ucx
RPM package needs to be manually selected for installation: For the technology preview, themercury
package is provided in two different versions, which are mutually exclusive: -
The standard
mercury
RPM does support libfabric, but not UCX. This RPM will be installed by default, and must be used in non-InfiniBand environments. -
A new
mercury-ucx
RPM is also provided, which supports both libfabric and UCX. This RPM must be used in InfiniBand environments when the intention is to use UCX. It may also be used in InfiniBand environments if the intention is to use libfabric. Attempts to install this RPM in non-Infiniband environments will fail, because it has a dependency on UCX packages. -
At DAOS installation time, to enable UCX support the new
mercury-ucx
RPM package must be explicitly listed in order to prevent the installation of the defaultmercury
package (which does not include the UCX support). For example, using theyum
package manager on EL8:
# on DAOS_ADMIN nodes:
yum install mercury-ucx daos-admin
# on DAOS_SERVER nodes:
yum install mercury-ucx daos-server
# on DAOS_CLIENT nodes:
yum install mercury-ucx daos-client
- To change an existing DAOS installation from libfabric to
UCX, the default
mercury
RPM first needs to be un-installed, and themercury-ucx
RPM must be installed instead. To prevent the removal of DAOS altogether (it has a package dependency on mercury), therpm
command with the--nodeps
option should be used:
# on EL8:
rpm -e --nodeps mercury
yum install mercury-ucx
# on Leap15:
rpm -e --nodeps mercury
zypper install mercury-ucx
- To update from DAOS 2.0 (with libfabric) to DAOS 2.2 with
UCX, the recommended path is to first perform a standard DAOS
RPM update (which will update the default
mercury
package). After the update, themercury
RPM package can be replaced bymercury-ucx
as described above.
After UCX support has been enabled by installing the mercury-ucx
package, the network provider must be changed in the DAOS server's
configuration file (/etc/daos/daos_server.yml
).
A sample YML file is available on
github.
The recommended setting for UCX is provider: ucx+dc_x
.