VMD Support in DAOS¶
Intel VMD (Volume Management Device) is a feature introduced with the Intel Xeon Scalable processor family to help manage NVMe drives. It provides features such as surprise hot plug, LED management, error isolation and bootable RAID.
The Intel VMD functionality is provided as part of the Intel VROC (Virtual RAID on CPU) technology, and resides within the Intel Xeon CPUs. If RAID is not needed, then Intel VROC can be used in pass-through mode to turn on Intel VMD Domains only.
!!! note DAOS is not using the VROC RAID functionality. The DAOS erasure coding functionality already provides data protection across servers, so there is no benefit from providing VROC RAID functionality within a single DAOS server.
Starting with DAOS 2.2, DAOS can optionally use NVMe devices that are members
of VMD domains. In order to use VMD, this functionality first has to be enabled
in the servers' UEFI. It can then be also enabled in the daos_server.yml
configuration file, as described below.
DAOS 2.2 did enable VMD-managed devices in the daos_server.yml
configuration file
(and as arguments to some DAOS management commands), but did
not yet provide any additional functionality over non-VMD devices.
DAOS 2.4 introduces the LED management feature that requires VMD.
The following function is targeted for a future DAOS release: * Surprise hot-plug management through VMD is a DAOS 2.6 roadmap item.
This document explains how to enable VMD in DAOS 2.4 environments. Customers who intend to utilize DAOS capabilities that depend on VMD are encouraged to enable VMD with DAOS 2.4, because changing from a non-VMD setup to VMD is not possible without reformatting the DAOS storage.
NVMe view with VMD disabled (before binding to SPDK)¶
The following is an example of the lspci
view on a server with eight
NVMe SSDs, when VMD is disabled. This is the status when the devices are
still bound to the kernel (before running daos_server storage prepare –n
):
[root@nvm0806 ~]# lspci -vv | grep -i nvme
65:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
66:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
67:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
68:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
e3:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
e4:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
e5:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
e6:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
After running daos_server storage prepare -n
, the NVMe SSDs are bound
to SPDK, and lspci
or nvme list
no longer show them.
Enabling VMD in the server UEFI¶
Before DAOS can use VMD devices, VMD needs to be enabled in the servers' UEFI.
Details depend on the server vendor.
An example for this setting is DevicesandIOPorts.EnableDisableIntelVMD=Enabled
.
After enabling VMD in UEFI, the servers need to be rebooted to activate it.
NVMe view with VMD enabled (before binding to SPDK)¶
When VMD is correctly enabled in UEFI, after the reboot the lspci
output
should show new VMD controller devices in addition to the NVMe SSDs themselves.
Note that the PCIe addresses of the VMD-managed NVMe SSDs are different from
the non-VMD case, highlighting that they are now backing devices behind a VMD
controller device.
[root@nvm0806 ~]# lspci -vv | grep -i nvme
0000:64:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 04)
0000:c9:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 04)
0000:e2:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 04)
10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10000:83:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10000:84:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10001:02:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10002:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10002:02:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10002:03:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
10002:04:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
In this particular example, the eight NVMe disks belong to two different VMD domains
(each VMD domain comprises 16 PCIe lanes, and each NVMe SSD uses 4 lanes).
The lspci
output also shows a third VMD controller, which does not have
any NVMe backing devices. This is due to the fact that this server has
additional NVMe drive slots, but those slots are not populated with NVMe SSDs.
NVMe view with VMD enabled (after binding to SPDK)¶
After daos_server storage prepare -n
has been run on a VMD-enabled DAOS server,
the NVMe disks are unbound from the Linux kernel and no longer show up in lspci
or
nvme list
(just like in the non-VMD case).
However, the VMD controller devices are still visible with lspci
:
[root@nvm0806 ~]# lspci | grep -i nvme
0000:64:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 07)
0000:e2:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 07)
0000:c9:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 07)
The VMD-managed NVMe backing devices now show up in the DAOS storage scan, with their VMD IDs:
[root@nvm0806 ~]# daos_server storage scan
Scanning locally-attached storage...
NVMe PCI Model FW Revision Socket ID Capacity
-------- ----- ----------- --------- --------
640005:81:00.0 SSDPF2KX038T9L 2CV1L028 0 3.8 TB
640005:83:00.0 SSDPF2KX038T9L 2CV1L028 0 3.8 TB
640005:85:00.0 SSDPF2KX038T9L 2CV1L028 0 3.8 TB
640005:87:00.0 SSDPF2KX038T9L 2CV1L028 0 3.8 TB
e20005:01:00.0 SSDPF2KX038T9L 2CV1L028 1 3.8 TB
e20005:03:00.0 SSDPF2KX038T9L 2CV1L028 1 3.8 TB
e20005:05:00.0 SSDPF2KX038T9L 2CV1L028 1 3.8 TB
e20005:07:00.0 SSDPF2KX038T9L 2CV1L028 1 3.8 TB
Using VMD Devices in the DAOS server configuration file¶
The recommended setup to use VMD devices within the DAOS server configuration file
is to list the PCIe IDs of the VMD controllers in the storage engines' bdev_list
.
This ensures that all NVMe disks that are members of a VMD domain are always
managed together, and assigned to the same DAOS storage engine.
The daos_server.yml
file for the above example would have one VMD domain per engine:
storage: # engine 0
-
class: dcpm
scm_mount: /var/daos/pmem0
scm_list:
- /dev/pmem0
-
class: nvme
bdev_list:
- "0000:64:00.5"
storage: # engine 1
-
class: dcpm
scm_mount: /var/daos/pmem1
scm_list:
- /dev/pmem1
-
class: nvme
bdev_list:
- "0000:e2:00.5"