Skip to content

DAOS Set-Up on OpenSUSE

Introduction

The following instructions detail how to install, set up and start DAOS servers and clients on two or more nodes. This document includes instructions for openSUSE/SLES-compatible distributions.

For setup instructions on RHEL and RHEL clones, refer to the RHEL setup.

For more details, including the prerequisite steps before installing DAOS, reference the DAOS administration guide.

Requirements

The following steps require two or more hosts which will be divided up into admin, client, and server roles. One node can be used for multiple roles. For example the admin role can reside on a server, on a client, or on a dedicated admin node.

All nodes must have:

  • sudo access configured

  • password-less ssh configured

  • a parallel command launcher like pdsh or clush is installed

In addition the server nodes should also have:

For the use of the commands outlined on this page the following shell variables will need to be defined:

  • ADMIN_NODES
  • CLIENT_NODES
  • SERVER_NODES
  • ALL_NODES

For example, if you want to use admin-1 as the admin node, client-1 and client-2 as client nodes, and server-[1-3] as server nodes, these variables would be defined as:

ADMIN_NODES="admin-1"

CLIENT_NODES="client-1,client-2"

SERVER_NODES="server-1,server-2"

ALL_NODES="$ADMIN_NODES,$CLIENT_NODES,$SERVER_NODES"

Note

If a client node is also serving as an admin node, exclude $ADMIN_NODES from the ALL_NODES assignment to prevent duplication. For example: ALL_NODES=$CLIENT_NODES,$SERVER_NODES

RPM Installation

In this section the required RPMs will be installed on each of the nodes based upon their role.  Admin and client nodes require the installation of the daos-client RPM and the server nodes require the installation of the daos-server RPM.

  1. Configure access to the DAOS package repository:

    clush -B -w $ALL_NODES 'sudo zypper ar https://packages.daos.io/v2.6/Leap15/packages/x86_64/daos_packages.repo'
    
  2. Import GPG key on all nodes:

    clush -B -w $ALL_NODES 'sudo rpm --import https://packages.daos.io/RPM-GPG-KEY'
    
  3. Refresh zypper:

    clush -B -w $ALL_NODES 'sudo zypper --non-interactive refresh'
    
  4. Install mercury-ucx or mercury-libfabric RPMs on all nodes. For more details, reference Network Requirements.

    clush -B -w $ALL_NODES 'sudo zypper install -y mercury-ucx'
    
  5. Install the daos-admin RPMs on the admin nodes:

    clush -B -w $ADMIN_NODES 'sudo zypper install -y daos-admin'
    
  6. Install the daos-server RPMs on the server nodes:

    clush -B -w $SERVER_NODES 'sudo zypper install -y daos-server'
    
  7. Install the daos-client RPMs on the client nodes:

    clush -B -w $CLIENT_NODES 'sudo zypper install -y daos-client'
    

Generate certificates

In this section certificates will be generated and installed for encrypting the DAOS control plane communications.

Administrative nodes require the following certificate files:

  • CA root certificate (daosCA.crt) owned by the current user

  • Admin certificate (admin.crt) owned by the current user

  • Admin key (admin.key) owned by the current user

Client nodes require the following certificate files:

  • CA root certificate (daosCa.crt) owned by the daos_agent user

  • Agent certificate (agent.crt) owned by the daos_agent user

  • Agent key (agent.key) owned by the daos_agent user

Server nodes require the following certificate files:

  • CA root certificate (daosCA.crt) owned by the daos_server user

  • Server certificate (server.crt) owned by the daos_server user

  • Server key (server.key) owned by the daos_server user

  • A copy of the Client certificate (client.crt) owned by the daos_server user

See Certificate Configuration for more information.

Note

The following commands are run on one of the $ADMIN_NODES.

  1. Generate a new set of certificates:

    cd /tmp
    /usr/lib64/daos/certgen/gen_certificates.sh
    

    Note

    These files should be protected from unauthorized access and preserved for future use.

  2. Copy the certificates to a common location on each node in order to move them to the final location:

    clush -B -w $ALL_NODES --copy /tmp/daosCA --dest /tmp
    
  3. Copy the certificates to their default location (/etc/daos) on each admin node:

    clush -B -w $ADMIN_NODES sudo cp /tmp/daosCA/certs/daosCA.crt /etc/daos/certs/.
    clush -B -w $ADMIN_NODES sudo cp /tmp/daosCA/certs/admin.crt /etc/daos/certs/.
    clush -B -w $ADMIN_NODES sudo cp /tmp/daosCA/certs/admin.key /etc/daos/certs/.
    

    Note

    If the /etc/daos/certs directory does not exist on the admin nodes then use the following command to create it:
    
        clush -B -w $ADMIN_NODES sudo mkdir /etc/daos/certs
    
  4. Copy the certificates to their default location (/etc/daos) on each client node:

    clush -B -w $CLIENT_NODES sudo cp /tmp/daosCA/certs/daosCA.crt /etc/daos/certs/.
    clush -B -w $CLIENT_NODES sudo cp /tmp/daosCA/certs/agent.crt /etc/daos/certs/.
    clush -B -w $CLIENT_NODES sudo cp /tmp/daosCA/certs/agent.key /etc/daos/certs/.
    

    Note

    If the /etc/daos/certs directory does not exist on the client nodes, use the following command to create it:

    clush -B -w $CLIENT_NODES sudo mkdir /etc/daos/certs
    
  5. Copy the certificates to their default location (/etc/daos) on each server node:

    clush -B -w $SERVER_NODES sudo cp /tmp/daosCA/certs/daosCA.crt /etc/daos/certs/.
    clush -B -w $SERVER_NODES sudo cp /tmp/daosCA/certs/server.crt /etc/daos/certs/.
    clush -B -w $SERVER_NODES sudo cp /tmp/daosCA/certs/server.key /etc/daos/certs/.
    clush -B -w $SERVER_NODES sudo cp /tmp/daosCA/certs/agent.crt /etc/daos/certs/clients/agent.crt
    
  6. Cleanup the temp directory

    clush -B -w $ALL_NODES sudo rm -rf /tmp/daosCA
    
  7. Set the ownership of the admin certificates on each admin node:

    clush -B -w $ADMIN_NODES sudo chown $USER: /etc/daos/certs/daosCA.crt
    clush -B -w $ADMIN_NODES sudo chown $USER: /etc/daos/certs/admin.*
    
  8. Set the ownership of the client certificates on each client node:

    clush -B -w $CLIENT_NODES sudo chown $USER: /etc/daos/certs/daosCA.crt
    clush -B -w $CLIENT_NODES sudo chown daos_agent:daos_agent /etc/daos/certs/agent.*
    
  9. Set the ownership of the server certificates on each server node:

    clush -B -w $SERVER_NODES sudo chown daos_server:daos_server /etc/daos/certs/daosCA.crt
    clush -B -w $SERVER_NODES sudo chown daos_server:daos_server /etc/daos/certs/server.*
    clush -B -w $SERVER_NODES sudo chown daos_server:daos_server /etc/daos/certs/clients/agent.crt
    clush -B -w $SERVER_NODES sudo chown daos_server:daos_server /etc/daos/certs/clients
    

Hardware Provisioning

If the server nodes are configured with PMem (Intel(R) Optane(TM) persistent memory), it will need to be prepared and configured to be used by DAOS and NVME SSDs will be identified. See SCM Preparation for more infortamtion.

  1. Prepare the pmem devices on Server nodes:

    daos_server scm prepare
    

    Sample Script:

    Prepare locally-attached PMem\...
    
    Memory allocation goals for PMem will be changed and namespaces
    modified, this may be a destructive operation. Please ensure
    namespaces are unmounted and locally attached PMem modules are
    not in use. Please be patient as it may take several minutes and
    subsequent reboot maybe required.
    
    Are you sure you want to continue? (yes/no)
    
    yes
    
    A reboot is required to process new PMem memory allocation goals.
    
  2. Reboot the server node.

  3. Run the prepare cmdline again:

    daos_server scm prepare
    

    Sample Script:

    Prepare locally-attached PMem\...
    SCM namespaces:
    SCM Namespace   Socket ID   Capacity
    -------------   ---------   --------
    pmem0           0           3.2 TB
    pmem1           0           3.2 TB
    
  4. Scan the available scm storage on the Server nodes:

    daos_server scm scan
    SCM Namespace   Socket ID   Capacity
    -------------   ---------   --------
    pmem0           0           3.2 TB
    pmem1           1           3.2 TB
    

Create Configuration Files

In this section the daos_server, daos_agent, and dmg command configuration files will be defined. Examples are available on github.

  1. Determine the addresses for the NVMe storage on the server nodes:

    clush -B -w $SERVER_NODES daos_server nvme scan --ignore-config
    

    Sample daos_server nvme scan output:

    Scan locally-attached NVMe storage...
    NVMe PCI     Model               FW Revision Socket Capacity Role(s) Rank
    --------     -----               ----------- ------ -------- ------- ----
    0000:83:00.0 INTEL SSDPE2MD800G4 8DV10171    0      800 GB   NA      None
    0000:84:00.0 INTEL SSDPE2MD800G4 8DV10171    1      800 GB   NA      None
    

    Note

    Save the addresses of the NVMe devices to use with each DAOS server, e.g. "81:00.0", from each server node.  This information will be used to populate the "bdev_list" server configuration parameter below.

  2. Create a server configuration file. See DAOS Server Setup for more details. Either modify the sample /etc/daos/daos_server.yml file or use the config generate tool to create a server configuration file in /tmp/daos_server.yml:

    An example of using the config generate command on the first server node:

    daos_server config generate --access-points=$(hostname -s) | tee /tmp/daos_server.yml
    

    An example of modifying the sample daos_server.yml:

    cp /etc/daos/daos_server.yml /tmp/daos_server.yml
    vim /tmp/daos_server.yml
    

    Note

    Use any PMEM or NVMe addresses collected above when modifying the scm_list or bdev_list engine storage entries, respectively.

    An example of the /tmp/daos_server.yml:

    cat /tmp/daos_server.yml
    name: daos_server
    access_points:
    - node-4
    port: 10001
    
    transport_config:
        allow_insecure: false
        client_cert_dir: /etc/daos/certs/clients
        ca_cert: /etc/daos/certs/daosCA.crt
        cert: /etc/daos/certs/server.crt
        key: /etc/daos/certs/server.key
    provider: ofi+verbs;ofi_rxm
    control_log_mask: DEBUG
    control_log_file: /tmp/daos_server.log
    helper_log_file: /tmp/daos_server_helper.log
    engines:
    -
        pinned_numa_node: 0
        targets: 8
        nr_xs_helpers: 2
        fabric_iface: ib0
        fabric_iface_port: 31316
        log_mask: INFO
        log_file: /tmp/daos_engine_0.log
        env_vars:
            - CRT_TIMEOUT=30
        storage:
        -
            class: dcpm
            scm_mount: /mnt/daos0
            scm_list:
            - /dev/pmem0
        -
            class: nvme
            bdev_list:
            - "0000:81:00.0"
    -
        pinned_numa_node: 1
        targets: 8
        nr_xs_helpers: 2
        fabric_iface: ib1
        fabric_iface_port: 31416
        log_mask: INFO
        log_file: /tmp/daos_engine_1.log
        env_vars:
            - CRT_TIMEOUT=30
        storage:
        -
            class: dcpm
            scm_mount: /mnt/daos1
            scm_list:
            - /dev/pmem1
        -
            class: nvme
            bdev_list:
            - "0000:83:00.0"
    
  3. Copy the modified server yaml file to all the server nodes at /etc/daos/daos_server.yml.

    clush -B -w $SERVER_NODES --copy /tmp/daos_server.yml
    clush -B -w $SERVER_NODES sudo cp /tmp/daos_server.yml /etc/daos/
    
  4. Create an agent configuration file by modifying the default /etc/daos/daos_agent.yml file on the client nodes. The following is an example daos_agent.yml. Copy the modified agent yaml file to all the client nodes at /etc/daos/daos_agent.yml.

    cat /tmp/daos_agent.yml
    name: daos_server
    access_points:
    - server-1
    
    port: 10001
    
    transport_config:
        allow_insecure: false
        ca_cert: /etc/daos/certs/daosCA.crt
        cert: /etc/daos/certs/agent.crt
        key: /etc/daos/certs/agent.key
    log_file: /tmp/daos_agent.log
    
    clush -B -w $CLIENT_NODES --copy /tmp/daos_agent.yml
    clush -B -w $CLIENT_NODES sudo cp /tmp/daos_agent.yml /etc/daos/
    
  5. Create a dmg configuration file by modifying the default /etc/daos/daos_control.yml file on the admin node. The following is an example of the daos_control.yml.

    cat /tmp/daos_control.yml
    name: daos_server
    port: 10001
    hostlist:
    - server-1
    - server-2
    
    transport_config:
        allow_insecure: false
        ca_cert: /etc/daos/certs/daosCA.crt
        cert: /etc/daos/certs/admin.crt
        key: /etc/daos/certs/admin.key
    
    clush -B -w $ADMIN_NODES --copy /tmp/daos_control.yml
    clush -B -w $ADMIN_NODES sudo cp /tmp/daos_control.yml /etc/daos/
    

Start the DAOS Servers

  1. Start daos engines on server nodes:

    clush -B -w $SERVER_NODES "sudo systemctl daemon-reload"
    clush -B -w $SERVER_NODES "sudo systemctl start daos_server"
    
  2. Check status and format storage:

    # check status
    clush -B -w $SERVER_NODES "sudo systemctl status daos_server"
    
    # if you see following format messages (depending on number of servers), proceed to storage format
    server-1: server-1.test.example.com INFO 2023/04/11 23:14:06 SCM format required on instance 1
    server-1: server-1.test.example.com INFO 2023/04/11 23:14:06 SCM format required on instance 0
    
    # format storage
    dmg storage format -l $SERVER_NODES # can use --force if needed
    
  3. Verify that all servers have started:

    # system query from ADMIN_NODES
    dmg system query -v
    
    # all the server ranks should show 'Joined' STATE
    Rank UUID                                 Control Address  Fault Domain                  State  Reason
    ---- ----                                 ---------------  ------------                  -----  ------
    0    604c4ffa-563a-49dc-b702-3c87293dbcf3 10.8.1.179:10001 /server-1.test.example.com Joined
    1    f0791f98-4379-4ace-a083-6ca3ffa65756 10.8.1.179:10001 /server-1.test.example.com Joined
    2    745d2a5b-46dd-42c5-b90a-d2e46e178b3e 10.8.1.189:10001 /server-2.test.example.com Joined
    3    ba6a7800-3952-46ce-af92-bba9daa35048 10.8.1.189:10001 /server-2.test.example.com Joined
    

Start the DAOS Agents

  1. Start the daos agents on the client nodes:

    # start agents
    clush -B -w $CLIENT_NODES "sudo systemctl start daos_agent"
    
  2. Verify daos_agent communication:

    # verify client communication to each server rank
    clush -B -w $CLIENT_NODES "daos system query --verbose"
    
    # Sample output
    connected to DAOS system:
    name: daos_server
    fabric provider: ofi+tcp
    access point ranks:
            rank[0]: ofi+tcp://10.8.1.179:10001
    rank URIs:
            rank[0]: ofi+tcp://10.8.1.179:10001
            rank[1]: ofi+tcp://10.8.1.179:10001
            rank[2]: ofi+tcp://10.8.1.189:10001
            rank[3]: ofi+tcp://10.8.1.189:10001