УНИТе

Портал > Документация > Инсталиране на дравери и софтуер за използване на Mellanox InfiniBand с RDMA под CentOS 7 и Scientific Linux 7

Инсталиране на дравери и софтуер за използване на Mellanox InfiniBand с RDMA под CentOS 7 и Scientific Linux 7

Съдържание:

  1. Предварителна информация
  2. Инсталиране на NTP софтуера
  3. Настройка на NTP клиента върху система, която не е част от HPC
  4. Настройка на NTP клиента върху система, която е част от HPC

 

1. Предварителна информация

Описанието, дадено по-долу, касае специфично InfiniBand (IB) хардуера, който е закупен по проекта УНИТе, базиран на устройства Mellanox. Настройката на IB модулите е строго специфична за обордуването на Mellanox и софтуерните компоненти, които този дистрибутор предлага.

 

2. Инсталиране на софтуера и поддръжката за InfiniBand, включени в дистрибутивните пакети

Преди да пристъпите към инсталацията на специфичните пакети, уверете се, че вече инсталираните пакети са актуализирани:

$ sudo yum update

В случай, че сред пакетите, актуализирани след изпълнението на горния команден ред, виждате тези за kernel и glibc, рестартирайте системата, преди да продължите.

Инсталирайте пакетите, които са нужни за настройките и конфигурирането на NFS с поддръжка за InfiniBand, през RDMA:

$ sudo yum install nfs-utils rpcbind rdma-core lsof gcc-gfortran tcsh
$ sudo yum groupinstall "InfiniBand Support"

 

3. Инсталиране на модулите на Mellanox, настройки на ядрото и мрежовите интерфейси

Преди да започнете с инсталацията на пакетите с драйверите, помощните библиотеки и инструменти, трябва да премахнете пакетите opa-address-resolution и opa-libopamgt, които са инсталирани при инсталирането на групата пакети "InfiniBand Support":

$ sudo yum remove opa-address-resolution opa-libopamgt

Изтеглете последната версия на Melanox драйвърите за Linux:

http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

Бъдете сигурни, че сте избрали версия на драйвърите, която отговаря на минорната версия на CentOS 7, върху която ще инсталирате драйверите (7.1, 7.2, 7.3, 7.4, 7.5, 7.6 или друга). Имайте предвид, че инсталатора на пакетите с драйверите, проверява версията на дистрибуцията и няма как да инсталирате (без да редактирате инсталационния скрипт), стара версия на драйвърите, върху нова версия на дистрибуцията, или обратното.

За примерите по-долу се предполага, че са изтеглени драйвери за CentOS 7.6 и процесорна архитектура x86_64. В този случай, файла с архива ще се казва MLNX_OFED_LINUX-4.6-1.0.1.1-rhel7.6-x86_64.tgz (по времето, по което ще изпълнявате инструкциите в този документ, е възможно версията в името на файла да е друга). Разпакетирайте го, влезте в директорията с разпакетираното съдържание на архива и там изпълнете локално скрипта mlnxofedinstall:

$ sudo ./mlnxofedinstall

Ако всичко около стартирането на скрипта е наред, инсталиранието на пакетите ще изведе следните съобщения:

Logs dir: /tmp/MLNX_OFED_LINUX.21484.logs
General log file: /tmp/MLNX_OFED_LINUX.21484.logs/general.log
Verifying KMP rpms compatibility with target kernel...
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:

rpm --nosignature -e --allmatches --nodeps libibverbs libibverbs-utils libibumad ibacm librdmacm librdmacm-utils opensm-libs dapl perftest mstflint ibutils infiniband-diags qperf libibverbs libibverbs-utils libibumad ibacm librdmacm librdmacm-utils opensm-libs dapl perftest mstflint infiniband-diags qperf opensm-libs ibutils ibutils-libs srp_daemon

Starting MLNX_OFED_LINUX-4.6-1.0.1.1 installation ...

Installing mlnx-ofa_kernel RPM
Preparing...                          ########################################
Updating / installing...
mlnx-ofa_kernel-4.6-OFED.4.6.1.0.1.1.g########################################
Configured /etc/security/limits.conf
Installing kmod-mlnx-ofa_kernel 4.6 RPM
Preparing...                          ########################################
kmod-mlnx-ofa_kernel-4.6-OFED.4.6.1.0.########################################
Installing mlnx-ofa_kernel-devel RPM
Preparing...                          ########################################
Updating / installing...
mlnx-ofa_kernel-devel-4.6-OFED.4.6.1.0########################################
Installing kmod-kernel-mft-mlnx 4.12.0 RPM
Preparing...                          ########################################
kmod-kernel-mft-mlnx-4.12.0-1.rhel7u6 ########################################
Installing knem RPM
Preparing...                          ########################################
Updating / installing...
knem-1.1.3.90mlnx1-OFED.4.4.2.5.2.1.g9########################################
Installing kmod-knem 1.1.3.90mlnx1 RPM
Preparing...                          ########################################
kmod-knem-1.1.3.90mlnx1-OFED.4.4.2.5.2########################################
Installing kmod-iser 4.6 RPM
Preparing...                          ########################################
kmod-iser-4.6-OFED.4.6.1.0.1.1.ga2cfe0########################################
Installing kmod-srp 4.6 RPM
Preparing...                          ########################################
kmod-srp-4.6-OFED.4.6.1.0.1.1.ga2cfe08########################################
Installing kmod-isert 4.6 RPM
Preparing...                          ########################################
kmod-isert-4.6-OFED.4.6.1.0.1.1.ga2cfe########################################
Installing kmod-rshim 1.6 RPM
Preparing...                          ########################################
kmod-rshim-1.6-0.g6aa30c7.rhel7u6     ########################################
Installing mpi-selector RPM
Preparing...                          ########################################
Updating / installing...
mpi-selector-1.0.3-1.46101            ########################################
Installing user level RPMs:
Preparing...                          ########################################
ofed-scripts-4.6-OFED.4.6.1.0.1       ########################################
Preparing...                          ########################################
libibverbs-41mlnx1-OFED.4.6.0.4.1.4610########################################
Preparing...                          ########################################
libibverbs-devel-41mlnx1-OFED.4.6.0.4.########################################
Preparing...                          ########################################
libibverbs-devel-static-41mlnx1-OFED.4########################################
Preparing...                          ########################################
libibverbs-utils-41mlnx1-OFED.4.6.0.4.########################################
Preparing...                          ########################################
libmlx4-41mlnx1-OFED.4.5.0.0.3.46101  ########################################
Preparing...                          ########################################
libmlx4-devel-41mlnx1-OFED.4.5.0.0.3.4########################################
Preparing...                          ########################################
libmlx5-41mlnx1-OFED.4.6.0.0.4.46101  ########################################
Preparing...                          ########################################
libmlx5-devel-41mlnx1-OFED.4.6.0.0.4.4########################################
Preparing...                          ########################################
librxe-41mlnx1-OFED.4.4.2.4.6.46101   ########################################
Preparing...                          ########################################
librxe-devel-static-41mlnx1-OFED.4.4.2########################################
Preparing...                          ########################################
libibcm-41mlnx1-OFED.4.1.0.1.0.46101  ########################################
Preparing...                          ########################################
libibcm-devel-41mlnx1-OFED.4.1.0.1.0.4########################################
Preparing...                          ########################################
libibumad-43.1.1.MLNX20190422.87b4d9b-########################################
Preparing...                          ########################################
libibumad-devel-43.1.1.MLNX20190422.87########################################
Preparing...                          ########################################
libibumad-static-43.1.1.MLNX20190422.8########################################
Preparing...                          ########################################
libibmad-5.4.0.MLNX20190423.1d917ae-0.########################################
Preparing...                          ########################################
libibmad-devel-5.4.0.MLNX20190423.1d91########################################
Preparing...                          ########################################
libibmad-static-5.4.0.MLNX20190423.1d9########################################
Preparing...                          ########################################
ibsim-0.7mlnx1-0.11.g85c342b.46101    ########################################
Preparing...                          ########################################
ibacm-41mlnx1-OFED.4.3.3.0.0.46101    ########################################
Preparing...                          ########################################
librdmacm-41mlnx1-OFED.4.6.0.0.1.46101########################################
Preparing...                          ########################################
librdmacm-utils-41mlnx1-OFED.4.6.0.0.1########################################
Preparing...                          ########################################
librdmacm-devel-41mlnx1-OFED.4.6.0.0.1########################################
Preparing...                          ########################################
opensm-libs-5.4.0.MLNX20190422.ed81811########################################
Preparing...                          ########################################
opensm-5.4.0.MLNX20190422.ed81811-0.1.########################################
Preparing...                          ########################################
opensm-devel-5.4.0.MLNX20190422.ed8181########################################
Preparing...                          ########################################
opensm-static-5.4.0.MLNX20190422.ed818########################################
Preparing...                          ########################################
dapl-2.1.10mlnx-OFED.3.4.2.1.0.46101  ########################################
Preparing...                          ########################################
dapl-devel-2.1.10mlnx-OFED.3.4.2.1.0.4########################################
Preparing...                          ########################################
dapl-devel-static-2.1.10mlnx-OFED.3.4.########################################
Preparing...                          ########################################
dapl-utils-2.1.10mlnx-OFED.3.4.2.1.0.4########################################
Preparing...                          ########################################
perftest-4.4-0.5.g1ceab48.46101       ########################################
Preparing...                          ########################################
mstflint-4.11.0-1.14.g840c9c2.46101   ########################################
Preparing...                          ########################################
mft-4.12.0-105                        ########################################
Preparing...                          ########################################
srptools-41mlnx1-5.46101              ########################################
Preparing...                          ########################################
ibutils2-2.1.1-0.104.MLNX20190408.gb55########################################
Preparing...                          ########################################
ibutils-1.5.7.1-0.12.gdcaeae2.46101   ########################################
Preparing...                          ########################################
cc_mgr-1.0-0.41.g750eb1e.46101        ########################################
Preparing...                          ########################################
dump_pr-1.0-0.37.g750eb1e.46101       ########################################
Preparing...                          ########################################
ar_mgr-1.0-0.42.g750eb1e.46101        ########################################
Preparing...                          ########################################
ibdump-5.0.0-3.46101                  ########################################
Preparing...                          ########################################
infiniband-diags-5.4.0.MLNX20190422.d1########################################
Preparing...                          ########################################
infiniband-diags-compat-5.4.0.MLNX2019########################################
Preparing...                          ########################################
qperf-0.4.9-9.46101                   ########################################
Preparing...                          ########################################
mxm-3.7.3111-1.46101                  ########################################
Preparing...                          ########################################
ucx-1.6.0-1.46101                     ########################################
Preparing...                          ########################################
ucx-devel-1.6.0-1.46101               ########################################
Preparing...                          ########################################
sharp-1.8.1.MLNX20190422.6c05a05-1.461########################################
Preparing...                          ########################################
ucx-cma-1.6.0-1.46101                 ########################################
Preparing...                          ########################################
ucx-ib-1.6.0-1.46101                  ########################################
Preparing...                          ########################################
ucx-ib-cm-1.6.0-1.46101               ########################################
Preparing...                          ########################################
ucx-rdmacm-1.6.0-1.46101              ########################################
Preparing...                          ########################################
ucx-knem-1.6.0-1.46101                ########################################
Preparing...                          ########################################
hcoll-4.3.2708-1.46101                ########################################
Preparing...                          ########################################
openmpi-4.0.2a1-1.46101               ########################################
Preparing...                          ########################################
mlnx-ethtool-4.19-1.46101             ########################################
Preparing...                          ########################################
mlnx-iproute2-4.20.0-1.46101          ########################################
Preparing...                          ########################################
mlnxofed-docs-4.6-1.0.1.1             ########################################
Preparing...                          ########################################
mpitests_openmpi-3.2.20-e1a0676.46101 ########################################

Installation finished successfully.


Preparing...                          ################################# [100%]
Updating / installing...
   1:mlnx-fw-updater-4.6-1.0.1.1      ################################# [100%]

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Attempting to perform Firmware update...

To load the new driver, run:
/etc/init.d/openibd restart

Най-важното, което е за отбелязване в горните съобщения от инсталатора, е Attempting to perform Firmware update.... Това е съобщение, което указва, че firmware версията на InfiniBand мрежовия адаптор, ще бъде обновена. Не бива след този текст да виждате никакво съобщение за фатална грешка. Ако случайно видите съобщение за грешка при актуализацията на firmware, трябва да се обърнете към поддръжката на доставчика. Възможно е да се наложи ROM на устройството да бъде препрограмиран.

Преди да пристъпите към първото зареждане на модулите, използвайки услугата openibd, премахнете от паметта модулите ib_isert, rpcrdma и ib_srpt:

$ sudo modprobe -rv  ib_isert rpcrdma ib_srpt

Сега вече може да тествате дали новоинсталираните модули се зареждат успешно:

$ sudo systemctl restart openibd

Рестартирайте системата, за да проверите дали зареждането на инсталираните модули се извършва успешно.

За да научите PCI адресите (пътищата) на IB устройствата, може да се използвате mst:

$ sudo mst start

За да видите съответствата между физическите и логическите устройства в IB, може да използвате резултата от изпълнението на:

$ sudo ibdev2netdev

След като драйвърите са заредени и съответните мрежови устройства са изградени, настройката на IP адресите може да стане с nmcli:

$ sudo nmcli con add con-name ib0 ifname ib0 type infiniband ip4 10.0.0.2/24 802-3-ethernet.mtu 9128

Обърнете внимание на задаването на стойността на MTU (в примера е 9128 байта). Статуса и настройките на интерфейса (при успешно конфигуриране) може да видите генерално така:

$ nmcli device status
DEVICE  TYPE        STATE        CONNECTION
enp4s0  ethernet    connected    enp4s0
ib0     infiniband  connected    ib0
ib1     infiniband  unavailable  --

Специфичните настройки за даден интерфейс, може да изведете пак чрез nmcli:

$ nmcli device show ib0
GENERAL.DEVICE:                         ib0
GENERAL.TYPE:                           infiniband
GENERAL.HWADDR:                         B1:00:09:51:FE:80:00:00:00:00:00:00:00:71:14:BB:00:92:B1:07
GENERAL.MTU:                            9128
GENERAL.STATE:                          100 (connected)
GENERAL.CONNECTION:                     ib0
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/ActiveConnection/25
IP4.ADDRESS[1]:                         10.0.0.2/24
IP4.GATEWAY:                            --
IP4.ROUTE[1]:                           dst = 10.0.0.2/24, nh = 0.0.0.0, mt = 150
IP6.ADDRESS[1]:                         fe80::cab4:429c:456a:8005/64
IP6.GATEWAY:                            --
IP6.ROUTE[1]:                           dst = fe80::/64, nh = ::, mt = 150
IP6.ROUTE[2]:                           dst = ff00::/8, nh = ::, mt = 256, table=255

 

4. Настройки на ядрото и мрежовите интерфейси за използване на RDMA

За да може да стартирате RDMA протоколния обмен, трябва да заредите RDMA модулите и да стартирате съответното сървърско/клиентско приложение.

Вграждането на модулите в стартовото изображение на CentOS, става чрез dracut по следния начин:

$ sudo dracut --add-drivers "mlx4_en mlx4_ib mlx5_ib" -f

След успешно завършване на операцията, рестартирайте системата, за да проверите дали модулите се зареждат. Ако установите, че са заредени (може да установите това с изпълнение на lsmod), пристъпете към стартиране на услугата RDMA

$ sudo systemctl restart rdma-load-modules@infiniband
$ sudo systemctl enable rdma
$ sudo restart rdma

Успоредно с това, във /var/log/messages, ще видите съобщенията за успешното зареждане на модулите:

Aug  4 04:09:52 hpc-service-host systemd: Starting Load RDMA modules from /etc/rdma/modules/infiniband.conf...
Aug  4 04:09:52 hpc-service-host systemd: Started Load RDMA modules from /etc/rdma/modules/infiniband.conf.

 


Последна актуализация: 30 юли 2019

2019 УНИТе, Веселин Колев