smartnic学习

总结smartNIC的相关知识,虽然没啥用(因为很多东西都是基于netdev interface,而实验上好像是虚拟端口,所以不是一个东西),但是为了以后可能会用,还是记一下。

Agilio smartNIC user guide

文档地址https://help.netronome.com/support/solutions/articles/36000049975-basic-firmware-user-guide

hardware installation

validation

1
2
3
lspci -d 19ee:
#
lspci | grep -i Eth

driver and firmware

validating the driver

支持nfp driver的操作系统

operating system kernel version
Ubuntu 16.04 4.11+(PF)
RHEL 7.4+ Default
CentOS 7.4+ Default

confirm upstreamed nfp driver

Confirm that your current Operating System contains the upstreamed nfp module

1
modinfo nfp | head -3

confirm thath nfp driver is loaded

1
lsmode | grep nfp

smartNIC netdev interfaces

1
2
3
4
5
# install agilio-nameing-policy package
apt-get install agilio-naming-policy

#nfp driver初始化之后就有新的netdev interfaces了
#ip link

validating the firmware

固件的版本和smartNIC能实现的功能有关,固件应该在/lib/firmware/netronome/

固件版本可以通过

1
2
#ethtool -i "netdev interface"
ethtool -i enp6s0np0

upgrading the firmware

upgrading firmware via the netronome repo

通过附录一的配置之后,就可以这样安装

1
apt-get install agilio-nic-firmware

upgrading firmware from package installation

从support上下载之后

1
2
3
4
dpkg -i agilio-nic-firmware-*.deb

# reload driver to load new firmware
rmmod nfp; modprobe nfp

using the linux driver

configuring interface media mode

对于老的内核不支持这样配置的,见附录C

下面是agilio CX 25G smartNIC的示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
## down respective interface(s)
# ip link set <intf> down

## to set interface linkspeed to 10G
# ethtool -s <intf> speed 10000

##NB. A driver reload is needed whenever a port’s speed is changed

## reload driver for changes to take effect
# rmmod nfp; modprobe nfp

## to set interface linkspeed to 25G
# ethtool -s <intf> speed 25000

## reload driver for changes to take effect
# rmmod nfp; modprobe nfp

## older driver/firmware may require a system reboot for changes to take effect
# reboot

setting interface breakout mode

下面的命令只适用于kernel版本4.13之后,为Agilio CX 40G/2x40G SmartNICs的示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## determine card’s pci address
# lspci -kd 19ee:

## devlink dev show
pci/0000:06:00.0

## devlink port
pci/0000:06:00.0/0: type eth netdev enp6s0np0

## to put the first 40G port in breakout mode(4x10G)
# devlink port split pci/0000:06:00.0/0 count 4

## to configure the second 40G port on a beryllium(2x40G interfaces) in breakout mode(4x10G)
# devlink port split pci/0000:06:00.0/4 count 4

## to configure a port from breakout mode(4x10G) to single mode(40G)
# devlink port unsplit pci/0000:06:00.0/4

## reload driver for changes to take effect
# rmmod nfp; modprobe nfp

## older driver/firmware versions may require a system reboot for changes to take effect
# reboot

## after reboot the port should be in breakout mode e.g.
# devlink port
pci/0000:06:00.0/0: type eth netdev enp6s0np0s0 split_group 0
pci/0000:06:00.0/1: type eth netdev enp6s0np0s1 split_group 0
pci/0000:06:00.0/2: type eth netdev enp6s0np0s2 split_group 0
pci/0000:06:00.0/3: type eth netdev enp6s0np0s3 split_group 0

confirm connectivity

allocating IP addresses

1
2
3
## assign IP address to interface
# ip address add 10.0.0.2/24 dev ens1np0
# ip link set ens1np0 up

pinging interfaces

1
2
3
4
5
6
## ping IP from host on same subnet
# ping 10.0.0.2

PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.067 ms
64 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=0.062 ms

basic performance test

除了IRQ和RSS的配置之外,就是iperf和iperf3的一些基本用法

set IRQ affinity

IRQ用来平衡核之间的负载

1
# wget https://raw.githubusercontent.com/Netronome/nfp-drv-kmods/master/tools/set_irq_affinity.sh

样例输出

1
2
3
4
5
6
7
8
9
10
11
# /nfp-drv-kmods/tools/set_irq_affinity.sh enXXXXnpX

Device 0000:02:00.0 is on node 0 with cpus 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
IRQ 181 to CPU 0 (irq: 00,00000001 xps: 03,00030003)
IRQ 182 to CPU 1 (irq: 00,00000002 xps: 00,00000000)
IRQ 183 to CPU 2 (irq: 00,00000004 xps: 0c,000c000c)
IRQ 184 to CPU 3 (irq: 00,00000008 xps: 00,00000000)
IRQ 185 to CPU 4 (irq: 00,00000010 xps: 30,00300030)
IRQ 186 to CPU 5 (irq: 00,00000020 xps: 00,00000000)
IRQ 187 to CPU 6 (irq: 00,00000040 xps: c0,00c000c0)
IRQ 188 to CPU 7 (irq: 00,00000080 xps: 00,00000000)

install iperf

1
apt-get install -y iperf

using iperf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## server 
# allocate ipv4 address to SmartNIC interface
# ip address add 10.0.0.1/24 dev ens1np0

# launch iperf server
# iperf -s

## client
# iperf -c 10.0.0.1 -P 4
# iperf -c 10.0.0.1 -P 4
------------------------------------------------------------
Client connecting to 10.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 5] local 10.0.0.2 port 56938 connected with 10.0.0.1 port 5001
[ 3] local 10.0.0.2 port 56932 connected with 10.0.0.1 port 5001
[ 4] local 10.0.0.2 port 56934 connected with 10.0.0.1 port 5001
[ 6] local 10.0.0.2 port 56936 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 11.9 GBytes 10.3 Gbits/sec
[ 3] 0.0-10.0 sec 9.85 GBytes 8.46 Gbits/sec
[ 4] 0.0-10.0 sec 11.9 GBytes 10.2 Gbits/sec
[ 5] 0.0-10.0 sec 10.2 GBytes 8.75 Gbits/sec
[SUM] 0.0-10.0 sec 43.8 GBytes 37.7 Gbits/sec

using iperf3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Server:
# iperf3 -s -p 5001 & iperf3 -s -p 5002 & iperf3 -s -p 5003 & iperf3 -s -p 5004 &

## Client:
# iperf3 -c 102.0.0.6 -i 30 -p 5001 & iperf3 -c 102.0.0.6 -i 30 -p 5002 & iperf3 -c 102.0.0.6 -i 30 -p 5003 & iperf3 -c 102.0.0.6 -i 30 -p 5004 &

Example output:

[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-10.04 sec 9.39 GBytes 8.03 Gbits/sec receiver
[ 5] 10.00-10.04 sec 33.1 MBytes 7.77 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-10.04 sec 9.86 GBytes 8.44 Gbits/sec receiver
[ 5] 10.00-10.04 sec 53.6 MBytes 11.8 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-10.04 sec 11.9 GBytes 10.2 Gbits/sec receiver
[ 5] 10.00-10.04 sec 42.1 MBytes 9.43 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-10.04 sec 10.2 GBytes 8.70 Gbits/sec receiver

Total: 37.7 Gbits/sec

95.49% of 40GbE link

basic firmware features

这节主要介绍用ethtool查看和配置一些smartNIC的接口参数。

multiple queues

smartNIC支持多个TX和RX队列。-I可以看到当前的队列,-L可以配置当前的队列。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# ethtool -l ens1np0
Channel parameters for ens1np0:
Pre-set maximums:
RX: 20
TX: 20
Other: 2
Combined: 20
Current hardware settings:
RX: 0
TX: 12
Other: 2
Combined: 8

# ethtool -L <intf> rx 0 tx 0 combined 8
## rx 代表receive ring interrupts
## tx 代表transmit ring interrupts
## combined 代表interrupts that service both

receive side scaling(RSS)

-n查看,-N修改,-x-X修改和配置key

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# ethtool -n ens1np0 rx-flow-hash tcp4
TCP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]

# ethtool -n ens1np0 rx-flow-hash udp4
UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA

# ethtool -N <intf> rx-flow-hash tcp4 sdfn
# ethtool -N <intf> rx-flow-hash udp4 sdfn

## flag details
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6m|v|t|s|d|f|n|r...
Configures the hash options for the specified flow type.

m Hash on the Layer 2 destination address of the rx packet.
v Hash on the VLAN tag of the rx packet.
t Hash on the Layer 3 protocol field of the rx packet.
s Hash on the IP source address of the rx packet.
d Hash on the IP destination address of the rx packet.
f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.
n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.
r Discard all packets of this flow type. When this option is set, all other options are ignored.

# ethtool -x <intf>
# ethtool -X <intf> <hkey>

## flag details
-x --show-rxfh-indir
Retrieves the receive flow hash indirection table.
-X --set-rxfh-indir
Configures the receive flow hash indirection table.

view interface parameters

-k

1
# ethtool -k ens1np0

一些选项开关列举如下

1
2
3
4
5
6
7
8
9
10
11
12
13
## rx-checksumming
# enable rx-checksumming
# ethtool -K ens1np0 rx on

# disable rx-checksumming
# ethtool -K ens1np0 rx off

##同理
## tx-checksumming: tx
## scatter and gather: sg
## tcp-segmentation offload: tso
## generic segmentation offload: gso
## generic receive offload: gro

install configuring and using dpdk

此处就略过了。

附录内容

  • A:netronome repositories,装上面提到的一些应用
  • B:安装out-of-tree NFP Driver
  • C:使用BSP package(上面提到了kernel较老的时候如何configure interface)
  • D:使用dpdk-ns
  • E:updating flash
  • F:升级kernel
  • G:set_irq_affinity.sh Source(前面irq的替代脚本)

key schedule

https://github.com/goodnighthy/KeySched/tree/master/key_schedule

starting and stopping using upstart(ubuntu 14.04 and centos 6)

The RTE (NORMAL MODE) can be started/stopped by calling: start/stop nfp-sdk6-rte

The RTE(DEBUG MODE) can be started/stopped by calling: start/stop nfp-sdk6-rte-debug

The RTE(SIM MODE) can be started by calling: start/stop nfp-sdk6-rte-sim Before using this Upstart configuration set NETRODIR to the SDK6 simulator installation directory in the installed file nfp-sdk6-rte-sim.conf

The Hardware Debug Server can be started by calling: start/stop nfp-hwdbg-srv

To start the job at system ready uncomment the startup line in nfp-sdk6-rte.conf, nfp-sdk6-rte-debug.conf or nfp-hwdbg-srv.conf in /etc/init/.

To check whether the RTE job started correctly and is still running use: status nfp-sdk6-rte (add -debug for DEBUG MODE or -sim for SIM MODE) if the status show stop/waiting the RTE has stopped and an error probably occurred. Look at either the Upstart job log in /var/log/upstart/nfp-sdk6-rte.log (replace nfp-sdk6-rte with the job name you started) or look in /var/log/nfp-sdk-rte.log for RTE only logs. To have a continuous live log open either log with tail -f .log

starting and stopping using systemd(ubuntu 16.04 and centos/rhel 7)

The RTE (NORMAL MODE) can be started/stopped by calling: systemctl start/stop nfp-sdk6-rte

The RTE(DEBUG MODE) can be started/stopped by calling: systemctl start/stop pnfp-sdk6-rte-debug

The RTE(SIM MODE) can be started/stopped by calling: systemctl start/stop nfp-sdk6-rte-sim Before using this Upstart configuration set NETRODIR to the SDK6 simulator installation directory in the file /usr/lib/systemd/system/nfp-sdk6-rte-sim.service

The Hardware Debug Server can be started/stopped by calling: systemctl start/stop nfp-hwdbg-srv

To start the programs at system startup run the systemctl enable command for the specified service: systemctl enable nfp-sdk6-rte.service

To check whether the RTE service started correctly and is still running use: systemctl status nfp-sdk6-rte (add -debug for DEBUG MODE or -sim for SIM MODE) if the Active status show inactive (dead) the RTE has stopped and an error probably occurred. Look at either the Systemd journal or in the RTE logs for more detail on what error occurred.

For looking in the Systemd journal use the following command: journalctl -u nfp-sdk6-rte replace nfp-sdk6-rte with the service name you are using (ex nfp-sdk6-rte-debug or nfp-sdk6-rte-sim). Add the -f argument to follow the journal for a live log.

For only logs generated by the RTE look /var/log/nfp-sdk-rte.log, for continuous live log open the log with tail -f /var/log/nfp-sdk-rte.log