Configuring an SR-IOV network device
You specify the SR-IOV network device configuration for a node by creating an SR-IOV network node policy. The API object for the policy is part of the API group.
The following YAML describes an SR-IOV network node policy:
The following example describes the configuration for an InfiniBand device:
Example configuration for an InfiniBand device
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-ib-net-1
namespace: openshift-sriov-network-operator
spec:
resourceName: ibnic1
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 4
nicSelector:
vendor: "15b3"
deviceID: "101b"
rootDevices:
- "0000:19:00.0"
linkType: ib
isRdma: true
The following example describes the configuration for an SR-IOV network device in a OpenStack virtual machine:
Example configuration for an SR-IOV device in a virtual machine
apiVersion: sriovnetwork.openshift.io/v1
metadata:
name: policy-sriov-net-openstack-1
namespace: openshift-sriov-network-operator
spec:
resourceName: sriovnic1
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 1 (1)
nicSelector:
vendor: "15b3"
deviceID: "101b"
1 | The numVfs field is always set to 1 when configuring the node network policy for a virtual machine. |
2 | The netFilter field must refer to a network ID when the virtual machine is deployed on OpenStack. Valid values for netFilter are available from an SriovNetworkNodeState object. |
In some cases, you might want to split virtual functions (VFs) from the same physical function (PF) into multiple resource pools. For example, you might want some of the VFs to load with the default driver and the remaining VFs load with the vfio-pci
driver. In such a deployment, the pfNames
selector in your SriovNetworkNodePolicy custom resource (CR) can be used to specify a range of VFs for a pool using the following format: <pfname>#<first_vf>-<last_vf>
.
For example, the following YAML shows the selector for an interface named netpf0
with VF 2
through 7
:
pfNames: ["netpf0#2-7"]
netpf0
is the PF interface name.2
is the first VF index (0-based) that is included in the range.7
is the last VF index (0-based) that is included in the range.
You can select VFs from the same PF by using different policy CRs if the following requirements are met:
The
numVfs
value must be identical for policies that select the same PF.The VF index must be in the range of
0
to<numVfs>-1
. For example, if you have a policy withnumVfs
set to8
, then the<first_vf>
value must not be smaller than0
, and the<last_vf>
must not be larger than7
.The VFs ranges in different policies must not overlap.
The
<first_vf>
must not be larger than the<last_vf>
.
The following example illustrates NIC partitioning for an SR-IOV device.
The policy policy-net-1
defines a resource pool net-1
that contains the VF 0
of PF netpf0
with the default VF driver. The policy policy-net-1-dpdk
defines a resource pool net-1-dpdk
that contains the VF 8
to 15
of PF netpf0
with the vfio
VF driver.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-net-1
namespace: openshift-sriov-network-operator
spec:
resourceName: net1
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 16
nicSelector:
pfNames: ["netpf0#0-0"]
deviceType: netdevice
Policy policy-net-1-dpdk
:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-net-1-dpdk
spec:
resourceName: net1dpdk
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 16
nicSelector:
pfNames: ["netpf0#8-15"]
deviceType: vfio-pci
Verifying that the interface is successfully partitioned
Confirm that the interface partitioned to virtual functions (VFs) for the SR-IOV device by running the following command.
1 | Replace <interface> with the interface that you specified when partitioning to VFs for the SR-IOV device, for example, ens3f1 . |
Example output
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:d1:bc:01 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 5a:e7:88:25:ea:a0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 2 link/ether ce:09:56:97:df:f9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 3 link/ether 5e:91:cf:88:d1:38 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 4 link/ether e6:06:a1:96:2f:de brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
CustomResourceDefinition to OKD. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).
Prerequisites
You installed the OpenShift CLI (
oc
).You have access to the cluster as a user with the
cluster-admin
role.You have installed the SR-IOV Network Operator.
You have enough available nodes in your cluster to handle the evicted workload from drained nodes.
You have not selected any control plane nodes for SR-IOV network device configuration.
Procedure
Create an
SriovNetworkNodePolicy
object, and then save the YAML in the<name>-sriov-node-network.yaml
file. Replace<name>
with the name for this configuration.Optional: Label the SR-IOV capable cluster nodes with
SriovNetworkNodePolicy.Spec.NodeSelector
if they are not already labeled. For more information about labeling nodes, see “Understanding how to update labels on nodes”.Create the
SriovNetworkNodePolicy
object:$ oc create -f <name>-sriov-node-network.yaml
where
<name>
specifies the name for this configuration.After applying the configuration update, all the pods in
sriov-network-operator
namespace transition to theRunning
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Additional resources
After following the procedure to configure an SR-IOV network device, the following sections address some error conditions.
To display the state of nodes, run the following command:
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name>
where: <node_name>
specifies the name of a node with an SR-IOV network device.
Error output: Cannot allocate memory
"lastSyncError": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"
Confirm that global SR-IOV settings are enabled in the BIOS for the node.
Confirm that VT-d is enabled in the BIOS for the node.
As a cluster administrator, you can assign an SR-IOV network interface to your VRF domain by using the CNI VRF plugin.
To do this, add the VRF configuration to the optional metaPlugins
parameter of the SriovNetwork
resource.
Applications that use VRFs need to bind to a specific device. The common usage is to use the Using a VRF through the |
The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit |
To create an additional SR-IOV network attachment with the CNI VRF plugin, perform the following procedure.
Prerequisites
Install the OKD CLI (oc).
Log in to the OKD cluster as a user with cluster-admin privileges.
Procedure
Create the
SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment and insert themetaPlugins
configuration, as in the following example CR. Save the YAML as the filesriov-network-attachment.yaml
.Create the
SriovNetwork
resource:$ oc create -f sriov-network-attachment.yaml
Verifying that the NetworkAttachmentDefinition
CR is successfully created
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command.$ oc get network-attachment-definitions -n <namespace> (1)
1 Replace <namespace>
with the namespace that you specified when configuring the network attachment, for example,additional-sriov-network-1
.Example output
NAME AGE
additional-sriov-network-1 14m
There might be a delay before the SR-IOV Network Operator creates the CR.
Verifying that the additional SR-IOV network attachment is successful
To verify that the VRF CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create an SR-IOV network that uses the VRF CNI.
Assign the network to a pod.
Verify that the pod network attachment is connected to the SR-IOV additional network. Remote shell into the pod and run the following command:
$ ip vrf show
Example output
Name Table
-----------------------
red 10
-
Example output
...
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode