From c6b6faa3e019a6627b9994e5f66b317e6a4f9a39 Mon Sep 17 00:00:00 2001 From: Rafal Lal Date: Fri, 17 Mar 2023 12:59:30 +0100 Subject: [PATCH] Add Intel Ethernet Operator spec Story: 2010562 Task: 47839 Signed-off-by: Rafal Lal Change-Id: I659aac562b07c644e6b48f26755b5c664075027b --- .../IEO_Starlingx_Spec_submission.rst | 420 ++++++++++++++++++ 1 file changed, 420 insertions(+) create mode 100644 doc/source/specs/stx-9.0/approved/IEO_Starlingx_Spec_submission.rst diff --git a/doc/source/specs/stx-9.0/approved/IEO_Starlingx_Spec_submission.rst b/doc/source/specs/stx-9.0/approved/IEO_Starlingx_Spec_submission.rst new file mode 100644 index 0000000..eacf21b --- /dev/null +++ b/doc/source/specs/stx-9.0/approved/IEO_Starlingx_Spec_submission.rst @@ -0,0 +1,420 @@ +Integration of Intel Ethernet Operator to StarlingX Platform +============================================================ + +Storyboard: +https://storyboard.openstack.org/#!/story/2010562 + +In a cloud environment, network interface adapters require a cloud based +management system. These adapters may have advanced functionality which is +best combined under a single operator. + +Problem description +=================== + +Firmware for Network Adapters may need management. Network Adapter +personalization may need management. Activation of flow rules to ensure +interfaces reach pods is also required. + +Use Cases +--------- + +* Update of Firmware on interface adapters +* Update of Device Dynamic Personalization on interface adapters +* Update of Flow Configuration on interface adapters to allow steering + +Dynamic Device Personalization (DDP) is the on-chip programmable pipeline +which allows deep and diverse protocol header processing. Flow configuration +allow the steering of traffic to particular VFs on the node. + +Proposed change +=============== + +The Intel Ethernet Operator (IEO) will allow Intel E810 Series NICs +firmware to be updated in a container environment. Nodes will be drained, +taken out of service and restarted as required by the update. Firmware and +DDP packages can be downloaded from a suitable HTTP Server +(configurable in EthernetNodeConfig Custom Resource). + +Intel Ethernet Operator also requires some other plugins and operators: + +Intel Ethernet operator requires SR-IOV Device Plugin which makes SR-IOV +resources available in kubernetes. For ease of configuration SR-IOV Network +Operator is also required. SR-IOV Network Operator requires the use of Node +Feature Discovery. Both SR-IOV Network Operator and Node Feature Discovery +are installed as dependencies along IEO in intel-ethernet-operator namespace. + +Flow rules require the inclusion of the (Unified Flow Tool) UFT server +application. UFT applies the flow rules and is called using the +DPDK rte_flow API. The API supports the Switch Filter rules, supported +by DPDK rte_flow. UFT is included as part of the IEO installation. + +Common features +--------------- + +Within the ethernet operator on the control node, the controller deploys the +first asset (ethernet-discovery = labeler) - this is being deployed as a pod +on each node, the labeler marks the node if a supported device is connected. +The controller deploys a compatibility map - a config file specifying which +FW/DDP/Kernel versions can work together. + +The controller deploys the Ethernet-daemon (FW/DDP daemon) as a Daemonset, +nodes with appropriate label get a pod deployed on them, others don't. +The Ethernet daemon checks for Node configuration, if one is not found it +creates it. +The Daemon reconciles in a loop, gathers the status of the required components +(found devices, PCI address, MAC, FW, DDP version etc) and updates the Node +configuration with a status. +User can now get a status of all Node configs, and status of a specific Node +config. + +Firmware and DDP upgrade +------------------------ + +User uploads desired DDP package and/or nvmupdate package to a HTTP server +accessible by the cluster (the HTTP server and mechanism to upload are out of +scope of operator). User can now apply a new cluster configuration with +preferred configuration, this is broken down by the Ethernet Controller into +smaller Node configurations, the configurations are updated. +The Ethernet Daemon reconciles in a loop for an update, if condition +(fields in applied EthernetClusterConfig CRD) is unchanged it ignores, +if new conditions for other nodes are detected it ignores them, when +a condition change is detected for particular daemon it acts on it, +it will verify the condition and deny change if it cannot be met. +If condition can be met it will run appropriate functions/actions to update +the node to the desired condition (ie DDP/FW update) - it will try to download +packages from specified address from HTTP server, it will elect a leader to +act as a controller, it will cordon off and drain the node, it will proceed +with updates, it will reboot the node, uncordon it and release the leadership. +Once any update to configuration is done, it will update the node +configuration status. Once the update is finished the user is able to get the +status of the update and status of the node. + +Flow Configuration +------------------ + +To allow the Flow Configuration feature to compose the flow rules for the +network card's traffic, the deployment must use a trusted virtual function +(VF) from each physical function (PF). Usually it is the first VF (VF0) for +each PF that has trust mode enabled and then bound to the vfio-pci driver. +This VF pool must be created by the user and be allocatable as a Kubernetes +resource. + +Rules can be written like rte_flow and will allow deep matching of packet type +flows to interfaces associated with pods on a cluster. Rules can be written +for cluster and pod. During pod scheduling they will be instantiated on +a node to configure the flow offload hardware on interface to target a pod +attached via a particular VF. + +Alternatives +============ + +It's possible to connect to each node, untar and install the firmware and +Device Profiles. Similarly, flow offloads could possibly be done individually +on each node. + +Data model impact +================= + +IEO introduces following CRDs on the cluster: +- EthernetClusterConfig +- FlowConfigNodeAgentDeployment +- NodeFlowConfig +- ClusterFlowConfig +- EthernetNodeConfig (NICs configuration status, not created by user) + +EthernetClusterConfig +===================== + +.. code-block:: yaml + + apiVersion: ethernet.intel.com/v1 + kind: EthernetClusterConfig + metadata: + name: config + spec: + nodeSelectors: + kubernetes.io/hostname: + deviceSelector: + pciAddress: "" + deviceConfig: + fwURL: "" + fwChecksum: "" + ddpURL: "" + ddpChecksum: "" + +Parameters +---------- + +* ``name``: Name of the specific config +* ``kubernetes.io/hostname``: Hostname containing cards to be updated +* ``fwURL``: Accessible URL for the file. Proxy may be needed +* ``fwChecksum``: Expected checksum of the firmware file +* ``ddpURL``: Accessible URL for the DDP file. Proxy may be needed +* ``fwChecksum``: Expected checksum of the DDP file + +FlowConfigNodeAgentDeployment +============================= + +.. code-block:: yaml + + apiVersion: flowconfig.intel.com/v1 + kind: FlowConfigNodeAgentDeployment + metadata: + labels: + control-plane: flowconfig-daemon + name: flowconfig-daemon-deployment + namespace: intel-ethernet-operator + spec: + DCFVfPoolName: openshift.io/cvl_uft_admin + NADAnnotation: sriov-cvl-dcf + +Parameters +---------- + +* ``name``: Name of the FlowConfigNodeAgentDeployment +* ``DCFVfPoolName``: Used SriovNetworkNodePolicy name +* ``NADAnnotation``: Used SriovNetwork name + +NodeFlowConfig +=============== + +.. code-block:: yaml + + apiVersion: flowconfig.intel.com/v1 + kind: NodeFlowConfig + metadata: + name: worker-01 + spec: + rules: + - pattern: + - type: RTE_FLOW_ITEM_TYPE_ETH + - type: RTE_FLOW_ITEM_TYPE_IPV4 + spec: + hdr: + src_addr: 10.56.217.9 + mask: + hdr: + src_addr: 255.255.255.255 + - type: RTE_FLOW_ITEM_TYPE_END + action: + - type: RTE_FLOW_ACTION_TYPE_DROP + - type: RTE_FLOW_ACTION_TYPE_END + portId: 0 + attr: + +Parameters +---------- + +* ``name``: Name of the config - needs to match node name +* ``pattern: type``: Header part to match on +* ``pattern: spec & mask``: Addresses to match for the rules +* ``action``: Alter the fate of matching traffic, its contents or properties +* ``attr``: Flow rule priority level +* ``portID``: Information to identify port on a node + +ClusterFlowConfig +================= + +.. code-block:: yaml + + apiVersion: flowconfig.intel.com/v1 + kind: ClusterFlowConfig + metadata: + name: pppoes-sample + spec: + rules: + - pattern: + - type: RTE_FLOW_ITEM_TYPE_ETH + - type: RTE_FLOW_ITEM_TYPE_IPV4 + spec: + hdr: + src_addr: 10.56.217.9 + mask: + hdr: + src_addr: 255.255.255.255 + - type: RTE_FLOW_ITEM_TYPE_END + action: + - type: to-pod-interface + conf: + podInterface: net1 + attr: + ingress: 1 + priority: 0 + podSelector: + matchLabels: + app: vagf + role: controlplane + +Parameters +---------- + +* ``name``: Name of the config +* ``pattern: type``: Header part to match on +* ``pattern: spec & mask``: Addresses to match for the rules +* ``action``: Alter the fate of matching traffic, its contents or properties +* ``attr``: Flow rule priority level +* ``podSelector``: Labels associated with the particular pod + +NOTE: Most of the objects parameters names are consistent with the names given +in the official dpdk rte flow documentation. For the full description of +Generic flow API see https://doc.dpdk.org/guides/prog_guide/rte_flow.html. + +During the course of execution, ClusterFlowConfig rules are broken down to +NodeFlowConfig rules. NodeFlowConfig rules can also be written manually. + +REST API impact +--------------- + +Standard extension of K8s APIs based on introduction of above CRDs. + +Security impact +--------------- + +Current/Existing K8S Authentication and Authorization apply to standard +extension of K8S APIs based on introduction of IEO CRDs. + +Other end user impact +--------------------- + +End user will have the capability to: +- control firmware and DDP packages +- configure flow rules +- display configuration status +on intel ethernet devices. + +Performance Impact +------------------ + +Using the Intel Ethernet Operator, service pods will be running on master +and worker nodes all the time which will consume some amount of CPU and memory +resource from cluster housekeeping, which we believe to be negligible. +For a periodic reconciling, communication between controller-manager and node +daemons may consume network resources as well, assuming negligible. + +Other deployer impact +--------------------- + +None. + +Developer impact +---------------- + +In StarlingX 8.0 and future releases /lib/firmware directory is read-only. +This creates problem for any customer that would want to use DDP profile +other than one that comes preinstalled. Intel ice driver looks for DDP package +named intel/ice/ddp/stx-ice.pkg in default firmware search paths - which are +/lib/firmware and /lib/firmware/updates. Both of these paths are immutable +so currently there is no way to change DDP package in use. Solution to this +is alternate firmware search path that is already in kernel +https://docs.kernel.org/driver-api/firmware/fw_search_path.html. This feature +can be enabled by adding suitable boot parameter. Contribution that adds that +to StarlingX is already made (in stx-puppet repository). + +Upgrade impact +-------------- + +None. This is an optional operator. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Rafal Lal + +Other contributors: + Kevin Clarke + +Repos Impacted +-------------- + +A new system-application repo will be created for the definition and building +of the intel-ethernet-operator application. + +Work Items +---------- + +Create intel-ethernet-operator application package + +Integrate intel-ethernet-operator application to FluxCD. Add application +upload/apply/remove/delete commands. + +Update the docs.starlingx.io for How To use intel-ethernet-operator to +configure ethernet cards. + +Building images +--------------- + +Intel Ethernet Operator team would like to redirect building of UFT container +image to StarlingX. Source code of the image is publicly available, and we +would provide build scripts. Images of other components would be built and +made ready to pull by Intel. + +Dependencies +============ + +None specific. + +Testing +======= + +Testing will be done on a multi node cluster configuration. + +* Testing of packages across several revisions of packages +* Validating firmware installs, DDP package installs. +* Testing that traffic flow is instantiated to correct pods. +* CRDs for particular functionality effect the change on the cluster +* Manually deleting / changing the configuration to validate controllers make + the changes +* reboot of nodes to validate new configuration remains +* reload of drivers to validate new configuration remains. + +Documentation Impact +==================== + +docs.starlingx.io will be updated for: +* How to use intel-ethernet-operator application +* How to perform enhanced configuration of ethernet devices with the CRDs supplied by Ethernet Operator. + +References +========== + +IntelĀ® Ethernet Operator - Overview Solution Brief +https://networkbuilders.intel.com/solutionslibrary/intel-ethernet-operator-overview-solution-brief + +Intel Ethernet Operator +https://github.com/intel/intel-ethernet-operator + +Unified Flow Tool (UFT) +https://github.com/intel/UFT/tree/main + +Intel Ethernet 810 series features +https://www.intel.com/content/www/us/en/products/details/ethernet/800-controllers/e810-controllers/docs.html + +Node Feature Discovery +https://github.com/kubernetes-sigs/node-feature-discovery + +SR-IOV Network Operator +https://github.com/k8snetworkplumbingwg/sriov-network-operator + +SR-IOV Network Device Plugin for Kubernetes +https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - 02-Feb-2023 + - Introducing Ethernet operator + * - 02-Feb-2023 + - Updated with comments from StarlingX Sub-Project Meeting + * - 03-Mar-2023 + - Submission + * - 29-Mar-2023 + - Updated with comments from StarlingX Sub-Project Meeting + * - 22-Jun-2023 + - Updated with comments from code reviews