From ac580e3db76f2a7c421842c48be9479e03fdb0ce Mon Sep 17 00:00:00 2001 From: Balendu Mouli Burla Date: Mon, 11 Apr 2022 17:04:24 -0500 Subject: [PATCH] Wireless FEC Opeator application for StarlingX This specification describe Intel Wireless FEC operator application for StarlingX. Story: 2009749 Task: 44206 Change-Id: Ie84b97f81d5ae21bc2fcf1f57a8298b923a65bf8 --- ...eless-FEC-operator-2009749-integration.rst | 375 ++++++++++++++++++ 1 file changed, 375 insertions(+) create mode 100644 doc/source/specs/stx-7.0/approved/wireless-FEC-operator-2009749-integration.rst diff --git a/doc/source/specs/stx-7.0/approved/wireless-FEC-operator-2009749-integration.rst b/doc/source/specs/stx-7.0/approved/wireless-FEC-operator-2009749-integration.rst new file mode 100644 index 0000000..e2dc43a --- /dev/null +++ b/doc/source/specs/stx-7.0/approved/wireless-FEC-operator-2009749-integration.rst @@ -0,0 +1,375 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + +================================= +Wireless FEC Operator integration +================================= + +Integration of Intel Wireless FEC Operator to StarlingX platform +================================================================ + +Storyboard: +https://storyboard.openstack.org/#!/story/2009749 + +In a distributed cloud environment for vRAN workloads, there may be hundreds +of sub-clouds, with each sub-cloud having one or more worker nodes managed +by a System Controller, Some of these sub-clouds have worker nodes with +Intel accelerator devices to offload 4G/LTE and 5G FEC (Forward Error +Correction) operations. + +These FEC devices have the flexibility to configure the hardware resource +on a per vRAN workload basis to gain the optimal performance. In a typical +scenario based on deployment locations, individual vRAN workload requirements +may vary. + +For an admin to manage and/or configure these Intel FEC accelerated devices +in a containerized environment, additional functionality is required. The +current configurability method in StartlingX does not support the flexibility +to configure all the parameters in FEC h/w accelerator and has a +pre-defined/static configuration options for typical workloads. + +Problem description +=================== + +Today in StarlingX, configuration of FEC devices is performed through a user +application "pf-bb-config", which in turn statically sets configurable +parameters through a config file. Current version of StarlingX does support +the configuration a few parameters (only 1 or 2) of FEC devices through +"system" commands which in turn triggers puppet to "pf-bb-config" application +when the system is unlocked. + +Current configurability option uses pre-defined/static config files to +configure FEC devices to support most the common vRAN workload requirements. +To support other combinations of configurations and changing the configuration +on different nodes in a cluster requires to add and maintain this configuration +file in a somewhat unsupported fashion. + +In addition to that, the next generation FEC devices ie., ACC101, ACC200, ... +support may need enhancements to the existing configuration method. + +The Intel supported FEC Operator is a SRO (Special Resource Operator) +for K8s which performs: +* detects and labels the nodes which have FEC h/w accelerators installed +* Configuration of FEC devices through standard K8s APIs (in JSON format) +* Validation of FEC device configuration parameters +* Configuration can be applied at cluster level or node level and device level +* deployable through Kustomize/Helm deployment models +* Support for next generation FEC devices is seamless + + +Use Cases +--------- + +FEC Operator is an optional system application for the vRAN deployments +where there is a need for fine tuning Intel FEC h/w accelerator resources +(ie., number of VFs, queues, queue groups, etc..) based on deployment workloads. + +List of parameters that can be configurable through the FEC Operator are: + +* Number of VF interface (VF bundles) +* PF/VF mode +* Enabling 4Gonly, 5Gonly or both 4G and 5G +* for each direction (uplink/downlink) configuration of: +* number of queuegroups, aqsPerGroup and aqDepth + +User has the flexibility to apply these configuration per devices per node +in a cluster using the native kubectl API interface. + +Proposed change +=============== + +The current method of configuration of FEC devices will be the default +configuration for existing vRAN deployments that will not be changed. + +FEC Operator will be added as an optional System application +(sriov-fec-operator), which by default will be disabled (i.e. not applied or +uploaded). Deployment of FEC operator is through helm charts packaged in the +new system application manifest. Users on demand, can enable, deploy and +configure the FEC operator by updating and applying helm overrides for the +new system application. + +FEC Operator functionality is distributed in few PODs: +* sriov-fec-controller-manager + + * Runs on all master nodes in cluster, provides K8s Custom Resource + API services for FEC device configuration, + * communicates with FEC operator service running on each node + to configure the FEC devices and reconciling. + +* sriov-fec-daemonset + + * Runs on each node in cluster,receives configuration from + controller-manager + * Detects the FEC devices on the platform/node + * Based on data configured in SriovFecClusterConfig CRD + * Binds the PF (Physical Function) interface with required driver + ie., igb_uio or pci-pf-stub. + * Creates the required number of VF interfaces + * Bind the VF interface with driver (igb_uio, vfio-pci) + * configure the FEC device using the pf-bb-config tool + +* sriov-device-plugin + + * Runs on each node, to manage the FEC device SR-IOV VF (Virtual Function) + resources configured to user application PODs. + +* accelerator-discovery + + * Runs on each node to detect the FEC devices on each node + * label the nodes which have FEC device + +With the two different methods of FEC device configuration, + method-1: Default, existing method + method-2: using FEC Operator + + Method-1(existing method) is the default method applied on node startup. + If SriovFecClusterConfig CRD is applied then sriov-fec-daemonset on the + node will overwrite the existing configuration for that particular device + on the node. + + If admin want switch back to default static method, then performs the + SriovFecClusterConfig CRD delete operation and reconfigure the device + through method-1. + + NOTE: + + Reconfiguration and/or switching between configuration methods will + impact the FEC device usage for the vRAN application PODs. Below listed + steps recommended to follow during reconfiguration and/or switching + configuration methods. + + - vRAN Application PODs should stop using the FEC devices and terminated. + - Perform reconfiguration of device or switch the method and reconfigure. + - Redeploy the vRAN application PODs to use the FEC device. + +FEC devices supported through FEC Operator in STX 7 are: + ACC100(Mt.Bryce), N3000 FPGA + + +Alternatives +------------ + +The current method of configuration to FEC devices is the default method of +configuration and enabled by-default. + +Configuration through FEC Operator is an optional alternative method. + +Data model impact +----------------- + +Sriov-fec-operator application is introducing the new +SriovFecClusterConfig CRD to the cluster. + + +Sample Cluster configuration: +----------------------------- + +.. code-block:: none + + apiVersion: sriovfec.intel.com/v2 + kind: SriovFecClusterConfig + metadata: + name: config + namespace: sriov-fec-system + spec: + priority: 1 + nodeSelector: + kubernetes.io/hostname: + acceleratorSelector: + pciAddress: 00000:17:00.0 + physicalFunction: + pfDriver: "pci-pf-stub" + vfDriver: "vfio-pci" + vfAmount: 16 + bbDevConfig: + acc100: + # Programming mode: 0 = VF Programming, 1 = PF Programming + pfMode: false + numVfBundles: 16 + maxQueueSize: 1024 + uplink4G: + numQueueGroups: 0 + numAqsPerGroups: 16 + aqDepthLog2: 4 + downlink4G: + numQueueGroups: 0 + numAqsPerGroups: 16 + aqDepthLog2: 4 + uplink5G: + numQueueGroups: 4 + numAqsPerGroups: 16 + aqDepthLog2: 4 + downlink5G: + numQueueGroups: 4 + numAqsPerGroups: 16 + aqDepthLog2: 4 + +sriov_fec_cluster_config parameters description: +------------------------------------------------ + +* ``name``: Name of the specific config. +* ``cluster_config_name``: Name of the cluster config. +* ``priority``: Priority of deployment (lower number higher priority). +* ``drainskip``: Allows for skipping the draining of the node after + config application. +* ``selected_node``: (Optional) field that can be used to target only + specific node. +* ``pf_driver``: The PF driver to be used igb_uio or pci-pf-stub. +* ``vf_driver``: The VF driver to be used vfio-pci or igb_uio. +* ``vf_amount``: The amount of VFs to be created for the device. +* ``bbdevconfig``: + + * ``pf_mode``: The mode in which accelerator will be programmed, + it is expected that VFs will be used and this is set to false. + * ``num_vf_bundles``: Number of VF bundles this should correspond + to the vf_amount field. + * ``max_queue_size``: Max queue size this field is not expected to + change in most deployments. + * ``ul4g_num_queue_groups``: Number of 4G Uplink queue groups, + there is in total 8 queue groups that can be distributed between + 4G/5G Uplink/Downlink. + * ``ul4g_num_aqs_per_groups``: Number of aqs per group - not expected + to change for most deployments. + * ``ul4g_aq_depth_log2``: Log depth + * ``dl4g_num_queue_groups``: Number of 4G Downlink queue groups, + there is in total 8 queue groups that can be distributed between + 4G/5G Uplink/Downlink. + * ``dl4g_num_aqs_per_groups``: Number of aqs per group, + not expected to change for most deployments. + * ``dl4g_aq_depth_log2``: Log depth. + * ``ul5g_num_queue_groups``: Number of 5G Uplink queue groups, + there is in total 8 queue groups that can be distributed between 4G/5G + Uplink/Downlink - here 4 queues are used for 5G Uplink. + * ``ul5g_num_aqs_per_groups``: Number of aqs per group, + not expected to change for most deployments. + * ``ul5g_aq_depth_log2``: Log depth. + * ``dl5g_num_queue_groups``: Number of 5G Downlink queue groups, + there is in total 8 queue groups that can be distributed between, + 4G/5G Uplink/Downlink - here 4 queues are used for 5G Downlink. + * ``dl5g_num_aqs_per_groups``: Number of aqs per group, + not expected to change for most deployments. + * ``dl5g_aq_depth_log2``: Log depth. + +REST API impact +--------------- + +Standard extension of K8s APIs based on introduction of +SriovFecClusterConfig CRD. + + +Security impact +--------------- + +Current/Existing K8S Authentication and Authorization apply to standard +extension of K8S APIs based on introduction of SriovFecClusterConfig CRD. + +Other end user impact +--------------------- + +End user will have the capability of more detailed configuration of FEC Devices. + + +Performance Impact +------------------ + +* In the existing method (method-1) configuration, resources (cpu and memory) + will be consumed only during the configuration. + +* Using the FEC Operator method, service PODs will be running on master and + worker nodes all the time which will consume some amount of CPU and memory + resource from cluster housekeeping, which we believe this to be negligible. + +* For a periodic reconciling, communication between controller-manager and + fec-daemon may consume network resources as well, assuming negligible. + +Other deployer impact +--------------------- +None. + +Upgrade impact +-------------- +None. The sriov-fec-operator application is optional. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + +* Balendu Mouli Burla (balendu) + +Other contributors: + +* Nidhi Shivashankara Belur (nshivash) + +Repos Impacted +-------------- + +A new system-application repo will be created for the definition and building +of the new sriov-fec-operator application. + +Work Items +---------- + +* Create sriov-fec-operator application package +* Integrate sriov-fec-operator application to FlexCD. Add application + upload/apply/remove/delete commands. +* Update the docs.starlingx.io for HowTo configure FEC devices using FEC + operator application. + +Dependencies +============ + +None + +Testing +======= + +* Testing will be performed on both SimpleX and DupleX mode deployment + configurations. +* Following functional validations will be performed + + * Check by default FEC operator is disable when node startsup first time. + * Check the static configuration of FEC operator, make sure existing + functionality is good. + * Check enable/disable functionality of FEC operator in cluster. + * Configure the FEC device with FEC Operator, to make sure it overrides the + default configuration and verify the FEC functionality. + * Delete the CRD configuration, re-configure the device through static + configuration and verify the FEC functionality + * Configure the device through FEC operator and reboot the node, check the + node comes up with new configuration applied through fec-operator. + +Documentation Impact +==================== + +docs.starlingx.io will be updated for: +* How to upload and apply sriov-fec-operator application + +* How to perform enhanced configuration of FEC devices with + SriovFecClusterConfig CRD. + +References +========== + +Intel FEC Operator: +https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md + +Acronyms +-------- + +- FEC : Forward Error Correction +- LTE : Long Term Evolution +- vRAN : Virtual Radio Access Network +- SR-IOV : Single Root - Input/Output Virtualization +- PF : Physical Function +- VF : Virtual Function +- CRD : Custom Resource Definition + +History +======= + +Initial Version.