Wireless FEC Opeator application for StarlingX
This specification describe Intel Wireless FEC operator application for StarlingX. Story: 2009749 Task: 44206 Change-Id: Ie84b97f81d5ae21bc2fcf1f57a8298b923a65bf8
This commit is contained in:
parent
dfaeb38ab7
commit
ac580e3db7
|
@ -0,0 +1,375 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=================================
|
||||||
|
Wireless FEC Operator integration
|
||||||
|
=================================
|
||||||
|
|
||||||
|
Integration of Intel Wireless FEC Operator to StarlingX platform
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
Storyboard:
|
||||||
|
https://storyboard.openstack.org/#!/story/2009749
|
||||||
|
|
||||||
|
In a distributed cloud environment for vRAN workloads, there may be hundreds
|
||||||
|
of sub-clouds, with each sub-cloud having one or more worker nodes managed
|
||||||
|
by a System Controller, Some of these sub-clouds have worker nodes with
|
||||||
|
Intel accelerator devices to offload 4G/LTE and 5G FEC (Forward Error
|
||||||
|
Correction) operations.
|
||||||
|
|
||||||
|
These FEC devices have the flexibility to configure the hardware resource
|
||||||
|
on a per vRAN workload basis to gain the optimal performance. In a typical
|
||||||
|
scenario based on deployment locations, individual vRAN workload requirements
|
||||||
|
may vary.
|
||||||
|
|
||||||
|
For an admin to manage and/or configure these Intel FEC accelerated devices
|
||||||
|
in a containerized environment, additional functionality is required. The
|
||||||
|
current configurability method in StartlingX does not support the flexibility
|
||||||
|
to configure all the parameters in FEC h/w accelerator and has a
|
||||||
|
pre-defined/static configuration options for typical workloads.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Today in StarlingX, configuration of FEC devices is performed through a user
|
||||||
|
application "pf-bb-config", which in turn statically sets configurable
|
||||||
|
parameters through a config file. Current version of StarlingX does support
|
||||||
|
the configuration a few parameters (only 1 or 2) of FEC devices through
|
||||||
|
"system" commands which in turn triggers puppet to "pf-bb-config" application
|
||||||
|
when the system is unlocked.
|
||||||
|
|
||||||
|
Current configurability option uses pre-defined/static config files to
|
||||||
|
configure FEC devices to support most the common vRAN workload requirements.
|
||||||
|
To support other combinations of configurations and changing the configuration
|
||||||
|
on different nodes in a cluster requires to add and maintain this configuration
|
||||||
|
file in a somewhat unsupported fashion.
|
||||||
|
|
||||||
|
In addition to that, the next generation FEC devices ie., ACC101, ACC200, ...
|
||||||
|
support may need enhancements to the existing configuration method.
|
||||||
|
|
||||||
|
The Intel supported FEC Operator is a SRO (Special Resource Operator)
|
||||||
|
for K8s which performs:
|
||||||
|
* detects and labels the nodes which have FEC h/w accelerators installed
|
||||||
|
* Configuration of FEC devices through standard K8s APIs (in JSON format)
|
||||||
|
* Validation of FEC device configuration parameters
|
||||||
|
* Configuration can be applied at cluster level or node level and device level
|
||||||
|
* deployable through Kustomize/Helm deployment models
|
||||||
|
* Support for next generation FEC devices is seamless
|
||||||
|
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
FEC Operator is an optional system application for the vRAN deployments
|
||||||
|
where there is a need for fine tuning Intel FEC h/w accelerator resources
|
||||||
|
(ie., number of VFs, queues, queue groups, etc..) based on deployment workloads.
|
||||||
|
|
||||||
|
List of parameters that can be configurable through the FEC Operator are:
|
||||||
|
|
||||||
|
* Number of VF interface (VF bundles)
|
||||||
|
* PF/VF mode
|
||||||
|
* Enabling 4Gonly, 5Gonly or both 4G and 5G
|
||||||
|
* for each direction (uplink/downlink) configuration of:
|
||||||
|
* number of queuegroups, aqsPerGroup and aqDepth
|
||||||
|
|
||||||
|
User has the flexibility to apply these configuration per devices per node
|
||||||
|
in a cluster using the native kubectl API interface.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
The current method of configuration of FEC devices will be the default
|
||||||
|
configuration for existing vRAN deployments that will not be changed.
|
||||||
|
|
||||||
|
FEC Operator will be added as an optional System application
|
||||||
|
(sriov-fec-operator), which by default will be disabled (i.e. not applied or
|
||||||
|
uploaded). Deployment of FEC operator is through helm charts packaged in the
|
||||||
|
new system application manifest. Users on demand, can enable, deploy and
|
||||||
|
configure the FEC operator by updating and applying helm overrides for the
|
||||||
|
new system application.
|
||||||
|
|
||||||
|
FEC Operator functionality is distributed in few PODs:
|
||||||
|
* sriov-fec-controller-manager
|
||||||
|
|
||||||
|
* Runs on all master nodes in cluster, provides K8s Custom Resource
|
||||||
|
API services for FEC device configuration,
|
||||||
|
* communicates with FEC operator service running on each node
|
||||||
|
to configure the FEC devices and reconciling.
|
||||||
|
|
||||||
|
* sriov-fec-daemonset
|
||||||
|
|
||||||
|
* Runs on each node in cluster,receives configuration from
|
||||||
|
controller-manager
|
||||||
|
* Detects the FEC devices on the platform/node
|
||||||
|
* Based on data configured in SriovFecClusterConfig CRD
|
||||||
|
* Binds the PF (Physical Function) interface with required driver
|
||||||
|
ie., igb_uio or pci-pf-stub.
|
||||||
|
* Creates the required number of VF interfaces
|
||||||
|
* Bind the VF interface with driver (igb_uio, vfio-pci)
|
||||||
|
* configure the FEC device using the pf-bb-config tool
|
||||||
|
|
||||||
|
* sriov-device-plugin
|
||||||
|
|
||||||
|
* Runs on each node, to manage the FEC device SR-IOV VF (Virtual Function)
|
||||||
|
resources configured to user application PODs.
|
||||||
|
|
||||||
|
* accelerator-discovery
|
||||||
|
|
||||||
|
* Runs on each node to detect the FEC devices on each node
|
||||||
|
* label the nodes which have FEC device
|
||||||
|
|
||||||
|
With the two different methods of FEC device configuration,
|
||||||
|
method-1: Default, existing method
|
||||||
|
method-2: using FEC Operator
|
||||||
|
|
||||||
|
Method-1(existing method) is the default method applied on node startup.
|
||||||
|
If SriovFecClusterConfig CRD is applied then sriov-fec-daemonset on the
|
||||||
|
node will overwrite the existing configuration for that particular device
|
||||||
|
on the node.
|
||||||
|
|
||||||
|
If admin want switch back to default static method, then performs the
|
||||||
|
SriovFecClusterConfig CRD delete operation and reconfigure the device
|
||||||
|
through method-1.
|
||||||
|
|
||||||
|
NOTE:
|
||||||
|
|
||||||
|
Reconfiguration and/or switching between configuration methods will
|
||||||
|
impact the FEC device usage for the vRAN application PODs. Below listed
|
||||||
|
steps recommended to follow during reconfiguration and/or switching
|
||||||
|
configuration methods.
|
||||||
|
|
||||||
|
- vRAN Application PODs should stop using the FEC devices and terminated.
|
||||||
|
- Perform reconfiguration of device or switch the method and reconfigure.
|
||||||
|
- Redeploy the vRAN application PODs to use the FEC device.
|
||||||
|
|
||||||
|
FEC devices supported through FEC Operator in STX 7 are:
|
||||||
|
ACC100(Mt.Bryce), N3000 FPGA
|
||||||
|
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
The current method of configuration to FEC devices is the default method of
|
||||||
|
configuration and enabled by-default.
|
||||||
|
|
||||||
|
Configuration through FEC Operator is an optional alternative method.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Sriov-fec-operator application is introducing the new
|
||||||
|
SriovFecClusterConfig CRD to the cluster.
|
||||||
|
|
||||||
|
|
||||||
|
Sample Cluster configuration:
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
apiVersion: sriovfec.intel.com/v2
|
||||||
|
kind: SriovFecClusterConfig
|
||||||
|
metadata:
|
||||||
|
name: config
|
||||||
|
namespace: sriov-fec-system
|
||||||
|
spec:
|
||||||
|
priority: 1
|
||||||
|
nodeSelector:
|
||||||
|
kubernetes.io/hostname: <node-label>
|
||||||
|
acceleratorSelector:
|
||||||
|
pciAddress: 00000:17:00.0
|
||||||
|
physicalFunction:
|
||||||
|
pfDriver: "pci-pf-stub"
|
||||||
|
vfDriver: "vfio-pci"
|
||||||
|
vfAmount: 16
|
||||||
|
bbDevConfig:
|
||||||
|
acc100:
|
||||||
|
# Programming mode: 0 = VF Programming, 1 = PF Programming
|
||||||
|
pfMode: false
|
||||||
|
numVfBundles: 16
|
||||||
|
maxQueueSize: 1024
|
||||||
|
uplink4G:
|
||||||
|
numQueueGroups: 0
|
||||||
|
numAqsPerGroups: 16
|
||||||
|
aqDepthLog2: 4
|
||||||
|
downlink4G:
|
||||||
|
numQueueGroups: 0
|
||||||
|
numAqsPerGroups: 16
|
||||||
|
aqDepthLog2: 4
|
||||||
|
uplink5G:
|
||||||
|
numQueueGroups: 4
|
||||||
|
numAqsPerGroups: 16
|
||||||
|
aqDepthLog2: 4
|
||||||
|
downlink5G:
|
||||||
|
numQueueGroups: 4
|
||||||
|
numAqsPerGroups: 16
|
||||||
|
aqDepthLog2: 4
|
||||||
|
|
||||||
|
sriov_fec_cluster_config parameters description:
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
|
* ``name``: Name of the specific config.
|
||||||
|
* ``cluster_config_name``: Name of the cluster config.
|
||||||
|
* ``priority``: Priority of deployment (lower number higher priority).
|
||||||
|
* ``drainskip``: Allows for skipping the draining of the node after
|
||||||
|
config application.
|
||||||
|
* ``selected_node``: (Optional) field that can be used to target only
|
||||||
|
specific node.
|
||||||
|
* ``pf_driver``: The PF driver to be used igb_uio or pci-pf-stub.
|
||||||
|
* ``vf_driver``: The VF driver to be used vfio-pci or igb_uio.
|
||||||
|
* ``vf_amount``: The amount of VFs to be created for the device.
|
||||||
|
* ``bbdevconfig``:
|
||||||
|
|
||||||
|
* ``pf_mode``: The mode in which accelerator will be programmed,
|
||||||
|
it is expected that VFs will be used and this is set to false.
|
||||||
|
* ``num_vf_bundles``: Number of VF bundles this should correspond
|
||||||
|
to the vf_amount field.
|
||||||
|
* ``max_queue_size``: Max queue size this field is not expected to
|
||||||
|
change in most deployments.
|
||||||
|
* ``ul4g_num_queue_groups``: Number of 4G Uplink queue groups,
|
||||||
|
there is in total 8 queue groups that can be distributed between
|
||||||
|
4G/5G Uplink/Downlink.
|
||||||
|
* ``ul4g_num_aqs_per_groups``: Number of aqs per group - not expected
|
||||||
|
to change for most deployments.
|
||||||
|
* ``ul4g_aq_depth_log2``: Log depth
|
||||||
|
* ``dl4g_num_queue_groups``: Number of 4G Downlink queue groups,
|
||||||
|
there is in total 8 queue groups that can be distributed between
|
||||||
|
4G/5G Uplink/Downlink.
|
||||||
|
* ``dl4g_num_aqs_per_groups``: Number of aqs per group,
|
||||||
|
not expected to change for most deployments.
|
||||||
|
* ``dl4g_aq_depth_log2``: Log depth.
|
||||||
|
* ``ul5g_num_queue_groups``: Number of 5G Uplink queue groups,
|
||||||
|
there is in total 8 queue groups that can be distributed between 4G/5G
|
||||||
|
Uplink/Downlink - here 4 queues are used for 5G Uplink.
|
||||||
|
* ``ul5g_num_aqs_per_groups``: Number of aqs per group,
|
||||||
|
not expected to change for most deployments.
|
||||||
|
* ``ul5g_aq_depth_log2``: Log depth.
|
||||||
|
* ``dl5g_num_queue_groups``: Number of 5G Downlink queue groups,
|
||||||
|
there is in total 8 queue groups that can be distributed between,
|
||||||
|
4G/5G Uplink/Downlink - here 4 queues are used for 5G Downlink.
|
||||||
|
* ``dl5g_num_aqs_per_groups``: Number of aqs per group,
|
||||||
|
not expected to change for most deployments.
|
||||||
|
* ``dl5g_aq_depth_log2``: Log depth.
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Standard extension of K8s APIs based on introduction of
|
||||||
|
SriovFecClusterConfig CRD.
|
||||||
|
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Current/Existing K8S Authentication and Authorization apply to standard
|
||||||
|
extension of K8S APIs based on introduction of SriovFecClusterConfig CRD.
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
End user will have the capability of more detailed configuration of FEC Devices.
|
||||||
|
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
* In the existing method (method-1) configuration, resources (cpu and memory)
|
||||||
|
will be consumed only during the configuration.
|
||||||
|
|
||||||
|
* Using the FEC Operator method, service PODs will be running on master and
|
||||||
|
worker nodes all the time which will consume some amount of CPU and memory
|
||||||
|
resource from cluster housekeeping, which we believe this to be negligible.
|
||||||
|
|
||||||
|
* For a periodic reconciling, communication between controller-manager and
|
||||||
|
fec-daemon may consume network resources as well, assuming negligible.
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
None.
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
None. The sriov-fec-operator application is optional.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
|
||||||
|
* Balendu Mouli Burla (balendu)
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
|
||||||
|
* Nidhi Shivashankara Belur (nshivash)
|
||||||
|
|
||||||
|
Repos Impacted
|
||||||
|
--------------
|
||||||
|
|
||||||
|
A new system-application repo will be created for the definition and building
|
||||||
|
of the new sriov-fec-operator application.
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Create sriov-fec-operator application package
|
||||||
|
* Integrate sriov-fec-operator application to FlexCD. Add application
|
||||||
|
upload/apply/remove/delete commands.
|
||||||
|
* Update the docs.starlingx.io for HowTo configure FEC devices using FEC
|
||||||
|
operator application.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
* Testing will be performed on both SimpleX and DupleX mode deployment
|
||||||
|
configurations.
|
||||||
|
* Following functional validations will be performed
|
||||||
|
|
||||||
|
* Check by default FEC operator is disable when node startsup first time.
|
||||||
|
* Check the static configuration of FEC operator, make sure existing
|
||||||
|
functionality is good.
|
||||||
|
* Check enable/disable functionality of FEC operator in cluster.
|
||||||
|
* Configure the FEC device with FEC Operator, to make sure it overrides the
|
||||||
|
default configuration and verify the FEC functionality.
|
||||||
|
* Delete the CRD configuration, re-configure the device through static
|
||||||
|
configuration and verify the FEC functionality
|
||||||
|
* Configure the device through FEC operator and reboot the node, check the
|
||||||
|
node comes up with new configuration applied through fec-operator.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
docs.starlingx.io will be updated for:
|
||||||
|
* How to upload and apply sriov-fec-operator application
|
||||||
|
|
||||||
|
* How to perform enhanced configuration of FEC devices with
|
||||||
|
SriovFecClusterConfig CRD.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
Intel FEC Operator:
|
||||||
|
https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md
|
||||||
|
|
||||||
|
Acronyms
|
||||||
|
--------
|
||||||
|
|
||||||
|
- FEC : Forward Error Correction
|
||||||
|
- LTE : Long Term Evolution
|
||||||
|
- vRAN : Virtual Radio Access Network
|
||||||
|
- SR-IOV : Single Root - Input/Output Virtualization
|
||||||
|
- PF : Physical Function
|
||||||
|
- VF : Virtual Function
|
||||||
|
- CRD : Custom Resource Definition
|
||||||
|
|
||||||
|
History
|
||||||
|
=======
|
||||||
|
|
||||||
|
Initial Version.
|
Loading…
Reference in New Issue