OVS-DPDK containerization ========================================== Storyboard: https://storyboard.openstack.org/#!/story/2005496 As StarlingX moves to containerization, most openstack components have been containerized. That includes OVS containerization, but OVS-DPDK is still running on host. This story is to implement OVS-DPDK containerization. Problem description =================== Currently, StarlingX supports OVS and OVS-DPDK. OVS is managed by openstack-helm and running in container. But OVS-DPDK is managed by puppet, and running directly on the host. Considering the benefits of containerization, we would like to containerize OVS-DPDK. On the other hand, maintaining two implementations and keeping them consistent cost more resources than maintaining just one implementation. Use Cases --------- Without OVS-DPDK containerization: * If we want to make some changes(upgrade OVS version, enable some features) of OVS. We need the changes at two places. * If we want to support other host OS distribution(i.e. Ubuntu), we need to build the OVS/DPDK package for Ubuntu, as we run OVS-DPDK on the host. Proposed change =============== This story includes StarlingX changes and openstack-helm upstream. openstack-helm upstream patches are already in review. 'ovs-dpdk', 'none' are vswitch types we support for now. 'ovs-dpdk' means running OVS-DPDK on host, 'none' means running OVS(without DPDK) in container. For containerized OVS-DPDK we don't create new vswitch type, we enhance the 'none' type to support dpdk. It means 'none' type will support both OVS and OVS-DPDK(containerized). A new kubernetes node label(openvswitch-dpdk=enabled) will be used to control dpdk enable. Once this story is completed, we will not maintain 'ovs-dpdk' type anymore. Hugepages need to be reserved for DPDK. Currently, the reservation is done by sysinv/puppet. In this story , the hugepages reservation will still be covered by sysin/puppet. openstack-helm just use the hugepages. StarlingX reserves hugepages for DPDK and nova-compute, we can run 'system host-memory-show controller-0' to show the hugepages info. StarlingX has a default policy for hugepages allocation, users can overwrite the default by 'system host-memory-modify'. As k8s doesn't support multiple hugepage sizes, we can only reserve hugepages of a single size. :: [wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-show controller-0 0 +-------------------------------------+--------------------------------------+ | Property | Value | +-------------------------------------+--------------------------------------+ | Memory: Usable Total (MiB) | 9181 | | Platform (MiB) | 7600 | | Available (MiB) | 9181 | | Huge Pages Configured | True | | vSwitch Huge Pages: Size (MiB) | 2 | | Total | 512 | | Available | 0 | | Required | None | | Application Pages (4K): Total | 1826048 | | Application Huge Pages (2M): Total | 1024 | | Available | 1024 | | Application Huge Pages (1G): Total | 0 | | Available | None | | uuid | 56be1dc6-dc10-4318-88e3-953f75eb6684 | | ihost_uuid | 3fc748fa-a831-42f0-8c67-d15786806d6b | | inode_uuid | c4ee7258-fd13-4520-80f5-62c93e2e2b20 | | created_at | 2019-04-28T06:08:42.884178+00:00 | | updated_at | 2019-05-05T06:21:04.987518+00:00 | +-------------------------------------+--------------------------------------+ From above output, we can see 2M * 512 hugepages are reserved for OVS-DPDK. In this story, `openvswitch helm plugin`_ will be updated to generate memory configuration(dpdk-socket-mem) for openvswitch chart according to the reserved hugepages info. If multiple NUMA nodes exist on the compute node, we should allocated hugepages on every NUMA node. To run OVS-DPDk in container, we need to enable kubernetes hugepages feature. Currently kubernetes doesn't support multiple hugepage sizes on a single node. I have opened `the multiple size issue`_ to track it. OVS-DPDK process contains 2 types of threads: the control path threads and data path threads. The control path threads run on Platform cores just like all other pods. But the data path threads, known as pmd threads, need to run on one or more dedicated cores. StarlingX needs to reserve CPU cores for OVS-DPDK data path threads. Currently StarlingX reserves CPU cores for OVS-DPDK(no-containerized) by sysinv which generates kernel parameter 'isolcpus'. For containerized OVS-DPDK, CPU cores are going to be reserved in the same way. We can run 'system host-cpu-list controller-0' to show the CPU info. StarlingX has a default policy for CPU allocation, users can overwrite the default by 'system host-cpu-modify'. :: [wrsroot@controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0 +--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+ | uuid | log_c | processor | phy_c | thread | processor_model | assigned_function | | | ore | | ore | | | | +--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+ | a6189494-a2da-4f26-8a18-658d3fa5ad4f | 0 | 0 | 0 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform | | c7d0de01-7c95-4b90-a423-d19d777e5b86 | 1 | 0 | 1 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform | | 0e644162-ee11-486d-8249-94099d34a160 | 2 | 0 | 2 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | vSwitch | | 3b13943e-5d8e-49ab-b63e-17311e314f32 | 3 | 0 | 3 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications | | a36e8842-2f55-4697-bd89-f074b2e0c567 | 4 | 0 | 4 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications | | a74c066b-5a9a-48bd-aeec-9e803e395f7f | 5 | 0 | 5 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications | +--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+ From above output, we can see core 2 is allocated for OVS-DPDK pmd threads. In this story, `openvswitch helm plugin`_ will be updated to generate CPU configurations(dpdk-lcore-mask, pmd-cpu-mask). 'pmd-cpu-mask' is the OVS parameter which specifies which CPU cores will the PMD threads run on. The technology under 'pmd-cpu-mask' is cpuset cgroup. By default, all pods can only see the platform cores. We need to change the cgroup of ovs at launch time. Actually, StarlingX also reserve CPU cores for nova-compute(assigned_function of Applications), finally rendered as 'vcpu_pin_set' in nova.conf When a compute node being unlocked, the vswitch.pp does some OVS related works: 1) bind datanetwork NICs to a linux module(vfio-pci by default in StarlingX). 2) Create OVS bridges 3) Add the NICs to bridges. In this story, the first item can be covered by puppet or openstack-helm or by using NetworkDeviceAttachment which leverages existing SRIOV CNI. The second and the third items will be covered by openstack-helm. To create OVS bridges and add NICs to bridges, openstack-helm needs to know the bridge names and the NIC pci_id. These parameters will be generated by `neutron helm plugin`_ according the info in sysinv. Alternatives ------------ None Data model impact ----------------- None REST API impact --------------- None Security impact --------------- None Other end user impact --------------------- As the k8s hugepage feature doesn't support multiple hugepage sizes for now, we can allocate hugepages of only 1 single size. That means we can only create VM of 1 single hugepage size. The limitation is described in the `hugepage spec commit`_ Performance Impact ------------------ Suppose no impact For networking, OVS-DPDK container uses host native network. For CPU/memory, although container resource is limited, but the resource used by OVS is configured by OVS parameters instead of container limitation. Other deployer impact --------------------- 'openvswitch-dpdk=enabled' label is required for compute nodes to enable OVS-DPDK. Developer impact ---------------- Once this feature is implemented, we don't run OVS-DPDK on the host. So the vswitch.pp file will be removed, openstack-helm takes its job for OVS-DPDK configuration. Upgrade impact -------------- None Implementation ============== Assignee(s) ----------- Primary assignee: chengli3 Other contributors: Repos Impacted -------------- starlingx/config, starlingx/integ Work Items ---------- * Improve OVS docker image to support dpdk (starlingx/integ). To support dpdk, dpdk should be installed in OVS image and OVS should be built/installed with `dpdk install option`_ (--with-dpdk). The community OVS image already support dpdk by `image patch`_. To build ourselves OVS image, we can author our OVS docker file in starling/integ project. The OVS/DPDK version will be the same as the host. The docker image OS may needs to be CentOS as well, as OVS container mounts host /lib/modules. * Make OVS chart supporting dpdk (openstack-helm-infra). To support dpdk, OVS needs to be setup with `dpdk setup options`_. `ovs patch`_ is in review. * Make neutron chart supporting dpdk (openstack-helm) * `Extra neutron configurations`_ are needed for dpdk supporting. * In openstack-helm, neutron chart takes responsibility of adding NIC to OVS bridge. So neutron chart takes `dpdk interface initialization`_ as well. `neutron patch`_ is already in review. * Reserve huge pages for OVS-DPDK and enable k8s hugepage feature (starlingx/config). `huge pages`_ should be reserved for containerized OVS-DPDK. The same as how we reserve huge pages for vswitch_type 'ovs-dpdk'. * Generate dpdk related configurations for openstack deployment (starlingx/config). `openvswitch helm plugin`_ needs be updated to add dpdk configurations. `neutron helm plugin`_ should be updated as well. * Docs update (starlingx/docs) Update the installation guide .. _dpdk install option: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#install-ovs .. _image patch: https://review.opendev.org/#/c/665310/ .. _dpdk setup options: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-ovs .. _ovs patch: https://review.openstack.org/#/c/626894/ .. _Extra neutron configurations: https://docs.openstack.org/neutron/pike/contributor/internals/ovs_vhostuser.html .. _dpdk interface initialization: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-dpdk-devices-using-vfio .. _neutron patch: https://review.openstack.org/#/c/643284/ .. _huge pages: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-hugepages .. _openvswitch helm plugin: https://github.com/openstack/stx-config/tree/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/openvswitch.py .. _neutron helm plugin: https://github.com/openstack/stx-config/blob/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/neutron.py .. _ovs.py: https://opendev.org/starlingx/config/src/commit/e0d453a98b72606ec9a0b90a3acb5bbda546d2ff/sysinv/sysinv/sysinv/sysinv/puppet/ovs.py#L318-L365 .. _the multiple size issue: https://github.com/kubernetes/kubernetes/issues/77251 .. _hugepage spec commit: https://github.com/kubernetes/community/pull/837/files#r133337110 Dependencies ============ * Needs OVS version >=2.6 to support vhost-user reconnect. Testing ======= The host NICs those are planed for data networks must support DPDK. Multiple hosts are needed to test connection cross hosts. The following cases are needed: * Creating VM and test the networking connection between VMs and the external connection. * Check if any issue with host reboot. Documentation Impact ==================== The installation guides on the wiki need to be updated. There will be a little difference for deployer on vswitch type setting. References ========== * http://docs.openvswitch.org/en/latest/intro/install/dpdk/ * https://opendev.org/openstack/openstack-helm-infra/src/branch/master/openvswitch * https://opendev.org/openstack/openstack-helm/src/branch/master/neutron History ======= Optional section intended to be used each time the spec is updated to describe new design, API or any database schema updated. Useful to let reader understand what's happened along the time. .. list-table:: Revisions :header-rows: 1 * - Release Name - Description * - Stein - Introduced