diff --git a/specs/2019.03/approved/containerization-2005496-OVS-DPDK-containerization.rst b/specs/2019.03/approved/containerization-2005496-OVS-DPDK-containerization.rst new file mode 100644 index 0000000..f3a394d --- /dev/null +++ b/specs/2019.03/approved/containerization-2005496-OVS-DPDK-containerization.rst @@ -0,0 +1,301 @@ +OVS-DPDK containerization +========================================== + +Storyboard: +https://storyboard.openstack.org/#!/story/2005496 + +As StarlingX moves to containerization, most openstack components have been +containerized. That includes OVS containerization, but OVS-DPDK is still +running on host. This story is to implement OVS-DPDK containerization. + +Problem description +=================== + +Currently, StarlingX supports OVS and OVS-DPDK. OVS is managed by +openstack-helm and running in container. But OVS-DPDK is managed by puppet, +and running directly on the host. Considering the benefits of containerization, +we would like to containerize OVS-DPDK. On the other hand, maintaining two +implementations and keeping them consistent cost more resources than +maintaining just one implementation. + +Use Cases +--------- + +Without OVS-DPDK containerization: + +* If we want to make some changes(upgrade OVS version, enable some features) + of OVS. We need the changes at two places. +* If we want to support other host OS distribution(i.e. Ubuntu), we need to + build the OVS/DPDK package for Ubuntu, as we run OVS-DPDK on the host. + +Proposed change +=============== + +This story includes StarlingX changes and openstack-helm upstream. +openstack-helm upstream patches are already in review. + +'ovs-dpdk', 'none' are vswitch types we support for now. +'ovs-dpdk' means running OVS-DPDK on host, 'none' means running +OVS(without DPDK) in container. For containerized OVS-DPDK we don't create new +vswitch type, we enhance the 'none' type to support dpdk. It means 'none' type +will support both OVS and OVS-DPDK(containerized). A new kubernetes +node label(openvswitch-dpdk=enabled) will be used to control dpdk enable. +Once this story is completed, we will not maintain 'ovs-dpdk' type anymore. + +Hugepages need to be reserved for DPDK. Currently, the reservation is done by +sysinv/puppet. In this story , the hugepages reservation will still be covered +by sysin/puppet. openstack-helm just use the hugepages. StarlingX reserves +hugepages for DPDK and nova-compute, we can run 'system host-memory-show +controller-0' to show the hugepages info. StarlingX has a default policy for +hugepages allocation, users can overwrite the default by +'system host-memory-modify'. As k8s doesn't support multiple hugepage sizes, +we can only reserve hugepages of a single size. + +:: + + [wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-show controller-0 0 + +-------------------------------------+--------------------------------------+ + | Property | Value | + +-------------------------------------+--------------------------------------+ + | Memory: Usable Total (MiB) | 9181 | + | Platform (MiB) | 7600 | + | Available (MiB) | 9181 | + | Huge Pages Configured | True | + | vSwitch Huge Pages: Size (MiB) | 2 | + | Total | 512 | + | Available | 0 | + | Required | None | + | Application Pages (4K): Total | 1826048 | + | Application Huge Pages (2M): Total | 1024 | + | Available | 1024 | + | Application Huge Pages (1G): Total | 0 | + | Available | None | + | uuid | 56be1dc6-dc10-4318-88e3-953f75eb6684 | + | ihost_uuid | 3fc748fa-a831-42f0-8c67-d15786806d6b | + | inode_uuid | c4ee7258-fd13-4520-80f5-62c93e2e2b20 | + | created_at | 2019-04-28T06:08:42.884178+00:00 | + | updated_at | 2019-05-05T06:21:04.987518+00:00 | + +-------------------------------------+--------------------------------------+ + +From above output, we can see 2M * 512 hugepages are reserved for OVS-DPDK. +In this story, `openvswitch helm plugin`_ will be updated to generate memory +configuration(dpdk-socket-mem) for openvswitch chart according to the reserved +hugepages info. If multiple NUMA nodes exist on the compute node, we should +allocated hugepages on every NUMA node. + +To run OVS-DPDk in container, we need to enable kubernetes hugepages feature. +Currently kubernetes doesn't support multiple hugepage sizes on a single node. +I have opened `the multiple size issue`_ to track it. + +OVS-DPDK process contains 2 types of threads: the control path threads and data +path threads. The control path threads run on Platform cores just like all +other pods. But the data path threads, known as pmd threads, need to run on one +or more dedicated cores. +StarlingX needs to reserve CPU cores for OVS-DPDK data path threads. Currently +StarlingX reserves CPU cores for OVS-DPDK(no-containerized) by sysinv which +generates kernel parameter +'isolcpus'. For containerized OVS-DPDK, CPU cores are going to be reserved in +the same way. We can run 'system host-cpu-list controller-0' to +show the CPU info. StarlingX has a default policy for CPU allocation, users can +overwrite the default by 'system host-cpu-modify'. + +:: + + [wrsroot@controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0 + +--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+ + | uuid | log_c | processor | phy_c | thread | processor_model | assigned_function | + | | ore | | ore | | | | + +--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+ + | a6189494-a2da-4f26-8a18-658d3fa5ad4f | 0 | 0 | 0 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform | + | c7d0de01-7c95-4b90-a423-d19d777e5b86 | 1 | 0 | 1 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform | + | 0e644162-ee11-486d-8249-94099d34a160 | 2 | 0 | 2 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | vSwitch | + | 3b13943e-5d8e-49ab-b63e-17311e314f32 | 3 | 0 | 3 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications | + | a36e8842-2f55-4697-bd89-f074b2e0c567 | 4 | 0 | 4 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications | + | a74c066b-5a9a-48bd-aeec-9e803e395f7f | 5 | 0 | 5 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications | + +--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+ + +From above output, we can see core 2 is allocated for OVS-DPDK pmd threads. +In this story, `openvswitch helm plugin`_ will be updated to generate CPU +configurations(dpdk-lcore-mask, pmd-cpu-mask). 'pmd-cpu-mask' is the OVS +parameter which specifies which CPU cores will the PMD threads run on. +The technology under 'pmd-cpu-mask' is cpuset cgroup. By default, all pods +can only see the platform cores. We need to change the cgroup of ovs at +launch time. Actually, StarlingX also +reserve CPU cores for nova-compute(assigned_function of Applications), +finally rendered as 'vcpu_pin_set' in nova.conf + +When a compute node being unlocked, the vswitch.pp does some OVS related works: +1) bind datanetwork NICs to a linux module(vfio-pci by default in StarlingX). +2) Create OVS bridges 3) Add the NICs to bridges. In this story, the first +item can be covered by puppet or openstack-helm or by using +NetworkDeviceAttachment which leverages existing SRIOV CNI. The second and +the third items will be covered by openstack-helm. To create OVS bridges and +add NICs to bridges, openstack-helm needs to know the bridge names and the +NIC pci_id. These parameters will be generated by `neutron helm plugin`_ +according the info in sysinv. + +Alternatives +------------ + +None + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Security impact +--------------- + +None + +Other end user impact +--------------------- + +As the k8s hugepage feature doesn't support multiple hugepage sizes for now, +we can allocate hugepages of only 1 single size. That means we can only create +VM of 1 single hugepage size. The limitation is described in the +`hugepage spec commit`_ + +Performance Impact +------------------ + +Suppose no impact + +For networking, OVS-DPDK container uses host native network. + +For CPU/memory, although container resource is limited, but the resource used +by OVS is configured by OVS parameters instead of container limitation. + +Other deployer impact +--------------------- + +'openvswitch-dpdk=enabled' label is required for compute nodes to enable +OVS-DPDK. + +Developer impact +---------------- + +Once this feature is implemented, we don't run OVS-DPDK on the host. So the +vswitch.pp file will be removed, openstack-helm takes its job for OVS-DPDK +configuration. + +Upgrade impact +-------------- + +None + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + chengli3 + +Other contributors: + + +Repos Impacted +-------------- + +starlingx/config, starlingx/integ + +Work Items +---------- + +* Improve OVS docker image to support dpdk (starlingx/integ). + To support dpdk, dpdk should be installed in OVS image and OVS should be + built/installed with `dpdk install option`_ (--with-dpdk). The community OVS + image already support dpdk by `image patch`_. To build ourselves OVS image, + we can author our OVS docker file in starling/integ project. The OVS/DPDK + version will be the same as the host. The docker image + OS may needs to be CentOS as well, as OVS container mounts host /lib/modules. +* Make OVS chart supporting dpdk (openstack-helm-infra). + To support dpdk, OVS needs to be setup with `dpdk setup options`_. + `ovs patch`_ is in review. +* Make neutron chart supporting dpdk (openstack-helm) + + * `Extra neutron configurations`_ are needed for dpdk supporting. + * In openstack-helm, neutron chart takes responsibility of adding NIC to OVS + bridge. So neutron chart takes `dpdk interface initialization`_ as + well. `neutron patch`_ is already in review. +* Reserve huge pages for OVS-DPDK and enable k8s hugepage feature + (starlingx/config). + `huge pages`_ should be reserved for containerized OVS-DPDK. The same as how + we reserve huge pages for vswitch_type 'ovs-dpdk'. +* Generate dpdk related configurations for openstack deployment + (starlingx/config). + `openvswitch helm plugin`_ needs be updated to add dpdk configurations. + `neutron helm plugin`_ should be updated as well. +* Docs update (starlingx/docs) + Update the installation guide + +.. _dpdk install option: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#install-ovs +.. _image patch: https://review.opendev.org/#/c/665310/ +.. _dpdk setup options: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-ovs +.. _ovs patch: https://review.openstack.org/#/c/626894/ +.. _Extra neutron configurations: https://docs.openstack.org/neutron/pike/contributor/internals/ovs_vhostuser.html +.. _dpdk interface initialization: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-dpdk-devices-using-vfio +.. _neutron patch: https://review.openstack.org/#/c/643284/ +.. _huge pages: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-hugepages +.. _openvswitch helm plugin: https://github.com/openstack/stx-config/tree/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/openvswitch.py +.. _neutron helm plugin: https://github.com/openstack/stx-config/blob/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/neutron.py +.. _ovs.py: https://opendev.org/starlingx/config/src/commit/e0d453a98b72606ec9a0b90a3acb5bbda546d2ff/sysinv/sysinv/sysinv/sysinv/puppet/ovs.py#L318-L365 +.. _the multiple size issue: https://github.com/kubernetes/kubernetes/issues/77251 +.. _hugepage spec commit: https://github.com/kubernetes/community/pull/837/files#r133337110 + +Dependencies +============ + +* Needs OVS version >=2.6 to support vhost-user reconnect. + + +Testing +======= + +The host NICs those are planed for data networks must support DPDK. +Multiple hosts are needed to test connection cross hosts. + +The following cases are needed: + +* Creating VM and test the networking connection between VMs and the external + connection. +* Check if any issue with host reboot. + +Documentation Impact +==================== + +The installation guides on the wiki need to be updated. There will be a little +difference for deployer on vswitch type setting. + +References +========== + +* http://docs.openvswitch.org/en/latest/intro/install/dpdk/ + +* https://opendev.org/openstack/openstack-helm-infra/src/branch/master/openvswitch + +* https://opendev.org/openstack/openstack-helm/src/branch/master/neutron + +History +======= + +Optional section intended to be used each time the spec is updated to describe +new design, API or any database schema updated. Useful to let reader understand +what's happened along the time. + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - Stein + - Introduced