From d9005f6ff5729cac6111519ee7a29ce31cd709a8 Mon Sep 17 00:00:00 2001 From: Mingyuan Qi Date: Mon, 14 Sep 2020 11:02:26 +0800 Subject: [PATCH] EdgeWorker management phase one Introduce edgeworker personality Story: 2008129 Change-Id: If74fb3d3863b05df9875a13e414f02bbfae4842e Signed-off-by: Mingyuan Qi --- ...008129-edgeworker-management-phase-one.rst | 393 ++++++++++++++++++ 1 file changed, 393 insertions(+) create mode 100644 doc/source/specs/stx-5.0/approved/Management-2008129-edgeworker-management-phase-one.rst diff --git a/doc/source/specs/stx-5.0/approved/Management-2008129-edgeworker-management-phase-one.rst b/doc/source/specs/stx-5.0/approved/Management-2008129-edgeworker-management-phase-one.rst new file mode 100644 index 0000000..b95de10 --- /dev/null +++ b/doc/source/specs/stx-5.0/approved/Management-2008129-edgeworker-management-phase-one.rst @@ -0,0 +1,393 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + +=============================== +EdgeWorker Management Phase One +=============================== + +Storyboard: +https://storyboard.openstack.org/#!/story/2008129 + +This story will introduce a new node personality 'edgeworker' to StarlingX. + +The biggest difference between 'edgeworker' node and 'worker' node is that +the OS of 'edgeworker' nodes are not installed or configured by StarlingX +controller and they may vary due to different cases, for example Ubuntu, +Debian, Fedora... The basic idea is to deploy containerd and kubelet service to +the 'edgeworker' nodes, so that the StarlingX Kubernetes platform will be +extended to 'edgeworker' nodes. + +The second difference is that 'edgeworker' are usually deployed close to edge +devices while 'worker' nodes are usually servers deployed in the server room. +The 'edgeworker' personality are suitable for the nodes that users may want to +install their customized OS and may require a deployment physically close to +the data producer or consumer devices. + +The way to leverage advantages of StarlingX functionality is to get most flock +agents containerized and enabled on edgeworker nodes. That is also aligned with +long term strategy of flock service containerization. + +The whole topic is broken down into 4 phases approximately: + +* Phase One + + * Add edgeworker personality + * Add ansible-playbook to join edgeworker node to STX K8S cluster + * Support Ubuntu and CentOS as target OS + +* Phase Two + + * Containerize a set of flock agents to get edgeworker node inventoried + * Enhance multiple Ceph cluster operation + +* Phase Three + + * Support Openstack running on edgeworker nodes + * Support L3/Tunnel mgmt. network + * Containerize rest of flock agents + +* Phase Four + + * Enable software management on edgeworker nodes + * Enable optional authentication for new nodes + * Extend target OS support + +This spec focuses on *Phase One*. + +Problem description +=================== + +In a typical IoT or industrial use case, StarlingX is usually used to +facilitate the whole edge cluster setup and management. But there are different +types of nodes existing in the cluster that are not in current StarlingX +management scope. Various reasons are hindering administrator to get these +nodes deployed as 'worker' nodes, from software to hardware. +In particular, the common setbacks are: + +* OS of the nodes could not or don't want to be installed by StarlingX. +* The nodes are running a Type I hypervisor. +* The hardware resources do not meet StarlingX worker node's minimum + requirement. +* The nodes are connected to StarlingX controllers over a L3 network. + +In this story, these nodes are categorized into a new personality to +distinguish from 'worker' nodes. The new personality is called 'edgeworker' +since these nodes are usually deployed close to the edge device side. An edge +device could probably be an I/O device, a camera, a servo motor or a sensor. + +The first three setbacks will be addressed in this phase one, while network +requirement and manageability enhancement will be addressed in the next few +phases. Separate specs for different phases will be submitted during different +releases. + +Use Cases +--------- + +* Administrator wants to have all the 'edgeworker' nodes managed by StarlingX + + * Make 'edgeworker' in the host list (Phase one) + * Check/Lock/Unlock 'edgeworker' node state (Phase two) + * Query 'edgeworker' hardware resources info (Phase two) + * Configure 'edgeworker' resources for specific usage (Phase two and later) + * Manage alarms generated by 'edgeworker' (Phase three) + * Update 'edgeworker' packages (Phase four) + +* Administrator does not want StarlingX to install OS on the 'edgeworker' + nodes +* User wants to orchestrate container workloads to 'edgeworker' nodes +* User wants to orchestrate VM workloads to 'edgeworker' nodes as an option + + +Proposed change +=============== + +**Edgeworker personality** + +Adding a new personality will require changes in sysinv db, sysinv api and +sysinv conductor, as well as cgts-client. + +#. *sysinv db* + + In order to get 'edgeworker' node into sysinv, the 'edgeworker' value will + be added to enum type invPersonalityEnum in sysinv db. Accordingly, adding + 'edgeworker' to db models is required as well. After this change, a host + from sysinv db perspective could be assigned as edgeworker personality. + +#. *sysinv api* + + Mainly focus on host api, adding checks during host add for 'edgeworker' + hosts. + Possible checks: + + * mgmt ip if mgmt network is not dynamic + * host name validation + * personality check + +#. *sysinv conductor* + + sysinv conductor is responsible for mgmt ip allocation when the mgmt + network is in dynamic type. + +#. *cgts client* + + Add 'edgeworker' choice for argument 'personality' of host-add/ + host-update command. + +After underlying changes applied, the administrator is able to use + + :: + + # system host-add -n -p edgeworker + or + # system host-update hostname= personality=edgeworker + +to add an edgeworker node to the inventory. + +When an edgeworker node is added to inventory, sysinv could provide +following services: + +* DHCP service (Phase one) +* Host lock/unlock (Phase two) +* Host interface modification and assignment (Phase two) +* Host hardware resource query (Phase two) +* Label assignment (Phase two) + +The function that will not be supported on edgeworker: + +* host-upgrade +* bmc integration + + An edgeworker node is not a server, but a normal PC like industrial + PC/NUC/workstation. BMC is not a required feature for those nodes. The node + life cycle management is done in-band or by the maintainer manually. The use + case which uses edgeworker nodes does not expect an out-of-band node + management for these nodes. + +Additional semantic check will be added for these functions. + +Other functions will be described in detail in each phase's spec. + +**ansible playbook for provisioning edgeworker nodes** + +The main steps for provisioning an edgeworker node are installing kubelet, +kubeadm and containerd packages to the node due to different Linux +distributions and joining the node to StarlingX Kubernetes platform. Besides +these steps, system configurations like ntp setup, interface configuration, +dns setup etc. are needed as well. + +The first two Linux distributions we propose to support for edgeworker are +*Ubuntu* and *CentOS*. + +The version of all the kubernetes packages on edgeworker nodes must be exactly +the same as the packages on controllers. If they are not, the playbook will +reinstall the packages to the proper version. + +The playbook sequence to provision an edgeworker node: + +#. Preparations on controller + + * Send containerd config and cert to edgeworker + * Generate K8S bootstrap token and calculate certificate hash + +#. Preparations on edgeworker + + * Config network (interface and dns) + * Setup proxy if needed + * Install essential packages + * Setup ntp + +#. Add edgeworker node to STX Kubernetes + + * Install containerd, kubelet, kubeadm packages (based on OS) + * Config sysctl and swap + * Join k8s cluster + * Download images + +There will be one playbook with different roles included. + +Alternatives +------------ + +There are several open source projects that can provision a Kubernetes node. + +* Kubespray + + Kubespray [1]_ is a composition of Ansible playbooks, inventory, provisioning + tools, and domain knowledge for generic OS/Kubernetes clusters configuration + management tasks. Kubespray performs generic OS configuration as well as + Kubernetes cluster bootstrapping. + + Kubespray provides the whole functionality of provisioning a Kubernetes + node just like the edgeworker provisioning playbook does. However, Kubespray + supports multiple container runtimes, multiple CNI plugins and control plane + bootstrap which are too much in functionality to provision an edgeworker. + + What edgeworker need is a playbook for certain container runtime, certain + CNI plugins and provision a Kubernetes node only. + +* KubeEdge + + KubeEdge [2]_ is an open source system for extending native containerized + application orchestration capabilities to hosts at Edge. KubeEdge could + run upon an existing Kubernetes cluster and deploy a customized kubelet + service called 'edged' to the edge node. In between the apiserver and edged, + the EdgeController is the bridge who manages edge nodes and pods metadata + so that the data can be targeted to a specific edge node. + + KubeEdge is able to provision edge nodes from cloud. But the kubelet service + is customized to fulfill the specific requirement that the administrator is + able to manage the pods running on edge nodes from public cloud platform. + The customized kubelet(edged) brings compatibility issues when Kubernetes + upgrading to a newer release, which leads to an extra effort to test/upgrade + KubeEdge during each Kubernetes upgrade since edgeworker provision is a key + step to enable these nodes. + + Besides, KubeEdge has a whole edge device management logic that is not in + current StarlingX platform scope. + +Data model impact +----------------- + +The only data model change is to insert 'edgeworker' to 'invPersonalityEnum' +in sysinv db model. + +REST API impact +--------------- + +None + +Security impact +--------------- + +The potential security threat and mitigation could be: + +* Malicious node + + It must be guaranteed by the administrator that no unauthorized node could + physically connect into the management network. + The authentication of the edgeworker node onboard will be introduced in the + later phases. + +* Malicious packages in edgeworker node + + It must be guaranteed by the administrator that the packages running in + edgeworker nodes are secure since the OS is managed by the administrator. + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +The deployer is required to run edgeworker provision playbook after adding or +updating the node as edgeworker personality. + +Developer impact +---------------- + +None + +Upgrade impact +-------------- + +The kubelet needs to be upgraded during the Kubernetes upgrade process. The +upgrade process will trigger an additional script/playbook to check the version +of the packages on edgeworker nodes, and upgrade them according to their own +distribution. + +The distribution's repo may not update the corresponding packages to the newest +version, due to Kubernetes version skew support policy [3]_ , up to two minor +versions older against apiserver is acceptable for kubelet and kube-proxy. + +The SW patching/updating will be addressed in phase four. It could either be a +3rd party solution or plugins of current SW management. Because current SW +management could not patch/update packages other than RPMs, while the OS of +edgeworker nodes could be different types of packages. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Mingyuan Qi + + +Repos Impacted +-------------- + +config +ansible-playbook + +Work Items +---------- + +The work items are already introduced in section `Proposed change`_ above. + + +Dependencies +============ + +None + + +Testing +======= + +* Sysinv unit test + +* Sysinv host operation test + +* Adding edgeworker nodes in different deploy mode test + + * Simplex + * Duplex + * Standard + +* Ansible-playbook test for each target OS + + * Host configuration + * Package installation + * Edgeworker node join to the Kubernetes cluster + + +Documentation Impact +==================== + +* Add a new page to describe the edgeworker nodes requirement, limitation and use case. +* Add new page to describe the following deployment: + + * Duplex + edgeworker + * Standard + edgeworker + +* Modify all deployment docs to insert an option to deploy edgewoker nodes and link it + to underlying deployment with edgeworker nodes. + + +References +========== + +.. [1] Kubespray https://github.com/kubernetes-sigs/kubespray +.. [2] KubeEdge https://kubeedge.io +.. [3] Kubernetes version skew policy https://kubernetes.io/docs/setup/release/version-skew-policy/ + + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - stx.5.0 + - Edgeworker management phase one introduced