EdgeWorker management phase one
Introduce edgeworker personality Story: 2008129 Change-Id: If74fb3d3863b05df9875a13e414f02bbfae4842e Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
This commit is contained in:
parent
5b3e10cb1c
commit
d9005f6ff5
|
@ -0,0 +1,393 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
===============================
|
||||||
|
EdgeWorker Management Phase One
|
||||||
|
===============================
|
||||||
|
|
||||||
|
Storyboard:
|
||||||
|
https://storyboard.openstack.org/#!/story/2008129
|
||||||
|
|
||||||
|
This story will introduce a new node personality 'edgeworker' to StarlingX.
|
||||||
|
|
||||||
|
The biggest difference between 'edgeworker' node and 'worker' node is that
|
||||||
|
the OS of 'edgeworker' nodes are not installed or configured by StarlingX
|
||||||
|
controller and they may vary due to different cases, for example Ubuntu,
|
||||||
|
Debian, Fedora... The basic idea is to deploy containerd and kubelet service to
|
||||||
|
the 'edgeworker' nodes, so that the StarlingX Kubernetes platform will be
|
||||||
|
extended to 'edgeworker' nodes.
|
||||||
|
|
||||||
|
The second difference is that 'edgeworker' are usually deployed close to edge
|
||||||
|
devices while 'worker' nodes are usually servers deployed in the server room.
|
||||||
|
The 'edgeworker' personality are suitable for the nodes that users may want to
|
||||||
|
install their customized OS and may require a deployment physically close to
|
||||||
|
the data producer or consumer devices.
|
||||||
|
|
||||||
|
The way to leverage advantages of StarlingX functionality is to get most flock
|
||||||
|
agents containerized and enabled on edgeworker nodes. That is also aligned with
|
||||||
|
long term strategy of flock service containerization.
|
||||||
|
|
||||||
|
The whole topic is broken down into 4 phases approximately:
|
||||||
|
|
||||||
|
* Phase One
|
||||||
|
|
||||||
|
* Add edgeworker personality
|
||||||
|
* Add ansible-playbook to join edgeworker node to STX K8S cluster
|
||||||
|
* Support Ubuntu and CentOS as target OS
|
||||||
|
|
||||||
|
* Phase Two
|
||||||
|
|
||||||
|
* Containerize a set of flock agents to get edgeworker node inventoried
|
||||||
|
* Enhance multiple Ceph cluster operation
|
||||||
|
|
||||||
|
* Phase Three
|
||||||
|
|
||||||
|
* Support Openstack running on edgeworker nodes
|
||||||
|
* Support L3/Tunnel mgmt. network
|
||||||
|
* Containerize rest of flock agents
|
||||||
|
|
||||||
|
* Phase Four
|
||||||
|
|
||||||
|
* Enable software management on edgeworker nodes
|
||||||
|
* Enable optional authentication for new nodes
|
||||||
|
* Extend target OS support
|
||||||
|
|
||||||
|
This spec focuses on *Phase One*.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
In a typical IoT or industrial use case, StarlingX is usually used to
|
||||||
|
facilitate the whole edge cluster setup and management. But there are different
|
||||||
|
types of nodes existing in the cluster that are not in current StarlingX
|
||||||
|
management scope. Various reasons are hindering administrator to get these
|
||||||
|
nodes deployed as 'worker' nodes, from software to hardware.
|
||||||
|
In particular, the common setbacks are:
|
||||||
|
|
||||||
|
* OS of the nodes could not or don't want to be installed by StarlingX.
|
||||||
|
* The nodes are running a Type I hypervisor.
|
||||||
|
* The hardware resources do not meet StarlingX worker node's minimum
|
||||||
|
requirement.
|
||||||
|
* The nodes are connected to StarlingX controllers over a L3 network.
|
||||||
|
|
||||||
|
In this story, these nodes are categorized into a new personality to
|
||||||
|
distinguish from 'worker' nodes. The new personality is called 'edgeworker'
|
||||||
|
since these nodes are usually deployed close to the edge device side. An edge
|
||||||
|
device could probably be an I/O device, a camera, a servo motor or a sensor.
|
||||||
|
|
||||||
|
The first three setbacks will be addressed in this phase one, while network
|
||||||
|
requirement and manageability enhancement will be addressed in the next few
|
||||||
|
phases. Separate specs for different phases will be submitted during different
|
||||||
|
releases.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
* Administrator wants to have all the 'edgeworker' nodes managed by StarlingX
|
||||||
|
|
||||||
|
* Make 'edgeworker' in the host list (Phase one)
|
||||||
|
* Check/Lock/Unlock 'edgeworker' node state (Phase two)
|
||||||
|
* Query 'edgeworker' hardware resources info (Phase two)
|
||||||
|
* Configure 'edgeworker' resources for specific usage (Phase two and later)
|
||||||
|
* Manage alarms generated by 'edgeworker' (Phase three)
|
||||||
|
* Update 'edgeworker' packages (Phase four)
|
||||||
|
|
||||||
|
* Administrator does not want StarlingX to install OS on the 'edgeworker'
|
||||||
|
nodes
|
||||||
|
* User wants to orchestrate container workloads to 'edgeworker' nodes
|
||||||
|
* User wants to orchestrate VM workloads to 'edgeworker' nodes as an option
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
**Edgeworker personality**
|
||||||
|
|
||||||
|
Adding a new personality will require changes in sysinv db, sysinv api and
|
||||||
|
sysinv conductor, as well as cgts-client.
|
||||||
|
|
||||||
|
#. *sysinv db*
|
||||||
|
|
||||||
|
In order to get 'edgeworker' node into sysinv, the 'edgeworker' value will
|
||||||
|
be added to enum type invPersonalityEnum in sysinv db. Accordingly, adding
|
||||||
|
'edgeworker' to db models is required as well. After this change, a host
|
||||||
|
from sysinv db perspective could be assigned as edgeworker personality.
|
||||||
|
|
||||||
|
#. *sysinv api*
|
||||||
|
|
||||||
|
Mainly focus on host api, adding checks during host add for 'edgeworker'
|
||||||
|
hosts.
|
||||||
|
Possible checks:
|
||||||
|
|
||||||
|
* mgmt ip if mgmt network is not dynamic
|
||||||
|
* host name validation
|
||||||
|
* personality check
|
||||||
|
|
||||||
|
#. *sysinv conductor*
|
||||||
|
|
||||||
|
sysinv conductor is responsible for mgmt ip allocation when the mgmt
|
||||||
|
network is in dynamic type.
|
||||||
|
|
||||||
|
#. *cgts client*
|
||||||
|
|
||||||
|
Add 'edgeworker' choice for argument 'personality' of host-add/
|
||||||
|
host-update command.
|
||||||
|
|
||||||
|
After underlying changes applied, the administrator is able to use
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
# system host-add -n <hostname> -p edgeworker
|
||||||
|
or
|
||||||
|
# system host-update <id> hostname=<hostname> personality=edgeworker
|
||||||
|
|
||||||
|
to add an edgeworker node to the inventory.
|
||||||
|
|
||||||
|
When an edgeworker node is added to inventory, sysinv could provide
|
||||||
|
following services:
|
||||||
|
|
||||||
|
* DHCP service (Phase one)
|
||||||
|
* Host lock/unlock (Phase two)
|
||||||
|
* Host interface modification and assignment (Phase two)
|
||||||
|
* Host hardware resource query (Phase two)
|
||||||
|
* Label assignment (Phase two)
|
||||||
|
|
||||||
|
The function that will not be supported on edgeworker:
|
||||||
|
|
||||||
|
* host-upgrade
|
||||||
|
* bmc integration
|
||||||
|
|
||||||
|
An edgeworker node is not a server, but a normal PC like industrial
|
||||||
|
PC/NUC/workstation. BMC is not a required feature for those nodes. The node
|
||||||
|
life cycle management is done in-band or by the maintainer manually. The use
|
||||||
|
case which uses edgeworker nodes does not expect an out-of-band node
|
||||||
|
management for these nodes.
|
||||||
|
|
||||||
|
Additional semantic check will be added for these functions.
|
||||||
|
|
||||||
|
Other functions will be described in detail in each phase's spec.
|
||||||
|
|
||||||
|
**ansible playbook for provisioning edgeworker nodes**
|
||||||
|
|
||||||
|
The main steps for provisioning an edgeworker node are installing kubelet,
|
||||||
|
kubeadm and containerd packages to the node due to different Linux
|
||||||
|
distributions and joining the node to StarlingX Kubernetes platform. Besides
|
||||||
|
these steps, system configurations like ntp setup, interface configuration,
|
||||||
|
dns setup etc. are needed as well.
|
||||||
|
|
||||||
|
The first two Linux distributions we propose to support for edgeworker are
|
||||||
|
*Ubuntu* and *CentOS*.
|
||||||
|
|
||||||
|
The version of all the kubernetes packages on edgeworker nodes must be exactly
|
||||||
|
the same as the packages on controllers. If they are not, the playbook will
|
||||||
|
reinstall the packages to the proper version.
|
||||||
|
|
||||||
|
The playbook sequence to provision an edgeworker node:
|
||||||
|
|
||||||
|
#. Preparations on controller
|
||||||
|
|
||||||
|
* Send containerd config and cert to edgeworker
|
||||||
|
* Generate K8S bootstrap token and calculate certificate hash
|
||||||
|
|
||||||
|
#. Preparations on edgeworker
|
||||||
|
|
||||||
|
* Config network (interface and dns)
|
||||||
|
* Setup proxy if needed
|
||||||
|
* Install essential packages
|
||||||
|
* Setup ntp
|
||||||
|
|
||||||
|
#. Add edgeworker node to STX Kubernetes
|
||||||
|
|
||||||
|
* Install containerd, kubelet, kubeadm packages (based on OS)
|
||||||
|
* Config sysctl and swap
|
||||||
|
* Join k8s cluster
|
||||||
|
* Download images
|
||||||
|
|
||||||
|
There will be one playbook with different roles included.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
There are several open source projects that can provision a Kubernetes node.
|
||||||
|
|
||||||
|
* Kubespray
|
||||||
|
|
||||||
|
Kubespray [1]_ is a composition of Ansible playbooks, inventory, provisioning
|
||||||
|
tools, and domain knowledge for generic OS/Kubernetes clusters configuration
|
||||||
|
management tasks. Kubespray performs generic OS configuration as well as
|
||||||
|
Kubernetes cluster bootstrapping.
|
||||||
|
|
||||||
|
Kubespray provides the whole functionality of provisioning a Kubernetes
|
||||||
|
node just like the edgeworker provisioning playbook does. However, Kubespray
|
||||||
|
supports multiple container runtimes, multiple CNI plugins and control plane
|
||||||
|
bootstrap which are too much in functionality to provision an edgeworker.
|
||||||
|
|
||||||
|
What edgeworker need is a playbook for certain container runtime, certain
|
||||||
|
CNI plugins and provision a Kubernetes node only.
|
||||||
|
|
||||||
|
* KubeEdge
|
||||||
|
|
||||||
|
KubeEdge [2]_ is an open source system for extending native containerized
|
||||||
|
application orchestration capabilities to hosts at Edge. KubeEdge could
|
||||||
|
run upon an existing Kubernetes cluster and deploy a customized kubelet
|
||||||
|
service called 'edged' to the edge node. In between the apiserver and edged,
|
||||||
|
the EdgeController is the bridge who manages edge nodes and pods metadata
|
||||||
|
so that the data can be targeted to a specific edge node.
|
||||||
|
|
||||||
|
KubeEdge is able to provision edge nodes from cloud. But the kubelet service
|
||||||
|
is customized to fulfill the specific requirement that the administrator is
|
||||||
|
able to manage the pods running on edge nodes from public cloud platform.
|
||||||
|
The customized kubelet(edged) brings compatibility issues when Kubernetes
|
||||||
|
upgrading to a newer release, which leads to an extra effort to test/upgrade
|
||||||
|
KubeEdge during each Kubernetes upgrade since edgeworker provision is a key
|
||||||
|
step to enable these nodes.
|
||||||
|
|
||||||
|
Besides, KubeEdge has a whole edge device management logic that is not in
|
||||||
|
current StarlingX platform scope.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The only data model change is to insert 'edgeworker' to 'invPersonalityEnum'
|
||||||
|
in sysinv db model.
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The potential security threat and mitigation could be:
|
||||||
|
|
||||||
|
* Malicious node
|
||||||
|
|
||||||
|
It must be guaranteed by the administrator that no unauthorized node could
|
||||||
|
physically connect into the management network.
|
||||||
|
The authentication of the edgeworker node onboard will be introduced in the
|
||||||
|
later phases.
|
||||||
|
|
||||||
|
* Malicious packages in edgeworker node
|
||||||
|
|
||||||
|
It must be guaranteed by the administrator that the packages running in
|
||||||
|
edgeworker nodes are secure since the OS is managed by the administrator.
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The deployer is required to run edgeworker provision playbook after adding or
|
||||||
|
updating the node as edgeworker personality.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
The kubelet needs to be upgraded during the Kubernetes upgrade process. The
|
||||||
|
upgrade process will trigger an additional script/playbook to check the version
|
||||||
|
of the packages on edgeworker nodes, and upgrade them according to their own
|
||||||
|
distribution.
|
||||||
|
|
||||||
|
The distribution's repo may not update the corresponding packages to the newest
|
||||||
|
version, due to Kubernetes version skew support policy [3]_ , up to two minor
|
||||||
|
versions older against apiserver is acceptable for kubelet and kube-proxy.
|
||||||
|
|
||||||
|
The SW patching/updating will be addressed in phase four. It could either be a
|
||||||
|
3rd party solution or plugins of current SW management. Because current SW
|
||||||
|
management could not patch/update packages other than RPMs, while the OS of
|
||||||
|
edgeworker nodes could be different types of packages.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
Mingyuan Qi
|
||||||
|
|
||||||
|
|
||||||
|
Repos Impacted
|
||||||
|
--------------
|
||||||
|
|
||||||
|
config
|
||||||
|
ansible-playbook
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
The work items are already introduced in section `Proposed change`_ above.
|
||||||
|
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
* Sysinv unit test
|
||||||
|
|
||||||
|
* Sysinv host operation test
|
||||||
|
|
||||||
|
* Adding edgeworker nodes in different deploy mode test
|
||||||
|
|
||||||
|
* Simplex
|
||||||
|
* Duplex
|
||||||
|
* Standard
|
||||||
|
|
||||||
|
* Ansible-playbook test for each target OS
|
||||||
|
|
||||||
|
* Host configuration
|
||||||
|
* Package installation
|
||||||
|
* Edgeworker node join to the Kubernetes cluster
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
* Add a new page to describe the edgeworker nodes requirement, limitation and use case.
|
||||||
|
* Add new page to describe the following deployment:
|
||||||
|
|
||||||
|
* Duplex + edgeworker
|
||||||
|
* Standard + edgeworker
|
||||||
|
|
||||||
|
* Modify all deployment docs to insert an option to deploy edgewoker nodes and link it
|
||||||
|
to underlying deployment with edgeworker nodes.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [1] Kubespray https://github.com/kubernetes-sigs/kubespray
|
||||||
|
.. [2] KubeEdge https://kubeedge.io
|
||||||
|
.. [3] Kubernetes version skew policy https://kubernetes.io/docs/setup/release/version-skew-policy/
|
||||||
|
|
||||||
|
|
||||||
|
History
|
||||||
|
=======
|
||||||
|
|
||||||
|
.. list-table:: Revisions
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Release Name
|
||||||
|
- Description
|
||||||
|
* - stx.5.0
|
||||||
|
- Edgeworker management phase one introduced
|
Loading…
Reference in New Issue