Merge "EdgeWorker management phase one"
This commit is contained in:
commit
f45e654ac6
|
@ -0,0 +1,393 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============================
|
||||
EdgeWorker Management Phase One
|
||||
===============================
|
||||
|
||||
Storyboard:
|
||||
https://storyboard.openstack.org/#!/story/2008129
|
||||
|
||||
This story will introduce a new node personality 'edgeworker' to StarlingX.
|
||||
|
||||
The biggest difference between 'edgeworker' node and 'worker' node is that
|
||||
the OS of 'edgeworker' nodes are not installed or configured by StarlingX
|
||||
controller and they may vary due to different cases, for example Ubuntu,
|
||||
Debian, Fedora... The basic idea is to deploy containerd and kubelet service to
|
||||
the 'edgeworker' nodes, so that the StarlingX Kubernetes platform will be
|
||||
extended to 'edgeworker' nodes.
|
||||
|
||||
The second difference is that 'edgeworker' are usually deployed close to edge
|
||||
devices while 'worker' nodes are usually servers deployed in the server room.
|
||||
The 'edgeworker' personality are suitable for the nodes that users may want to
|
||||
install their customized OS and may require a deployment physically close to
|
||||
the data producer or consumer devices.
|
||||
|
||||
The way to leverage advantages of StarlingX functionality is to get most flock
|
||||
agents containerized and enabled on edgeworker nodes. That is also aligned with
|
||||
long term strategy of flock service containerization.
|
||||
|
||||
The whole topic is broken down into 4 phases approximately:
|
||||
|
||||
* Phase One
|
||||
|
||||
* Add edgeworker personality
|
||||
* Add ansible-playbook to join edgeworker node to STX K8S cluster
|
||||
* Support Ubuntu and CentOS as target OS
|
||||
|
||||
* Phase Two
|
||||
|
||||
* Containerize a set of flock agents to get edgeworker node inventoried
|
||||
* Enhance multiple Ceph cluster operation
|
||||
|
||||
* Phase Three
|
||||
|
||||
* Support Openstack running on edgeworker nodes
|
||||
* Support L3/Tunnel mgmt. network
|
||||
* Containerize rest of flock agents
|
||||
|
||||
* Phase Four
|
||||
|
||||
* Enable software management on edgeworker nodes
|
||||
* Enable optional authentication for new nodes
|
||||
* Extend target OS support
|
||||
|
||||
This spec focuses on *Phase One*.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
In a typical IoT or industrial use case, StarlingX is usually used to
|
||||
facilitate the whole edge cluster setup and management. But there are different
|
||||
types of nodes existing in the cluster that are not in current StarlingX
|
||||
management scope. Various reasons are hindering administrator to get these
|
||||
nodes deployed as 'worker' nodes, from software to hardware.
|
||||
In particular, the common setbacks are:
|
||||
|
||||
* OS of the nodes could not or don't want to be installed by StarlingX.
|
||||
* The nodes are running a Type I hypervisor.
|
||||
* The hardware resources do not meet StarlingX worker node's minimum
|
||||
requirement.
|
||||
* The nodes are connected to StarlingX controllers over a L3 network.
|
||||
|
||||
In this story, these nodes are categorized into a new personality to
|
||||
distinguish from 'worker' nodes. The new personality is called 'edgeworker'
|
||||
since these nodes are usually deployed close to the edge device side. An edge
|
||||
device could probably be an I/O device, a camera, a servo motor or a sensor.
|
||||
|
||||
The first three setbacks will be addressed in this phase one, while network
|
||||
requirement and manageability enhancement will be addressed in the next few
|
||||
phases. Separate specs for different phases will be submitted during different
|
||||
releases.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* Administrator wants to have all the 'edgeworker' nodes managed by StarlingX
|
||||
|
||||
* Make 'edgeworker' in the host list (Phase one)
|
||||
* Check/Lock/Unlock 'edgeworker' node state (Phase two)
|
||||
* Query 'edgeworker' hardware resources info (Phase two)
|
||||
* Configure 'edgeworker' resources for specific usage (Phase two and later)
|
||||
* Manage alarms generated by 'edgeworker' (Phase three)
|
||||
* Update 'edgeworker' packages (Phase four)
|
||||
|
||||
* Administrator does not want StarlingX to install OS on the 'edgeworker'
|
||||
nodes
|
||||
* User wants to orchestrate container workloads to 'edgeworker' nodes
|
||||
* User wants to orchestrate VM workloads to 'edgeworker' nodes as an option
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
**Edgeworker personality**
|
||||
|
||||
Adding a new personality will require changes in sysinv db, sysinv api and
|
||||
sysinv conductor, as well as cgts-client.
|
||||
|
||||
#. *sysinv db*
|
||||
|
||||
In order to get 'edgeworker' node into sysinv, the 'edgeworker' value will
|
||||
be added to enum type invPersonalityEnum in sysinv db. Accordingly, adding
|
||||
'edgeworker' to db models is required as well. After this change, a host
|
||||
from sysinv db perspective could be assigned as edgeworker personality.
|
||||
|
||||
#. *sysinv api*
|
||||
|
||||
Mainly focus on host api, adding checks during host add for 'edgeworker'
|
||||
hosts.
|
||||
Possible checks:
|
||||
|
||||
* mgmt ip if mgmt network is not dynamic
|
||||
* host name validation
|
||||
* personality check
|
||||
|
||||
#. *sysinv conductor*
|
||||
|
||||
sysinv conductor is responsible for mgmt ip allocation when the mgmt
|
||||
network is in dynamic type.
|
||||
|
||||
#. *cgts client*
|
||||
|
||||
Add 'edgeworker' choice for argument 'personality' of host-add/
|
||||
host-update command.
|
||||
|
||||
After underlying changes applied, the administrator is able to use
|
||||
|
||||
::
|
||||
|
||||
# system host-add -n <hostname> -p edgeworker
|
||||
or
|
||||
# system host-update <id> hostname=<hostname> personality=edgeworker
|
||||
|
||||
to add an edgeworker node to the inventory.
|
||||
|
||||
When an edgeworker node is added to inventory, sysinv could provide
|
||||
following services:
|
||||
|
||||
* DHCP service (Phase one)
|
||||
* Host lock/unlock (Phase two)
|
||||
* Host interface modification and assignment (Phase two)
|
||||
* Host hardware resource query (Phase two)
|
||||
* Label assignment (Phase two)
|
||||
|
||||
The function that will not be supported on edgeworker:
|
||||
|
||||
* host-upgrade
|
||||
* bmc integration
|
||||
|
||||
An edgeworker node is not a server, but a normal PC like industrial
|
||||
PC/NUC/workstation. BMC is not a required feature for those nodes. The node
|
||||
life cycle management is done in-band or by the maintainer manually. The use
|
||||
case which uses edgeworker nodes does not expect an out-of-band node
|
||||
management for these nodes.
|
||||
|
||||
Additional semantic check will be added for these functions.
|
||||
|
||||
Other functions will be described in detail in each phase's spec.
|
||||
|
||||
**ansible playbook for provisioning edgeworker nodes**
|
||||
|
||||
The main steps for provisioning an edgeworker node are installing kubelet,
|
||||
kubeadm and containerd packages to the node due to different Linux
|
||||
distributions and joining the node to StarlingX Kubernetes platform. Besides
|
||||
these steps, system configurations like ntp setup, interface configuration,
|
||||
dns setup etc. are needed as well.
|
||||
|
||||
The first two Linux distributions we propose to support for edgeworker are
|
||||
*Ubuntu* and *CentOS*.
|
||||
|
||||
The version of all the kubernetes packages on edgeworker nodes must be exactly
|
||||
the same as the packages on controllers. If they are not, the playbook will
|
||||
reinstall the packages to the proper version.
|
||||
|
||||
The playbook sequence to provision an edgeworker node:
|
||||
|
||||
#. Preparations on controller
|
||||
|
||||
* Send containerd config and cert to edgeworker
|
||||
* Generate K8S bootstrap token and calculate certificate hash
|
||||
|
||||
#. Preparations on edgeworker
|
||||
|
||||
* Config network (interface and dns)
|
||||
* Setup proxy if needed
|
||||
* Install essential packages
|
||||
* Setup ntp
|
||||
|
||||
#. Add edgeworker node to STX Kubernetes
|
||||
|
||||
* Install containerd, kubelet, kubeadm packages (based on OS)
|
||||
* Config sysctl and swap
|
||||
* Join k8s cluster
|
||||
* Download images
|
||||
|
||||
There will be one playbook with different roles included.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
There are several open source projects that can provision a Kubernetes node.
|
||||
|
||||
* Kubespray
|
||||
|
||||
Kubespray [1]_ is a composition of Ansible playbooks, inventory, provisioning
|
||||
tools, and domain knowledge for generic OS/Kubernetes clusters configuration
|
||||
management tasks. Kubespray performs generic OS configuration as well as
|
||||
Kubernetes cluster bootstrapping.
|
||||
|
||||
Kubespray provides the whole functionality of provisioning a Kubernetes
|
||||
node just like the edgeworker provisioning playbook does. However, Kubespray
|
||||
supports multiple container runtimes, multiple CNI plugins and control plane
|
||||
bootstrap which are too much in functionality to provision an edgeworker.
|
||||
|
||||
What edgeworker need is a playbook for certain container runtime, certain
|
||||
CNI plugins and provision a Kubernetes node only.
|
||||
|
||||
* KubeEdge
|
||||
|
||||
KubeEdge [2]_ is an open source system for extending native containerized
|
||||
application orchestration capabilities to hosts at Edge. KubeEdge could
|
||||
run upon an existing Kubernetes cluster and deploy a customized kubelet
|
||||
service called 'edged' to the edge node. In between the apiserver and edged,
|
||||
the EdgeController is the bridge who manages edge nodes and pods metadata
|
||||
so that the data can be targeted to a specific edge node.
|
||||
|
||||
KubeEdge is able to provision edge nodes from cloud. But the kubelet service
|
||||
is customized to fulfill the specific requirement that the administrator is
|
||||
able to manage the pods running on edge nodes from public cloud platform.
|
||||
The customized kubelet(edged) brings compatibility issues when Kubernetes
|
||||
upgrading to a newer release, which leads to an extra effort to test/upgrade
|
||||
KubeEdge during each Kubernetes upgrade since edgeworker provision is a key
|
||||
step to enable these nodes.
|
||||
|
||||
Besides, KubeEdge has a whole edge device management logic that is not in
|
||||
current StarlingX platform scope.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
The only data model change is to insert 'edgeworker' to 'invPersonalityEnum'
|
||||
in sysinv db model.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
The potential security threat and mitigation could be:
|
||||
|
||||
* Malicious node
|
||||
|
||||
It must be guaranteed by the administrator that no unauthorized node could
|
||||
physically connect into the management network.
|
||||
The authentication of the edgeworker node onboard will be introduced in the
|
||||
later phases.
|
||||
|
||||
* Malicious packages in edgeworker node
|
||||
|
||||
It must be guaranteed by the administrator that the packages running in
|
||||
edgeworker nodes are secure since the OS is managed by the administrator.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The deployer is required to run edgeworker provision playbook after adding or
|
||||
updating the node as edgeworker personality.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
The kubelet needs to be upgraded during the Kubernetes upgrade process. The
|
||||
upgrade process will trigger an additional script/playbook to check the version
|
||||
of the packages on edgeworker nodes, and upgrade them according to their own
|
||||
distribution.
|
||||
|
||||
The distribution's repo may not update the corresponding packages to the newest
|
||||
version, due to Kubernetes version skew support policy [3]_ , up to two minor
|
||||
versions older against apiserver is acceptable for kubelet and kube-proxy.
|
||||
|
||||
The SW patching/updating will be addressed in phase four. It could either be a
|
||||
3rd party solution or plugins of current SW management. Because current SW
|
||||
management could not patch/update packages other than RPMs, while the OS of
|
||||
edgeworker nodes could be different types of packages.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Mingyuan Qi
|
||||
|
||||
|
||||
Repos Impacted
|
||||
--------------
|
||||
|
||||
config
|
||||
ansible-playbook
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The work items are already introduced in section `Proposed change`_ above.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Sysinv unit test
|
||||
|
||||
* Sysinv host operation test
|
||||
|
||||
* Adding edgeworker nodes in different deploy mode test
|
||||
|
||||
* Simplex
|
||||
* Duplex
|
||||
* Standard
|
||||
|
||||
* Ansible-playbook test for each target OS
|
||||
|
||||
* Host configuration
|
||||
* Package installation
|
||||
* Edgeworker node join to the Kubernetes cluster
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* Add a new page to describe the edgeworker nodes requirement, limitation and use case.
|
||||
* Add new page to describe the following deployment:
|
||||
|
||||
* Duplex + edgeworker
|
||||
* Standard + edgeworker
|
||||
|
||||
* Modify all deployment docs to insert an option to deploy edgewoker nodes and link it
|
||||
to underlying deployment with edgeworker nodes.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] Kubespray https://github.com/kubernetes-sigs/kubespray
|
||||
.. [2] KubeEdge https://kubeedge.io
|
||||
.. [3] Kubernetes version skew policy https://kubernetes.io/docs/setup/release/version-skew-policy/
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - stx.5.0
|
||||
- Edgeworker management phase one introduced
|
Loading…
Reference in New Issue