stx-7.0: initial spec for Debian builds on K8s

Initial spec for adding support for full K8s to debian
build tools.

Story: 2009812
Task: 44374

Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
Change-Id: I3e640b8c9a14592db8924e893488a908770a7bdd
This commit is contained in:
Davlet Panech 2022-01-27 16:44:16 -05:00
parent 26bcbd6b2c
commit 960766e114
1 changed files with 390 additions and 0 deletions

View File

@ -0,0 +1,390 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode
======================================
StarlingX: Debian Builds on Kubernetes
======================================
Storyboard Story:
https://storyboard.openstack.org/#!/story/2009812
The new Debian build system [1] in conjunction with Minikube lacks support for
multiple projects, branches & users within the same environment. We propose a
Kubernetes infrastructure to remedy these shortcomings: a dedicated multi-node
build cluster with shared services, as well as the necessary tooling changes.
Problem Description
===================
The current implementation relies on Minikube a version of Kubernetes
optimized for a single-node, single user operation -- making it difficult to
share computing resources between multiple projects, branches, and users
within the same environment, particularly on a dedicated “daily” build server.
The Debian package repository service cannot be shared, which results in
excessive download times and disk usage.
There is no explicit support for CI environments, requiring additional
scripting in Jenkins or similar tools. Jenkinss approach to k8s integration
is not compatible with the current tooling, as it requires the top-level
scripts written in the “pipeline” domain-specific language. The best we can do
in Jenkins is call the StarlingX build scripts, bypassing Jenkins POD & node
scheduling & management mechanisms.
Use Cases
---------
This change would support infrastructure configurations covering the common
use cases described below.
Isolated single-user builds
^^^^^^^^^^^^^^^^^^^^^^^^^^^
An individual contributor wants to build individual packages and the
installation ISO; or docker images in an isolated, autonomous environment.
This use case is already supported by the build tool using Minikube, any
further changes must remain compatible with this type of environment.
Daily build server
^^^^^^^^^^^^^^^^^^
An organization wishes to maintain a server cluster for building multiple
projects or branches daily or on demand (this is the case with the current
StarlingX official build system). Tooling must support:
* Kubernetes clusters. Motivation: some organizations already have
Kubernetes clusters.
* StarlingX clusters. Motivation: “eat our own dog food”
* Multiple worker nodes. Motivation: allow for expanding computing resources
available to the build system.
* Ideally, clusters without a shared file system. Motivation: shared redundant
file systems are slow and difficult to implement and may not be available in
the target environment.
Build server open to individuals
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is a variation of the above, but with the option for individual
contributors to generate private builds based on their patches before pushing
them to source control. Motivation: this allows users to benefit from the more
powerful, centralized build server.
This use case is not addressed by the current spec. We believe the proposed
changes are sufficient to add this functionality in the future.
Proposed changes
================
We propose a build system that can run in any environment based on Kubernetes,
and a matching installation to drive daily builds on CENGN.
Change the build scripts to support vanilla k8s multi-user environments. This
includes making sure POD and directory names do not clash between users or
multiple projects/branches. Motivation: allow multiple users & projects in the
same environment.
Update helm charts to isolate the common parts between minikube and other k8s
environments.
Update the ``stx`` tool as it may be of limited or no use in full k8s
environments.
Replace Aptly with Pulp (package repository service). Motivation: Pulp
supports file types other than Debian packages, such as source archives used
as build inputs.
Update the package repository service container so that it can be shared among
multiple builds. Motivation: avoid unnecessary duplication of package files
that can be shared among different users on the same system.
Update other build containers to allow transient use (single command
execution). Motivation: efficient memory/CPU sharing among multiple builds.
::
xxxxxx Kubernetes or Minikube xxxxxxxxxxxxxxxxxxxxxxxx
x ┌──────────────┐ x
x │ User builder │ x
x ┌──────────────┐ ┌──┤ PODs │ x User 1
x │ Pulp │◄─────┐ │ └──────────────┘ x
x └──────────────┘ │ │ x
x │ │ ┌──────────────┐ x
x ┌──────────────┐ │ │ │ User builder │ x
x │ Other repos │◄─────┼────┼──┤ PODs │ x User 2
x └──────────────┘ │ │ └──────────────┘ x
x │ │ x
x ┌──────────────┐ │ │ ┌──────────────┐ x
x │ Docker reg │◄─────┘ │ │ User builder │ x
x └──────────────┘ └──┤ PODs │ x User 2
x └──────────────┘ x
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Additional repository services may be deployed in the cluster to support
specific types of data. Whether the build will require additional repository
services remains to be seen.
A docker registry may be deployed for managing intermediate containers used by
the build. Some environments may have a docker registry available outside of
the cluster, so this is optional. In particular, CENGN already has this service
(Docker Hub) available.
We propose installing Kubernetes on a single server to drive daily builds
(CENGN). Kubernetes will be configured to allow the addition of additional
nodes. Jenkins will be installed to trigger Tekton [2] builds and for
reporting.
Tekton is a CI pipeline engine designed specifically for Kubernetes. It is
command-line driven and may be used by the build tools directly to schedule
build jobs within the k8s cluster. Whether such direct usage is feasible or
useful is unclear at this point.
Outputs of released or otherwise important builds would need to be saved
indefinitely and backed up in case of hardware failures. On CENGN
availability of backup storage is to be determined.
Outputs of old non-released builds would be deleted regularly (builds older
than 2 weeks or similar). This includes all artifacts (log files, deb files,
ISOs).
Mirrors of 3rd-party files (tars, deb files) would be saved indefinitely.
Docker images would be built using kaniko [4] -- a tool to build container
images from a Dockerfile, inside a container or Kubernetes cluster. It allows
one to run "docker build" inside a docker container. This method is
appropriate for building Debian build tools images.
For the more complicated cases that need to acces docker in other ways, we
would use sysbox [5] -- a tool for running system software, including Docker,
inside docker containers. This method is appropriate for building application
images, such as Openstack containers.
Alternatives
------------
Tekton
^^^^^^
We do not have to use Tekton we could simply run build commands directly in
k8s PODs controlled by the build scripts (Python), with a Jenkins on top to
manage build schedules and artifacts archiving. This would require us to
maintain a sizable chunk of the pipeline logic in Jenkins. Jenkins is hard to
install and automate, making the testing of updates to the pipelines a
challenge. Jenkins automation API is somewhat unstable and uses an obscure
pipeline definition language. We expect a Tekton-based approach to be largely
free of these shortcomings.
On the other hand, Tekton is not as mature as Jenkins.
Docker image builds
^^^^^^^^^^^^^^^^^^^
To build docker images in k8s and instead of kaniko & sysbox, we could use
docker-in-docker [6]. This method has multiple problems linked to kernel
security and I/O performance [7].
We could also mount the hosts' daemon socket inside any containers/pods that
need to interact with docker. This would leave container instances behind on
the host and would require additional scripting to clean them up.
Impact on build tools installations
-----------------------------------
Individual contributors will be able to continue using Minikube, as they do
now.
Installing and configuring Kubernetes itself is beyond the scope of this
document. The services & POD definitions used by the build tools shall be
reusable (as Helm charts) no matter what the surrounding infrastructure looks
like.
Open questions
--------------
Persistent storage
^^^^^^^^^^^^^^^^^^
The builds would need to persist these types of files:
* Debian packages and other files (tarballs, etc) used as build inputs. This
will be handled by Pulp, whose underlying storage facility is to be
determined.
* Debian packages produced by the build. This will be handled by Pulp as well.
* Debian package mirror. This may be handled by Pulp as well. It is
currently implemented as a custom script on CENGN, outside of k8s.
* Other files produced by the build (ISO files, docker image list files). We
expect to use Pulp for this as well.
* Log files are normally stored within k8s itself, as well as in individual
POD cotainers. We would probably need to export them for ease of access.
CENGN users would expect log files as simple downloadable files, since we
not proposing making anu k8s GUIs available to the public at this point.
ElasticSearch may be helpful (searchable database of logs, among other
things), but it needs a lot of CPU & RAM.
* Docker images. Official images (ie build outputs) are to be published to
an external Docker registry (Docker Hub).
It is not clear whether the build would require a shared persistent file
system (eg for passing build artifacts between build steps). It is
difficult to implement and target k8s installations may not have one
available for use by us. Without a shared file system builds will take
longer to complete due to having to download and copy many files.
Contrast this with the older CentOS build system, which relies on a shared
file system and uses symbolic links for file sharing.
If a file system can't be shared - as a workaround - all builds' PODs
will have to be scheduled to run on the same node.
Downside: can't schedule PODs on different nodes
An object storage service (non-shared, artifacts to be copied, no symlinks,
etc, such as MinIO[3]) may be used for artifacts archiving, as well as for
passing artifacts between build stages.
Downside: slow.
NFS could be used as a shared file system.
Downside: slow
Ceph.
Downside: seems complicated.
Artifact retention & backups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
CENGN: not clear whether independent isolated backup storage is available
We could save backups on one of the 2 build servers, making sure important
files (released builds etc) are saved on 2 physical machines.
StarlingX vs Kubernetes
^^^^^^^^^^^^^^^^^^^^^^^
* Once StarlingX switches to Debian, the build server would have to be
re-imaged, this will cause disruption in daily builds.
* We do not need many of the functions that StarlingX provides, k8s is
sufficient.
* StarlingX is not optimized for running build jobs.
* If we use k8s we should pick a stable base OS with a long shelf life to
avoid upgrading it for longer, while being able to upgrade k8s at will.
* If we use StarlingX we should pick the latest official release (6.0).
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None for StarlingX deployments. Kubernetes clusters used for builds have
security implications that will have to be considered.
Other end user impact
---------------------
None
Performance impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
Current workflow based on Minikube will continue being supported. Organizations
will gain the ability to take advantage of full Kubernetes installations for
centralized builds.
Upgrade impact
--------------
None for StarlingX. Kubernetes upgrades are covered in [8].
Implementation
==============
Assignee(s)
-----------
* Davlet Panech - dpanech
* Luis Sampaio - lsampaio
Repos impacted
--------------
starlingx/tools
Work Items
----------
See storyboard.
Dependencies
============
None
Testing
=======
As the scope of this spec is restricted to the building of StarlingX
it does not introduce any additional runtime testing requirements. As
this change is proposed to take place alongside the move to Debian,
full runtime testing is expected related to that spec.
Building under full Kubernetes will require validation to ensure similar
outcomes as were expected when building in Minikube environment.
Documentation Impact
====================
StarlingX Build Guide
https://docs.starlingx.io/developer_resources/build_guide.html -
add instructions for full Kubernetes environments.
References
==========
[1] StarlingX: Debian Build Spec --
https://docs.starlingx.io/specs/specs/stx-6.0/approved/starlingx_2008846_debian_build.html
[2] Tekton, a CI pipeline engine for k8s --
https://tekton.dev/
[4] Kaniko, a tool to build container images from a Dockerfile, inside a
container or Kubernetes cluster --
https://github.com/GoogleContainerTools/kaniko
[3] MinIO, an Amazon S3 - compatible object storage system --
https://min.io/
[5] Sysbox, a container runtime that sits below Docker --
https://github.com/nestybox/sysbox
[6] Docker in Docker --
https://hub.docker.com/_/docker/
[7] Using Docker-in-Docker for your CI or testing environment? Think twice. --
https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
[8] Kubernetes - https://kubernetes.io/docs/home/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - STX-7.0
- Introduced