stx-7.0: initial spec for Debian builds on K8s
Initial spec for adding support for full K8s to debian build tools. Story: 2009812 Task: 44374 Signed-off-by: Davlet Panech <davlet.panech@windriver.com> Change-Id: I3e640b8c9a14592db8924e893488a908770a7bdd
This commit is contained in:
parent
26bcbd6b2c
commit
960766e114
|
@ -0,0 +1,390 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
======================================
|
||||||
|
StarlingX: Debian Builds on Kubernetes
|
||||||
|
======================================
|
||||||
|
|
||||||
|
Storyboard Story:
|
||||||
|
https://storyboard.openstack.org/#!/story/2009812
|
||||||
|
|
||||||
|
The new Debian build system [1] in conjunction with Minikube lacks support for
|
||||||
|
multiple projects, branches & users within the same environment. We propose a
|
||||||
|
Kubernetes infrastructure to remedy these shortcomings: a dedicated multi-node
|
||||||
|
build cluster with shared services, as well as the necessary tooling changes.
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
|
||||||
|
The current implementation relies on Minikube – a version of Kubernetes
|
||||||
|
optimized for a single-node, single user operation -- making it difficult to
|
||||||
|
share computing resources between multiple projects, branches, and users
|
||||||
|
within the same environment, particularly on a dedicated “daily” build server.
|
||||||
|
The Debian package repository service cannot be shared, which results in
|
||||||
|
excessive download times and disk usage.
|
||||||
|
|
||||||
|
There is no explicit support for CI environments, requiring additional
|
||||||
|
scripting in Jenkins or similar tools. Jenkins’s approach to k8s integration
|
||||||
|
is not compatible with the current tooling, as it requires the top-level
|
||||||
|
scripts written in the “pipeline” domain-specific language. The best we can do
|
||||||
|
in Jenkins is call the StarlingX build scripts, bypassing Jenkins’ POD & node
|
||||||
|
scheduling & management mechanisms.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
This change would support infrastructure configurations covering the common
|
||||||
|
use cases described below.
|
||||||
|
|
||||||
|
Isolated single-user builds
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
An individual contributor wants to build individual packages and the
|
||||||
|
installation ISO; or docker images – in an isolated, autonomous environment.
|
||||||
|
This use case is already supported by the build tool using Minikube, any
|
||||||
|
further changes must remain compatible with this type of environment.
|
||||||
|
|
||||||
|
Daily build server
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
An organization wishes to maintain a server cluster for building multiple
|
||||||
|
projects or branches daily or on demand (this is the case with the current
|
||||||
|
StarlingX official build system). Tooling must support:
|
||||||
|
|
||||||
|
* Kubernetes clusters. Motivation: some organizations already have
|
||||||
|
Kubernetes clusters.
|
||||||
|
* StarlingX clusters. Motivation: “eat our own dog food”
|
||||||
|
* Multiple worker nodes. Motivation: allow for expanding computing resources
|
||||||
|
available to the build system.
|
||||||
|
* Ideally, clusters without a shared file system. Motivation: shared redundant
|
||||||
|
file systems are slow and difficult to implement and may not be available in
|
||||||
|
the target environment.
|
||||||
|
|
||||||
|
Build server open to individuals
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
This is a variation of the above, but with the option for individual
|
||||||
|
contributors to generate private builds based on their patches before pushing
|
||||||
|
them to source control. Motivation: this allows users to benefit from the more
|
||||||
|
powerful, centralized build server.
|
||||||
|
|
||||||
|
This use case is not addressed by the current spec. We believe the proposed
|
||||||
|
changes are sufficient to add this functionality in the future.
|
||||||
|
|
||||||
|
Proposed changes
|
||||||
|
================
|
||||||
|
|
||||||
|
We propose a build system that can run in any environment based on Kubernetes,
|
||||||
|
and a matching installation to drive daily builds on CENGN.
|
||||||
|
|
||||||
|
Change the build scripts to support vanilla k8s multi-user environments. This
|
||||||
|
includes making sure POD and directory names do not clash between users or
|
||||||
|
multiple projects/branches. Motivation: allow multiple users & projects in the
|
||||||
|
same environment.
|
||||||
|
|
||||||
|
Update helm charts to isolate the common parts between minikube and other k8s
|
||||||
|
environments.
|
||||||
|
|
||||||
|
Update the ``stx`` tool as it may be of limited or no use in full k8s
|
||||||
|
environments.
|
||||||
|
|
||||||
|
Replace Aptly with Pulp (package repository service). Motivation: Pulp
|
||||||
|
supports file types other than Debian packages, such as source archives used
|
||||||
|
as build inputs.
|
||||||
|
|
||||||
|
Update the package repository service container so that it can be shared among
|
||||||
|
multiple builds. Motivation: avoid unnecessary duplication of package files
|
||||||
|
that can be shared among different users on the same system.
|
||||||
|
|
||||||
|
Update other build containers to allow transient use (single command
|
||||||
|
execution). Motivation: efficient memory/CPU sharing among multiple builds.
|
||||||
|
::
|
||||||
|
|
||||||
|
xxxxxx Kubernetes or Minikube xxxxxxxxxxxxxxxxxxxxxxxx
|
||||||
|
x ┌──────────────┐ x
|
||||||
|
x │ User builder │ x
|
||||||
|
x ┌──────────────┐ ┌──┤ PODs │ x User 1
|
||||||
|
x │ Pulp │◄─────┐ │ └──────────────┘ x
|
||||||
|
x └──────────────┘ │ │ x
|
||||||
|
x │ │ ┌──────────────┐ x
|
||||||
|
x ┌──────────────┐ │ │ │ User builder │ x
|
||||||
|
x │ Other repos │◄─────┼────┼──┤ PODs │ x User 2
|
||||||
|
x └──────────────┘ │ │ └──────────────┘ x
|
||||||
|
x │ │ x
|
||||||
|
x ┌──────────────┐ │ │ ┌──────────────┐ x
|
||||||
|
x │ Docker reg │◄─────┘ │ │ User builder │ x
|
||||||
|
x └──────────────┘ └──┤ PODs │ x User 2
|
||||||
|
x └──────────────┘ x
|
||||||
|
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||||
|
|
||||||
|
|
||||||
|
Additional repository services may be deployed in the cluster to support
|
||||||
|
specific types of data. Whether the build will require additional repository
|
||||||
|
services remains to be seen.
|
||||||
|
|
||||||
|
A docker registry may be deployed for managing intermediate containers used by
|
||||||
|
the build. Some environments may have a docker registry available outside of
|
||||||
|
the cluster, so this is optional. In particular, CENGN already has this service
|
||||||
|
(Docker Hub) available.
|
||||||
|
|
||||||
|
We propose installing Kubernetes on a single server to drive daily builds
|
||||||
|
(CENGN). Kubernetes will be configured to allow the addition of additional
|
||||||
|
nodes. Jenkins will be installed to trigger Tekton [2] builds and for
|
||||||
|
reporting.
|
||||||
|
|
||||||
|
Tekton is a CI pipeline engine designed specifically for Kubernetes. It is
|
||||||
|
command-line driven and may be used by the build tools directly to schedule
|
||||||
|
build jobs within the k8s cluster. Whether such direct usage is feasible or
|
||||||
|
useful is unclear at this point.
|
||||||
|
|
||||||
|
Outputs of released or otherwise important builds would need to be saved
|
||||||
|
indefinitely and backed up in case of hardware failures. On CENGN
|
||||||
|
availability of backup storage is to be determined.
|
||||||
|
|
||||||
|
Outputs of old non-released builds would be deleted regularly (builds older
|
||||||
|
than 2 weeks or similar). This includes all artifacts (log files, deb files,
|
||||||
|
ISOs).
|
||||||
|
|
||||||
|
Mirrors of 3rd-party files (tars, deb files) would be saved indefinitely.
|
||||||
|
|
||||||
|
Docker images would be built using kaniko [4] -- a tool to build container
|
||||||
|
images from a Dockerfile, inside a container or Kubernetes cluster. It allows
|
||||||
|
one to run "docker build" inside a docker container. This method is
|
||||||
|
appropriate for building Debian build tools images.
|
||||||
|
|
||||||
|
For the more complicated cases that need to acces docker in other ways, we
|
||||||
|
would use sysbox [5] -- a tool for running system software, including Docker,
|
||||||
|
inside docker containers. This method is appropriate for building application
|
||||||
|
images, such as Openstack containers.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Tekton
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
We do not have to use Tekton – we could simply run build commands directly in
|
||||||
|
k8s PODs controlled by the build scripts (Python), with a Jenkins on top to
|
||||||
|
manage build schedules and artifacts archiving. This would require us to
|
||||||
|
maintain a sizable chunk of the pipeline logic in Jenkins. Jenkins is hard to
|
||||||
|
install and automate, making the testing of updates to the pipelines a
|
||||||
|
challenge. Jenkins’ automation API is somewhat unstable and uses an obscure
|
||||||
|
pipeline definition language. We expect a Tekton-based approach to be largely
|
||||||
|
free of these shortcomings.
|
||||||
|
|
||||||
|
On the other hand, Tekton is not as mature as Jenkins.
|
||||||
|
|
||||||
|
Docker image builds
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
To build docker images in k8s and instead of kaniko & sysbox, we could use
|
||||||
|
docker-in-docker [6]. This method has multiple problems linked to kernel
|
||||||
|
security and I/O performance [7].
|
||||||
|
|
||||||
|
We could also mount the hosts' daemon socket inside any containers/pods that
|
||||||
|
need to interact with docker. This would leave container instances behind on
|
||||||
|
the host and would require additional scripting to clean them up.
|
||||||
|
|
||||||
|
Impact on build tools installations
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
|
Individual contributors will be able to continue using Minikube, as they do
|
||||||
|
now.
|
||||||
|
|
||||||
|
Installing and configuring Kubernetes itself is beyond the scope of this
|
||||||
|
document. The services & POD definitions used by the build tools shall be
|
||||||
|
reusable (as Helm charts) no matter what the surrounding infrastructure looks
|
||||||
|
like.
|
||||||
|
|
||||||
|
Open questions
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Persistent storage
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The builds would need to persist these types of files:
|
||||||
|
|
||||||
|
* Debian packages and other files (tarballs, etc) used as build inputs. This
|
||||||
|
will be handled by Pulp, whose underlying storage facility is to be
|
||||||
|
determined.
|
||||||
|
* Debian packages produced by the build. This will be handled by Pulp as well.
|
||||||
|
* Debian package mirror. This may be handled by Pulp as well. It is
|
||||||
|
currently implemented as a custom script on CENGN, outside of k8s.
|
||||||
|
* Other files produced by the build (ISO files, docker image list files). We
|
||||||
|
expect to use Pulp for this as well.
|
||||||
|
* Log files are normally stored within k8s itself, as well as in individual
|
||||||
|
POD cotainers. We would probably need to export them for ease of access.
|
||||||
|
CENGN users would expect log files as simple downloadable files, since we
|
||||||
|
not proposing making anu k8s GUIs available to the public at this point.
|
||||||
|
ElasticSearch may be helpful (searchable database of logs, among other
|
||||||
|
things), but it needs a lot of CPU & RAM.
|
||||||
|
* Docker images. Official images (ie build outputs) are to be published to
|
||||||
|
an external Docker registry (Docker Hub).
|
||||||
|
|
||||||
|
It is not clear whether the build would require a shared persistent file
|
||||||
|
system (eg for passing build artifacts between build steps). It is
|
||||||
|
difficult to implement and target k8s installations may not have one
|
||||||
|
available for use by us. Without a shared file system builds will take
|
||||||
|
longer to complete due to having to download and copy many files.
|
||||||
|
Contrast this with the older CentOS build system, which relies on a shared
|
||||||
|
file system and uses symbolic links for file sharing.
|
||||||
|
|
||||||
|
If a file system can't be shared - as a workaround - all builds' PODs
|
||||||
|
will have to be scheduled to run on the same node.
|
||||||
|
Downside: can't schedule PODs on different nodes
|
||||||
|
|
||||||
|
An object storage service (non-shared, artifacts to be copied, no symlinks,
|
||||||
|
etc, such as MinIO[3]) may be used for artifacts archiving, as well as for
|
||||||
|
passing artifacts between build stages.
|
||||||
|
Downside: slow.
|
||||||
|
|
||||||
|
NFS could be used as a shared file system.
|
||||||
|
Downside: slow
|
||||||
|
|
||||||
|
Ceph.
|
||||||
|
Downside: seems complicated.
|
||||||
|
|
||||||
|
Artifact retention & backups
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
CENGN: not clear whether independent isolated backup storage is available
|
||||||
|
We could save backups on one of the 2 build servers, making sure important
|
||||||
|
files (released builds etc) are saved on 2 physical machines.
|
||||||
|
|
||||||
|
StarlingX vs Kubernetes
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Once StarlingX switches to Debian, the build server would have to be
|
||||||
|
re-imaged, this will cause disruption in daily builds.
|
||||||
|
* We do not need many of the functions that StarlingX provides, k8s is
|
||||||
|
sufficient.
|
||||||
|
* StarlingX is not optimized for running build jobs.
|
||||||
|
* If we use k8s we should pick a stable base OS with a long shelf life to
|
||||||
|
avoid upgrading it for longer, while being able to upgrade k8s at will.
|
||||||
|
* If we use StarlingX we should pick the latest official release (6.0).
|
||||||
|
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None for StarlingX deployments. Kubernetes clusters used for builds have
|
||||||
|
security implications that will have to be considered.
|
||||||
|
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Current workflow based on Minikube will continue being supported. Organizations
|
||||||
|
will gain the ability to take advantage of full Kubernetes installations for
|
||||||
|
centralized builds.
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
None for StarlingX. Kubernetes upgrades are covered in [8].
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
* Davlet Panech - dpanech
|
||||||
|
* Luis Sampaio - lsampaio
|
||||||
|
|
||||||
|
|
||||||
|
Repos impacted
|
||||||
|
--------------
|
||||||
|
|
||||||
|
starlingx/tools
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
See storyboard.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
As the scope of this spec is restricted to the building of StarlingX
|
||||||
|
it does not introduce any additional runtime testing requirements. As
|
||||||
|
this change is proposed to take place alongside the move to Debian,
|
||||||
|
full runtime testing is expected related to that spec.
|
||||||
|
|
||||||
|
Building under full Kubernetes will require validation to ensure similar
|
||||||
|
outcomes as were expected when building in Minikube environment.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
StarlingX Build Guide
|
||||||
|
https://docs.starlingx.io/developer_resources/build_guide.html -
|
||||||
|
add instructions for full Kubernetes environments.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
[1] StarlingX: Debian Build Spec --
|
||||||
|
https://docs.starlingx.io/specs/specs/stx-6.0/approved/starlingx_2008846_debian_build.html
|
||||||
|
|
||||||
|
[2] Tekton, a CI pipeline engine for k8s --
|
||||||
|
https://tekton.dev/
|
||||||
|
|
||||||
|
[4] Kaniko, a tool to build container images from a Dockerfile, inside a
|
||||||
|
container or Kubernetes cluster --
|
||||||
|
https://github.com/GoogleContainerTools/kaniko
|
||||||
|
|
||||||
|
[3] MinIO, an Amazon S3 - compatible object storage system --
|
||||||
|
https://min.io/
|
||||||
|
|
||||||
|
[5] Sysbox, a container runtime that sits below Docker --
|
||||||
|
https://github.com/nestybox/sysbox
|
||||||
|
|
||||||
|
[6] Docker in Docker --
|
||||||
|
https://hub.docker.com/_/docker/
|
||||||
|
|
||||||
|
[7] Using Docker-in-Docker for your CI or testing environment? Think twice. --
|
||||||
|
https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
|
||||||
|
|
||||||
|
[8] Kubernetes - https://kubernetes.io/docs/home/
|
||||||
|
|
||||||
|
History
|
||||||
|
=======
|
||||||
|
|
||||||
|
.. list-table:: Revisions
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Release Name
|
||||||
|
- Description
|
||||||
|
* - STX-7.0
|
||||||
|
- Introduced
|
Loading…
Reference in New Issue