diff --git a/doc/source/specs/stx-6.0/approved/starlingx_2009812_debian_build_k8s.rst b/doc/source/specs/stx-6.0/approved/starlingx_2009812_debian_build_k8s.rst new file mode 100644 index 0000000..4100a0e --- /dev/null +++ b/doc/source/specs/stx-6.0/approved/starlingx_2009812_debian_build_k8s.rst @@ -0,0 +1,390 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + +====================================== +StarlingX: Debian Builds on Kubernetes +====================================== + +Storyboard Story: +https://storyboard.openstack.org/#!/story/2009812 + +The new Debian build system [1] in conjunction with Minikube lacks support for +multiple projects, branches & users within the same environment. We propose a +Kubernetes infrastructure to remedy these shortcomings: a dedicated multi-node +build cluster with shared services, as well as the necessary tooling changes. + +Problem Description +=================== + +The current implementation relies on Minikube – a version of Kubernetes +optimized for a single-node, single user operation -- making it difficult to +share computing resources between multiple projects, branches, and users +within the same environment, particularly on a dedicated “daily” build server. +The Debian package repository service cannot be shared, which results in +excessive download times and disk usage. + +There is no explicit support for CI environments, requiring additional +scripting in Jenkins or similar tools. Jenkins’s approach to k8s integration +is not compatible with the current tooling, as it requires the top-level +scripts written in the “pipeline” domain-specific language. The best we can do +in Jenkins is call the StarlingX build scripts, bypassing Jenkins’ POD & node +scheduling & management mechanisms. + +Use Cases +--------- + +This change would support infrastructure configurations covering the common +use cases described below. + +Isolated single-user builds +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +An individual contributor wants to build individual packages and the +installation ISO; or docker images – in an isolated, autonomous environment. +This use case is already supported by the build tool using Minikube, any +further changes must remain compatible with this type of environment. + +Daily build server +^^^^^^^^^^^^^^^^^^ + +An organization wishes to maintain a server cluster for building multiple +projects or branches daily or on demand (this is the case with the current +StarlingX official build system). Tooling must support: + +* Kubernetes clusters. Motivation: some organizations already have + Kubernetes clusters. +* StarlingX clusters. Motivation: “eat our own dog food” +* Multiple worker nodes. Motivation: allow for expanding computing resources + available to the build system. +* Ideally, clusters without a shared file system. Motivation: shared redundant + file systems are slow and difficult to implement and may not be available in + the target environment. + +Build server open to individuals +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This is a variation of the above, but with the option for individual +contributors to generate private builds based on their patches before pushing +them to source control. Motivation: this allows users to benefit from the more +powerful, centralized build server. + +This use case is not addressed by the current spec. We believe the proposed +changes are sufficient to add this functionality in the future. + +Proposed changes +================ + +We propose a build system that can run in any environment based on Kubernetes, +and a matching installation to drive daily builds on CENGN. + +Change the build scripts to support vanilla k8s multi-user environments. This +includes making sure POD and directory names do not clash between users or +multiple projects/branches. Motivation: allow multiple users & projects in the +same environment. + +Update helm charts to isolate the common parts between minikube and other k8s +environments. + +Update the ``stx`` tool as it may be of limited or no use in full k8s +environments. + +Replace Aptly with Pulp (package repository service). Motivation: Pulp +supports file types other than Debian packages, such as source archives used +as build inputs. + +Update the package repository service container so that it can be shared among +multiple builds. Motivation: avoid unnecessary duplication of package files +that can be shared among different users on the same system. + +Update other build containers to allow transient use (single command +execution). Motivation: efficient memory/CPU sharing among multiple builds. +:: + + xxxxxx Kubernetes or Minikube xxxxxxxxxxxxxxxxxxxxxxxx + x ┌──────────────┐ x + x │ User builder │ x + x ┌──────────────┐ ┌──┤ PODs │ x User 1 + x │ Pulp │◄─────┐ │ └──────────────┘ x + x └──────────────┘ │ │ x + x │ │ ┌──────────────┐ x + x ┌──────────────┐ │ │ │ User builder │ x + x │ Other repos │◄─────┼────┼──┤ PODs │ x User 2 + x └──────────────┘ │ │ └──────────────┘ x + x │ │ x + x ┌──────────────┐ │ │ ┌──────────────┐ x + x │ Docker reg │◄─────┘ │ │ User builder │ x + x └──────────────┘ └──┤ PODs │ x User 2 + x └──────────────┘ x + xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx + + +Additional repository services may be deployed in the cluster to support +specific types of data. Whether the build will require additional repository +services remains to be seen. + +A docker registry may be deployed for managing intermediate containers used by +the build. Some environments may have a docker registry available outside of +the cluster, so this is optional. In particular, CENGN already has this service +(Docker Hub) available. + +We propose installing Kubernetes on a single server to drive daily builds +(CENGN). Kubernetes will be configured to allow the addition of additional +nodes. Jenkins will be installed to trigger Tekton [2] builds and for +reporting. + +Tekton is a CI pipeline engine designed specifically for Kubernetes. It is +command-line driven and may be used by the build tools directly to schedule +build jobs within the k8s cluster. Whether such direct usage is feasible or +useful is unclear at this point. + +Outputs of released or otherwise important builds would need to be saved +indefinitely and backed up in case of hardware failures. On CENGN +availability of backup storage is to be determined. + +Outputs of old non-released builds would be deleted regularly (builds older +than 2 weeks or similar). This includes all artifacts (log files, deb files, +ISOs). + +Mirrors of 3rd-party files (tars, deb files) would be saved indefinitely. + +Docker images would be built using kaniko [4] -- a tool to build container +images from a Dockerfile, inside a container or Kubernetes cluster. It allows +one to run "docker build" inside a docker container. This method is +appropriate for building Debian build tools images. + +For the more complicated cases that need to acces docker in other ways, we +would use sysbox [5] -- a tool for running system software, including Docker, +inside docker containers. This method is appropriate for building application +images, such as Openstack containers. + +Alternatives +------------ + +Tekton +^^^^^^ + +We do not have to use Tekton – we could simply run build commands directly in +k8s PODs controlled by the build scripts (Python), with a Jenkins on top to +manage build schedules and artifacts archiving. This would require us to +maintain a sizable chunk of the pipeline logic in Jenkins. Jenkins is hard to +install and automate, making the testing of updates to the pipelines a +challenge. Jenkins’ automation API is somewhat unstable and uses an obscure +pipeline definition language. We expect a Tekton-based approach to be largely +free of these shortcomings. + +On the other hand, Tekton is not as mature as Jenkins. + +Docker image builds +^^^^^^^^^^^^^^^^^^^ + +To build docker images in k8s and instead of kaniko & sysbox, we could use +docker-in-docker [6]. This method has multiple problems linked to kernel +security and I/O performance [7]. + +We could also mount the hosts' daemon socket inside any containers/pods that +need to interact with docker. This would leave container instances behind on +the host and would require additional scripting to clean them up. + +Impact on build tools installations +----------------------------------- + +Individual contributors will be able to continue using Minikube, as they do +now. + +Installing and configuring Kubernetes itself is beyond the scope of this +document. The services & POD definitions used by the build tools shall be +reusable (as Helm charts) no matter what the surrounding infrastructure looks +like. + +Open questions +-------------- + +Persistent storage +^^^^^^^^^^^^^^^^^^ + +The builds would need to persist these types of files: + +* Debian packages and other files (tarballs, etc) used as build inputs. This + will be handled by Pulp, whose underlying storage facility is to be + determined. +* Debian packages produced by the build. This will be handled by Pulp as well. +* Debian package mirror. This may be handled by Pulp as well. It is + currently implemented as a custom script on CENGN, outside of k8s. +* Other files produced by the build (ISO files, docker image list files). We + expect to use Pulp for this as well. +* Log files are normally stored within k8s itself, as well as in individual + POD cotainers. We would probably need to export them for ease of access. + CENGN users would expect log files as simple downloadable files, since we + not proposing making anu k8s GUIs available to the public at this point. + ElasticSearch may be helpful (searchable database of logs, among other + things), but it needs a lot of CPU & RAM. +* Docker images. Official images (ie build outputs) are to be published to + an external Docker registry (Docker Hub). + +It is not clear whether the build would require a shared persistent file +system (eg for passing build artifacts between build steps). It is +difficult to implement and target k8s installations may not have one +available for use by us. Without a shared file system builds will take +longer to complete due to having to download and copy many files. +Contrast this with the older CentOS build system, which relies on a shared +file system and uses symbolic links for file sharing. + +If a file system can't be shared - as a workaround - all builds' PODs +will have to be scheduled to run on the same node. +Downside: can't schedule PODs on different nodes + +An object storage service (non-shared, artifacts to be copied, no symlinks, +etc, such as MinIO[3]) may be used for artifacts archiving, as well as for +passing artifacts between build stages. +Downside: slow. + +NFS could be used as a shared file system. +Downside: slow + +Ceph. +Downside: seems complicated. + +Artifact retention & backups +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +CENGN: not clear whether independent isolated backup storage is available +We could save backups on one of the 2 build servers, making sure important +files (released builds etc) are saved on 2 physical machines. + +StarlingX vs Kubernetes +^^^^^^^^^^^^^^^^^^^^^^^ +* Once StarlingX switches to Debian, the build server would have to be + re-imaged, this will cause disruption in daily builds. +* We do not need many of the functions that StarlingX provides, k8s is + sufficient. +* StarlingX is not optimized for running build jobs. +* If we use k8s we should pick a stable base OS with a long shelf life to + avoid upgrading it for longer, while being able to upgrade k8s at will. +* If we use StarlingX we should pick the latest official release (6.0). + + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Security impact +--------------- + +None for StarlingX deployments. Kubernetes clusters used for builds have +security implications that will have to be considered. + + +Other end user impact +--------------------- + +None + +Performance impact +------------------ + +None + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +Current workflow based on Minikube will continue being supported. Organizations +will gain the ability to take advantage of full Kubernetes installations for +centralized builds. + +Upgrade impact +-------------- + +None for StarlingX. Kubernetes upgrades are covered in [8]. + +Implementation +============== + +Assignee(s) +----------- + +* Davlet Panech - dpanech +* Luis Sampaio - lsampaio + + +Repos impacted +-------------- + +starlingx/tools + +Work Items +---------- + +See storyboard. + +Dependencies +============ + +None + +Testing +======= + +As the scope of this spec is restricted to the building of StarlingX +it does not introduce any additional runtime testing requirements. As +this change is proposed to take place alongside the move to Debian, +full runtime testing is expected related to that spec. + +Building under full Kubernetes will require validation to ensure similar +outcomes as were expected when building in Minikube environment. + +Documentation Impact +==================== + +StarlingX Build Guide +https://docs.starlingx.io/developer_resources/build_guide.html - +add instructions for full Kubernetes environments. + + +References +========== + +[1] StarlingX: Debian Build Spec -- +https://docs.starlingx.io/specs/specs/stx-6.0/approved/starlingx_2008846_debian_build.html + +[2] Tekton, a CI pipeline engine for k8s -- +https://tekton.dev/ + +[4] Kaniko, a tool to build container images from a Dockerfile, inside a +container or Kubernetes cluster -- +https://github.com/GoogleContainerTools/kaniko + +[3] MinIO, an Amazon S3 - compatible object storage system -- +https://min.io/ + +[5] Sysbox, a container runtime that sits below Docker -- +https://github.com/nestybox/sysbox + +[6] Docker in Docker -- +https://hub.docker.com/_/docker/ + +[7] Using Docker-in-Docker for your CI or testing environment? Think twice. -- +https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/ + +[8] Kubernetes - https://kubernetes.io/docs/home/ + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - STX-7.0 + - Introduced