Commit Graph

7 Commits

Author SHA1 Message Date
Scott Little 6f3d1e5dd2 cengn reference removal - centos
mirror.starlingx.cengn.ca no longer exists. CENGN is kindly forwarding
requests to the new location mirror.starlingx.windriver.com for now, but
that will only last a few months. We need to replace all the references
with the new URL.

I will also remove as many 'cengn' references as possible, replacing
them with 'stx_mirror'

Partial-Bug: 2033555
Signed-off-by: Scott Little <scott.little@windriver.com>
Change-Id: I09e3f564edef2049786c965a86dbcaacac359801
2023-11-07 11:23:12 -05:00
Davlet Panech ac49ff342c use curl + avoid partial downloads
Mirror scripts sometimes leave corrupted/partial files behind.

Problems
========

1) wget is called with the -O flag, and the server returns an HTTP
error for the requested URL (404 etc). Wget leaves a zero-length file
behind. This doesn't seem to happen without the -O flag.

2) wget starts the download which stalls & times out half-way; wget
gives up and requests the same file with a byte offset of the form
"Range: bytes=1234-", and the web server doesn't support open-ended
ranges. In this case wget prints out a warning, leaves a partial file
behind and returns success.

3) Sites like GitHub generate repo tarballs on the fly, eg:
https://github.com/kubernetes/kubernetes/archive/refs/tags/v1.19.3.tar.gz
Since tags can move, downloading such a file twice may result in a
different file. Therefore HTTP "resume download" may corrupt files in
this case.

4) Git "keyword expansion" feature may result in differences in source
files being downloaded. For example, this file:

  https://github.com/kubernetes/kubernetes/blob/v1.19.3/staging/src/k8s.io/component-base/version/base.go

contains lines similar to:

  gitVersion  = "v0.0.0-master+$Format:%h$"

where %h is replaced with a short SHA when the tar file is
exported/downloaded.  How short the SHA is depends on git history and
sometimes results in shortened SHAs of different lengths. So
downloading that file may result in different files.

Therefore HTTP "Range" header may corrupt files in this case as
well.

5) Curl is invoked with the "--retry" option and starts the download;
connection stalls; curl gives up, connects again, skips the 1st N
bytes and appends to the partial file. If the file changes while we
are doing this, it will end up corrupting the file. This is very
unlikely to happen and I haven't been able to reproduce this case.

Problems with HTTP Range header
===============================
Curl/wget "resume/continue download" feature has no way of verifying
whether the partial file on disk, and the one being re-requested, are in
fact the same file.  If the file changes on the server between
downloads, "resume download" will corrupt it.

Some web servers don't support this at all, which triggers case (2)
with wget.

Some web servers support the Range header, but require that the end
byte position is present. This is not compatible with wget & curl.
For example curl & wget add headers similar to: "Range: bytes=1234-"
means give me the file starting at offset 1234 and till EOF. This also
triggers case (2).

This patch
==========

* Always download the file to a temporary name, then rename into place

* Use curl instead wget (better error handling). The only exception is
"recursive downloads", which curl doesn't support.

Bug: https://bugs.launchpad.net/starlingx/+bug/1950017
Change-Id: Iaa89009ce23efe5b73ecb8163556ce6db932028b
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
2021-11-10 14:25:47 -05:00
Scott Little 9507d97d2a Parallel downloads
Download_mirror.sh takes 15 hours to download all the rpms and
tarballs required to build StarlingX into a fresh workspace.
It should be much faster than that.

Replace the current serial download algorithm with one that is
parallel.  I'll cap it at 8 parallel downloads for now.  I'm
a little worried about overwelming CENGN.  This is sufficient
to drop download times from 15 to 3 hours for a fresh workspace,
and 30 min to 5 min to refresh an existing workspace.

Closes-Bug: 1918477
Signed-off-by: Scott Little <scott.little@windriver.com>
Change-Id: I469b4fee3cb304fe2984aa697ce2dc6cec52e79e
2021-03-24 19:36:12 -04:00
Scott Little d51f8050cd Build layering, script and lst update
Script changes to download content by layer.
Valid options are 'all', 'compiler', 'distro', 'flock'.

Current .lst and yum files under directory centos-mirror-tools
are relocated.  Generic package dependencies are relocated to
centos-mirror-tools/config/<os>/<layer>/ .

Lst entries for compilable content have been relocated to other
git repos by prior updates.  i.e.  those that list tarballs or
srpms to be compiled within that repo.

The original .lst files are deleted to make it easier
to identify new content during development.

Layer 'all' builds all layers in a single workspace.  The
lst files are identical to current content, minus the src.rpm and
tarball entries.

Other layers get only a subset of packages download. The minimum
required to build the layer.  The 'flock' layer will have additional
content to satisfy the run time requirements as well as the build
time requirements.

An upper layer does not need to list rpms known to be provided by
a lower layer.  Instead the config file 'required_layer_pkgs.cfg'
lists urls for lst files for lower layer build outputs.
These build outputs are generated and published by cengn for
each layer.

A second layer config file, 'required_layer_iso_inc.cfg' lists
image.inc files for lower layer builds.  These build outputs are
generated and published by cengn for each layer, summarizing
the image.inc files found in individual git repos.

Image.inc files inform the build-iso process, listing rpms that
that provide services and commands that need to be included in
the iso.  The transitive list of required rpms need not be listed.

Finally the layer config should include a yum.repos.d
directory in which supplementary yum repos are defined to
pick up cengn built content from lower layers.

To allow a designer to do cross-layer building using local sources
rather than those provided by CENGN, there are several options.

The designer can modify the urls for lower layer build outputs,
as found in the .cfg and .repo files within the config directory
'stx-tools/centos-mirror-tools/config/<distro>/<layer-to-build>'
directly within the git.  Substitute urls can use the file:///
syntax.  Just be sure to remove these changes before submitting.

Alternatively new args have been added to download_mirror.sh,
generate-cgcs-centos-repo.sh and commands that override the
normal config.

The easiest to use is a command argurement that substitutes a new
config directory, replacing stx-tools/centos-mirror-tools/config.
The intent is for the designer to do a recursive copy of that
directory into a side location. make his changes there, outside of
git, and provide the path to that directory as an extra arguement
to download_mirror.sh and generate-cgcs-centos-repo.sh.

e.g. For simplicity I'll only list the 'extra' arguements

download_mirror.sh -C <my-config-dir> \
                   -l <layer> \
                   ...

generate-cgcs-centos-repo.sh --config-dir=<my-config-dir> \
                             --layer=<layer> \
                             ...

populate_downloads.sh --config-dir=<my-config-dir> \
                      --layer=<layer> \
                      ...

These arguements can also be suplied via the environment.
For the purpose of containerized builds, these arguements
should be defined in your localrc.

e.g.
export STX_CONFIG_DIR=<my-config-dir>
export LAYER=<layer>

The final alternative is to override things at a more granular level,
replacing a single lst file of image.inc file.  Here you can replace
a single line found in a required_layer_pkgs.cfg or
required_layer_iso_inc.cfg file.

e.g. We are doing a flock build and want to modify the content picked up
from the distro layer's rt build, and that content delivers a service
we want in the iso.  For simplicity I'll only list the 'extra' arguments

./download_mirror.sh -l flock \
    -L distro,rt,file:///<my-distro-workspace>/rt/rpmbuild/RPMS/rpm.lst \
    -I distro,std,file:///<my-distro-workspace>/rt/image.inc \
    ...

generate-cgcs-centos-repo.sh --layer=flock \
    --layer-pkg-url=distro,rt,file:///<my-distro-workspace>/rt/rpmbuild/RPMS/rpm.lst \
    --layer-inc-url=distro,std,file:////<my-distro-workspace>/rt/image.inc \
    ...

NOTE: The triplet syntax for a package list url is
    <lower-layer>,<build-type>,<url-to-rpm.lst>

    lower-layer: 'compiler', 'distro'
    build-type: 'std', 'rt', 'installer'

    Also if 'file:///' syntax is used, a matching change is made to
    the yum *.repo file.  This assumes that the rpm.lst is co-resident with
    repodata directory, as is the norm for our build outputs.

NOTE: The triplet syntax for a image inc url is
    <lower-layer>,<include-type>,<url-to-image.inc>

    lower-layer: 'compiler', 'distro'
    build-type: 'std', 'dev'

A typical user is likely only working in the flock layer on the master
branch.  He should be content to use the compiler and distro layer
outputs from cengn.

His workflow looks like ...

1, sync code for flock layer
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m flock.xml
$ repo sync

2, download rpms for flock layer, and populate a local mirror
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S -l flock
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/

3, Prepare a virtual repo and downloads directory for building
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --layer=flock $LOCAL_MIRROR
$ ./populate_downloads.sh --layer=flock $LOCAL_MIRROR

4, rpm package and iso building
$ build-pkgs && build-iso && build-helm-charts.sh

Building all layers in a single workspace is still supported, and
looks identical to the previous workflow.

1, sync code
$ repo init -u https://opendev.org/starlingx/manifest.git -b master
$ repo sync

2, download rpms for flock layer, and populate a local mirror
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/

3, create repo named "StxCentos7Distro" for building
$ cd ../toCOPY
$ generate-cgcs-centos-repo.sh $LOCAL_MIRROR
$ populate_downloads.sh $LOCAL_MIRROR

4, rpm package and iso building
$ build-pkgs && build-iso && build-helm-charts.sh

Only a cross-layer developer should setup two or three copies of the
building environment, one per layer.  We suggest you use seperate shells
for each layer, as the various paths (MY_REPO, MY_WORKSPACE ...) need to
be unique,

Shell 1, compiler layer
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ LOCAL_CONFIG=<some-dir>/config
$ MY_REPO_ROOT_DIR=<some-dir>/layer-compiler
$ MY_REPO=$MY_REPO_ROOT_DIR/cgcs-root
$ MY_WORKSPACE=$MY_REPO_ROOT_DIR/workspace
  ...

$ mkdir -p $MY_REPO_ROOT_DIR
$ cd $MY_REPO_ROOT_DIR
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m compiler.xml
$ cd stx-tools/centos-mirror-tools
$ cp -r config/* $LOCAL_CONFIG
 ... edit urls in *.cfg and *.repo files under $LOCAL_CONFIG ...

$ ./download_mirror.sh -n -g -c yum.conf.sample -S -C $LOCAL_CONFIG -l compiler
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --config-dir=$LOCAL_CONFIG --layer=compiler $LOCAL_MIRROR
$ ./populate_downloads.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ build-pkgs

Shell 2, distro layer
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ LOCAL_CONFIG=<some-dir>/config
$ MY_REPO_ROOT_DIR=<some-dir>/layer-distro
$ MY_REPO=$MY_REPO_ROOT_DIR/cgcs-root
$ MY_WORKSPACE=$MY_REPO_ROOT_DIR/workspace
  ...
$ mkdir -p $MY_REPO_ROOT_DIR
$ cd $MY_REPO_ROOT_DIR
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m distro.xml
$ repo sync
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S -C $LOCAL_CONFIG -l distro
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --config-dir=$LOCAL_CONFIG --layer=distro $LOCAL_MIRROR
$ ./populate_downloads.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ build-pkgs

Shell 3, flock layer
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ LOCAL_CONFIG=<some-dir>/config
$ MY_REPO_ROOT_DIR=<some-dir>/layer-flock
$ MY_REPO=$MY_REPO_ROOT_DIR/cgcs-root
$ MY_WORKSPACE=$MY_REPO_ROOT_DIR/workspace
  ...
$ mkdir -p $MY_REPO_ROOT_DIR
$ cd $MY_REPO_ROOT_DIR
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m flock.xml
$ repo sync
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S -C $LOCAL_CONFIG -l flock
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ ./populate_downloads.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ build-pkgs && build-iso && build-helm-charts.sh

Story: 2006166
Task: 37103

Depends-On: https://review.opendev.org/698756
Depends-On: https://review.opendev.org/700819
Depends-On: https://review.opendev.org/700821
Change-Id: I088020b81f08656e50aa29b5584bbc1dd1378f12
Signed-off-by: Scott Little <scott.little@windriver.com>
2020-02-10 10:45:40 -05:00
Scott Little 06171f9f81 download_mirrors.sh: cengn urls are not validated before substitution
Problem
=======
download_mirrors.sh by default substitutes urls with their cengn
equivalent. If the user is trying to introduce a new repo, a CENGN
mirror of the repo will not yet exist. The substituted url will be added
to the yum configuration despite being invalid. All subsequent
yumdownload attempts will fail with a 404 on the repodata of the
non-existent cengn repo.

Solution
========
   Only substitute yum repo urls with the cengn equivalent if
cengn actually has the repo.

Closes-bug: 1824877
Change-Id: Ifa262212d67e096cc29131e5738aec0365ed9893
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-04-17 18:42:00 +00:00
Saul Wold adfce3ef13 make_stx_mirror_yum_conf: Fix sed statement
The appending to repodir was missing () around the LHS reference and
give the following error:

sed: -e expression #1, char 56: invalid reference \1 on `s' command's RHS

Change-Id: Ib7f21a45a9debf42ae8d91be3e8288c7d0b97463
Signed-off-by: Saul Wold <sgw@linux.intel.com>
2019-03-14 20:55:27 -07:00
Scott Little ae68691bf0 Enable starlingx mirror
StarlingX needs to download a variety of rpms and tarballs
from various upstream sources.  Unfortunately the upstream sources
are not always dependable. Either servers go down, are unreachable,
or drop older content that we still depend on.

Our proposed solution is to run our own mirror to capture an
independent copy of the content needed by StarlingX.
For this purpose, a server has been set up at
http://mirror.starlingx.cengn.ca/mirror/centos

The mirror will use deterministic paths derived from the upstream
urls.  Scripts will be used to convert an upstream url to
the mirror's equivalent url (see function url_to_stx_mirror_url in
url_utils.sh)

The mirror will be refreshed daily.  New .lst entries will be
processed at that time.  Processing of changes under yum.repos.d
is not automated by this update.  Expect a follow-up update to
address this issue soon.  These scripts are found under the
'stx_mirror_scripts' subdirectory.

Changes are made to the download_mirror.sh script, and it's
supporting scripts.  New arguments have been added to each
script to select the download source.
   -s  StarlingX mirror only
   -S  StarlingX mirror, with upstream source as backup
   -u  Upstream source only
   -U  Upstream source, with StarlingX mirror as backup

You do not need to provide any of these flags.  Continue to
us download_mirror.sh as you always have.  The default
behavior is currently set to '-S', i.e. first try the
StarlingX mirror, with upstream source as backup.
If this proves to place to heavy a load on the existing
server, we might switch the default to '-U', i.e. first
try the upstream source, with StarlingX mirror as backup.
If poor download performance is seen, you might want to try
explicitly adding -U as an argument.

The remaining two options are not recommended for regular use.
Upstream only, i.e. '-u', restores original behaviour, but
you may once again encounter rpms that have aged out, and
been removed from their original repos.  StarlingX only,
i.e. '-s', is vulnerable if a .lst file has been updated,
but the mirror has not yet processed it.

Change-Id: I7e0f3d9fb99253662f9f4bf12457d39250408c0b
Story: 2003906
Task: 26785
Signed-off-by: Scott Little <scott.little@windriver.com>
2018-11-02 13:33:00 -04:00