Commit Graph

7 Commits

Author SHA1 Message Date
Davlet Panech ac49ff342c use curl + avoid partial downloads
Mirror scripts sometimes leave corrupted/partial files behind.

Problems
========

1) wget is called with the -O flag, and the server returns an HTTP
error for the requested URL (404 etc). Wget leaves a zero-length file
behind. This doesn't seem to happen without the -O flag.

2) wget starts the download which stalls & times out half-way; wget
gives up and requests the same file with a byte offset of the form
"Range: bytes=1234-", and the web server doesn't support open-ended
ranges. In this case wget prints out a warning, leaves a partial file
behind and returns success.

3) Sites like GitHub generate repo tarballs on the fly, eg:
https://github.com/kubernetes/kubernetes/archive/refs/tags/v1.19.3.tar.gz
Since tags can move, downloading such a file twice may result in a
different file. Therefore HTTP "resume download" may corrupt files in
this case.

4) Git "keyword expansion" feature may result in differences in source
files being downloaded. For example, this file:

  https://github.com/kubernetes/kubernetes/blob/v1.19.3/staging/src/k8s.io/component-base/version/base.go

contains lines similar to:

  gitVersion  = "v0.0.0-master+$Format:%h$"

where %h is replaced with a short SHA when the tar file is
exported/downloaded.  How short the SHA is depends on git history and
sometimes results in shortened SHAs of different lengths. So
downloading that file may result in different files.

Therefore HTTP "Range" header may corrupt files in this case as
well.

5) Curl is invoked with the "--retry" option and starts the download;
connection stalls; curl gives up, connects again, skips the 1st N
bytes and appends to the partial file. If the file changes while we
are doing this, it will end up corrupting the file. This is very
unlikely to happen and I haven't been able to reproduce this case.

Problems with HTTP Range header
===============================
Curl/wget "resume/continue download" feature has no way of verifying
whether the partial file on disk, and the one being re-requested, are in
fact the same file.  If the file changes on the server between
downloads, "resume download" will corrupt it.

Some web servers don't support this at all, which triggers case (2)
with wget.

Some web servers support the Range header, but require that the end
byte position is present. This is not compatible with wget & curl.
For example curl & wget add headers similar to: "Range: bytes=1234-"
means give me the file starting at offset 1234 and till EOF. This also
triggers case (2).

This patch
==========

* Always download the file to a temporary name, then rename into place

* Use curl instead wget (better error handling). The only exception is
"recursive downloads", which curl doesn't support.

Bug: https://bugs.launchpad.net/starlingx/+bug/1950017
Change-Id: Iaa89009ce23efe5b73ecb8163556ce6db932028b
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
2021-11-10 14:25:47 -05:00
Davlet Panech 1f3f3e08bc Update CentOS vault URL to HTTPS
CentOS vault HTTP url redirects to HTTPS, but yum displays misleading
error messages when the secondary HTTPS request fails.
Use HTTPS directly.

Change-Id: I687c1de2378de11abb5ad981bc73b66d8c40ba2a
Closes-Bug: 1908088
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
2020-12-14 11:32:12 -05:00
Scott Little d51f8050cd Build layering, script and lst update
Script changes to download content by layer.
Valid options are 'all', 'compiler', 'distro', 'flock'.

Current .lst and yum files under directory centos-mirror-tools
are relocated.  Generic package dependencies are relocated to
centos-mirror-tools/config/<os>/<layer>/ .

Lst entries for compilable content have been relocated to other
git repos by prior updates.  i.e.  those that list tarballs or
srpms to be compiled within that repo.

The original .lst files are deleted to make it easier
to identify new content during development.

Layer 'all' builds all layers in a single workspace.  The
lst files are identical to current content, minus the src.rpm and
tarball entries.

Other layers get only a subset of packages download. The minimum
required to build the layer.  The 'flock' layer will have additional
content to satisfy the run time requirements as well as the build
time requirements.

An upper layer does not need to list rpms known to be provided by
a lower layer.  Instead the config file 'required_layer_pkgs.cfg'
lists urls for lst files for lower layer build outputs.
These build outputs are generated and published by cengn for
each layer.

A second layer config file, 'required_layer_iso_inc.cfg' lists
image.inc files for lower layer builds.  These build outputs are
generated and published by cengn for each layer, summarizing
the image.inc files found in individual git repos.

Image.inc files inform the build-iso process, listing rpms that
that provide services and commands that need to be included in
the iso.  The transitive list of required rpms need not be listed.

Finally the layer config should include a yum.repos.d
directory in which supplementary yum repos are defined to
pick up cengn built content from lower layers.

To allow a designer to do cross-layer building using local sources
rather than those provided by CENGN, there are several options.

The designer can modify the urls for lower layer build outputs,
as found in the .cfg and .repo files within the config directory
'stx-tools/centos-mirror-tools/config/<distro>/<layer-to-build>'
directly within the git.  Substitute urls can use the file:///
syntax.  Just be sure to remove these changes before submitting.

Alternatively new args have been added to download_mirror.sh,
generate-cgcs-centos-repo.sh and commands that override the
normal config.

The easiest to use is a command argurement that substitutes a new
config directory, replacing stx-tools/centos-mirror-tools/config.
The intent is for the designer to do a recursive copy of that
directory into a side location. make his changes there, outside of
git, and provide the path to that directory as an extra arguement
to download_mirror.sh and generate-cgcs-centos-repo.sh.

e.g. For simplicity I'll only list the 'extra' arguements

download_mirror.sh -C <my-config-dir> \
                   -l <layer> \
                   ...

generate-cgcs-centos-repo.sh --config-dir=<my-config-dir> \
                             --layer=<layer> \
                             ...

populate_downloads.sh --config-dir=<my-config-dir> \
                      --layer=<layer> \
                      ...

These arguements can also be suplied via the environment.
For the purpose of containerized builds, these arguements
should be defined in your localrc.

e.g.
export STX_CONFIG_DIR=<my-config-dir>
export LAYER=<layer>

The final alternative is to override things at a more granular level,
replacing a single lst file of image.inc file.  Here you can replace
a single line found in a required_layer_pkgs.cfg or
required_layer_iso_inc.cfg file.

e.g. We are doing a flock build and want to modify the content picked up
from the distro layer's rt build, and that content delivers a service
we want in the iso.  For simplicity I'll only list the 'extra' arguments

./download_mirror.sh -l flock \
    -L distro,rt,file:///<my-distro-workspace>/rt/rpmbuild/RPMS/rpm.lst \
    -I distro,std,file:///<my-distro-workspace>/rt/image.inc \
    ...

generate-cgcs-centos-repo.sh --layer=flock \
    --layer-pkg-url=distro,rt,file:///<my-distro-workspace>/rt/rpmbuild/RPMS/rpm.lst \
    --layer-inc-url=distro,std,file:////<my-distro-workspace>/rt/image.inc \
    ...

NOTE: The triplet syntax for a package list url is
    <lower-layer>,<build-type>,<url-to-rpm.lst>

    lower-layer: 'compiler', 'distro'
    build-type: 'std', 'rt', 'installer'

    Also if 'file:///' syntax is used, a matching change is made to
    the yum *.repo file.  This assumes that the rpm.lst is co-resident with
    repodata directory, as is the norm for our build outputs.

NOTE: The triplet syntax for a image inc url is
    <lower-layer>,<include-type>,<url-to-image.inc>

    lower-layer: 'compiler', 'distro'
    build-type: 'std', 'dev'

A typical user is likely only working in the flock layer on the master
branch.  He should be content to use the compiler and distro layer
outputs from cengn.

His workflow looks like ...

1, sync code for flock layer
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m flock.xml
$ repo sync

2, download rpms for flock layer, and populate a local mirror
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S -l flock
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/

3, Prepare a virtual repo and downloads directory for building
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --layer=flock $LOCAL_MIRROR
$ ./populate_downloads.sh --layer=flock $LOCAL_MIRROR

4, rpm package and iso building
$ build-pkgs && build-iso && build-helm-charts.sh

Building all layers in a single workspace is still supported, and
looks identical to the previous workflow.

1, sync code
$ repo init -u https://opendev.org/starlingx/manifest.git -b master
$ repo sync

2, download rpms for flock layer, and populate a local mirror
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/

3, create repo named "StxCentos7Distro" for building
$ cd ../toCOPY
$ generate-cgcs-centos-repo.sh $LOCAL_MIRROR
$ populate_downloads.sh $LOCAL_MIRROR

4, rpm package and iso building
$ build-pkgs && build-iso && build-helm-charts.sh

Only a cross-layer developer should setup two or three copies of the
building environment, one per layer.  We suggest you use seperate shells
for each layer, as the various paths (MY_REPO, MY_WORKSPACE ...) need to
be unique,

Shell 1, compiler layer
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ LOCAL_CONFIG=<some-dir>/config
$ MY_REPO_ROOT_DIR=<some-dir>/layer-compiler
$ MY_REPO=$MY_REPO_ROOT_DIR/cgcs-root
$ MY_WORKSPACE=$MY_REPO_ROOT_DIR/workspace
  ...

$ mkdir -p $MY_REPO_ROOT_DIR
$ cd $MY_REPO_ROOT_DIR
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m compiler.xml
$ cd stx-tools/centos-mirror-tools
$ cp -r config/* $LOCAL_CONFIG
 ... edit urls in *.cfg and *.repo files under $LOCAL_CONFIG ...

$ ./download_mirror.sh -n -g -c yum.conf.sample -S -C $LOCAL_CONFIG -l compiler
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --config-dir=$LOCAL_CONFIG --layer=compiler $LOCAL_MIRROR
$ ./populate_downloads.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ build-pkgs

Shell 2, distro layer
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ LOCAL_CONFIG=<some-dir>/config
$ MY_REPO_ROOT_DIR=<some-dir>/layer-distro
$ MY_REPO=$MY_REPO_ROOT_DIR/cgcs-root
$ MY_WORKSPACE=$MY_REPO_ROOT_DIR/workspace
  ...
$ mkdir -p $MY_REPO_ROOT_DIR
$ cd $MY_REPO_ROOT_DIR
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m distro.xml
$ repo sync
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S -C $LOCAL_CONFIG -l distro
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --config-dir=$LOCAL_CONFIG --layer=distro $LOCAL_MIRROR
$ ./populate_downloads.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ build-pkgs

Shell 3, flock layer
$ LOCAL_MIRROR=/import/mirrors/starlingx
$ LOCAL_CONFIG=<some-dir>/config
$ MY_REPO_ROOT_DIR=<some-dir>/layer-flock
$ MY_REPO=$MY_REPO_ROOT_DIR/cgcs-root
$ MY_WORKSPACE=$MY_REPO_ROOT_DIR/workspace
  ...
$ mkdir -p $MY_REPO_ROOT_DIR
$ cd $MY_REPO_ROOT_DIR
$ repo init -u https://opendev.org/starlingx/manifest.git -b master -m flock.xml
$ repo sync
$ cd stx-tools/centos-mirror-tools
$ ./download_mirror.sh -n -g -c yum.conf.sample -S -C $LOCAL_CONFIG -l flock
$ cp -r output/stx-r1/CentOS/pike/* $LOCAL_MIRROR/
$ cd ../toCOPY
$ ./generate-cgcs-centos-repo.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ ./populate_downloads.sh --config-dir=$LOCAL_CONFIG --layer=flock $LOCAL_MIRROR
$ build-pkgs && build-iso && build-helm-charts.sh

Story: 2006166
Task: 37103

Depends-On: https://review.opendev.org/698756
Depends-On: https://review.opendev.org/700819
Depends-On: https://review.opendev.org/700821
Change-Id: I088020b81f08656e50aa29b5584bbc1dd1378f12
Signed-off-by: Scott Little <scott.little@windriver.com>
2020-02-10 10:45:40 -05:00
Martin, Chen 819a39aac8 upgrade anaconda to CentOS 7.6 version
Story: 2004522
Task: 28413

Change-Id: I6ff86bc8fabb6af2a69b74e57b9f5de554832bba
Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>
2019-01-07 10:58:41 +08:00
Scott Little ae68691bf0 Enable starlingx mirror
StarlingX needs to download a variety of rpms and tarballs
from various upstream sources.  Unfortunately the upstream sources
are not always dependable. Either servers go down, are unreachable,
or drop older content that we still depend on.

Our proposed solution is to run our own mirror to capture an
independent copy of the content needed by StarlingX.
For this purpose, a server has been set up at
http://mirror.starlingx.cengn.ca/mirror/centos

The mirror will use deterministic paths derived from the upstream
urls.  Scripts will be used to convert an upstream url to
the mirror's equivalent url (see function url_to_stx_mirror_url in
url_utils.sh)

The mirror will be refreshed daily.  New .lst entries will be
processed at that time.  Processing of changes under yum.repos.d
is not automated by this update.  Expect a follow-up update to
address this issue soon.  These scripts are found under the
'stx_mirror_scripts' subdirectory.

Changes are made to the download_mirror.sh script, and it's
supporting scripts.  New arguments have been added to each
script to select the download source.
   -s  StarlingX mirror only
   -S  StarlingX mirror, with upstream source as backup
   -u  Upstream source only
   -U  Upstream source, with StarlingX mirror as backup

You do not need to provide any of these flags.  Continue to
us download_mirror.sh as you always have.  The default
behavior is currently set to '-S', i.e. first try the
StarlingX mirror, with upstream source as backup.
If this proves to place to heavy a load on the existing
server, we might switch the default to '-U', i.e. first
try the upstream source, with StarlingX mirror as backup.
If poor download performance is seen, you might want to try
explicitly adding -U as an argument.

The remaining two options are not recommended for regular use.
Upstream only, i.e. '-u', restores original behaviour, but
you may once again encounter rpms that have aged out, and
been removed from their original repos.  StarlingX only,
i.e. '-s', is vulnerable if a .lst file has been updated,
but the mirror has not yet processed it.

Change-Id: I7e0f3d9fb99253662f9f4bf12457d39250408c0b
Story: 2003906
Task: 26785
Signed-off-by: Scott Little <scott.little@windriver.com>
2018-11-02 13:33:00 -04:00
Scott Little fa995d6981 Use kojipkgs.fedoraproject.org as a backup rpm source.
EPEL7 rpms builds are visible at kojipkgs.fedoraproject.org
and often remain available there long after they age out
of the official repo.

Use kojipkgs.fedoraproject.org as a backup source for
packages when not found in the repo.  Log seperately
so that these dropouts can be identified and corrected
in the lst files.

Also improved the tests for already downloaded packages
in several places to accelerate the test cycle.

Change-Id: Iaaa9fe05ac605a3604acf0048c571c6e88303692
Story: 2003157
Task: 23292
Signed-off-by: Scott Little <scott.little@windriver.com>
2018-08-17 12:57:36 -04:00
Dean Troyer 62bd0253f0 Add build tools
This includes Docker containers to perform the StarlingX build
and a set of scripts to maintain a local mirror of binary CentOS
and other packages required to populate the final ISO file.

Change-Id: I8140fd8fa2d00e7aa98c895a8e4962ab3748669d
2018-06-08 17:01:43 -05:00