This commit fixes the sw-patch reporting bug.
Bug description: Applied patches go to a Partial-Applied state
after applying a new reboot-required patch, until the system reboots.
Please note that this bug only happens when applying a new patch,
when removing patches the previously applied patches don't have
their state changed.
sw-patch query result before applying a new patch:
Patch ID RR Release Patch State
=========================== == ======= ===========
PATCH_0001 Y 9.0 Applied
PATCH_0002 Y 9.0 Applied
PATCH_0003 Y 9.0 Applied
PATCH_0004 Y 9.0 Applied
PATCH_0005 Y 9.0 Available
Then after applying the latest patch, previously applied patches
go to partial-applied, as follows:
Patch ID RR Release Patch State
=========================== == ======= =============
PATCH_0001 Y 9.0 Partial-Apply
PATCH_0002 Y 9.0 Partial-Apply
PATCH_0003 Y 9.0 Partial-Apply
PATCH_0004 Y 9.0 Applied
PATCH_0005 Y 9.0 Partial-Apply
This happens because the previously applied patches commit
doesn't match the latest_sysroot_commit, but the latest applied patch
(in our example PATCH_0004) do have their commit matching, so we need
to mark their dependent patches as Applied.
Test Plan:
PASS: For in-service patches (non reboot-required) the issues doesn't happen and the bugfix in place it has the same behavior.
PASS: Python code for patch_controler.py unit tests pass and a
a new test case was created to cover this condition
After this bugfix, using a virtual machine with patched code, the patch reports are the following:
Patch ID RR Release Patch State
=========================== == ======= =============
PATCH_0001 Y 9.0 Applied
PATCH_0002 Y 9.0 Applied
PATCH_0003 Y 9.0 Applied
PATCH_0004 Y 9.0 Applied
PATCH_0005 Y 9.0 Partial-Apply
Closes-Bug: 2063374
Change-Id: Ibceeecbf025535b73886517b6ce02e6013d99aea
Signed-off-by: caio-volpato <caio.volpato@windriver.com>
When we upload the patch we copy the precheck script to a versioned
folder, this precheck script requires the upgrade utilities module to
run. This change copy the upgrade_utils.py after copying the precheck
script.
Test plan:
PASS: Upload patch with precheck and upgrade utilites.
Check if file is in versioned folder.
Execute `software deploy precheck` on the patch.
PASS: Upload patch without precheck and upgrade utilites.
Execute `software deploy precheck` on the patch.
Story: 2010676
Task: 49982
Change-Id: Ied05fb52e10943e3717462a68685887da68cd1ec
Signed-off-by: Dostoievski Batista <dostoievski.albinobatista@windriver.com>
Currently the deploy host status is being moved to failed only
when it is rejected, thus having a reject reason. However, there
are scenarios where deploy host can fail and not necessarily with
a reject reason, so these scenarios are not being covered.
This commit fixes this issue, along with some minor tox issues,
and convert the db api lock logs to debug, since they were
generating log lines that were bloating software.log.
Test Plan
PASS: force deploy host failure, verify the deploy host status
is updated accordingly
Regression
PASS: deploy host is rejected, verify the behavior remains the
same as before (deploy host status failed)
PASS: deploy host with success, verify the behavior remains the
same as before (deploy host status done)
Story: 2010676
Task: 49936
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Change-Id: Ibcc2246ee3bf4598ae3e21bdec59247d4e754855
Now with the state machine introduced by commit [1], the host
state validation once done by [2] is not needed anymore, and in
fact was incorrectly blocking the "deploy host" command from
being reentrant.
This commit fixes this issue.
[1] https://review.opendev.org/c/starlingx/update/+/914929
[2] https://review.opendev.org/c/starlingx/update/+/914825
Test Plan
PASS: force "deploy host" to fail, then once the host is in "failed"
state, run deploy "host again" and verify the system does not
block it from proceeding
Story: 2010676
Task: 49938
Change-Id: I0d2a8a4ab9ea98f83fbd7253cf4f174a257ee070
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
We have been using bind mounts to select K8s versions, but they are not
well supported by Puppet and suffer from fragility since you cannot
remove a bind mount while an executable is still running from it. They
also need to be re-created when creating an OSTree hotfix or when
applying software patches.
Symlinks suffer from no such issues, they just need to be created in
a filesystem that is not managed by OSTree.
Accordingly, make the current bindmount-related code conditional on
the bindmount directory actually being present. That way the code
will not complain when we switch to using symlinks.
Story: 2011047
Task: 49917
TEST PLAN:
PASS: Run the modified code snippet standalone on system where
/usr/local/kubernetes/current/ exists, ensure it attempts to
run the two mount commands.
PASS: Run the modified code snippet standalone on system where
/usr/local/kubernetes/current/ does not exist, ensure it does not
attempt to run the two mount commands.
Change-Id: I1dfea974ae9532cf316bb1fac701ae93f5507681
Upload process should not require the release id to be part of patch
filename. This change allows us to directly get release id from the
metadata inside the patch file.
Test Plan:
PASS: Successfully upload patch with different filename
from release id using software upload command.
PASS: Successfully upload patch with filename same as release id
using software upload command.
PASS: Successfully upload multiple patches with different filename
from release id using software upload-dir command.
PASS: Successfully upload multiple patches with filename same as
release id using software upload-dir command.
Story: 2010676
Task: 49868
Change-Id: Ibd5bcf9b8797b5de0eef3e46313055cc141da0b2
Signed-off-by: Dostoievski Batista <dostoievski.albinobatista@windriver.com>
This commit validates pre-conditions when software
deploy-host <hostname> is issued.
The pre-conditions are:
The host state is in pending state.
The specified host target is locked and online.
Nodes deployed to major release in the order below in DX system:
Controller-1 -> Controller-0 -> Storage nodes -> Compute nodes.
Nodes deployed to patch release in the order below in DX system:
Controllers -> Storages nodes -> Compute nodes.
Test Plan:
PASS: Deploy host to controller-1 validation.
PASS: Validated target locked and online checker.
PASS: Validated start-done deploy state checker.
Story: 2010676
Task: 49795
Change-Id: I8fe8faa85c594472bb6c8c021416205bf4a61fbb
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
This change introduced state machines for release state, deploy state
and deploy host state.
This change removed the direct reference to the software metadata from
software-controller and other modules. Replaced with encapuslated
release_data module.
Also include changes:
1. removed required parameter for software deploy activate and software
deploy complete RestAPI.
2. ensure reload metadata for each request
3. added feed_repo and commit-id to the deploy entity, to be
subsequently passed to deploy operations.
4. fix issues
TCs:
passed: software upload major and patching releases
passed: software deploy start major and patching releases
passed: software deploy host (mock) major and patching release
passed: software activate major and patching release
passed: software complete major release and patching release
passed: redeploy host after host deploy failed both major and
patching release
Story: 2010676
Task: 49849
Change-Id: I4b1894560eccb8ef4f613633a73bf3887b2b93fb
Signed-off-by: Bin Qian <bin.qian@windriver.com>
During the management network reconfiguration, the system is restarted
to controller_config script runs the puppet code and update
all services to use the new mgmt IP address.
But the sw-patch services start before the controller_config.
When they start they get the mgmt_ip using the python socket lib that
uses the IP address from the /etc/hosts.
But /etc/hosts at that time is not updated yet, so it get the old
management network IP.
To fix this issue, the sw-patch services will wait for the puppet
code to be applied to make sure the /etc/hosts and new management
network IPs were installed in the system.
Tests done:
IPv4 AIO-SX fresh install
IPv4 AIO-DX fresh install
IPv4 DC with subcloud AIO-SX fresh install
IPv4 AIO-SX mgmt reconfig and apply a non-reboot-required patch
IPv4 AIO-SX mgmt reconfig and apply a reboot-required patch
IPv4 subcloud AIO-SX mgmt reconfig and apply a non-reboot-required patch
IPv4 subcloud AIO-SX mgmt reconfig and apply a reboot-required patch
For this test the sw-patch was in failed state after the reboot,
It happens even without the mgmt reconfig and this fix
Partial-Bug: #2060066
Story: 2010722
Task: 49827
Depends-On: https://review.opendev.org/c/starlingx/config/+/914710
Change-Id: Ie544425513ef4fede73b4b55770ad6857cdf7eed
Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>
This commit adds the capability to run deploy host
for a major release deployment (release upgrade).
To achieve this, this commit essentially changes
some code that is already used by patching to allow:
1. Create a new remote pointing to the to_release feed ostree
2. Pull the to_release ostree commit to sysroot ostree
3. Deploy the to_release ostree commit
This commit also includes some additional steps for
sysinv/puppet integration with USM, and fixes minor
flake8 issues on the files that are being changed.
Test Plan
PASS: run "deploy host" for major release deployment
successfully on AIO-SX
PASS: run "deploy host" for major release deployment
successfully on AIO-DX
PASS: (regression) run "deploy host" successfully for a
patch release
Story: 2010676
Task: 49787
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Change-Id: Ib8b08d1cd85dcad7d6fc858e2fae623b5900cffc
This commit is to raise the deploy state out of sync alarm
when the deploy state in the software.json files in both controllers
are different.
The deploy state is checked every 30 seconds during the
deploying stage. If they are insync, the alarm will be cleared.
Depends-on: https://review.opendev.org/c/starlingx/fault/+/913581
Test Plan:
PASS: the alarm is raised when the state is out of sync
in both DX and SX
PASS: the alarm is cleared when the state is in sync in
both DX and SX
Task: 49737
Story: 2010676
Change-Id: Ic31c7166135d03591fa4696445783895254dfc95
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
The commit id changes after apt-ostree command is run
in software deploy start. We are updating that commit
id to metadata dictionary, which is accessed by software
show rest api. Updating the new commit id for remove case
also to metadata dictionary.
Test Plan:
PASS: Check ostree commit id after apply case
of software deploy start in software show
PASS: Check ostree commit id after remove case
of software deploy start in software show
PASS: Check commit id for a prepatched iso
Story: 2010676
Task: 49818
Change-Id: I247c35f2e95bf013e0af0c684e17c4c38d0c2802
Signed-off-by: sshathee <shunmugam.shatheesh@windriver.com>
This commit is to replace legacy sysinv endpoints for platform
upgrade with new USM endpoints.
New endpoints:
get_software_upgrade: get from/to versions and deploy state
for upgrade.
get_software_host_upgrade: get current/target versions and
deploy state for the given host.
Test Plan:
PASS: call the endpoints using curl before deploy start
PASS: call the endpoints using curl after deploy start.
Task: 49766
Story: 2010676
Change-Id: I574da8f60e1cf9fc046f5c6a727f7e17fe8c55f7
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
Fix software deploy show and deploy host-list not displaying information
issue.
Also default the deploy host state to pending when a deploy host entity
is created.
Story: 2010676
Task: 49645
TCs:
passed: software deploy show and software deploy host-list show
deploy data after deploy start command accepted.
passed: display "No deploy in progress" for software deploy show
when there is no deploy.
Change-Id: I9dc50804c66d5cb07df7717fd6623c23d0fca522
Signed-off-by: Bin Qian <bin.qian@windriver.com>
This commit fix the deploy precheck <release> command output which
is returning one line with an blank "Error:" when the output is correct.
Test Plan:
PASS: Deploy precheck <release> returning without error.
PASS: Deploy precheck <release> returning expected error.
PASS: Deploy start failed with expected error.
PASS: Deploy start
Note: Exit code 3 was chosen based on exit codes with special meaning
which the 1 is for catchall general errors and 2 for misuse of shell
builtins. The number 3 is not allocated with special meaning so it
was chosen for unhealthy precheck.
Closes-Bug: 2056106
Change-Id: Ifed48157f7810eec2881d8a9e011eae8941f3427
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
When using Keystone auth for software cli, only user with 'admin' role
is allowed to run any commands. When using software cli without 'sudo',
all software commands require user with 'admin' role.
This review also update the exception handling and error reporting.
Test Plan:
PASS: A Keystone user in the 'admin' project with 'admin' role should
be able to run ALL 'software' commands WITHOUT SUDO
PASS: A Keystone user in the 'admin' project with only 'member' and/or
'reader' role should NOT be able to run ANY 'software' commands
WITHOUT SUDO
Story: 2010676
Task: 49754
Change-Id: I46653021b1a82bccded5eb870dc0907cd5c2351b
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
There are additional steps needed to enable "deploy host"
operation for major release upgrades. This commit adds these
additional steps, which are:
1. Create TO release platform config directory, done during
upgrade-start on legacy upgrade procedure
2. Create TO release rabbitmq directory, done by upgrade
manifest after controller-1 reinstall on legacy upgrade
The commit also fixes some issues:
1. shell-utils logging functions logging nowhere after
being sourced by other scripts
2. sync-controllers-feed script was only syncing the
ostree_repo directory instead of the full feed content
3. major release deployment scripts ran from different
places, now all scripts are executed from the TO release
feed directory, or from checked out TO release ostree
repo in case of chroot
4. change umount command from chroot_mounts to lazy umount
Note: Since the "deploy host" endpoint for major release
deployment is not yet implemented, the test plan will have
test cases that simulate the "deploy host" operation.
Test Plan
PASS: simulate "deploy host" successfully for AIO-SX
PASS: simulate "deploy host" successfully for AIO-DX
Depends-on: https://review.opendev.org/c/starlingx/config/+/913715
Story: 2010676
Task: 49703
Change-Id: Ib6ae49b3590a1e50acb305ac7482e28bcc4de403
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Insert ostree commit id into metadata file after apt-ostree
runs. "previous_commit" tag is for commit id before apt-ostree
is executed and "commit" tag is for latest commit id after it
is executed.
Test Plan:
PASS: Run software deploy start for patch and check commit id
is inserted in metadata file
Story: 2010676
Task: 49753
Change-Id: I8fc3c33430188449d852770824e3ecd765583dd7
Signed-off-by: Charles Short <charles.short@windriver.com>
Signed-off-by: sshathee <shunmugam.shatheesh@windriver.com>
This commit is to fix 'software upload-dir' not having
respond that contains uploaded release info.
Test Plan:
PASS: upload files using 'software upload-dir'
Task: 49634
Story: 2010676
Change-Id: I635554fdbdb80fe31a38d1170202405fe6f32d3a
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
This change adds checks before deleting software releases:
1. software release is available or unavailable
2. When it is on a system controller, the release is not being used by a
subcloud
This change also update the following:
1. removed the exception handling in controller level, moved to
exception hook
2. CLI code to display HTTP error, only handles 500 status code, by
displaying message from API, all other 4xx, 5xx status code display
HTTP error directly.
3. ensure CLI return 1 for unsuccessful requets (status code 500)
4. fixed some minor issues
Story: 2010676
Task: 49657
TCs:
passed: observe delection rejected because of release not found,
release is not in available or unavailable state.
passed: delete an available release
passed: on system controller, successfully delete scenarios
passed: (simulated) on system controller with subcloud, delete
release used by subcloud is rejected
Change-Id: I306b1d8604113b92d907384844e8e8107835a463
Signed-off-by: Bin Qian <bin.qian@windriver.com>
Currently upload-dir logic was uploading only the files from
the last directory passed as argument to the command, and it's
help text and output was divergent from the similar "software
upload" command.
This commit fixes the logic, allowing uploading from multiple
directories correctly, and fixes the help text and output to
align with "software upload".
Test Plan
PASS: show help text for "software upload-dir"
PASS: upload one directory containing patches
PASS: upload one directory containing iso + patches
PASS: upload multiple directories containing patches
PASS: upload multiple directories containing iso + patches
Story: 2010676
Task: 49693
Change-Id: I5886e8c5c55355e24ec471c4ae47e91ec3c84dfd
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
When doing load-import operation, the /var/www/pages/feed/ files must
be replicated in both controllers. The files were being created in
controller-0, but they were not copied to controller-1.
This fix added a way to get all the folders from one controller to
another and sync them with ostree the explanation is in the following
commit: I42c274079631a3c197015e636e03de1bc96de28b
Test-plan:
PASS: After a load-import, the inactive controller should have the
feed repo created.
Closes-bug: 2045321
Change-Id: I260951461d2c19550e9f57ad7ab9ec66a25de5bb
Signed-off-by: Lindley Werner <Lindley.Vieira@windriver.com>
'software deploy host <in-service-patch-release>' should NOT fail if
there is no restart-script (post-deploy script).
If the rsync does not find the scripts, does not raise an exception.
Test-plan:
PASS: software deploy host <in-service-patch-release> successful
Closes-bug: 2058393
Change-Id: I1b8cac9e0401c3f64c7334139c62bf272e9aeb56
Signed-off-by: Lindley Werner <Lindley.Vieira@windriver.com>
This commit is to include a versioned deploy precheck script into
/opt/software/scripts/rel-<sw_version> to be able to run the correct
precheck code for a specific release.
Along with this commit, the precheck api is changed to use the new
location of the script, and the precheck script is changed to add
support to patch-only prechecks, and as a consequence, minor wording
changes were done to return more accurate messages to the user.
1. For the iso upload scenario:
The upload process will copy all scripts under
<iso_root>/upgrades/software-deploy to /opt/software/rel-<ver>/scripts
2. For the patch upload scenario:
The upload process will check if patch contains the deploy-precheck
script. If it does, then the script is copied to
/opt/software/rel-<ver>/scripts, if not then a symlink will be created
to the patch 'required patch' versioned precheck script.
Notes:
- iso (prepatched or not) will always come with deploy-precheck script
- <ver> assumes the format MM.mm.pp
Test Plan:
PASS: Upload multiples patches, both with and without precheck
scripts, and verify the versioned directories are created
and the precheck script is created as expected
PASS: Run deploy precheck for an iso release and verify the upgrade
precheck output is returned as expected
PASS: Run deploy precheck for a patch release and verify the patch
precheck output is returned as expected
Depends-on: https://review.opendev.org/c/starlingx/metal/+/911595
Story: 2010676
Task: 49263
Change-Id: I04ff89d43579fd71592f7ec534db57a1ead79483
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
Co-signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
This change add support for https with SSL protocol and certificate.
The USM client can work with either insecure (disable SSL/TLS
certificate verification) or with SSL certificate. The client is
also modified to support sessions and versions. These changes are
adapted from cgtsclient.
This adds three authorization modes, [token, keystone & local-root].
In token mode, a keystone token and software-url is used for auth.
Eg: $ software \
--software-url "http://192.168.204.1:5497" \
--os-auth-token "${TOKEN}" list
In keystone mode, sourced keystone configs in env is used for auth.
Eg: $ source /etc/platform/openrc; software list
In local-root mode, authorization is by privileged user (root/sudo)
of the controller where software application is running.
Eg: $ sudo software list
Optional arguments specific to https:
-k, --insecure
--cert-file CERT_FILE
--key-file KEY_FILE
--ca-file CA_FILE
Example usage for insecure connection:
software -k list
Story: 2010676
Task: 49666
Test Plan:
PASS: Verify software cli output for http endpoints
PASS: Verify software cli output for https endpoints
Change-Id: I2e2ff115b8d03cddb02e026da84f389918238dab
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
This commit is to allow active controller periodically sending deploy
state message to the software agent on its peer controller. The
interval is set to 30 seconds.
Test Plan:
PASS: build and deploy the iso
PASS: start new deployment, file is synced in both controllers
Task: 49655
Story: 2010676
Change-Id: Ie95c5a7d45b3d88331569ca52d64d40a4f39d6c3
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
This commit fixes an issue that was preventing load import
script from copying the TO release pxe files correctly from
the load, and also gives pxe-update-*.sh script execution
permission to avoid permission denied errors during host-unlock.
Test Plan
PASS: upload TO release load, verify the TO release pxe.cfg.linux
files are copied to /var and that /etc/pxe-update script is
updated to 755
Story: 2010676
Task: 49698
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Change-Id: I222b484d3a28c603ed8d7c42d0405481086735f0
This commit add some changes on deploy host-list.
Adds a function to query the hostnames from sysinv
to deploy host-list entities during deploy start.
Changes endpoint to GET verb, the endpoint return in case
of no deployment in progress it will an empty list
and at CLI will print "No deploy in progress." In case
there is a deployment in progress the CLI will behave
the same but the endpoint will return the data below:
[{'hostname': '<hostname>',
'software_release': '<sw_version>',
'target_release': '<sw_version>',
'reboot_required': 'str<true/false>',
'host_state': '<host_deploy_state>'}]
This commit also changes the wait_for_install_complete function
to follow the new state logic.
Note: Software deploy host is affected by this change related
to states and will need a future commit regarding state changes
during deploy start and deploy host itself.
Test Plan:
PASS: Software deploy host-list with/without deployment in progress.
PASS: Deploy_host creation/update/get/delete.
PASS: Collect hostnames to deploy host entities during deploy start.
Story: 2010676
Task: 49586
Change-Id: I7b03df30fd8e326637a3ffc031e0fdf543cb6356
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
This commit add some changes to deploy show endpoint, the name was
changed to just deploy with GET verb and also changes the deploy to
be saved as a list of dict to attend the API requirements. Now the
api accepts from_release and to_release as optional params, in case
it is provided the endpoint will return a dict otherwise will return
a list of dict.
Test Plan:
PASS: Create deploy
PASS: Update deploy
PASS: Software deploy start
PASS: Software deploy show
Story: 2010676
Task: 49645
Change-Id: I68d243c05da88c7eecf2d866c7202c3c7be51a2b
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
This is to remove the unnecessary routing based on request
method in which the respond can't be decoded properly by
the sm-api
Test Plan:
PASS: run system host-swact
Task: 49661
Story: 2010676
Change-Id: Ife8862b7d5666de3a3dafff582dbb4c27e1adafa
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
The constraints file used for tox.ini was removed. We need to
update the file to use the StarlingX Debian constraints file.
Test Plan:
PASS - Run tox command
Closes-bug: 2055734
Change-Id: I306be11f6edc4538cbb3f7a164bac9e1ad08501f
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
This change is to create 2nd thread to provide concurrent service. In a
different commit [1], the haproxy is to be configured to distribute the
slow requests to the 2nd thread, and the fast requests to the primiary
thread.
TCs:
passed: concurrent keystone requests of "software upload/
deploy precheck/deploy start" and "software list/deploy show/
deploy host-list"
passed: keystone authenticated "software deploy precheck"
request completed.
Story: 2010676
Task: 49647
[1] https://review.opendev.org/c/starlingx/stx-puppet/+/910644
Change-Id: I0e8e8ac1b5177f1bbf40e047335c075b0a471fc1
Signed-off-by: Bin Qian <bin.qian@windriver.com>