nfv/nfv
albailey 60738b4cdb Improving kube rootca orchestration recovery
When hosts are in 'updating' state, the kube rootca
update orchestration should consider those hosts to be
in-progress, rather than attempt the same command, which
immediately fails the strategy.

A second scenario is that if a host is rebooted while
initiating a kube rootca updating action, it will not complete
and will block orchestration. The only way to resume is
to abort that rootca update.

The compatability changes in this review are:
- The removal of the 'force' option for rootca 'patch' REST
API calls pertaining to the 'complete' and 'abort' operations.

The logic changes in this review are:

- If a host is already 'updated' when the step is invoked,
it will be considered successful.

- If a host is 'updating', orchestration will wait for it
to complete. If the host is stalled, the step will fail based
on the step timeout. If the host was updated longer than a
specific duration, it will immediately be considered timed out.

- If a host step fails, the rootca update is aborted,
not just the orchestration.

Known limitation:
VIM orchestration updates the hosts sorted by their name.
If a system has been manually updated, and then VIM orchestration
is used to resume it, it will fail if the hosts cannot be updated
in the expected order.

TEST PLAN:
 PASS: Run a kube-rootca-update orchestration (2 controllers)
 PASS: Abort a kube-rootca-update orchestration after first step and
  verify update is not aborted.
 PASS: Create a valid kube-rootca-update orchestration in a system
   with an existing 'started' update.
 PASS: Reboot second host before it is sent its 'updating' request.
   The host state moves to 'updating' but never completes.
   Verify orchestration fails (step times out after 10 minutes) and
   the update is aborted.
 PASS: Run kube-rootca-update orchestration over an aborted update.
   This will create a new update which can be run.
 PASS: Abort orchestration during the 'pods' step.
   This completes the pods step, and aborts the orchestration but will
   not abort the update.
 PASS: Manually start an update and generate a cert.
   Create an orchestration.
   Manually update controller-1 and let it complete.
   Start the orchestration.
   Orchestration should succeed (skips cleanly over controller-1)

Story: 2009665
Task: 44058
Depends-On: https://review.opendev.org/c/starlingx/config/+/819020
Signed-off-by: albailey <Al.Bailey@windriver.com>
Change-Id: If5da6fb7648a6c07438b41449e7acc724f71e0f7
2021-11-25 09:11:04 -06:00
..
centos Kube rootca update orchestration integration 2021-09-02 12:53:36 -05:00
debian debian: Fix nfv build 2021-11-10 16:01:27 +02:00
nfv-client Propagate unexpected errors from nfv client 2021-11-09 11:19:26 -06:00
nfv-common Re-enable important py3k checks for nfv 2021-10-28 14:11:13 -03:00
nfv-debug-tools/histogram_analysis Not require recreate of tox env when running tox 2021-04-06 09:48:36 -05:00
nfv-plugins Improving kube rootca orchestration recovery 2021-11-25 09:11:04 -06:00
nfv-tests Upgrade orchestration handle host-unlock after HostUpgrade 2021-10-06 16:31:32 -05:00
nfv-tools small cleanup required by OBS badness check - exec rights on non executable not allowed 2019-09-17 08:54:22 +02:00
nfv-vim Improving kube rootca orchestration recovery 2021-11-25 09:11:04 -06:00
opensuse Add opensuse specfiles to nfv 2019-10-02 10:34:02 -05:00
.coveragerc Convert NFV unit tests from nose to stestr 2018-09-18 12:56:44 -05:00
.gitignore Add bugbear to flake8 and cleanup some errors 2018-09-13 14:12:48 -05:00
.stestr.conf Convert NFV unit tests from nose to stestr 2018-09-18 12:56:44 -05:00
PKG-INFO StarlingX open source release updates 2018-05-31 07:36:51 -07:00
pylint.rc Re-enable important py3k checks for nfv 2021-10-28 14:11:13 -03:00
test-requirements.txt Not require recreate of tox env when running tox 2021-04-06 09:48:36 -05:00
tox.ini Fix unit tests unable to find fm-api 2021-09-29 17:01:25 -05:00