config/sysinv/sysinv/sysinv/sysinv/tests/conductor
Saba Touheed Mujawar 4c42927040 Add retry robustness for Kubernetes upgrade control plane
In the case of a rare intermittent failure behaviour during the
upgrading control plane step where puppet hits timeout first before
the upgrade is completed or kubeadm hits its own Upgrade Manifest
timeout (at 5m).

This change will retry running the process by
reporting failure to conductor when puppet manifest apply fails.
Since it is using RPC to send messages with options, we don't get
the return code directly and hence, cannot use a retry decorator.
So we use the sysinv report callback feature to handle the
success/failure path.

TEST PLAN:
PASS: Perform simplex and duplex k8s upgrade successfully.
PASS: Install iso successfully.
PASS: Manually send STOP signal to pause the process so that
      puppet manifest timeout and check whether retry code works
      and in retry attempts the upgrade completes.
PASS: Manually decrease the puppet timeout to very low number
      and verify that code retries 2 times and updates failure
      state
PASS: Perform orchestrated k8s upgrade, Manually send STOP
      signal to pause the kubeadm process during step
      upgrading-first-master and perform system kube-upgrade-abort.
      Verify that upgrade-aborted successfully and also verify
      that code does not try the retry mechanism for
      k8s upgrade control-plane as it is not in desired
      KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER
      state
PASS: Perform manual k8s upgrade, for k8s upgrade control-plane
      failure perform manual upgrade-abort successfully.
      Perform Orchestrated k8s upgrade, for k8s upgrade control-plane
      failure after retries nfv aborts automatically.

Closes-Bug: 2056326

Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806
            https://review.opendev.org/c/starlingx/stx-puppet/+/911945
            https://review.opendev.org/c/starlingx/integ/+/913422

Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 08:49:36 -04:00
..
data Upgrade sts-silicom app 2023-11-21 17:00:12 -03:00
__init__.py StarlingX open source release updates 2018-05-31 07:35:52 -07:00
test_ceph.py Merge "Removal of K8S Ansible Pb from Conductor init" 2024-01-19 13:33:26 +00:00
test_keystone_listener.py New RESTful API and DB schema for network to address-pools. 2024-03-06 07:34:14 -03:00
test_kube_app_app_operator.py Replace openstack/context library by oslo_context 2023-02-24 16:17:30 -03:00
test_kube_app_image_parser.py Upgrade sts-silicom app 2023-11-21 17:00:12 -03:00
test_kube_app_metadata.py Specify encoding of the file for yaml load 2021-08-13 16:31:52 +00:00
test_manager.py Add retry robustness for Kubernetes upgrade control plane 2024-03-19 08:49:36 -04:00
test_restore.py Add alarm for Restore in progress 2023-04-21 18:07:04 +00:00
test_rpcapi.py Fix LDAP issue for DC subcloud 2024-03-13 14:27:13 -04:00