In the case of a rare intermittent failure behaviour during the
upgrading control plane step where puppet hits timeout first before
the upgrade is completed or kubeadm hits its own Upgrade Manifest
timeout (at 5m).
This change will retry running the process by
reporting failure to conductor when puppet manifest apply fails.
Since it is using RPC to send messages with options, we don't get
the return code directly and hence, cannot use a retry decorator.
So we use the sysinv report callback feature to handle the
success/failure path.
TEST PLAN:
PASS: Perform simplex and duplex k8s upgrade successfully.
PASS: Install iso successfully.
PASS: Manually send STOP signal to pause the process so that
puppet manifest timeout and check whether retry code works
and in retry attempts the upgrade completes.
PASS: Manually decrease the puppet timeout to very low number
and verify that code retries 2 times and updates failure
state
PASS: Perform orchestrated k8s upgrade, Manually send STOP
signal to pause the kubeadm process during step
upgrading-first-master and perform system kube-upgrade-abort.
Verify that upgrade-aborted successfully and also verify
that code does not try the retry mechanism for
k8s upgrade control-plane as it is not in desired
KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER
state
PASS: Perform manual k8s upgrade, for k8s upgrade control-plane
failure perform manual upgrade-abort successfully.
Perform Orchestrated k8s upgrade, for k8s upgrade control-plane
failure after retries nfv aborts automatically.
Closes-Bug: 2056326
Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806https://review.opendev.org/c/starlingx/stx-puppet/+/911945https://review.opendev.org/c/starlingx/integ/+/913422
Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>