config/sysinv/sysinv/sysinv/sysinv/puppet
Saba Touheed Mujawar 4c42927040 Add retry robustness for Kubernetes upgrade control plane
In the case of a rare intermittent failure behaviour during the
upgrading control plane step where puppet hits timeout first before
the upgrade is completed or kubeadm hits its own Upgrade Manifest
timeout (at 5m).

This change will retry running the process by
reporting failure to conductor when puppet manifest apply fails.
Since it is using RPC to send messages with options, we don't get
the return code directly and hence, cannot use a retry decorator.
So we use the sysinv report callback feature to handle the
success/failure path.

TEST PLAN:
PASS: Perform simplex and duplex k8s upgrade successfully.
PASS: Install iso successfully.
PASS: Manually send STOP signal to pause the process so that
      puppet manifest timeout and check whether retry code works
      and in retry attempts the upgrade completes.
PASS: Manually decrease the puppet timeout to very low number
      and verify that code retries 2 times and updates failure
      state
PASS: Perform orchestrated k8s upgrade, Manually send STOP
      signal to pause the kubeadm process during step
      upgrading-first-master and perform system kube-upgrade-abort.
      Verify that upgrade-aborted successfully and also verify
      that code does not try the retry mechanism for
      k8s upgrade control-plane as it is not in desired
      KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER
      state
PASS: Perform manual k8s upgrade, for k8s upgrade control-plane
      failure perform manual upgrade-abort successfully.
      Perform Orchestrated k8s upgrade, for k8s upgrade control-plane
      failure after retries nfv aborts automatically.

Closes-Bug: 2056326

Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806
            https://review.opendev.org/c/starlingx/stx-puppet/+/911945
            https://review.opendev.org/c/starlingx/integ/+/913422

Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 08:49:36 -04:00
..
__init__.py Open vSwitch integration with host and configuration framework 2018-06-14 16:03:52 -05:00
barbican.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
base.py Introduce Puppet variables for primary and secondary pool addresses. 2024-03-12 07:25:46 -03:00
ceph.py System mode modify fails for duplex systems 2023-02-08 11:02:45 -03:00
certalarm.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
certmon.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
common.py Add retry robustness for Kubernetes upgrade control plane 2024-03-19 08:49:36 -04:00
dcdbsync.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
dcmanager.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
dcorch.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
device.py Revert "Remove Nova prefix from constants" 2023-09-22 05:20:10 +00:00
dockerdistribution.py Support authenticated registries 2019-10-02 11:30:43 -04:00
fm.py Remove the use of the mgmt_ip field in host table 2023-11-01 10:30:21 -04:00
horizon.py Fix: "import" issue for Python 2/3 compatible code 2018-12-25 08:58:03 +08:00
interface.py New RESTful API and DB schema for network to address-pools. 2024-03-06 07:34:14 -03:00
inventory.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
keystone.py Remove the use of the mgmt_ip field in host table 2023-11-01 10:30:21 -04:00
kubernetes.py Remove support for ignoring k8s isolated CPUs in sysinv 2024-02-14 17:47:15 +00:00
ldap.py Upgrade changes to support MGMT FQDN 2024-03-05 12:42:21 -03:00
mtce.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
networking.py Merge "Introduce Puppet variables for primary and secondary pool addresses." 2024-03-13 13:33:01 +00:00
nfv.py Initial integration of DC with admin network 2023-01-10 16:47:02 +00:00
openstack.py Upgrade changes to support MGMT FQDN 2024-03-05 12:42:21 -03:00
ovs.py Customize sysinv dpdk_elf_file for OVS-DPDK 2023-06-29 20:16:43 -03:00
patching.py Use FQDN for MGMT network 2023-10-31 20:45:40 -04:00
platform.py Upgrade changes to support MGMT FQDN 2024-03-05 12:42:21 -03:00
platform_firewall.py New RESTful API and DB schema for network to address-pools. 2024-03-06 07:34:14 -03:00
puppet.py Use correct hiera file for downgrade 2024-03-11 13:19:37 +00:00
rook.py Introduce rook ceph 2021-01-27 06:46:02 +08:00
service_parameter.py Kubernetes custom configuration support: runtime. 2022-09-13 15:50:22 -04:00
smapi.py Remove the use of the mgmt_ip field in host table 2023-11-01 10:30:21 -04:00
sssd.py Add SSSD ldap password expiration control 2023-08-14 13:04:01 +00:00
storage.py Add resize to / 2023-02-09 20:12:10 -05:00
usm.py Enable puppet hiera setup for USM 2023-07-19 18:48:53 +00:00