nfv/nfv
Saba Touheed Mujawar 471d1001e0 Set timeout for KubeHostUpgradeControlPlaneStep to 420s
The history for KubeHostUpgradeControlPlaneStep timeout of 600s
was to give significant headroom in doing control-plane upgrade.
This step was known to run long, but we had limited data, so
we set the value large. The underlying kubeadm
UpgradeManifestTimeout was 5 minutes, so timeout larger than
300s was ineffective.

This updates KubeHostUpgradeControlPlaneStep timeout
to 420s. This is intentionally engineered to be larger than
the resultant time for sysinv code to reach completion of the
Kubernetes Upgrade control-plane step with retries and
accounting for failure.

The timeout is engineered using the following equation.
This accounts for retries, hitting kubeadm upgrade timeout
each try, and some buffer for the sysinv report callback
mechanism.

nfv_timeout = ImageDownloadTime + retries*
                        (UpgradeControlPlaneTimeout + buffer)

Following are the engineered parameters:

ImageDownloadTime = 0s (images are pre-pull before this step)
UpgradeManifestTimeout = 3 minutes
buffer = 30s
2 retries

Result:
Engineered puppet timeout for upgrade control-plane:
= UpgradeControlPlaneTimeout + buffer = 3*60s + 30s = 210s

Engineered NFV timeout:
= 0s + 2(180s + 30s) = 420s

Test Plan:
PASS: Perform orchestrated k8s upgrade, manually STOP kubeadm process
      during k8s upgrade control-plane step. Check logs to verify
      puppet timeout and also verify sysinv attempts retry mechanism
      before nfv timeout.

Partial-Bug: 2056326

Change-Id: I73ab8ea7cd7fc3816372260983c4b54a02cdcc4c
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 13:44:09 -04:00
..
centos Kube rootca update orchestration integration 2021-09-02 12:53:36 -05:00
debian Update debian package versions to use git commits 2023-02-09 17:00:49 +00:00
nfv-client NFV API to list current strategy type and state. 2024-03-11 12:03:49 +00:00
nfv-common Implement system_config_update orchestration 2023-07-17 17:36:44 -04:00
nfv-debug-tools/histogram_analysis Not require recreate of tox env when running tox 2021-04-06 09:48:36 -05:00
nfv-plugins Alarm 900.701 raised on failing to remove node taint. 2024-02-15 12:12:32 -05:00
nfv-tests Merge "Alarm 900.701 raised on failing to remove node taint." 2024-02-27 17:41:22 +00:00
nfv-tools small cleanup required by OBS badness check - exec rights on non executable not allowed 2019-09-17 08:54:22 +02:00
nfv-vim Set timeout for KubeHostUpgradeControlPlaneStep to 420s 2024-03-19 13:44:09 -04:00
opensuse Add opensuse specfiles to nfv 2019-10-02 10:34:02 -05:00
.coveragerc Convert NFV unit tests from nose to stestr 2018-09-18 12:56:44 -05:00
.gitignore Add bugbear to flake8 and cleanup some errors 2018-09-13 14:12:48 -05:00
.stestr.conf Fix relative imports in nfv 2023-01-24 22:16:39 +00:00
PKG-INFO StarlingX open source release updates 2018-05-31 07:36:51 -07:00
pylint.rc pylint cleanup for nfv to use standard modules 2023-03-15 15:28:54 +00:00
test-requirements.txt Replace mock with unittest.mock 2023-01-24 22:13:59 +00:00
tox.ini Cleanup pep8 un-used variable warnings 2023-03-08 15:18:00 +00:00