nfv/nfv/nfv-vim/nfv_vim/strategy
Saba Touheed Mujawar 471d1001e0 Set timeout for KubeHostUpgradeControlPlaneStep to 420s
The history for KubeHostUpgradeControlPlaneStep timeout of 600s
was to give significant headroom in doing control-plane upgrade.
This step was known to run long, but we had limited data, so
we set the value large. The underlying kubeadm
UpgradeManifestTimeout was 5 minutes, so timeout larger than
300s was ineffective.

This updates KubeHostUpgradeControlPlaneStep timeout
to 420s. This is intentionally engineered to be larger than
the resultant time for sysinv code to reach completion of the
Kubernetes Upgrade control-plane step with retries and
accounting for failure.

The timeout is engineered using the following equation.
This accounts for retries, hitting kubeadm upgrade timeout
each try, and some buffer for the sysinv report callback
mechanism.

nfv_timeout = ImageDownloadTime + retries*
                        (UpgradeControlPlaneTimeout + buffer)

Following are the engineered parameters:

ImageDownloadTime = 0s (images are pre-pull before this step)
UpgradeManifestTimeout = 3 minutes
buffer = 30s
2 retries

Result:
Engineered puppet timeout for upgrade control-plane:
= UpgradeControlPlaneTimeout + buffer = 3*60s + 30s = 210s

Engineered NFV timeout:
= 0s + 2(180s + 30s) = 420s

Test Plan:
PASS: Perform orchestrated k8s upgrade, manually STOP kubeadm process
      during k8s upgrade control-plane step. Check logs to verify
      puppet timeout and also verify sysinv attempts retry mechanism
      before nfv timeout.

Partial-Bug: 2056326

Change-Id: I73ab8ea7cd7fc3816372260983c4b54a02cdcc4c
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 13:44:09 -04:00
..
__init__.py Nfv upgrade orchestration for kube-upgrade-storage 2023-12-18 09:08:44 -03:00
_strategy.py Merge "Alarm 900.701 raised on failing to remove node taint." 2024-02-27 17:41:22 +00:00
_strategy_defs.py Add retry at nfv orchestration level 2024-03-08 06:05:04 -05:00
_strategy_phases.py Clean up imports based on flake8 2018-09-20 16:43:28 -05:00
_strategy_stages.py Nfv upgrade orchestration for kube-upgrade-storage 2023-12-18 09:08:44 -03:00
_strategy_steps.py Set timeout for KubeHostUpgradeControlPlaneStep to 420s 2024-03-19 13:44:09 -04:00