From 471d1001e0eba0adeba8bdfed020df3f3a0b83f9 Mon Sep 17 00:00:00 2001 From: Saba Touheed Mujawar Date: Wed, 13 Mar 2024 12:54:40 -0400 Subject: [PATCH] Set timeout for KubeHostUpgradeControlPlaneStep to 420s The history for KubeHostUpgradeControlPlaneStep timeout of 600s was to give significant headroom in doing control-plane upgrade. This step was known to run long, but we had limited data, so we set the value large. The underlying kubeadm UpgradeManifestTimeout was 5 minutes, so timeout larger than 300s was ineffective. This updates KubeHostUpgradeControlPlaneStep timeout to 420s. This is intentionally engineered to be larger than the resultant time for sysinv code to reach completion of the Kubernetes Upgrade control-plane step with retries and accounting for failure. The timeout is engineered using the following equation. This accounts for retries, hitting kubeadm upgrade timeout each try, and some buffer for the sysinv report callback mechanism. nfv_timeout = ImageDownloadTime + retries* (UpgradeControlPlaneTimeout + buffer) Following are the engineered parameters: ImageDownloadTime = 0s (images are pre-pull before this step) UpgradeManifestTimeout = 3 minutes buffer = 30s 2 retries Result: Engineered puppet timeout for upgrade control-plane: = UpgradeControlPlaneTimeout + buffer = 3*60s + 30s = 210s Engineered NFV timeout: = 0s + 2(180s + 30s) = 420s Test Plan: PASS: Perform orchestrated k8s upgrade, manually STOP kubeadm process during k8s upgrade control-plane step. Check logs to verify puppet timeout and also verify sysinv attempts retry mechanism before nfv timeout. Partial-Bug: 2056326 Change-Id: I73ab8ea7cd7fc3816372260983c4b54a02cdcc4c Signed-off-by: Saba Touheed Mujawar --- nfv/nfv-vim/nfv_vim/strategy/_strategy_steps.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nfv/nfv-vim/nfv_vim/strategy/_strategy_steps.py b/nfv/nfv-vim/nfv_vim/strategy/_strategy_steps.py index 07890856..245c4052 100755 --- a/nfv/nfv-vim/nfv_vim/strategy/_strategy_steps.py +++ b/nfv/nfv-vim/nfv_vim/strategy/_strategy_steps.py @@ -4711,7 +4711,7 @@ class KubeHostUpgradeControlPlaneStep(AbstractKubeHostUpgradeStep): """ def __init__(self, host, to_version, force, target_state, target_failure_state, - timeout_in_secs=600): + timeout_in_secs=420): super(KubeHostUpgradeControlPlaneStep, self).__init__( host, to_version,