The history for KubeHostUpgradeControlPlaneStep timeout of 600s
was to give significant headroom in doing control-plane upgrade.
This step was known to run long, but we had limited data, so
we set the value large. The underlying kubeadm
UpgradeManifestTimeout was 5 minutes, so timeout larger than
300s was ineffective.
This updates KubeHostUpgradeControlPlaneStep timeout
to 420s. This is intentionally engineered to be larger than
the resultant time for sysinv code to reach completion of the
Kubernetes Upgrade control-plane step with retries and
accounting for failure.
The timeout is engineered using the following equation.
This accounts for retries, hitting kubeadm upgrade timeout
each try, and some buffer for the sysinv report callback
mechanism.
nfv_timeout = ImageDownloadTime + retries*
(UpgradeControlPlaneTimeout + buffer)
Following are the engineered parameters:
ImageDownloadTime = 0s (images are pre-pull before this step)
UpgradeManifestTimeout = 3 minutes
buffer = 30s
2 retries
Result:
Engineered puppet timeout for upgrade control-plane:
= UpgradeControlPlaneTimeout + buffer = 3*60s + 30s = 210s
Engineered NFV timeout:
= 0s + 2(180s + 30s) = 420s
Test Plan:
PASS: Perform orchestrated k8s upgrade, manually STOP kubeadm process
during k8s upgrade control-plane step. Check logs to verify
puppet timeout and also verify sysinv attempts retry mechanism
before nfv timeout.
Partial-Bug: 2056326
Change-Id: I73ab8ea7cd7fc3816372260983c4b54a02cdcc4c
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>