From d863aea1729f0aa6cb17bbf8aec5f9fd3e9cc371 Mon Sep 17 00:00:00 2001 From: Eric MacDonald Date: Fri, 16 Jun 2023 18:11:53 +0000 Subject: [PATCH] Increase mtce host offline threshold to handle slow host shutdown Mtce polls/queries the remote host for mtcAlive messages for 42 x 100 ms intervals over unlock or host failed cases. Absence of mtcAlive during this (~5 sec) period indicates the node is offline. However, in the rare case where shutdown is slow, 5 seconds is not long enough. Rare cases have been seen where 7 or 8 second wait time is required to properly declare offline. To avoid the rare transient 200.004 host alarm over an unlock operation, this update increases the mtce host offline window from 5 to 10 seconds (approx) by modifying the mtce configuration file offline threshold from 42 to 90. Test Plan: PASS: Verify unchallenged failed to offline period to be ~10 secs PASS: Verify algorithm restarts if there is mtcAlive received anytime during the polls/queries (challenge) window. PASS: Verify challenge handling leads to a longer but successful offline declaration. PASS: Verify above handling for both unlock and spontaneous failure handling cases. Closes-Bug: 2024249 Change-Id: Ice41ed611b4ba71d9cf8edbfe98da4b65dcd05cf Signed-off-by: Eric MacDonald --- mtce/src/scripts/mtc.conf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mtce/src/scripts/mtc.conf b/mtce/src/scripts/mtc.conf index 461766b0..df4db4fa 100644 --- a/mtce/src/scripts/mtc.conf +++ b/mtce/src/scripts/mtc.conf @@ -8,8 +8,8 @@ hbs_minor_threshold = 4 ; Heartbeat minor threshold count. ; minor notification to maintenance. offline_period = 100 ; number of msecs to wait for each offline audit -offline_threshold = 46 ; number of back to back mtcAlive requests missed - ; 100:46 will yield a typical 5 sec holdoff from +offline_threshold = 90 ; number of back to back mtcAlive requests missed + ; 100:90 will yield a typical 10 sec holdoff from ; failed to offline inventory_port = 6385 ; The Inventory Port Number