Increase mtce host offline threshold to handle slow host shutdown

Mtce polls/queries the remote host for mtcAlive messages
for 42 x 100 ms intervals over unlock or host failed cases.
Absence of mtcAlive during this (~5 sec) period indicates
the node is offline.

However, in the rare case where shutdown is slow, 5 seconds
is not long enough. Rare cases have been seen where 7 or 8
second wait time is required to properly declare offline.

To avoid the rare transient 200.004 host alarm over an
unlock operation, this update increases the mtce host
offline window from 5 to 10 seconds (approx) by modifying
the mtce configuration file offline threshold from 42 to 90.

Test Plan:

PASS: Verify unchallenged failed to offline period to be ~10 secs
PASS: Verify algorithm restarts if there is mtcAlive received
      anytime during the polls/queries (challenge) window.
PASS: Verify challenge handling leads to a longer but
      successful offline declaration.
PASS: Verify above handling for both unlock and spontaneous
      failure handling cases.

Closes-Bug: 2024249
Change-Id: Ice41ed611b4ba71d9cf8edbfe98da4b65dcd05cf
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
Eric MacDonald 2023-06-16 18:11:53 +00:00
parent e8bbc8c6d3
commit d863aea172
1 changed files with 2 additions and 2 deletions

View File

@ -8,8 +8,8 @@ hbs_minor_threshold = 4 ; Heartbeat minor threshold count.
; minor notification to maintenance.
offline_period = 100 ; number of msecs to wait for each offline audit
offline_threshold = 46 ; number of back to back mtcAlive requests missed
; 100:46 will yield a typical 5 sec holdoff from
offline_threshold = 90 ; number of back to back mtcAlive requests missed
; 100:90 will yield a typical 10 sec holdoff from
; failed to offline
inventory_port = 6385 ; The Inventory Port Number