metal/mtce/src/common
Eric MacDonald 2210c71216 Fix Mtce Heartbeat period recovery on MNFA Exit
When Multi-Node Failure Avoidance (MNFA) occurs,
maintenance commands the Heartbeat Agent to slow
down by a factor of 4.

The rate recovery following a MNFA is not occurring.

Update https://review.opendev.org/#/c/701057 made
a condition check change that introduced this issue
by requiring mnfa_timeout to be non-zero before an
attempt is made to recover heartbeat period following
MNFA recovery.

This update switches that condition check to use more
specific mnfa_backoff state tracker and because MNFA
is a global maintenance mode feature rather than a
node specific feature, moves the recovery check code
from the node level fsm into a mnfa_recovery_handler
called in the main select loop.

Test Plan:

PASS: Verify MNFA handling/recovery with mnfa_timeout!=0
             that expires.
PASS: Verify MNFA handling/recovery when mnfa_timeout!=0
             but before the timeout expires.
PASS: Verify MNFA handling/recovery when mnfa_timeout=0
PASS: Verify MNFA backoff rate recovery over mtcAgent
             process restart.
PASS: Verify MNFA backoff rate is sent to hbsAgent if
             hbsAgent restarts while MNFA his active.

Change-Id: I8da5a000ab503692c7cfa620233ed8aa772c50f8
Closes-Bug: #1893212
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-09-18 01:34:11 +00:00
..
Makefile Set SHELL in Makefiles that use bash constructs 2018-12-07 14:09:48 -06:00
nodeClass.cpp Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
nodeClass.h Fix Mtce Heartbeat period recovery on MNFA Exit 2020-09-18 01:34:11 +00:00
nodeCmds.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00