metal/mtce-common/cgts-mtce-common-1.0/common
Eric MacDonald acd2d684f6 Mtce: Debouce heartbeat recovery
For the event of Heartbeat Failure with a host, the Mtce Heartbeat Agent
will declare heartbeat recovery upon the first successful heartbeat
reply after the loss is declared ; basically edge level trigger
recovery.

In cases where a networking issue causes heartbeat loss of a group of
hosts, Maintenance tracks the group of hosts that experienced heartbeta
loss and puts the system into 'Multi Node Failure Avoidance' mode.
maintenance then simply waits up to a configured timeout period for
hosts to regain heartbeat.
As heartbeat is regained for each host that host is attempted to be
'Gracefully Recovered'.

However, if the networking issue persists in a way that the occasional
transient heartbeat pulse gets through then the maintenance system can
prematurely take hosts and then 'the system' out of MNFA mode only to
find that heartbeat is actually not properly recovered/working only to
then fail and force reboot/reset each node that is still experiencing
heartbeat loss.

This update changes the heartbeat service from an 'edge' to 'level'
sensitive recovery by requiring a number of back-2-back heartbeat pulses
following a failure before that host is delared as recovered and pulled
out of the MMNFA pool.

Basically, This update makes the system's MNFA recovery algorithm more
robust in the face of transient heartbeat loss for a group of hosts.

Story: 2002882
Task: 22845

Change-Id: Ie36b73a14cfad317d900e3a3a9ddb434326737a1
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-07-20 11:12:19 -04:00
..
Makefile StarlingX open source release updates 2018-05-31 07:36:43 -07:00
alarmUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
alarmUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
fitCodes.h Mtce: Implement all token fetches as non-blocking operations. 2018-06-27 15:00:23 -04:00
fsync.c StarlingX open source release updates 2018-05-31 07:36:43 -07:00
hostClass.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
hostClass.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
hostUtil.cpp Controller Services swact/failover time reduction 2018-06-28 15:51:50 -04:00
hostUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
httpUtil.cpp Mtce: Implement all token fetches as non-blocking operations. 2018-06-27 15:00:23 -04:00
httpUtil.h Mtce: Implement all token fetches as non-blocking operations. 2018-06-27 15:00:23 -04:00
ipmiUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
ipmiUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
jsonUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
jsonUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
keyClass.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
keyClass.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
logMacros.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
msgClass.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
msgClass.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nlEvent.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nlEvent.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nodeBase.cpp Collectd+InfluxDb-RMON Replacement(ALL METRICS) P1 2018-07-03 11:04:27 -04:00
nodeBase.h Mtce: Debouce heartbeat recovery 2018-07-20 11:12:19 -04:00
nodeClass.cpp Mtce: Debouce heartbeat recovery 2018-07-20 11:12:19 -04:00
nodeClass.h Mtce: Debouce heartbeat recovery 2018-07-20 11:12:19 -04:00
nodeCmds.h Add 90s delay before locking storage node for upgrade 2018-07-06 09:18:21 -04:00
nodeEvent.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nodeEvent.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nodeMacro.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nodeTimers.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nodeTimers.h Add 90s delay before locking storage node for upgrade 2018-07-06 09:18:21 -04:00
nodeUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
nodeUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
pgdbClass.cpp.OBS StarlingX open source release updates 2018-05-31 07:36:43 -07:00
pgdbClass.h.OBS StarlingX open source release updates 2018-05-31 07:36:43 -07:00
pgdbUtil.cpp.OBS StarlingX open source release updates 2018-05-31 07:36:43 -07:00
pingUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
pingUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
regexUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
regexUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
returnCodes.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
threadUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
threadUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
timeUtil.cpp StarlingX open source release updates 2018-05-31 07:36:43 -07:00
timeUtil.h StarlingX open source release updates 2018-05-31 07:36:43 -07:00
tokenUtil.cpp Mtce: Implement all token fetches as non-blocking operations. 2018-06-27 15:00:23 -04:00
tokenUtil.h Mtce: Implement all token fetches as non-blocking operations. 2018-06-27 15:00:23 -04:00