metal/mtce/src/alarm
Eric MacDonald f2fedc0446 Add alarm retry support to maintenance alarm handling daemon
The maintenance alarm handling daemon (mtcalarmd) should not
drop alarm requests simply because FM process is not running.
Insteads it should retry for it and other FM error cases that
will likely succeed in time if they are retried.

Some error cases however do need to be dropped such as those
that are unlikely to succeed with retries.

Reviewed FM return codes with FM designer which lead to a list
of errors that should drop and others that should retry.

This update implements that handling with a posting and
servicing of a first-in / first-out alarm queue.

Typical retry case is the NOCONNECT error code which occurs
when FM is not running.

Alarm ordering and first try timestamp is maintained.
Retries and logs are throttled to avoid flooding.

Test Plan:

PASS: Verify success path alarm handling End-to-End.
PASS: Verify retry handling while FM is not running.
PASS: Verify handling of all FM error codes (fit tool).
PASS: Verify alarm handling under stress (inject-alarm script) soak.
PASS: verify no memory leak over stress soak.
PASS: Verify logging (success, retry, failure)
PASS: Verify alarm posted date is maintained over retry success.

Change-Id: Icd1e75583ef660b767e0788dd4af7f184bdb9e86
Closes-Bug: 1841653
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-10-07 09:07:49 -04:00
..
scripts Add LSB headers to mtce service scripts 2019-08-29 11:20:14 -05:00
Makefile Add EXTRALDFLAGS to linker in a number of Makefiles 2019-02-28 22:34:54 -06:00
alarm.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
alarm.h Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00
alarmData.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
alarmHdlr.cpp Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00
alarmInit.cpp Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00
alarmMgr.cpp Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00
alarmUtil.cpp Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00