metal/mtce-common/src/daemon
Eric MacDonald 2fc05673d1 Add SysRq crash dump support for pmon quorum health messaging loss
The hostwd process supports failure handling for two pmon
quorum failure modes.
 1. persistent pmon quorum process failure
 2. persistent absence of pmon's quorum health report

This update adds a new configuration option and associated
implementation required to force a crash dump action for
failure mode 2 above.

This means that if the Process Monitor itself gets stalled or stops
running for 3 (default config) minutes then the hostwd will trigger
a SysRq to force a crash dump.

Test Plan:

PASS: Verify kdump for pmon quorum health report message loss
PASS: Verify no kdump when kdump_on_stall is disabled
PASS: Verify handling when kdump service is not active
PASS: Verify sighup config change detection and handling

Regression:

PASS: Verify softdog timeout handling and logs
PASS: Verify quorum threshold config change and handling
PASS: Verify handling with reboot/reset recovery methods disabled
PASS: Verify enable reboot_on_err config change handling
PASS: Verify reboot/reset actions are ignored while host is locked
PASS: Verify pmon failure recovery handling before threshold reached

Change-Id: Id926447574e02013f83c0170784e2a8f9a46bac1
Partial-Bug: 1894889
Depends-On: https://review.opendev.org/#/c/750806
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-11-13 12:38:16 -05:00
..
Makefile Set SHELL in Makefiles that use bash constructs 2018-12-07 14:09:48 -06:00
daemon_common.h Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
daemon_config.cpp Add SysRq crash dump support for pmon quorum health messaging loss 2020-11-13 12:38:16 -05:00
daemon_debug.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
daemon_files.cpp Make daemon_get_file_str return first line in specified file 2020-09-22 18:19:24 -04:00
daemon_ini.cpp fix compilation warnings in c/cpp files 2018-10-23 07:38:33 +00:00
daemon_ini.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
daemon_main.cpp Update the init parameters for opts 2019-05-30 11:00:41 +08:00
daemon_option.h Implement Active-Active Heartbeat as HA Improvement 2018-11-20 19:57:18 +00:00
daemon_signal.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00