2fc05673d1
The hostwd process supports failure handling for two pmon quorum failure modes. 1. persistent pmon quorum process failure 2. persistent absence of pmon's quorum health report This update adds a new configuration option and associated implementation required to force a crash dump action for failure mode 2 above. This means that if the Process Monitor itself gets stalled or stops running for 3 (default config) minutes then the hostwd will trigger a SysRq to force a crash dump. Test Plan: PASS: Verify kdump for pmon quorum health report message loss PASS: Verify no kdump when kdump_on_stall is disabled PASS: Verify handling when kdump service is not active PASS: Verify sighup config change detection and handling Regression: PASS: Verify softdog timeout handling and logs PASS: Verify quorum threshold config change and handling PASS: Verify handling with reboot/reset recovery methods disabled PASS: Verify enable reboot_on_err config change handling PASS: Verify reboot/reset actions are ignored while host is locked PASS: Verify pmon failure recovery handling before threshold reached Change-Id: Id926447574e02013f83c0170784e2a8f9a46bac1 Partial-Bug: 1894889 Depends-On: https://review.opendev.org/#/c/750806 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> |
||
---|---|---|
.. | ||
hostw | ||
hostw.logrotate | ||
hostw.service | ||
hostwd.conf |