metal/mtce/src/hostw/scripts
Eric MacDonald 3a6fec50c1 Reduce Maintenance Host Watchdog timeout for controllers
This update makes changes to the maintenance host watchdog
and reduces the timeout from 5 to 3 minutes for controllers.

This update also decouples the pmon quorum monitoring
feature handling from the host watchdog timeout. Both were
driven off the same select timer which prevented watchdog
timeout value to be independently changed without affecting
quorum monitoring.

A new config label 'kernwd_update_period_stall_detect' is
added and value loaded for hosts that need more rigid
process stall detection.

This new lower timeout value label is loaded and applied to
hosts that run the system controller function.

A few logging improvements were made.

Test Plan:

PASS: Verify pmon quorum failure handling while unlocked.
             Was and remains at 3 misses, 60 seconds each.
PASS: Verify watchdog TO at 12 seconds on controllers.
             Was 300 secs.
PASS: Verify kernel watchdog is not enabled when loaded
             kernwd_update_period is less than 5 seconds.
             Was 60 secs.
PASS: Verify process logging ; startup, failure, transient
PASS: Verify all config values loaded by hostwd process

Regression:

PASS: Verify watchdog TO at 300 seconds on non-controllers
PASS: Verify handling of failed quorum process while locked
PASS: Verify handling of failed quorum process while unlocked
PASS: Verify handling of transient quorum messaging loss while
             unlocked
PASS: Verify hostwd process patching ; locked and unlocked
             cases

PASS: Verify AIO DX System Install
PASS: Verify Standard System Install

Note: There is no kernel WD TO log.
      The log is output to the console.

Change-Id: Iad726436e28dfa48a06743aa166318969eb6915d
Closes-Bug: #1894889
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-11-13 07:52:59 -05:00
..
hostw Add LSB headers to mtce service scripts 2019-08-29 11:20:14 -05:00
hostw.logrotate Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
hostw.service De-branding in starlingx/metal: Titanium Cloud -> StarlingX 2020-04-03 07:58:25 +02:00
hostwd.conf Reduce Maintenance Host Watchdog timeout for controllers 2020-11-13 07:52:59 -05:00