StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald 3a6fec50c1 Reduce Maintenance Host Watchdog timeout for controllers
This update makes changes to the maintenance host watchdog
and reduces the timeout from 5 to 3 minutes for controllers.

This update also decouples the pmon quorum monitoring
feature handling from the host watchdog timeout. Both were
driven off the same select timer which prevented watchdog
timeout value to be independently changed without affecting
quorum monitoring.

A new config label 'kernwd_update_period_stall_detect' is
added and value loaded for hosts that need more rigid
process stall detection.

This new lower timeout value label is loaded and applied to
hosts that run the system controller function.

A few logging improvements were made.

Test Plan:

PASS: Verify pmon quorum failure handling while unlocked.
             Was and remains at 3 misses, 60 seconds each.
PASS: Verify watchdog TO at 12 seconds on controllers.
             Was 300 secs.
PASS: Verify kernel watchdog is not enabled when loaded
             kernwd_update_period is less than 5 seconds.
             Was 60 secs.
PASS: Verify process logging ; startup, failure, transient
PASS: Verify all config values loaded by hostwd process

Regression:

PASS: Verify watchdog TO at 300 seconds on non-controllers
PASS: Verify handling of failed quorum process while locked
PASS: Verify handling of failed quorum process while unlocked
PASS: Verify handling of transient quorum messaging loss while
             unlocked
PASS: Verify hostwd process patching ; locked and unlocked
             cases

PASS: Verify AIO DX System Install
PASS: Verify Standard System Install

Note: There is no kernel WD TO log.
      The log is output to the console.

Change-Id: Iad726436e28dfa48a06743aa166318969eb6915d
Closes-Bug: #1894889
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-11-13 07:52:59 -05:00
api-ref/source Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
bsp-files Revert "Enable 'rcu_nocb_poll' kernel config option" 2020-10-27 13:53:04 +00:00
devstack Security: Handle nospectre_v1 in the bootargs 2020-01-28 18:21:13 -05:00
doc Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
installer Remove reference to cgcs-centos-repo 2020-09-18 16:04:44 -04:00
kickstart Drop isolcpu from AIO/worker kickstarts 2020-06-19 02:08:28 -04:00
mtce Reduce Maintenance Host Watchdog timeout for controllers 2020-11-13 07:52:59 -05:00
mtce-common Reduce Maintenance Host Watchdog timeout for controllers 2020-11-13 07:52:59 -05:00
mtce-compute Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
mtce-control Fix heartbeat messaging when interface is set to 'lo' 2020-06-26 14:16:41 +00:00
mtce-storage Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
tools/rvmc/centos Redfish Virtual Media Controller enhancements 2020-08-17 21:14:50 +00:00
.gitignore Update tox.ini files to use stein constraints 2019-06-25 13:20:35 -04:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
centos_build_layer.cfg Build layering, add layer build config file 2019-10-15 19:19:45 +08:00
centos_iso_image.inc Remove unused inventory and python-inventoryclient 2020-01-08 14:12:05 -06:00
centos_pkg_dirs rvmc: remove un-used build data 2020-01-16 08:39:54 -08:00
centos_stable_docker_images.inc Utility to install a server via Redfish 2019-12-31 15:34:54 +00:00
pylint.rc Add pylint checks for python files in metal 2020-01-03 13:27:00 -06:00
test-requirements.txt Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
tox.ini Use newer flake8 to run on ubuntu-focal Zuul machines 2020-09-09 17:59:49 -04:00

README.rst

metal

StarlingX Bare Metal Management