StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald da398e0c5f Debian: Make Mtce offline handler more resilient to slow shutdowns
The current offline handler assumes the node is offline after
'offline_search_count' reaches 'offline_threshold' count
regardless of whether mtcAlive messages were received during
the search window.

The offline algorithm requires that no mtcAlive messages
be seen for the full offline_threshold count.

During a slow shutdown the mtcClient runs for longer than
it should and as a result can lead to maintenance seeing
the node as recovered before it should.

This update manages the offline search counter to ensure that
it only reached the count threshold after seeing no mtcAlive
messages for the full search count. Any mtcAlive message seen
during the count triggers a count reset.

This update also
1. Adjusts the reset retry cadence from 7 to 12 secs
   to prevent unnecessary reboot thrash during
   the current shutdown.
2. Clears the hbsClient ready event at the start of the
   subfunction handler so the heartbeat soak is only
   started after seeing heartbeat client ready events
   that follow the main config.

Test Plan:

PASS: Debian and CentOS Build and DX install
PASS: Verify search count management
PASS: Verify issue does not occur over lock/unlock soak (100+)
      - where the same test without update did show issue.
PASS: Monitor alive logs for behavioral correctness
PASS: Verify recovery reset occurs after expected extended time.

Closes-Bug: 1993656
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: If10bb75a1fb01d0ecd3f88524d74c232658ca29e
2022-10-24 15:57:43 +00:00
api-ref/source Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
bsp-files sysinv_fpga_agent merge with sysinv_agent 2022-10-03 18:45:03 +00:00
devstack Security: Handle nospectre_v1 in the bootargs 2020-01-28 18:21:13 -05:00
doc Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
installer Debian: UEFI RT pxeboot installs need efi=runtime option 2022-09-26 17:15:24 -05:00
kickstart Debian: Disable LAT's automatic rollback image selection 2022-10-14 23:46:48 +00:00
mtce Debian: Make Mtce offline handler more resilient to slow shutdowns 2022-10-24 15:57:43 +00:00
mtce-common Debian: Make Mtce offline handler more resilient to slow shutdowns 2022-10-24 15:57:43 +00:00
mtce-compute debian: Remove package preset install for metal 2022-09-27 08:23:09 +00:00
mtce-control Merge "Debian: Remove conf files from etc-pmon.d" 2022-09-30 19:41:16 +00:00
mtce-storage debian: Remove package preset install for metal 2022-09-27 08:23:09 +00:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
tools/rvmc debian: port rvmc docker image to Debian 2022-08-12 16:30:01 +00:00
.gitignore Update tox.ini files to use stein constraints 2019-06-25 13:20:35 -04:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
centos_build_layer.cfg Build layering, add layer build config file 2019-10-15 19:19:45 +08:00
centos_iso_image.inc Remove unused inventory and python-inventoryclient 2020-01-08 14:12:05 -06:00
centos_pkg_dirs rvmc: remove un-used build data 2020-01-16 08:39:54 -08:00
centos_stable_docker_images.inc Utility to install a server via Redfish 2019-12-31 15:34:54 +00:00
debian_build_layer.cfg Add debian_build_layer.cfg file 2021-10-05 14:08:23 -04:00
debian_iso_image.inc Include upgrades meta files to Debian ISO 2022-08-02 21:01:58 +00:00
debian_pkg_dirs Include upgrades meta files to Debian ISO 2022-08-02 21:01:58 +00:00
debian_stable_docker_images.inc debian: port rvmc docker image to Debian 2022-08-12 16:30:01 +00:00
pylint.rc Add pylint py3 portability checks for the metal repo 2021-09-13 11:57:42 -03:00
test-requirements.txt Removed wait_for_worker_config_init in AIO systems 2021-07-08 18:48:28 -04:00
tox.ini Fix bashate failure in zuul 2022-10-06 17:22:12 +00:00

README.rst

metal

StarlingX Bare Metal Management