StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald f00de2a311 Add controller-0 to Mtce Heartbeat Service in AIO SX
All system types with the exception of AIO SX
adds controller-0 to the heartbeat service.

There is no enabled heartbeating in AIO SX so
controller-0 was never added. However, without
being added the alarms the hbsAgent raises are
not cleared over a process startup.

The local hbsClient was designed to monitor
pmon, effectively monitor the process monitor,
and report to the hbsAgent its onging health
state. This way if pmond stops functioning
maintenance is able to alarm that condition.

However, because in AIO SX controller-0 is never
added to the heartbeat service the current method
of looping over the internal heartbeat service
inventory clearing all the hbsAgent owned alarms
for each host over a process restart is bypassed.

So, the failure mode where pmond is failing and
the hbsAgent has raised an alarm against it and is
followed by a restart of the hbsAgent that coincides
with 'pmond' process recovery, the pmond alarm gets
stuck asserted.

This update adds controller-0 to the heartbeat
service inventory list for all system types so
the hbsAgent managed alarms are cleared over a
process restart regardless of the system type.

Additionally, the following logging improvements
were made:

 - add the network name to the heartbeat start log.
 - avoid heartbeat stop log when already stopped.

Test Plan:

PASS: Verify pmond alarm clears over hbsAgent process
      restart in AIO SX, AOI DX, Standard and Storage
      Systems.

Regression:

PASS: Verify Storage System Install and heartbeat
PASS: Verify Standard System install and heartbeat
PASS: Verify AIO DX install and heartbeat
PASS: Verify AIO SX install and heartbeat
PASS: Verify heartbeat logs and failure handling
PEND: Verify update as a patch

Change-Id: I9afd92a0b54296ef1f87ce7d912510649ae7560c
Closes-Bug: 1904918
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-12-23 00:18:49 +00:00
api-ref/source Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
bsp-files swtpm rpms cleanup in metal/bsp-files 2020-12-16 12:25:27 -05:00
devstack Security: Handle nospectre_v1 in the bootargs 2020-01-28 18:21:13 -05:00
doc Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
installer Add auto-version for remaining stx/metal packages 2020-12-17 13:26:24 -05:00
kickstart Drop isolcpu from AIO/worker kickstarts 2020-06-19 02:08:28 -04:00
mtce Add controller-0 to Mtce Heartbeat Service in AIO SX 2020-12-23 00:18:49 +00:00
mtce-common Add SM process heartbeat and status to the hbs cluster 2020-12-10 11:13:13 -05:00
mtce-compute Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
mtce-control Fix heartbeat messaging when interface is set to 'lo' 2020-06-26 14:16:41 +00:00
mtce-storage Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
tools/rvmc/centos Redfish Virtual Media Controller enhancements 2020-08-17 21:14:50 +00:00
.gitignore Update tox.ini files to use stein constraints 2019-06-25 13:20:35 -04:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
centos_build_layer.cfg Build layering, add layer build config file 2019-10-15 19:19:45 +08:00
centos_iso_image.inc Remove unused inventory and python-inventoryclient 2020-01-08 14:12:05 -06:00
centos_pkg_dirs rvmc: remove un-used build data 2020-01-16 08:39:54 -08:00
centos_stable_docker_images.inc Utility to install a server via Redfish 2019-12-31 15:34:54 +00:00
pylint.rc Add pylint checks for python files in metal 2020-01-03 13:27:00 -06:00
test-requirements.txt Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
tox.ini Use newer flake8 to run on ubuntu-focal Zuul machines 2020-09-09 17:59:49 -04:00

README.rst

metal

StarlingX Bare Metal Management