StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald 5c043f7ca9 Make Mtce ignore heartbeat events from in-active controller.
There is the potential for a race condition that can lead to
mtce incorrectly failing hosts due to heartbeat failure event
messages sourced from the in-active controller.

During a split brain recovery action scenario there was a swact
which left the hbsAgent on the new stand-by controller thinking
it was still on the active controller.

This specific split brain failure mode was one where the active
and then (after swact) stand-by controller was failing heartbeat
to its peer and other nodes in the system even though the new
active controller saw heartbeat working fine.

The problem being, the in-active controller detected and sent
a heartbeat loss message to mtce before mtce was able to update
the in-active controller's heartbeat activity status which would
have gated the loss event send.

This update adds an additional layer of protection by intentionally
ignoring heartbeat events from the in-active controller that might
slip through due to this activity state change race condition.

Also fixed a flooding log in the hbsAgent for big systems.

Change-Id: I825a801166b3e80cbf67945c7f587851f4e0d90b
Closes-Bug: 1813976
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-05-09 14:42:01 +00:00
api-ref/source Clean up and standardize landing pages 2019-01-09 09:34:38 -08:00
bsp-files Implement Pci Interrupt Affinity Agent 2019-05-04 01:13:44 +00:00
devstack Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
doc Clean up and standardize landing pages 2019-01-09 09:34:38 -08:00
installer Configurable Host HTTP/HTTPS Port Binding 2019-02-06 16:04:07 -06:00
inventory Remove api/v0.1 to access to ceph mgr RESTful plugin 2019-04-24 15:16:02 +00:00
kickstart Configurable Host HTTP/HTTPS Port Binding 2019-02-06 16:04:07 -06:00
mtce Make Mtce ignore heartbeat events from in-active controller. 2019-05-09 14:42:01 +00:00
mtce-common Make Mtce system mode scan case in-sensitive 2019-05-06 19:14:14 +00:00
mtce-compute Remove all nova and libvirt files from mtce-common 2019-03-19 15:23:36 -05:00
mtce-control Implement Active-Active Heartbeat as HA Improvement 2018-11-20 19:57:18 +00:00
mtce-storage get rid of duplicate LICENSE files in 3 packages 2018-10-30 02:55:34 +00:00
python-inventoryclient Remove remote-clients SDK Module from StarlingX 2019-03-22 16:19:06 -04:00
releasenotes Update config for release notes to include project name 2019-02-05 14:14:17 -08:00
.gitignore [Doc] OpenStack API Reference Guide 2018-09-05 19:59:26 -05:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
centos_iso_image.inc Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00
centos_pkg_dirs SysInv Decoupling: Create Inventory Service 2018-12-06 13:17:35 -05:00
test-requirements.txt pep8 job enable and fix pep8 reported issue 2018-09-06 09:45:51 +08:00
tox.ini Merge "Update tox.ini files to adapt to repo renaming" 2019-04-22 21:18:53 +00:00

README.rst

metal

StarlingX Bare Metal Management