StarlingX Bare Metal and Node Management, Hardware Maintenance

Go to file

Eric MacDonald f01fd85470 Fix MNFA recovery race condition that leads to stuck degrade Seeing from 0 to 10% of hosts get stuck in the degrade state after MNFA recovery. Clearing host degrade on Multi-Node Failure Avoidance (MNFA) recovery does not send degrade clear but does clear the hbs controol states. Instead relies on explicit events from hbsAgent per host/network to do so. If MNFA Recovery (exit) event occurs before all hbsAgent clear messages arrive then the hbs control clear tricks the mtcAgent into thinking that there was no degrade event active when it actually may still be. This fix enables the clear option the mon_host MNFA Recovery call so that the host's degrade condition is cleared. It also removes the unnecessary heartbeat disable call. Test Plan: PASS: soak MNFA in large system over and over to verify a 0-10% stuck degrade occurance rate drops to 0 after many (more than 20) occurances. Regression: PASS: Verify heartbeat. PASS: Verify single node graceful recovery. Change-Id: I699a376af5a95cc8dcc6ea5cc8266dc14fbacd09 Closes-Bug: 1845344 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>		2019-10-03 09:28:32 -04:00
api-ref/source	Clean up and standardize landing pages	2019-01-09 09:34:38 -08:00
bsp-files	Support custom kickstart addon for install from USB	2019-09-20 12:42:22 -04:00
devstack	Add redfish support detection to maintenance	2019-08-19 14:03:37 +00:00
doc	Fix the error links for metal docs	2019-07-03 09:20:25 -04:00
installer	Configurable Host HTTP/HTTPS Port Binding	2019-02-06 16:04:07 -06:00
inventory	Merge "Add inventory specfile for opensuse"	2019-09-20 14:23:16 +00:00
kickstart	Add openSUSE OBS Artifacts for Maintenance services	2019-09-20 09:18:54 -05:00
mtce	Fix MNFA recovery race condition that leads to stuck degrade	2019-10-03 09:28:32 -04:00
mtce-common	Add redfish power/reset/reinstall bmc support to maintenance	2019-09-26 15:59:35 -04:00
mtce-compute	Add openSUSE OBS Artifacts for Maintenance services	2019-09-20 09:18:54 -05:00
mtce-control	Add openSUSE OBS Artifacts for Maintenance services	2019-09-20 09:18:54 -05:00
mtce-storage	Add openSUSE OBS Artifacts for Maintenance services	2019-09-20 09:18:54 -05:00
python-inventoryclient	Add openSUSE OBS Artifacts for Maintenance services	2019-09-20 09:18:54 -05:00
releasenotes	Update config for release notes to include project name	2019-02-05 14:14:17 -08:00
.gitignore	Update tox.ini files to use stein constraints	2019-06-25 13:20:35 -04:00
.gitreview	OpenDev Migration Patch	2019-04-19 19:52:33 +00:00
.zuul.yaml	Minor zuul and tox cleanup related to package re-org	2019-09-09 10:35:11 -05:00
CONTRIBUTORS.wrs	StarlingX open source release updates	2018-05-31 07:36:43 -07:00
LICENSE	StarlingX open source release updates	2018-05-31 07:36:43 -07:00
README.rst	Followup opendev cleanup and test jobs	2019-04-22 16:42:03 +00:00
centos_iso_image.inc	Remove Resource Monitor ; aka rmon, from the load	2019-03-19 16:12:38 -04:00
centos_pkg_dirs	SysInv Decoupling: Create Inventory Service	2018-12-06 13:17:35 -05:00
test-requirements.txt	pep8 job enable and fix pep8 reported issue	2018-09-06 09:45:51 +08:00
tox.ini	Update tox.ini files to use stein constraints	2019-06-25 13:20:35 -04:00

README.rst

metal

StarlingX Bare Metal Management