f01fd85470
Seeing from 0 to 10% of hosts get stuck in the degrade state after MNFA recovery. Clearing host degrade on Multi-Node Failure Avoidance (MNFA) recovery does not send degrade clear but does clear the hbs controol states. Instead relies on explicit events from hbsAgent per host/network to do so. If MNFA Recovery (exit) event occurs before all hbsAgent clear messages arrive then the hbs control clear tricks the mtcAgent into thinking that there was no degrade event active when it actually may still be. This fix enables the clear option the mon_host MNFA Recovery call so that the host's degrade condition is cleared. It also removes the unnecessary heartbeat disable call. Test Plan: PASS: soak MNFA in large system over and over to verify a 0-10% stuck degrade occurance rate drops to 0 after many (more than 20) occurances. Regression: PASS: Verify heartbeat. PASS: Verify single node graceful recovery. Change-Id: I699a376af5a95cc8dcc6ea5cc8266dc14fbacd09 Closes-Bug: 1845344 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> |
||
---|---|---|
api-ref/source | ||
bsp-files | ||
devstack | ||
doc | ||
installer | ||
inventory | ||
kickstart | ||
mtce | ||
mtce-common | ||
mtce-compute | ||
mtce-control | ||
mtce-storage | ||
python-inventoryclient | ||
releasenotes | ||
.gitignore | ||
.gitreview | ||
.zuul.yaml | ||
CONTRIBUTORS.wrs | ||
LICENSE | ||
README.rst | ||
centos_iso_image.inc | ||
centos_pkg_dirs | ||
test-requirements.txt | ||
tox.ini |
README.rst
metal
StarlingX Bare Metal Management