630a777cbb
Service Management (SM) monitors connectivity and health of its peer controller over the OAM, Mgmt and (if provisioned) Cluster-Host networks. If SM sees all the links to its peer go 'carrier down' virtually simultaneously, it is possible that both controllers might simultaneously declare themselves unhealthy and both go disabled; i.e. shutdown all services with no automatic recovery. This update adds an 'Unhealthy State Recovery Audit' to SM which forces a self restart when all of its monitored links recover for cases where both controllers go unhealthy-shutdown or both controllers remain active in split-brain. Test Plan: PASS: Verify AIO SX install PASS: Verify Standard system install and unhealthy state recovery PASS: Verify single link failure end to end behavior PASS: Verify 2 of 3 link failure end to end behavior PASS: Verify all link failure end to end behavior PASS: Verify SM and Mtce heartbeat recovery over unhealthy state recovery PASS: Verify swact back and forth following a recovery PASS: Verify process restart as part of unhealthy state recovery PASS: Verify AIO DX install and unhealthy state recovery Change-Id: Ie906eaf04bec607328b7e0af09b37fa0558e3bbe Closes-Bug: 1883004 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> |
||
---|---|---|
.. | ||
centos | ||
opensuse | ||
scripts | ||
src | ||
LICENSE | ||
Makefile |