metal/mtce/src/maintenance
Eric MacDonald 1196056612 Disable Redfish BMC audit and improve reinstall failure handling
The Mtce Reinstall Handler can collide with the BMC Redfish
audit resulting in reinstall failure. BMC handler's 2 minute
connection audit can colliding with other BMC commands.

The reinstall handler, with 4 bmc command operations is
particularly suseptable.

Two additional bmc communication improvements are implemented:

1. Add 'retry' handling to all BMC requests in the Maintenance
   Reinstall Handler FSM to handle transient command failures.

   Note: There are already retries to all but the power status
   query and the netboot requests in that handler and retries
   in other administrative commands that involve bmc requests.

2. Switch BMC power control command management from 'static' to
   'learned' lists. Some BMCs don't support both graceful and
   immediate power commands; Graceful Restart and Force Restart.
   To remove the possibility of using an unsupported BMC command,
   this update switches from static to learned power command lists
   with log produced if a server is missing command support.

   Power commands escalate from graceful to immediate in the
   presence of retries.

Test Cases:

PASS: Verify bmc handler redfish audit is disabled
PASS: Verify reinstall soak using redfish
PASS: Verify reinstall netboot and power status retry handling
PASS: Verify all power control commands using redfish
PASS: Verify graceful operations are used if available
PASS: Verify immediate operations are used for retries

Regression:

PASS: Verify bmc ping audit success and failure handling

PASS: Verify Reset        Handling soak (redfish and ipmi)
PASS: Verify Power-Off/On Handling soak (redfish and ipmi)
PASS: Verify Reinstall    Handling soak (redfish and ipmi)
PASS: Verify Standard System Install    (redfish and ipmi)
PASS: Verify AIO DX   System Install    (redfish and ipmi)

PASS: Verify this update as a patch

Change-Id: Idb484512ccb1b16e2d0ea9aff4ab7965347b1322
Closes-Bug: 1880578
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-11-16 15:15:22 +00:00
..
Makefile Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
mtcAlarm.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcAlarm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcBmcUtil.cpp Disable Redfish BMC audit and improve reinstall failure handling 2020-11-16 15:15:22 +00:00
mtcBmcUtil.h Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
mtcCmdHdlr.cpp Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
mtcCompMsg.cpp Fix AIO SX Lazy Reboot race condition 2020-10-22 17:12:48 -04:00
mtcCtrlMsg.cpp Fix Mtce Heartbeat period recovery on MNFA Exit 2020-09-18 01:34:11 +00:00
mtcHttpSvr.cpp Fix Mtce's VIM systems query handling 2019-10-09 09:44:35 -04:00
mtcHttpSvr.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcHttpUtil.cpp MTCE: reading BMC passwords from Barbican secret storage. 2019-02-14 09:04:46 -05:00
mtcHttpUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcInvApi.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcInvApi.h Fix format-overflow warning in mtcInvApi 2019-08-27 10:33:44 -05:00
mtcNodeComp.cpp Fix AIO SX Lazy Reboot race condition 2020-10-22 17:12:48 -04:00
mtcNodeComp.h Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
mtcNodeCtrl.cpp Fix Mtce Heartbeat period recovery on MNFA Exit 2020-09-18 01:34:11 +00:00
mtcNodeFsm.cpp Fix Mtce Heartbeat period recovery on MNFA Exit 2020-09-18 01:34:11 +00:00
mtcNodeFsm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeHdlrs.cpp Disable Redfish BMC audit and improve reinstall failure handling 2020-11-16 15:15:22 +00:00
mtcNodeHdlrs.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeMnfa.cpp Fix Mtce Heartbeat period recovery on MNFA Exit 2020-09-18 01:34:11 +00:00
mtcNodeMsg.h Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
mtcSmgrApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcSmgrApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcStubs.cpp Implement Active-Active Heartbeat as HA Improvement Fix 2018-12-10 09:57:34 -05:00
mtcSubfHdlrs.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcThreads.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcThreads.h Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
mtcVimApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcVimApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcWorkQueue.cpp [Trivial Fix] fix typos in docstrings 2019-02-21 14:46:06 +08:00