metal/mtce/src/maintenance
Eric MacDonald c4b8171ddd Refactor BMC provisioning in Maintenance
The current mechanism used to preserve the learned bmc protocol in
the filesystem on the active controller is problematic over swact.

This update removes the file storage method in favor of preserving
the learned protocol in the system inventory database as a key/value
pair at the host level in already existing mtce_info database field.

The specified or learned bmc access protocol is then shared with the
hardware monitor through inter-daemon maintenance messaging.

This update refactors bmc provisioning to accommodate bmc protocol
selection at the host rather than system level. Towards that this
update removes system level bmc_access_method selection in favor of
host level selection through bm_type. A bm_type of 'bmc' specifies
that the bmc access protocol for that host be learned. This has the
effect of making it the same as what is delivered today but without
support for changing it as the system level.

A system inventory update will be delivered shortly that enables bmc
access protocol selection at the host level. That update allows the
customer to specify the bmc access protocol at the host level to be
either dynamic (aka learned) or to only use 'redfish' or 'ipmi'.
That system inventory update delivers that information to maintenance
through bm_type via bmc provisioning. Until that update is delivered
bm_type always comes in as 'bmc' which get interpreted as 'dynamic'
to maintain existing configuration.

The following additional issues were also fixed in this update.

1. The nodeTimers module defaults the 'ring' member of timers that are
   not running to false but should be true.

2. Added a pingUtil_restart function to facilitate quicker sensor
   monitoring following provisioning changes and bmc access failures.

3. Enhanced the hardware monitor sensor grouping filter to accommodate
   non-standard Redfish readout labelling so that more sensors fall
   into the existing canned groups ; leads to more monitored sensors.

4. Added a 'http security mode' to hardware monitor messaging. This
   defaults to https as that is all that is supported by the Redfish
   implementation today. This field can be used to specify non-secure
   'http' mode in the future when that gets implemented.

5. Ensure the hardware monitor performs a bmc password re-fetch on every
   provisioning change.

Test Plan:

PASS: Verify bmc access protocol store/fetched from the database (mtce_info)
PASS: Verify inventory push from mtcAgent to hwmond over mtcAgent restart
PASS: Verify inventory push from mtcAgent to hwmond over hwmon restart
PASS: Verify bmc provisioning of ipmi and redfish servers
PASS: Verify learned bmc protocol persists over process restart and swact
PASS: Verify process startup with protocol already learned

Hardware Monitor:

PASS: Verify bmc_type=ipmi handling ; protocol forced to ipmi ; (re)prov
PASS: Verify bmc_type=redfish handling ; protocol forced to redfish ; (re)prov
PASS: Verify bmc_type=dynamic handling ; protocol is learned then persisted
PASS: Verify sensor model delete and relearn over ip address change
PASS: Verify sensor model delete and relearn over bm_type change change
PASS: Verify sensor model not relearned username change
PASS: Verify bm pw is re-fetched over any (re)provisioning change
PASS: Verify bmc re-provisioning soak (test-bmc-reprovisioning.sh 50 loops)
PASS: Verify protocol change handling, file cleanup, model recreation
PASS: Verify End-2-End behavior for bm_type change from redfish to ipmi
PASS: Verify End-2-End behavior for bm_type change from ipmi to redfish
PASS: Verify End-2-End behavior for bm_type change from redfish to dynamic
PASS: Verify End-2-End behavior for bm_type change from ipmi to dynamic
PASS: Verify End-2-End behavior for bm_type change from dynamic to ipmi
PASS: Verify End-2-End behavior for bm_type change from dynamic to redfish
PASS: Verify sensor model creation waits for server power to be on
PASS: Verify sensor relearn by provisioning change during model creation. (soak)

Regression:

PASS: Verify host power off and on.
PASS: Verify BMC access alarm handling (assert and clear)
PASS: Verify mtcAgent and hwmond logs add value
PASS: Verify no core dumps / seg faults.
PASS: Verify no mtcAgent and hwmond memory leak.
PASS: Verify delete of BMC provisioned host
PASS: Verify sensor monitoring, alarming, degrade and then clear cycle
PASS: Verify static analysis report of changed modules.
PASS: Verify host level bm_type=bmc functions as would dynamic selection
PASS: Verify batch provisioning and deprovisioning (7 nodes)
PASS: Verify batch provisioning to different protocol (5 nodes)
PASS: Verify handling of flaky Redfish responses

PEND: Verify System Install

Change-Id: Ic224a9c33e0283a611725b33c90009132cab3382
Closes-Bug: #1853471
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-12-09 09:39:49 -05:00
..
Makefile Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
mtcAlarm.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcAlarm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcBmcUtil.cpp Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
mtcBmcUtil.h Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
mtcCmdHdlr.cpp Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
mtcCompMsg.cpp Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
mtcCtrlMsg.cpp Add bmc protocol select to maintenance 2019-09-08 14:14:15 -04:00
mtcHttpSvr.cpp Fix Mtce's VIM systems query handling 2019-10-09 09:44:35 -04:00
mtcHttpSvr.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcHttpUtil.cpp MTCE: reading BMC passwords from Barbican secret storage. 2019-02-14 09:04:46 -05:00
mtcHttpUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcInvApi.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcInvApi.h Fix format-overflow warning in mtcInvApi 2019-08-27 10:33:44 -05:00
mtcNodeComp.cpp Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
mtcNodeComp.h Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
mtcNodeCtrl.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcNodeFsm.cpp Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
mtcNodeFsm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeHdlrs.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcNodeHdlrs.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeMnfa.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcNodeMsg.h Fix maintenance cluster-host messaging 2019-07-18 14:54:45 -04:00
mtcSmgrApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcSmgrApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcStubs.cpp Implement Active-Active Heartbeat as HA Improvement Fix 2018-12-10 09:57:34 -05:00
mtcSubfHdlrs.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcThreads.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
mtcThreads.h Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
mtcVimApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcVimApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcWorkQueue.cpp [Trivial Fix] fix typos in docstrings 2019-02-21 14:46:06 +08:00