metal/mtce-common/src/common
Eric MacDonald aaf9d08028 Mtce: Fix bmc password fetch error handling
The mtcAgent process sometimes segfaults while trying to fetch
the bmc password from a failing barbican process.

With that issue fixed the mtcAgent sends the bmc access
credentials to the hardware monitor (hwmond) process which
then segfaults for a reason similar

In cases where the process does not segfault but also does not
get a bmc password, the mtcAgent will flood its log file.

This update

 1. Prevents the segfault case by properly managing acquired
    json-c object releases. There was one in the mtcAgent and
    another in the hardware monitor (hwmond).

    The json_object_put object release api should only be called
    against objects that were created with very specific apis.
    See new comments in the code.

 2. Avoids log flooding error case by performing a password size
    check rather than assume the password is valid following the
    secret payload receive stage.

 3. Simplifies the secret fsm and error and retry handling.

 4. Deletes useless creation and release of a few unused json
    objects in the common jsonUtil and hwmonJson modules.

Note: This update temporarily disables sensor and sensorgroup
      suppression support for the debian hardware monitor while
      a suppression type fix in sysinv is being investigated.

Test Plan:

PASS: Verify success path bmc password secret fetch
PASS: Verify secret reference get error handling
PASS: Verify secret password read error handling
PASS: Verify 24 hr provision/deprov success path soak
PASS: Verify 24 hr provision/deprov error path path soak
PASS: Verify no memory leak over success and failure path soaking
PASS: Verify failure handling stress soak ; reduced retry delay
PASS: Verify blocking secret fetch success and error handling
PASS: Verify non-blocking secret fetch success and error handling
PASS: Verify secret fetch is set non-blocking
PASS: Verify success and failure path logging
PASS: Verify all of jsonUtil module manages object release properly
PASS: Verify hardware monitor sensor model creation, monitoring,
             alarming and relearning. This test requires suppress
             disable in order to create sensor groups in debian.
PASS: Verify both ipmi and redfish and switch between them with
             just bm_type change.
PASS: Verify all above tests in CentOS
PASS: Verify over 4000 provision/deprovision cycles across both
             failure and success path handling with no process
             failures

Closes-Bug: 1975520
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Ibbfdaa1de662290f641d845d3261457904b218ff
2022-06-01 15:21:05 +00:00
..
Makefile Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
alarmUtil.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
alarmUtil.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
bmcUtil.cpp Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
bmcUtil.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
fitCodes.h Add mtcAgent socket initialization failure retry handling. 2020-04-01 19:24:22 +00:00
hostClass.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
hostClass.h Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
hostUtil.cpp Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
hostUtil.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
httpUtil.cpp Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
httpUtil.h Remove all nova and libvirt files from mtce-common 2019-03-19 15:23:36 -05:00
ipmiUtil.cpp Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
ipmiUtil.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
jsonUtil.cpp Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
jsonUtil.h Remove all nova and libvirt files from mtce-common 2019-03-19 15:23:36 -05:00
keyClass.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
keyClass.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
logMacros.h Disable Redfish BMC audit and improve reinstall failure handling 2020-11-16 15:15:22 +00:00
msgClass.cpp Fix mtce-common build error with gcc-8.2.1 2020-04-03 14:49:09 +08:00
msgClass.h Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
nlEvent.cpp Fix heartbeat messaging when interface is set to 'lo' 2020-06-26 14:16:41 +00:00
nlEvent.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
nodeBase.cpp Add Debian packaging for mtce packages 2021-10-29 09:17:00 -05:00
nodeBase.h Improved maintenance handling of spontaneous active controller reboot 2021-04-30 15:35:53 +00:00
nodeEvent.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
nodeEvent.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
nodeMacro.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
nodeTimers.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
nodeTimers.h Make Mtce Power-Off FSM verify power-off 2020-11-22 13:38:33 +00:00
nodeUtil.cpp Add Debian packaging for mtce packages 2021-10-29 09:17:00 -05:00
nodeUtil.h Prevent pmond process recovery when system is not running 2020-06-15 11:09:47 -04:00
pingUtil.cpp Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
pingUtil.h Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
redfishUtil.cpp Add Debian packaging for mtce packages 2021-10-29 09:17:00 -05:00
redfishUtil.h Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
regexUtil.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
regexUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
returnCodes.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
secretUtil.cpp Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
secretUtil.h Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
threadUtil.cpp Improve mtcAgent interrupted thread cleanup 2021-03-15 10:51:16 -04:00
threadUtil.h Improve mtcAgent interrupted thread cleanup 2021-03-15 10:51:16 -04:00
timeUtil.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
timeUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
tokenUtil.cpp Remove references to ceilometer in maintenance 2019-04-30 14:28:12 -04:00
tokenUtil.h MTCE: reading BMC passwords from Barbican secret storage. 2019-02-14 09:04:46 -05:00