metal/mtce/src
Eric MacDonald 9bf231a286 Fix BMC access loss handling
Recent refactoring of the BMC handler FSM introduced a code change that
prevents the BMC Access alarm from being raised after initial BMC
accessibility was established and is then lost.

This update ensures BMC access alarm management is working properly.

This update also implements ping failure debounce so that a single ping
failure does not trigger full reconnection handling. Instead that now
requires 3 ping failures in a row. This has the effect of adding a minute
to ping failure action handling before the usual 2 minute BMC access failure
alarm is raised. ping failure logging is reduced/improved.

Test Plan: for both hwmond and mtcAgent

PASS: Verify BMC access alarm due to bad provisioning (un, pw, ip, type)
PASS: Verify BMC ping failure debounce handling, recovery and logging
PASS: Verify BMC ping persistent failure handling
PASS: Verify BMC ping periodic miss handling
PASS: Verify BMC ping and access failure recovery timing
PASS: Verify BMC ping failure and recovery handling over BMC link pull/plug
PASS: Verify BMC sensor monitoring stops/resumes over ping failure/recovery

Regression:

PASS: Verify IPv6 System Install using provisioned BMCs (wp8-12)
PASS: Verify BMC power-off request handling with BMC ping failing & recovering
PASS: Verify BMC power-on request handling with BMC ping failing & recovering
PASS: Verify BMC reset request handling with BMC ping failing & recovering
PASS: Verify BMC sensor group read failure handling & recovery
PASS: Verify sensor monitoring after ping failure handling & recovery

Change-Id: I74870816930ef6cdb11f987424ffed300ff8affe
Closes-Bug: 1858110
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-01-03 09:34:37 -05:00
..
alarm Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00
common Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
fsmon Add LSB headers to mtce service scripts 2019-08-29 11:20:14 -05:00
fsync Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
heartbeat Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
hostw Update host watchdog CONFIG_MASK 2019-10-30 16:40:56 -04:00
hwmon Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
lmon Monitor the datanetwork for non-OpenStack work node 2019-12-26 04:00:47 +00:00
maintenance Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
mtclog Set restricted permissions for mtce logfiles 2019-07-17 18:19:52 -04:00
pmon Make successful pmon-restart clear failed restarts count 2019-11-21 14:58:28 +00:00
public Set SHELL in Makefiles that use bash constructs 2018-12-07 14:09:48 -06:00
scripts Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
LICENSE Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
Makefile Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00