metal/mtce/src
Eric MacDonald 4e62e3ac9f Prevent process coredump due to missing token in response header
Both Maintenance and the Hardware Monitor use a common token refresh
utility that has been seen to crash the calling process when a token
'get' request is missing the token in its response header.

This update avoids that by exiting the token handler at error
detection point rather than continue handling the response with
invalid data.

Significant fault insertion testing was performed on the update
which lead to some additional improvements in token request error
handling that both processes benefit from.

Additional specific fixes include
- fixed race condition memory leak around authentication error handling
- differentiate token refresh from failure recovery renewal.
- fixed a few missing event status / rc updates.

Test Plan:
 - used mtce fault insertion tools to create failure modes
 - 24+ hr memory leak test run for both success & token error handling
 - all tests were done with both hwmond and mtcAgent

PASS: Verify build and AIO DX install.
PASS: Verify reported hwmon coredump issue is avoided/resolved.
PASS: Verify issue also exists in the mtcAgent and is also
      avoided/resolved by this update.

Regression:

PASS: Verify token get failure retry handling:
PASS: - get first token inline - retry cadence: 5 seconds
PASS: - refresh token by http  - retry cadence: 10, 30 and 1200 secs
PASS: Verify recovery handling cases:
PASS: - corrupt token
PASS: - no token present
PASS: - no token in header
PASS: Verify token renewal stress soak ; every 10 seconds for 24+ hrs
PASS: - repeat over token get failure cases
PASS: - in each success and failure case verify no memory leaks.
PASS: Verify authentication error handling soak
      - every 10-60 secs for 24+ hrs
      - token is corrupted followed by a sysinv request to
        exercise authentication error handling and renewal process.
PASS: Verify no coredumps.
PASS: Verify logging and token retry.
PASS: Verify process continues to use the previous token until a new
      one is acquired.
      - Token Refresh is on time.
      - Token Renew is on event.
PASS: Verify soak of persistent authentication error / token
      renewal cycle. No memory leak or coredumps.

Closes-Bug: 2063475
Change-Id: I5eef62518ac606e6b54323b46fbb6f9475b5c1ef
2024-04-29 13:11:26 +00:00
..
alarm Add pxeboot mtcAlive messaging alarm handling 2024-04-09 14:13:23 +00:00
common Add pxeboot mtcAlive messaging alarm handling 2024-04-09 14:13:23 +00:00
fsmon Replace a file test from fsmond 2023-11-17 08:15:28 -03:00
fsync Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
heartbeat Remove swerr log in hbsAgent cluster delete 2021-06-14 19:04:33 -04:00
hostw Change hostwd emergency log to write to /dev/kmsg 2023-02-01 23:41:14 +00:00
hwmon Prevent process coredump due to missing token in response header 2024-04-29 13:11:26 +00:00
lmon Add pxeboot network mtcAlive messaging to Maintenance 2024-03-28 15:28:27 +00:00
maintenance Prevent process coredump due to missing token in response header 2024-04-29 13:11:26 +00:00
mtclog Set restricted permissions for mtce logfiles 2019-07-17 18:19:52 -04:00
pmon Fix bashate failure in zuul 2022-10-06 17:22:12 +00:00
public Fix mtce build error with gcc-8.2.1 2020-04-03 14:44:21 +08:00
scripts Add pxeboot network mtcAlive messaging to Maintenance 2024-03-28 15:28:27 +00:00
LICENSE Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
Makefile Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00