From aef702fb01f0eccb49bf6ee5aaebd8fd2659a5fd Mon Sep 17 00:00:00 2001 From: Eric MacDonald Date: Thu, 28 Mar 2024 16:26:26 +0000 Subject: [PATCH] Add pxeboot mtcAlive alarm to fault management This update introduces a new maintenance group alarm ; 200.003 This new alarm is minor and management affecting if asserted. It is considered management affecting for the upgrades case because the pxeboot network is needed to upgrade a node. The alarm represents a communication/messaging failure between the active controller mtcAgent process and the mtcClient that runs on each node. Test Plan: PASS: Verify alarm attributes PASS: - code of 200.003 PASS: - assertion cause text PASS: - proposed repair action text PASS: - suppression option PASS: - does not inhibit other alarms PASS: - affect of assertion on upgrade healthcheck PASS: Verify ability to assert and clear PASS: Verify fm logging for the above assertion and clear Story: 2010940 Task: 49789 Change-Id: I507d30213674c5b1e24fcfebe15c6a87bad74358 Signed-off-by: Eric MacDonald --- fm-doc/fm_doc/events.yaml | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/fm-doc/fm_doc/events.yaml b/fm-doc/fm_doc/events.yaml index 7b978afa..4b8fecff 100755 --- a/fm-doc/fm_doc/events.yaml +++ b/fm-doc/fm_doc/events.yaml @@ -566,6 +566,22 @@ Degrade_Affecting_Severity: none Context: starlingx +200.003: + Type: Alarm + Description: pxeboot network communication failure. + Entity_Instance_ID: host= + Severity: minor + Proposed_Repair_Action: Administratively Lock and Unlock host to recover. If problem persists, contact next level of support. + Maintenance_Action: none + Inhibit_Alarms: False + Alarm_Type: communication + Probable_Cause: unknown + Service_Affecting: False + Suppression: False + Management_Affecting_Severity: warning + Degrade_Affecting_Severity: none + Context: starlingx + 200.004: Type: Alarm Description: |-