Adding alarm when kube-apiserver pod fails

It is possible to put the system into a state where
kubernetes does not work but no alarms are present.
New alarm added to indicate k8s api server is down.

Test Plan:
PASS: Kube api server was interrupted/stopped by changing
	  configuration files and alarm was raised.
PASS: Alarm was cleared when configurations were reset and
	  kube api server was restarted.

Change-Id: I335179ea98ef63d7c35c89d82328a52ab2391f5c
Signed-off-by: rakshith mr <rakshith.mr@windriver.com>
This commit is contained in:
rakshith mr 2024-01-29 08:20:34 -05:00
parent 1684514c79
commit 021718688b
2 changed files with 22 additions and 0 deletions

View File

@ -309,6 +309,9 @@ FM_ALARM_ID_KUBE_ROOTCA_UPDATE_IN_PROGRESS = ALARM_GROUP_SW_MGMT + ".008"
# Kubernetes RootCA Update abort alarm id
FM_ALARM_ID_KUBE_ROOTCA_UPDATE_ABORTED = ALARM_GROUP_SW_MGMT + ".009"
# Kubernetes Node Down alarm id
FM_ALARM_ID_KUBE_DOWN = ALARM_GROUP_K8S + ".002"
# The SYSTEM_CONFIG_UPDATE alarms are originated by vim strategy which is the
# same as the other sw-mgmt alarms, put them in the same group
# System Config Update alarm id

View File

@ -3431,6 +3431,25 @@
Degrade_Affecting_Severity: none
Context: none
850.002:
Type: Alarm
Description: K8s nodes unreachable
Entity_Instance_ID: kubernetes=k8s-nodes
Severity: major
Proposed_Repair_Action: "Restart kubernetes service.
Consult the System Administration Manual
for more details. If problem persists
contact next level of support."
Maintenance_Action:
Inhibit_Alarms:
Alarm_Type: communication
Probable_Cause: communication-subsystem-failure
Service_Affecting: False
Suppression: False
Management_Affecting_Severity: none
Degrade_Affecting_Severity: none
Context: none
#---------------------------------------------------------------------------
# SOFTWARE
#---------------------------------------------------------------------------