Merge "Add ceph commands in the 800 series alarm document"

This commit is contained in:
Zuul 2023-03-17 13:32:25 +00:00 committed by Gerrit Code Review
commit 1a61473b14
1 changed files with 17 additions and 3 deletions

View File

@ -3109,11 +3109,12 @@
800.001: 800.001:
Type: Alarm Type: Alarm
Description: |- Description: |-
Storage Alarm Condition: Possible data loss. Any mds, mon or osd is unavailable in storage replication group.
1 mons down, quorum 1,2 controller-1,storage-0
Entity_Instance_ID: cluster=<dist-fs-uuid> Entity_Instance_ID: cluster=<dist-fs-uuid>
Severity: [critical, major] Severity: [critical, major]
Proposed_Repair_Action: "If problem persists, contact next level of support." Proposed_Repair_Action: "Manually restart Ceph processes and check the state of the Ceph cluster with
'ceph -s'
If problem persists, contact next level of support."
Maintenance_Action: Maintenance_Action:
Inhibit_Alarms: Inhibit_Alarms:
Alarm_Type: equipment Alarm_Type: equipment
@ -3133,7 +3134,10 @@
Entity_Instance_ID: cluster=<dist-fs-uuid>.peergroup=<group-x> Entity_Instance_ID: cluster=<dist-fs-uuid>.peergroup=<group-x>
Severity: [critical] Severity: [critical]
Proposed_Repair_Action: "Ensure storage hosts from replication group are unlocked and available. Proposed_Repair_Action: "Ensure storage hosts from replication group are unlocked and available.
Check replication group state with 'system host-list'
Check if OSDs of each storage host are up and running. Check if OSDs of each storage host are up and running.
Manually restart Ceph processes and check the state of the Ceph OSDs with
'ceph osd stat' OR 'ceph osd tree'
If problem persists, contact next level of support." If problem persists, contact next level of support."
Maintenance_Action: Maintenance_Action:
Inhibit_Alarms: Inhibit_Alarms:
@ -3153,7 +3157,10 @@
Entity_Instance_ID: cluster=<dist-fs-uuid>.peergroup=<group-x> Entity_Instance_ID: cluster=<dist-fs-uuid>.peergroup=<group-x>
Severity: [major] Severity: [major]
Proposed_Repair_Action: "Ensure storage hosts from replication group are unlocked and available. Proposed_Repair_Action: "Ensure storage hosts from replication group are unlocked and available.
Check replication group state with 'system host-list'
Check if OSDs of each storage host are up and running. Check if OSDs of each storage host are up and running.
Manually restart Ceph processes and check the state of the Ceph OSDs with
'ceph osd stat' AND/OR 'ceph osd tree'
If problem persists, contact next level of support." If problem persists, contact next level of support."
Maintenance_Action: Maintenance_Action:
Inhibit_Alarms: Inhibit_Alarms:
@ -3282,6 +3289,9 @@
Entity_Instance_ID: <hostname>.lvmthinpool=<VG name>/<Pool name> Entity_Instance_ID: <hostname>.lvmthinpool=<VG name>/<Pool name>
Severity: critical Severity: critical
Proposed_Repair_Action: "Increase Storage Space Allotment for Cinder on the 'lvm' backend. Proposed_Repair_Action: "Increase Storage Space Allotment for Cinder on the 'lvm' backend.
Try the following commands:
'vgextend <VG name> <PV name>' or 'vgextend -L +<size extension> <PV name>
Check status with 'vgdisplay'
Consult the System Administration Manual for more details. Consult the System Administration Manual for more details.
If problem persists, contact next level of support." If problem persists, contact next level of support."
Maintenance_Action: Maintenance_Action:
@ -3302,6 +3312,10 @@
Entity_Instance_ID: storage_backend=<storage-backend-name> Entity_Instance_ID: storage_backend=<storage-backend-name>
Severity: critical Severity: critical
Proposed_Repair_Action: "Update backend setting to reapply configuration. Proposed_Repair_Action: "Update backend setting to reapply configuration.
Use the following commands to try again:
'system storage-backend-delete <storage-backend-name>'
AND
'system storage-backend-add <storage-backend-name>'
Consult the System Administration Manual for more details. Consult the System Administration Manual for more details.
If problem persists, contact next level of support." If problem persists, contact next level of support."
Maintenance_Action: Maintenance_Action: