Fault Management guide

Removed OSX and MS Visual Code metadata files and directories.

Change-Id: I4e6024767cc072fcb3fc8d88f0724f02819cebc8
Signed-off-by: Stone <ronald.stone@windriver.com>
This commit is contained in:
Stone 2020-10-29 11:46:58 -04:00
parent f6f21a4056
commit 10e4b9ac86
54 changed files with 4825 additions and 10 deletions

View File

@ -0,0 +1 @@
.. This file must exist to satisfy build requirements.

View File

@ -0,0 +1,36 @@
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
For more information, see :ref:`Overview <openstack-fault-management-overview>`.
In the following tables, the severity of the alarms is represented by one or
more letters, as follows:
.. _alarm-messages-300s-ul-jsd-jkg-vp:
- C: Critical
- M: Major
- m: Minor
- W: Warning
A slash-separated list of letters is used when the alarm can be triggered with
one of several severity levels.
An asterisk \(\*\) indicates the management-affecting severity, if any. A
management-affecting alarm is one that cannot be ignored at the indicated
severity level or higher by using relaxed alarm rules during an orchestrated
patch or upgrade operation.
Differences exist between the terminology emitted by some alarms and that
used in the CLI, GUI, and elsewhere in the documentations:
- References to provider networks in alarms refer to data networks.
- References to data networks in alarms refer to physical networks.
- References to tenant networks in alarms refer to project networks.

View File

@ -0,0 +1,13 @@
The Customer Logs include events that do not require immediate user action.
The following types of events are included in the Customer Logs. The severity of the events is represented in the table by one or more letters, as follows:
- C: Critical
- M: Major
- m: Minor
- W: Warning
- NA: Not applicable

View File

@ -0,0 +1,15 @@
The Customer Logs include events that do not require immediate user action.
The following types of events are included in the Customer Logs. The severity of the events is represented in the table by one or more letters, as follows:
.. _customer-log-messages-401s-services-ul-jsd-jkg-vp:
- C: Critical
- M: Major
- m: Minor
- W: Warning
- NA: Not applicable

View File

@ -0,0 +1 @@
.. This file must exist to satisfy build requirements.

View File

@ -0,0 +1,33 @@
.. rsg1586183719424
.. _alarm-messages-overview:
Alarm messages are numerically coded by the type of alarm.
For more information, see
:ref:`Fault Management Overview <fault-management-overview>`.
In the alarm description tables, the severity of the alarms is represented by
one or more letters, as follows:
.. _alarm-messages-overview-ul-jsd-jkg-vp:
- C: Critical
- M: Major
- m: Minor
- W: Warning
A slash-separated list of letters is used when the alarm can be triggered with
one of several severity levels.
An asterisk \(\*\) indicates the management-affecting severity, if any. A
management-affecting alarm is one that cannot be ignored at the indicated
severity level or higher by using relaxed alarm rules during an orchestrated
patch or upgrade operation.
.. note::
**Degrade Affecting Severity: Critical** indicates a node will be
degraded if the alarm reaches a Critical level.

View File

@ -0,0 +1,336 @@
.. jsy1579701868527
.. _100-series-alarm-messages:
=========================
100 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _100-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.101**
- Platform CPU threshold exceeded; threshold x%, actual y%.
CRITICAL @ 95%
MAJOR @ 90%
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- Critical
* - Severity:
- C/M\*
* - Proposed Repair Action
- Monitor and if condition persists, contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.103**
- Memory threshold exceeded; threshold x%, actual y% .
CRITICAL @ 90%
MAJOR @ 80%
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- Critical
* - Severity:
- C/M
* - Proposed Repair Action
- Monitor and if condition persists, contact next level of support; may
require additional memory on Host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.104**
- File System threshold exceeded; threshold x%, actual y%
CRITICAL @ 90%
MAJOR @ 80%
* - Entity Instance
- host=<hostname>.filesystem=<mount-dir>
* - Degrade Affecting Severity:
- Critical
* - Severity:
- C\*/M
* - Proposed Repair Action
- Monitor and if condition persists, consider adding additional physical
volumes to the volume group.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.105**
- <fs\_name\> filesystem is not added on both controllers and/or does not
have the same size: <hostname\>.
* - Entity Instance
- fs\_name=<image-conversion>
* - Degrade Affecting Severity:
- None
* - Severity:
- C/M\*
* - Proposed Repair Action
- Add image-conversion filesystem on both controllers.
Consult the System Administration Manual for more details.
If problem persists, contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.106**
- 'OAM' Port failed.
* - Entity Instance
- host=<hostname>.port=<port-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.107**
- 'OAM' Interface degraded.
or
'OAM' Interface failed.
* - Entity Instance
- host=<hostname>.interface=<if-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C or M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.108**
- 'MGMT' Port failed.
* - Entity Instance
- host=<hostname>.port=<port-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.109**
- 'OAM' Interface degraded.
or
'OAM' Interface failed.
* - Entity Instance
- host=<hostname>.interface=<if-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C or M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.110**
- 'CLUSTER-HOST' Port failed.
* - Entity Instance
- host=<hostname>.port=<port-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C or M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.111**
- 'CLUSTER-HOST' Interface degraded.
OR
'CLUSTER-HOST' Interface failed.
* - Entity Instance
- host=<hostname>.interface=<if-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C or M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.112**
- 'DATA-VRS' Port down.
* - Entity Instance
- host=<hostname>.port=<port-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- M
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.113**
- 'DATA-VRS' Interface degraded.
or
'DATA-VRS' Interface down.
* - Entity Instance
- host=<hostname>.interface=<if-name>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C or M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.114**
- NTP configuration does not contain any valid or reachable NTP servers.
The alarm is raised regardless of NTP enabled/disabled status.
NTP address <IP address> is not a valid or a reachable NTP server.
Connectivity to external PTP Clock Synchronization is lost.
* - Entity Instance
- host=<hostname>.ntp
host=<hostname>.ntp=<IP address>
* - Degrade Affecting Severity:
- None
* - Severity:
- M or m
* - Proposed Repair Action
- Monitor and if condition persists, contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.118**
- Controller cannot establish connection with remote logging server.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- m
* - Proposed Repair Action
- Ensure Remote Log Server IP is reachable from Controller through OAM
interface; otherwise contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 100.119**
- Major: PTP configuration or out-of-tolerance time-stamping conditions.
Minor: PTP out-of-tolerance time-stamping condition.
* - Entity Instance
- host=<hostname>.ptp OR host=<hostname>.ptp=no-lock
OR
host=<hostname>.ptp=<interface>.unsupported=hardware-timestamping
OR
host=<hostname>.ptp=<interface>.unsupported=software-timestamping
OR
host=<hostname>.ptp=<interface>.unsupported=legacy-timestamping
OR
host=<hostname>.ptp=out-of-tolerance
* - Degrade Affecting Severity:
- None
* - Severity:
- M or m
* - Proposed Repair Action
- Monitor and, if condition persists, contact next level of support.

View File

@ -0,0 +1,402 @@
.. uof1579701912856
.. _200-series-alarm-messages:
=========================
200 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _200-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.001**
- <hostname> was administratively locked to take it out-of-service.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W\*
* - Proposed Repair Action
- Administratively unlock Host to bring it back in-service.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.004**
- <hostname> experienced a service-affecting failure.
Host is being auto recovered by Reboot.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- If auto-recovery is consistently unable to recover host to the
unlocked-enabled state contact next level of support or lock and replace
failing host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.005**
- Degrade:
<hostname> is experiencing an intermittent 'Management Network'
communication failures that have exceeded its lower alarming threshold.
Failure:
<hostname> is experiencing a persistent Critical 'Management Network'
communication failure.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\* (Degrade) or C\* (Failure)
* - Proposed Repair Action
- Check 'Management Network' connectivity and support for multicast
messaging. If problem consistently occurs after that and Host is reset,
then contact next level of support or lock and replace failing host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.006**
- Main Process Monitor Daemon Failure \(Major\)
<hostname> 'Process Monitor' \(pmond\) process is not running or
functioning properly. The system is trying to recover this process.
Monitored Process Failure \(Critical/Major/Minor\)
Critical: <hostname> Critical '<processname>' process has failed and
could not be auto-recovered gracefully. Auto-recovery progression by
host reboot is required and in progress.
Major: <hostname> is degraded due to the failure of its '<processname>'
process. Auto recovery of this Major process is in progress.
Minor:
<hostname> '<processname>' process has failed. Auto recovery of this
Minor process is in progress.
<hostname> '<processname>' process has failed. Manual recovery is required.
tp4l/phc2sys process failure. Manual recovery is required.
* - Entity Instance
- host=<hostname>.process=<processname>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C/M/m\*
* - Proposed Repair Action
- If this alarm does not automatically clear after some time and continues
to be asserted after Host is locked and unlocked then contact next level
of support for root cause analysis and recovery.
If problem consistently occurs after Host is locked and unlocked then
contact next level of support for root cause analysis and recovery.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.007**
- Critical: \(with host degrade\):
Host is degraded due to a 'Critical' out-of-tolerance reading from the
'<sensorname>' sensor
Major: \(with host degrade\)
Host is degraded due to a 'Major' out-of-tolerance reading from the
'<sensorname>' sensor
Minor:
Host is reporting a 'Minor' out-of-tolerance reading from the
'<sensorname>' sensor
* - Entity Instance
- host=<hostname>.sensor=<sensorname>
* - Degrade Affecting Severity:
- Critical
* - Severity:
- C/M/m
* - Proposed Repair Action
- If problem consistently occurs after Host is power cycled and or reset,
contact next level of support or lock and replace failing host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.009**
- Degrade:
<hostname> is experiencing an intermittent 'Cluster-host Network'
communication failures that have exceeded its lower alarming threshold.
Failure:
<hostname> is experiencing a persistent Critical 'Cluster-host Network'
communication failure.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\* (Degrade) or C\* (Critical)
* - Proposed Repair Action
- Check 'Cluster-host Network' connectivity and support for multicast
messaging. If problem consistently occurs after that and Host is reset,
then contact next level of support or lock and replace failing host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.010**
- <hostname> access to board management module has failed.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W
* - Proposed Repair Action
- Check Host's board management configuration and connectivity.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.011**
- <hostname> experienced a configuration failure during initialization.
Host is being re-configured by Reboot.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- If auto-recovery is consistently unable to recover host to the
unlocked-enabled state contact next level of support or lock and
replace failing host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.012**
- <hostname> controller function has in-service failure while compute
services remain healthy.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- Major
* - Severity:
- C\*
* - Proposed Repair Action
- Lock and then Unlock host to recover. Avoid using 'Force Lock' action
as that will impact compute services running on this host. If lock action
fails then contact next level of support to investigate and recover.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.013**
- <hostname> compute service of the only available controller is not
operational. Auto-recovery is disabled. Degrading host instead.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- Major
* - Severity:
- M\*
* - Proposed Repair Action
- Enable second controller and Switch Activity \(Swact\) over to it as
soon as possible. Then Lock and Unlock host to recover its local compute
service.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.014**
- The Hardware Monitor was unable to load, configure and monitor one
or more hardware sensors.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- m
* - Proposed Repair Action
- Check Board Management Controller provisioning. Try reprovisioning the
BMC. If problem persists try power cycling the host and then the entire
server including the BMC power. If problem persists then contact next
level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 200.015**
- Unable to read one or more sensor groups from this host's board
management controller.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M
* - Proposed Repair Action
- Check board management connectivity and try rebooting the board
management controller. If problem persists contact next level of
support or lock and replace failing host.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 210.001**
- System Backup in progress.
* - Entity Instance
- host=controller
* - Degrade Affecting Severity:
- None
* - Severity:
- m\*
* - Proposed Repair Action
- No action required.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 250.001**
- <hostname> Configuration is out-of-date.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Administratively lock and unlock <hostname> to update config.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 250.003**
- Kubernetes certificates rotation failed on host <hostname>.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M/w
* - Proposed Repair Action
- Rotate kubernetes certificates manually.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 270.001**
- Host <host\_name> compute services failure\[, reason = <reason\_text>\]
* - Entity Instance
- host=<host\_name>.services=compute
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for host services recovery to complete; if problem persists contact
next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 280.001**
- <subcloud> is offline.
* - Entity Instance
- subcloud=<subcloud>
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for subcloud to become online; if problem persists contact next
level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 280.001**
- <subcloud><resource> sync status is out-of-sync.
* - Entity Instance
- \[subcloud=<subcloud>.resource=<compute> \| <network> \| <platform>
\| <volumev2>\]
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- If problem persists contact next level of support.

View File

@ -0,0 +1,120 @@
.. lzz1579291773073
.. _200-series-maintenance-customer-log-messages:
============================================
200 Series Maintenance Customer Log Messages
============================================
The Customer Logs include events that do not require immediate user action.
The following types of events are included in the Customer Logs. The severity
of the events is represented in the table by one or more letters, as follows:
.. _200-series-maintenance-customer-log-messages-ul-jsd-jkg-vp:
- C: Critical
- M: Major
- m: Minor
- W: Warning
- NA: Not applicable
.. _200-series-maintenance-customer-log-messages-table-zgf-jvw-v5:
.. table:: Table 1. Customer Log Messages
:widths: auto
+-----------------+------------------------------------------------------------------+----------+
| Log ID | Description | Severity |
+ +------------------------------------------------------------------+----------+
| | Entity Instance ID | |
+=================+==================================================================+==========+
| 200.020 | <hostname> has been 'discovered' on the network | NA |
| | | |
| | host=<hostname>.event=discovered | |
+-----------------+------------------------------------------------------------------+----------+
| 200.020 | <hostname> has been 'added' to the system | NA |
| | | |
| | host=<hostname>.event=add | |
+-----------------+------------------------------------------------------------------+----------+
| 200.020 | <hostname> has 'entered' multi-node failure avoidance | NA |
| | | |
| | host=<hostname>.event=mnfa\_enter | |
+-----------------+------------------------------------------------------------------+----------+
| 200.020 | <hostname> has 'exited' multi-node failure avoidance | NA |
| | | |
| | host=<hostname>.event=mnfa\_exit | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> board management controller has been 'provisioned' | NA |
| | | |
| | host=<hostname>.command=provision | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> board management controller has been 're-provisioned' | NA |
| | | |
| | host=<hostname>.command=reprovision | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> board management controller has been 'de-provisioned' | NA |
| | | |
| | host=<hostname>.command=deprovision | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'unlock' request | NA |
| | | |
| | host=<hostname>.command=unlock | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'reboot' request | NA |
| | | |
| | host=<hostname>.command=reboot | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'reset' request | NA |
| | | |
| | host=<hostname>.command=reset | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'power-off' request | NA |
| | | |
| | host=<hostname>.command=power-off | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'power-on' request | NA |
| | | |
| | host=<hostname>.command=power-on | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'reinstall' request | NA |
| | | |
| | host=<hostname>.command=reinstall | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'force-lock' request | NA |
| | | |
| | host=<hostname>.command=force-lock | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'delete' request | NA |
| | | |
| | host=<hostname>.command=delete | |
+-----------------+------------------------------------------------------------------+----------+
| 200.021 | <hostname> manual 'controller switchover' request | NA |
| | | |
| | host=<hostname>.command=swact | |
+-----------------+------------------------------------------------------------------+----------+
| 200.022 | <hostname> is now 'disabled' | NA |
| | | |
| | host=<hostname>.state=disabled | |
+-----------------+------------------------------------------------------------------+----------+
| 200.022 | <hostname> is now 'enabled' | NA |
| | | |
| | host=<hostname>.state=enabled | |
+-----------------+------------------------------------------------------------------+----------+
| 200.022 | <hostname> is now 'online' | NA |
| | | |
| | host=<hostname>.status=online | |
+-----------------+------------------------------------------------------------------+----------+
| 200.022 | <hostname> is now 'offline' | NA |
| | | |
| | host=<hostname>.status=offline | |
+-----------------+------------------------------------------------------------------+----------+
| 200.022 | <hostname> is 'disabled-failed' to the system | NA |
| | | |
| | host=<hostname>.status=failed | |
+-----------------+------------------------------------------------------------------+----------+

View File

@ -0,0 +1,53 @@
.. zwe1579701930425
.. _300-series-alarm-messages:
=========================
300 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the
overall health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _300-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.001**
- 'Data' Port failed.
* - Entity Instance
- host=<hostname>.port=<port-uuid>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.002**
- 'Data' Interface degraded.
or
'Data' Interface failed.
* - Entity Instance
- host=<hostname>.interface=<if-uuid>
* - Degrade Affecting Severity:
- Critical
* - Severity:
- C/M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.

View File

@ -0,0 +1,69 @@
.. ots1579702138430
.. _400-series-alarm-messages:
=========================
400 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _400-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.003**
- License key is not installed; a valid license key is required for
operation.
or
License key has expired or is invalid; a valid license key is required
for operation.
or
Evaluation license key will expire on <date>; there are <num\_days> days
remaining in this evaluation.
or
Evaluation license key will expire on <date>; there is only 1 day
remaining in this evaluation.
* - Entity Instance:
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- Contact next level of support to obtain a new license key.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.003**
- Communication failure detected with peer over port <linux-ifname>.
or
Communication failure detected with peer over port <linux-ifname>
within the last 30 seconds.
* - Entity Instance:
- host=<hostname>.network=<mgmt \| oam \| cluster-host>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent
equipment.

View File

@ -0,0 +1,81 @@
.. pgb1579292662158
.. _400-series-customer-log-messages:
================================
400 Series Customer Log Messages
================================
The Customer Logs include events that do not require immediate user action.
The following types of events are included in the Customer Logs. The severity
of the events is represented in the table by one or more letters, as follows:
.. _400-series-customer-log-messages-ul-jsd-jkg-vp:
- C: Critical
- M: Major
- m: Minor
- W: Warning
- NA: Not applicable
.. _400-series-customer-log-messages-table-zgf-jvw-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.003**
- License key has expired or is invalid
or
Evaluation license key will expire on <date>
or
License key is valid
* - Entity Instance
- host=<host\_name>
* - Severity:
- C
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.005**
- Communication failure detected with peer over port <port> on host
<host name>
or
Communication failure detected with peer over port <port> on host
<host name> within the last <X> seconds
or
Communication established with peer over port <port> on host <host name>
* - Entity Instance
- host=<host\_name>.network=<network>
* - Severity:
- C
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.007**
- Swact or swact-force
* - Entity Instance
- host=<host\_name>
* - Severity:
- C

View File

@ -0,0 +1,49 @@
.. xpx1579702157578
.. _500-series-alarm-messages:
=========================
500 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _500-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 500.100**
- TPM initialization failed on host.
* - Entity Instance
- tenant=<tenant-uuid>
* - Degrade Affecting Severity:
- None
* - Severity:
- M
* - Proposed Repair Action
- Reinstall HTTPS certificate; if problem persists contact next level of
support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 500.101**
- Developer patch certificate enabled.
* - Entity Instance
- host=controller
* - Degrade Affecting Severity:
- None
* - Severity:
- M
* - Proposed Repair Action
- Reinstall system to disable developer certificate and remove untrusted
patches.

View File

@ -0,0 +1,118 @@
.. cta1579702173704
.. _750-series-alarm-messages:
=========================
750 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _750-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 750.001**
- Application upload failure.
* - Entity Instance
- k8s\_application=<appname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W
* - Proposed Repair Action
- Check the system inventory log for the cause.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 750.002**
- Application apply failure.
* - Entity Instance
- k8s\_application=<appname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M
* - Proposed Repair Action
- Retry applying the application. If the issue persists, please check the
system inventory log for cause.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 750.003**
- Application remove failure.
* - Entity Instance
- k8s\_application=<appname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M
* - Proposed Repair Action
- Retry removing the application. If the issue persists, please the check
system inventory log for cause.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 750.004**
- Application apply in progress.
* - Entity Instance
- k8s\_application=<appname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W
* - Proposed Repair Action
- No action is required.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 750.005**
- Application update in progress.
* - Entity Instance
- k8s\_application=<appname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W
* - Proposed Repair Action
- No action is required.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 750.006**
- Automatic application re-apply is pending.
* - Entity Instance
- k8s\_application=<appname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W
* - Proposed Repair Action
- Ensure all hosts are either locked or unlocked. When the system is
stable the application will automatically be reapplied.

View File

@ -0,0 +1,152 @@
.. rww1579702317136
.. _800-series-alarm-messages:
=========================
800 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _800-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.001**
- Storage Alarm Condition:
1 mons down, quorum 1,2 controller-1,storage-0
* - Entity Instance
- cluster=<dist-fs-uuid>
* - Degrade Affecting Severity:
- None
* - Severity:
- C/M\*
* - Proposed Repair Action
- If problem persists, contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.003**
- Storage Alarm Condition: Quota/Space mismatch for the <tiername> tier.
The sum of Ceph pool quotas does not match the tier size.
* - Entity Instance
- cluster=<dist-fs-uuid>.tier=<tiername>
* - Degrade Affecting Severity:
- None
* - Severity:
- m
* - Proposed Repair Action
- Update ceph storage pool quotas to use all available tier space.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.010**
- Potential data loss. No available OSDs in storage replication group.
* - Entity Instance
- cluster=<dist-fs-uuid>.peergroup=<group-x>
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- Ensure storage hosts from replication group are unlocked and available.
Check if OSDs of each storage host are up and running. If problem
persists contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.011**
- Loss of replication in peergroup.
* - Entity Instance
- cluster=<dist-fs-uuid>.peergroup=<group-x>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Ensure storage hosts from replication group are unlocked and available.
Check if OSDs of each storage host are up and running. If problem
persists contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.102**
- Storage Alarm Condition:
PV configuration <error/failed to apply\> on <hostname>.
Reason: <detailed reason\>.
* - Entity Instance
- pv=<pv\_uuid>
* - Degrade Affecting Severity:
- None
* - Severity:
- C/M\*
* - Proposed Repair Action
- Remove failed PV and associated Storage Device then recreate them.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.103**
- Storage Alarm Condition:
\[ Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded
threshold and automatic extension failed
Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded
threshold \]; threshold x%, actual y%.
* - Entity Instance
- <hostname>.lvmthinpool=<VG name>/<Pool name>
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- Increase Storage Space Allotment for Cinder on the 'lvm' backend.
Consult the user documentation for more details. If problem persists,
contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.104**
- Storage Alarm Condition:
<storage-backend-name> configuration failed to apply on host: <host-uuid>.
* - Degrade Affecting Severity:
- None
* - Severity:
- C\*
* - Proposed Repair Action
- Update backend setting to reapply configuration. Consult the user
documentation for more details. If problem persists, contact next level
of support.

View File

@ -0,0 +1,260 @@
.. pti1579702342696
.. _900-series-alarm-messages:
=========================
900 Series Alarm Messages
=========================
The system inventory and maintenance service reports system changes with
different degrees of severity. Use the reported alarms to monitor the overall
health of the system.
.. include:: ../_includes/x00-series-alarm-messages.rest
.. _900-series-alarm-messages-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.001**
- Patching operation in progress.
* - Entity Instance
- host=controller
* - Degrade Affecting Severity:
- None
* - Severity:
- m\*
* - Proposed Repair Action
- Complete reboots of affected hosts.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.002**
- Obsolete patch in system.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- W\*
* - Proposed Repair Action
- Remove and delete obsolete patches.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.003**
- Patch host install failure.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Undo patching operation.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.004**
- Host version mismatch.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Reinstall host to update applied load.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.005**
- System Upgrade in progress.
* - Entity Instance
- host=controller
* - Degrade Affecting Severity:
- None
* - Severity:
- m\*
* - Proposed Repair Action
- No action required.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.101**
- Software update auto-apply in progress.
* - Entity Instance
- sw-update
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Wait for software update auto-apply to complete; if problem persists
contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.102**
- Software update auto-apply aborting.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Wait for software update auto-apply abort to complete; if problem
persists contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.103**
- Software update auto-apply failed.
* - Entity Instance
- host=<hostname>
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Attempt to apply software updates manually; if problem persists contact
next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.201**
- Software upgrade auto-apply in progress.
* - Entity Instance
- orchestration=sw-upgrade
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Wait for software upgrade auto-apply to complete; if problem persists
contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.202**
- Software upgrade auto-apply aborting
* - Entity Instance
- orchestration=sw-upgrade
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Wait for software upgrade auto-apply abort to complete; if problem
persists contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.203**
- Software update auto-apply failed.
* - Entity Instance
- orchestration=sw-upgrade
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Attempt to apply software upgrade manually; if problem persists contact
next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.301**
- Firmware Update auto-apply in progress.
* - Entity Instance
- orchestration=fw-upgrade
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Wait for firmware update auto-apply to complete; if problem persists
contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.302**
- Firmware Update auto-apply aborting.
* - Entity Instance
- orchestration=fw-upgrade
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Wait for firmware update auto-apply abort to complete; if problem
persists contact next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 900.303**
- Firmware Update auto-apply failed.
* - Entity Instance
- orchestration=fw-upgrade
* - Degrade Affecting Severity:
- None
* - Severity:
- M\*
* - Proposed Repair Action
- Attempt to apply firmware update manually; if problem persists
contact next level of support.

View File

@ -0,0 +1,168 @@
.. bdq1579700719122
.. _900-series-orchestration-customer-log-messages:
==============================================
900 Series Orchestration Customer Log Messages
==============================================
The Customer Logs include events that do not require immediate user action.
The following types of events are included in the Customer Logs. The severity
of the events is represented in the table by one or more letters, as follows:
.. _900-series-orchestration-customer-log-messages-ul-jsd-jkg-vp:
- C: Critical
- M: Major
- m: Minor
- W: Warning
- NA: Not applicable
.. _900-series-orchestration-customer-log-messages-table-zgf-jvw-v5:
.. table:: Table 1. Customer Log Messages
:widths: auto
+-------------------+--------------------------------------------+----------+
| Log ID | Description | Severity |
+ +--------------------------------------------+----------+
| | Entity Instance ID |
+===================+============================================+==========+
| 900.111 | Software update auto-apply start | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.112 | Software update auto-apply inprogress | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.113 | Software update auto-apply rejected | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.114 | Software update auto-apply canceled | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.115 | Software update auto-apply failed | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.116 | Software update auto-apply completed | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.117 | Software update auto-apply abort | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.118 | Software update auto-apply aborting | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.119 | Software update auto-apply abort rejected | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.120 | Software update auto-apply abort failed | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.121 | Software update auto-apply aborted | C |
| | | |
| | orchestration=sw-update | |
+-------------------+--------------------------------------------+----------+
| 900.211 | Software upgrade auto-apply start | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.212 | Software upgrade auto-apply inprogress | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.213 | Software upgrade auto-apply rejected | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.214 | Software upgrade auto-apply canceled | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.215 | Software upgrade auto-apply failed | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.216 | Software upgrade auto-apply completed | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.217 | Software upgrade auto-apply abort | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.218 | Software upgrade auto-apply aborting | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.219 | Software upgrade auto-apply abort rejected | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.220 | Software upgrade auto-apply abort failed | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.221 | Software upgrade auto-apply aborted | C |
| | | |
| | orchestration=sw-upgrade | |
+-------------------+--------------------------------------------+----------+
| 900.311 | Firmware update auto-apply | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.312 | Firmware update auto-apply in progress | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.313 | Firmware update auto-apply rejected | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.314 | Firmware update auto-apply canceled | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.315 | Firmware update auto-apply failed | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.316 | Firmware update auto-apply completed | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.317 | Firmware update auto-apply aborted | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.318 | Firmware update auto-apply aborting | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.319 | Firmware update auto-apply abort rejected | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.320 | Firmware update auto-apply abort failed | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+
| 900.321 | Firmware update auto-apply aborted | C |
| | | |
| | orchestration=fw-update | |
+-------------------+--------------------------------------------+----------+

View File

@ -0,0 +1,111 @@
.. xti1552680491532
.. _adding-an-snmp-community-string-using-the-cli:
==========================================
Add an SNMP Community String Using the CLI
==========================================
To enable SNMP services you need to define one or more SNMP community strings
using the command line interface.
.. rubric:: |context|
No default community strings are defined on |prod| after the initial
commissioning of the cluster. This means that no SNMP operations are enabled
by default.
The following exercise illustrates the system commands available to manage and
query SNMP community strings. It uses the string **commstr1** as an example.
.. caution::
For security, do not use the string **public**, or other community strings
that could easily be guessed.
.. rubric:: |prereq|
All commands must be executed on the active controller's console, which can be
accessed using the OAM floating IP address. You must acquire Keystone **admin**
credentials in order to execute the commands.
.. rubric:: |proc|
#. Add the SNMP community string commstr1 to the system.
.. code-block:: none
~(keystone_admin)$ system snmp-comm-add -c commstr1
+-----------+--------------------------------------+
| Property | Value |
+-----------+--------------------------------------+
| access | ro |
| uuid | eccf5729-e400-4305-82e2-bdf344eb868d |
| community | commstr1 |
| view | .1 |
+-----------+--------------------------------------+
The following are attributes associated with the new community string:
**access**
The SNMP access type. In |prod| all community strings provide read-only
access.
**uuid**
The UUID associated with the community string.
**community**
The community string value.
**view**
The is always the full MIB tree.
#. List available community strings.
.. code-block:: none
~(keystone_admin)$ system snmp-comm-list
+----------------+--------------------+--------+
| SNMP community | View | Access |
+----------------+--------------------+--------+
| commstr1 | .1 | ro |
+----------------+--------------------+--------+
#. Query details of a specific community string.
.. code-block:: none
~(keystone_admin)$ system snmp-comm-show commstr1
+------------+--------------------------------------+
| Property | Value |
+------------+--------------------------------------+
| access | ro |
| created_at | 2014-08-14T21:12:10.037637+00:00 |
| uuid | eccf5729-e400-4305-82e2-bdf344eb868d |
| community | commstr1 |
| view | .1 |
+------------+--------------------------------------+
#. Delete a community string.
.. code-block:: none
~(keystone_admin)$ system snmp-comm-delete commstr1
Deleted community commstr1
.. rubric:: |result|
Community strings in |prod| provide query access to any SNMP monitor
workstation that can reach the controller's OAM address on UDP port 161.
You can verify SNMP access using any monitor tool. For example, the freely
available command :command:`snmpwalk` can be issued from any host to list
the state of all SNMP Object Identifiers \(OID\):
.. code-block:: none
$ snmpwalk -v 2c -c commstr1 10.10.10.100 > oids.txt
In this example, 10.10.10.100 is the |prod| OAM floating IP address. The output,
which is a large file, is redirected to the file oids.txt.

View File

@ -0,0 +1,61 @@
.. idb1552680603462
.. _cli-commands-and-paged-output:
=============================
CLI Commands and Paged Output
=============================
There are some CLI commands that perform paging, and you can use options to
limit the paging or to disable it, which is useful for scripts.
CLI fault management commands that perform paging include:
.. _cli-commands-and-paged-output-ul-wjz-y4q-bw:
- :command:`fm event-list`
- :command:`fm event-suppress`
- :command:`fm event-suppress-list`
- :command:`fm event-unsuppress`
- :command:`fm event-unsuppress-all`
To turn paging off, use the --nopaging option for the above commands. The
--nopaging option is useful for bash script writers.
.. _cli-commands-and-paged-output-section-N10074-N1001C-N10001:
--------
Examples
--------
The following examples demonstrate the resulting behavior from the use and
non-use of the paging options.
This produces a paged list of events.
.. code-block:: none
~(keystone_admin)$ fm event-list
This produces a list of events without paging.
.. code-block:: none
~(keystone_admin)$ fm event-list --nopaging
This produces a paged list of 50 events.
.. code-block:: none
~(keystone_admin)$ fm event-list --limit 50
This will produce a list of 50 events without paging.
.. code-block:: none
~(keystone_admin)$ fm event-list --limit 50 --nopaging

View File

@ -0,0 +1,89 @@
.. sjb1552680530874
.. _configuring-snmp-trap-destinations:
================================
Configure SNMP Trap Destinations
================================
SNMP trap destinations are hosts configured in |prod| to receive unsolicited
SNMP notifications.
.. rubric:: |context|
Destination hosts are specified by IP address, or by host name if it can be
properly resolved by |prod|. Notifications are sent to the hosts using a
designated community string so that they can be validated.
.. rubric:: |proc|
#. Configure IP address 10.10.10.1 to receive SNMP notifications using the
community string commstr1.
.. code-block:: none
~(keystone_admin)$ system snmp-trapdest-add -c commstr1 --ip_address 10.10.10.1
+------------+--------------------------------------+
| Property | Value |
+------------+--------------------------------------+
| uuid | c7b6774e-7f45-40f5-bcca-3668de2a186f |
| ip_address | 10.10.10.1 |
| community | commstr1 |
| type | snmpv2c_trap |
| port | 162 |
| transport | udp |
+------------+--------------------------------------+
The following are attributes associated with the new community string:
**uuid**
The UUID associated with the trap destination object.
**ip\_address**
The trap destination IP address.
**community**
The community string value to be associated with the notifications.
**type**
snmpv2c\_trap, the only supported message type for SNMP traps.
**port**
The destination UDP port that SNMP notifications are sent to.
**transport**
The transport protocol used to send notifications.
#. List defined trap destinations.
.. code-block:: none
~(keystone_admin)$ system snmp-trapdest-list
+------------+----------------+------+--------------+-----------+
| IP Address | SNMP Community | Port | Type | Transport |
+-------------+----------------+------+--------------+-----------+
| 10.10.10.1 | commstr1 | 162 | snmpv2c_trap | udp |
+-------------+----------------+------+--------------+-----------+
#. Query access details of a specific trap destination.
.. code-block:: none
~(keystone_admin)$ system snmp-trapdest-show 10.10.10.1
+------------+--------------------------------------+
| Property | Value |
+------------+--------------------------------------+
| uuid | c7b6774e-7f45-40f5-bcca-3668de2a186f |
| ip_address | 10.10.10.1 |
| community | commstr1 |
| type | snmpv2c_trap |
| port | 162 |
| transport | udp |
+------------+--------------------------------------+
#. Disable the sending of SNMP notifications to a specific IP address.
.. code-block:: none
~(keystone_admin)$ system snmp-trapdest-delete 10.10.10.1
Deleted ip 10.10.10.1

View File

@ -0,0 +1,34 @@
.. cpy1552680695138
.. _deleting-an-alarm-using-the-cli:
=============================
Delete an Alarm Using the CLI
=============================
You can manually delete an alarm that is not automatically cleared by the
system.
.. rubric:: |context|
Manually deleting an alarm should not be done unless it is absolutely
clear that there is no reason for the alarm to be active.
You can use the command :command:`fm alarm-delete` to manually delete an alarm
that remains active/set for no apparent reason, which may happen in rare
conditions. Alarms usually clear automatically when the related trigger or
fault condition is corrected.
.. rubric:: |proc|
.. _deleting-an-alarm-using-the-cli-steps-clp-fzw-nkb:
- To delete an alarm, use the :command:`fm alarm-delete` command.
For example:
.. code-block:: none
~(keystone_admin)$ fm alarm-delete 4ab5698a-19cb-4c17-bd63-302173fef62c
Substitute the UUID of the alarm you wish to delete.

View File

@ -0,0 +1,26 @@
.. nat1580220934509
.. _enabling-snmp-support:
===================
Enable SNMP Support
===================
SNMP support must be enabled before you can begin using it to monitor a system.
.. rubric:: |context|
In order to have a workable SNMP configuration you must use the command line
interface on the active controller to complete the following steps.
.. rubric:: |proc|
#. Define at least one SNMP community string.
See |fault-doc|: :ref:`Adding an SNMP Community String Using the CLI <adding-an-snmp-community-string-using-the-cli>` for details.
#. Configure at least one SNMP trap destination.
This will allow alarms and logs to be reported as they happen.
For more information, see :ref:`Configuring SNMP Trap Destinations <configuring-snmp-trap-destinations>`.

View File

@ -0,0 +1,33 @@
.. pmt1552680681730
.. _events-suppression-overview:
===========================
Events Suppression Overview
===========================
All alarms are unsuppressed by default. A suppressed alarm is excluded from the
Active Alarm and Events displays by setting the **Suppression Status** filter,
on the Horizon Web interface, the CLI, or REST APIs, and will not be included
in the Active Alarm Counts.
.. warning::
Suppressing an alarm will result in the system NOT notifying the operator
of this particular fault.
The Events Suppression page, available from **Admin** \> **Fault Management**
\> **Events Suppression** in the left-hand pane, provides the suppression
status of each event type and functionality for suppressing or unsuppressing
each event type.
As shown below, the Events Suppression page lists each event type by ID, and
provides a description of the event and a current status indicator. Each event
can be suppressed using the **Suppress Event** button.
You can sort events by clicking the **Event ID**, **Description**, and
**Status** column headers. You can also use these as filtering criteria
from the **Search** field.
.. figure:: figures/uty1463514747661.png
:scale: 70 %
:alt: Event Suppression

View File

@ -0,0 +1,69 @@
.. yrq1552337051689
.. _fault-management-overview:
=========================
Fault Management Overview
=========================
An admin user can view |prod-long| fault management alarms and logs in order
to monitor and respond to fault conditions.
See :ref:`Alarm Messages <100-series-alarm-messages>` for the list of
alarms and :ref:`Customer Log Messages
<200-series-maintenance-customer-log-messages>`
for the list of customer logs reported by |prod|.
You can access active and historical alarms, and customer logs using the CLI,
GUI, REST APIs and SNMP.
To use the CLI, see
:ref:`Viewing Active Alarms Using the CLI
<viewing-active-alarms-using-the-cli>`
and :ref:`Viewing the Event Log Using the CLI
<viewing-the-event-log-using-the-cli>`.
Using the GUI, you can obtain fault management information in a number of
places.
.. _fault-management-overview-ul-nqw-hbp-mx:
- The Fault Management pages, available from
**Admin** \> **Fault Management** in the left-hand pane, provide access to
the following:
- The Global Alarm Banner in the page header of all screens provides the
active alarm counts for all alarm severities, see
:ref:`The Global Alarm Banner <the-global-alarm-banner>`.
- **Admin** \> **Fault Management** \> **Active Alarms**—Alarms that are
currently set, and require user action to clear them. For more
information about active alarms, see
:ref:`Viewing Active Alarms Using the CLI
<viewing-active-alarms-using-the-cli>`
and :ref:`Deleting an Alarm Using the CLI
<deleting-an-alarm-using-the-cli>`.
- **Admin** \> **Fault Management** \> **Events**—The event log
consolidates historical alarms that have occurred in the past, that
is, both set and clear events of active alarms, as well as customer
logs.
For more about the event log, which includes historical alarms and
customer logs, see
:ref:`Viewing the Event Log Using Horizon
<viewing-the-event-log-using-horizon>`.
- **Admin** \> **Fault Management** \> **Events Suppression**—Individual
events can be put into a suppressed state or an unsuppressed state. A
suppressed alarm is excluded from the Active Alarm and Events displays.
All alarms are unsuppressed by default. An event can be suppressed or
unsuppressed using the Horizon Web interface, the CLI, or REST APIs.
- The Data Network Topology view provides real-time alarm information for
data networks and associated worker hosts and data/pci-passthru/pci-sriov
interfaces.
.. xreflink For more information, see |datanet-doc|: :ref:`The Data Network Topology View <the-data-network-topology-view>`.
To use SNMP, see :ref:`SNMP Overview <snmp-overview>`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

View File

@ -0,0 +1,71 @@
============================
|prod-long| Fault Management
============================
- Fault Management Overview
- :ref:`Fault Management Overview <fault-management-overview>`
- The Global Alarm Banner
- :ref:`The Global Alarm Banner <the-global-alarm-banner>`
- Viewing Active Alarms
- :ref:`Viewing Active Alarms Using Horizon <viewing-active-alarms-using-horizon>`
- :ref:`Viewing Active Alarms Using the CLI <viewing-active-alarms-using-the-cli>`
- :ref:`Viewing Alarm Details Using the CLI <viewing-alarm-details-using-the-cli>`
- Viewing the Event Log
- :ref:`Viewing the Event Log Using Horizon <viewing-the-event-log-using-horizon>`
- :ref:`Viewing the Event Log Using the CLI <viewing-the-event-log-using-the-cli>`
- Deleting an Alarm
- :ref:`Deleting an Alarm Using the CLI <deleting-an-alarm-using-the-cli>`
- Events Suppression
- :ref:`Events Suppression Overview <events-suppression-overview>`
- :ref:`Suppressing and Unsuppressing Events <suppressing-and-unsuppressing-events>`
- :ref:`Viewing Suppressed Alarms Using the CLI <viewing-suppressed-alarms-using-the-cli>`
- :ref:`Suppressing an Alarm Using the CLI <suppressing-an-alarm-using-the-cli>`
- :ref:`Unsuppressing an Alarm Using the CLI <unsuppressing-an-alarm-using-the-cli>`
- CLI Commands and Paged Output
- :ref:`CLI Commands and Paged Output <cli-commands-and-paged-output>`
- SNMP
- :ref:`SNMP Overview <snmp-overview>`
- :ref:`Enabling SNMP Support <enabling-snmp-support>`
- :ref:`Traps <traps>`
- :ref:`Configuring SNMP Trap Destinations <configuring-snmp-trap-destinations>`
- :ref:`SNMP Event Table <snmp-event-table>`
- :ref:`Adding an SNMP Community String Using the CLI <adding-an-snmp-community-string-using-the-cli>`
- :ref:`Setting SNMP Identifying Information <setting-snmp-identifying-information>`
- :ref:`Troubleshooting Log Collection <troubleshooting-log-collection>`
- Cloud Platform Alarm Messages
- :ref:`Alarm Messages Overview <alarm-messages-overview>`
- :ref:`100 Series Alarm Messages <100-series-alarm-messages>`
- :ref:`200 Series Alarm Messages <200-series-alarm-messages>`
- :ref:`300 Series Alarm Messages <300-series-alarm-messages>`
- :ref:`400 Series Alarm Messages <400-series-alarm-messages>`
- :ref:`500 Series Alarm Messages <500-series-alarm-messages>`
- :ref:`750 Series Alarm Messages <750-series-alarm-messages>`
- :ref:`800 Series Alarm Messages <800-series-alarm-messages>`
- :ref:`900 Series Alarm Messages <900-series-alarm-messages>`
- Cloud Platform Customer Log Messages
- :ref:`200 Series Maintenance Customer Log Messages <200-series-maintenance-customer-log-messages>`
- :ref:`400 Series Customer Log Messages <400-series-customer-log-messages>`
- :ref:`900 Series Orchestration Customer Log Messages <900-series-orchestration-customer-log-messages>`

161
doc/source/fault/index.rst Normal file
View File

@ -0,0 +1,161 @@
.. Fault Management file, created by
sphinx-quickstart on Thu Sep 3 15:14:59 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
================
Fault Management
================
--------------------
StarlingX Kubernetes
--------------------
.. toctree::
:maxdepth: 1
fault-management-overview
*****************
The global banner
*****************
.. toctree::
:maxdepth: 1
the-global-alarm-banner
*********************
Viewing active alarms
*********************
.. toctree::
:maxdepth: 1
viewing-active-alarms-using-horizon
viewing-active-alarms-using-the-cli
viewing-alarm-details-using-the-cli
*********************
Viewing the event log
*********************
.. toctree::
:maxdepth: 1
viewing-the-event-log-using-horizon
viewing-the-event-log-using-the-cli
*****************
Deleting an alarm
*****************
.. toctree::
:maxdepth: 1
deleting-an-alarm-using-the-cli
*****************
Event suppression
*****************
.. toctree::
:maxdepth: 1
events-suppression-overview
suppressing-and-unsuppressing-events
viewing-suppressed-alarms-using-the-cli
suppressing-an-alarm-using-the-cli
unsuppressing-an-alarm-using-the-cli
*****************************
CLI commands and paged output
*****************************
.. toctree::
:maxdepth: 1
cli-commands-and-paged-output
****
SNMP
****
.. toctree::
:maxdepth: 1
snmp-overview
enabling-snmp-support
traps
configuring-snmp-trap-destinations
snmp-event-table
adding-an-snmp-community-string-using-the-cli
setting-snmp-identifying-information
******************************
Troubleshooting log collection
******************************
.. toctree::
:maxdepth: 1
troubleshooting-log-collection
**************
Alarm messages
**************
.. toctree::
:maxdepth: 1
100-series-alarm-messages
200-series-alarm-messages
300-series-alarm-messages
400-series-alarm-messages
500-series-alarm-messages
750-series-alarm-messages
800-series-alarm-messages
900-series-alarm-messages
************
Log messages
************
.. toctree::
:maxdepth: 1
200-series-maintenance-customer-log-messages
400-series-customer-log-messages
900-series-orchestration-customer-log-messages
-------------------
StarlingX OpenStack
-------------------
.. toctree::
:maxdepth: 1
openstack-fault-management-overview
************************
OpenStack alarm messages
************************
.. toctree::
:maxdepth: 1
openstack-alarm-messages-300s
openstack-alarm-messages-400s
openstack-alarm-messages-700s
openstack-alarm-messages-800s
*******************************
OpenStack customer log messages
*******************************
.. toctree::
:maxdepth: 1
openstack-customer-log-messages-270s-virtual-machines
openstack-customer-log-messages-401s-services
openstack-customer-log-messages-700s-virtual-machines

View File

@ -0,0 +1,135 @@
.. slf1579788051430
.. _alarm-messages-300s:
=====================
Alarm Messages - 300s
=====================
.. include:: ../_includes/openstack-alarm-messages-xxxs.rest
.. _alarm-messages-300s-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.003**
- Networking Agent not responding.
* - Entity Instance
- host=<hostname>.agent=<agent-uuid>
* - Severity:
- M\*
* - Proposed Repair Action
- If condition persists, attempt to clear issue by administratively locking and unlocking the Host.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.004**
- No enabled compute host with connectivity to provider network.
* - Entity Instance
- host=<hostname>.providernet=<pnet-uuid>
* - Severity:
- M\*
* - Proposed Repair Action
- Enable compute hosts with required provider network connectivity.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.005**
- Communication failure detected over provider network x% for ranges y% on host z%.
or
Communication failure detected over provider network x% on host z%.
* - Entity Instance
- providernet=<pnet-uuid>.host=<hostname>
* - Severity:
- M\*
* - Proposed Repair Action
- Check neighbor switch port VLAN assignments.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.010**
- ML2 Driver Agent non-reachable
or
ML2 Driver Agent reachable but non-responsive
or
ML2 Driver Agent authentication failure
or
ML2 Driver Agent is unable to sync Neutron database
* - Entity Instance
- host=<hostname>.ml2driver=<driver>
* - Severity:
- M\*
* - Proposed Repair Action
- Monitor and if condition persists, contact next level of support.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.012**
- Openflow Controller connection failed.
* - Entity Instance
- host=<hostname>.openflow-controller=<uri>
* - Severity:
- M\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent equipment.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.013**
- No active Openflow controller connections found for this network.
or
One or more Openflow controller connections in disconnected state for this network.
* - Entity Instance
- host=<hostname>.openflow-network=<name>
* - Severity:
- C, M\*
* - Proposed Repair Action
- host=<hostname>.openflow-network=<name>
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.015**
- No active OVSDB connections found.
* - Entity Instance
- host=<hostname>
* - Severity:
- C\*
* - Proposed Repair Action
- Check cabling and far-end port configuration and status on adjacent equipment.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 300.016**
- Dynamic routing agent x% lost connectivity to peer y%
* - Entity Instance
- host=<hostname>,agent=<agent-uuid>,bgp-peer=<bgp-peer>
* - Severity:
- M\*
* - Proposed Repair Action
- If condition persists, fix connectivity to peer.

View File

@ -0,0 +1,55 @@
.. msm1579788069384
.. _alarm-messages-400s:
=====================
Alarm Messages - 400s
=====================
.. include:: ../_includes/openstack-alarm-messages-xxxs.rest
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.001**
- Service group failure; <list\_of\_affected\_services>.
or
Service group degraded; <list\_of\_affected\_services>
or
Service group Warning; <list\_of\_affected\_services>.
* - Entity Instance
- service\_domain=<domain\_name>.service\_group=<group\_name>.host=<hostname>
* - Severity:
- C/M/m\*
* - Proposed Repair Action
- Contact next level of support.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 400.002**
- Service group loss of redundancy; expected <num> standby member<s> but only <num> standby member<s> available.
or
Service group loss of redundancy; expected <num> standby member<s> but only <num> standby member<s> available.
or
Service group loss of redundancy; expected <num> active member<s> but no active members available.
or
Service group loss of redundancy; expected <num> active member<s> but only <num> active member<s> available.
* - Entity Instance
- service\_domain=<domain\_name>.service\_group=<group\_name>
* - Severity:
- M\*
* - Proposed Repair Action
- Bring a controller node back in to service, otherwise contact next level of support.

View File

@ -0,0 +1,275 @@
.. uxo1579788086872
.. _alarm-messages-700s:
=====================
Alarm Messages - 700s
=====================
.. include:: ../_includes/openstack-alarm-messages-xxxs.rest
.. _alarm-messages-700s-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.001**
- Instance <instance\_name> owned by <tenant\_name> has failed on host
<host\_name>
Instance <instance\_name> owned by <tenant\_name> has failed to
schedule
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- The system will attempt recovery; no repair action required.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.002**
- Instance <instance\_name> owned by <tenant\_name> is paused on host
<host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Unpause the instance.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.003**
- Instance <instance\_name> owned by <tenant\_name> is suspended on host
<host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Resume the instance.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.004**
- Instance <instance\_name> owned by <tenant\_name> is stopped on host
<host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Start the instance.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.005**
- Instance <instance\_name> owned by <tenant\_name> is rebooting on host
<host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for reboot to complete; if problem persists contact next level of
support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.006**
- Instance <instance\_name> owned by <tenant\_name> is rebuilding on host
<host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for rebuild to complete; if problem persists contact next level of
support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.007**
- Instance <instance\_name> owned by <tenant\_name> is evacuating from host
<host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for evacuate to complete; if problem persists contact next level of
support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.008**
- Instance <instance\_name> owned by <tenant\_name> is live migrating from
host <host\_name>
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- W\*
* - Proposed Repair Action
- Wait for live migration to complete; if problem persists contact next
level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.009**
- Instance <instance\_name> owned by <tenant\_name> is cold migrating from
host <host\_name>
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for cold migration to complete; if problem persists contact next
level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.010**
- Instance <instance\_name> owned by <tenant\_name> has been cold-migrated
to host <host\_name> waiting for confirmation.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Confirm or revert cold-migrate of instance.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.011**
- Instance <instance\_name> owned by <tenant\_name> is reverting cold
migrate to host <host\_name>
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for cold migration revert to complete; if problem persists contact
next level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.012**
- Instance <instance\_name> owned by <tenant\_name> is resizing on host
<host\_name>
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for resize to complete; if problem persists contact next level of
support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.013**
- Instance <instance\_name> owned by <tenant\_name> has been resized on
host <host\_name> waiting for confirmation.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Confirm or revert resize of instance.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.014**
- Instance <instance\_name> owned by <tenant\_name> is reverting resize
on host <host\_name>.
* - Entity Instance
- tenant=<tenant-uuid>.instance=<instance-uuid>
* - Severity:
- C\*
* - Proposed Repair Action
- Wait for resize revert to complete; if problem persists contact next
level of support.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.016**
- Multi-Node Recovery Mode
* - Entity Instance
- subsystem=vim
* - Severity:
- m\*
* - Proposed Repair Action
- Wait for the system to exit out of this mode.
-----
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 700.017**
- Server group <server\_group\_name> <policy> policy was not satisfied.
* - Entity Instance
- server-group<server-group-uuid>
* - Severity:
- M
* - Proposed Repair Action
- Migrate instances in an attempt to satisfy the policy; if problem
persists contact next level of support.

View File

@ -0,0 +1,98 @@
.. tsh1579788106505
.. _alarm-messages-800s:
=====================
Alarm Messages - 800s
=====================
.. include:: ../_includes/openstack-alarm-messages-xxxs.rest
.. _alarm-messages-800s-table-zrd-tg5-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.002**
- Image storage media is full: There is not enough disk space on the image storage media.
or
Instance <instance name\> snapshot failed: There is not enough disk space on the image storage media.
or
Supplied <attrs\> \(<supplied\>\) and <attrs\> generated from uploaded image \(<actual\>\) did not match. Setting image status to 'killed'.
or
Error in store configuration. Adding images to store is disabled.
or
Forbidden upload attempt: <exception\>
or
Insufficient permissions on image storage media: <exception\>
or
Denying attempt to upload image larger than <size\> bytes.
or
Denying attempt to upload image because it exceeds the quota: <exception\>
or
Received HTTP error while uploading image <image\_id\>
or
Client disconnected before sending all data to backend
or
Failed to upload image <image\_id\>
* - Entity Instance
- image=<image-uuid>, instance=<instance-uuid>
or
image=<tenant-uuid>, instance=<instance-uuid>
* - Severity:
- W\*
* - Proposed Repair Action
- If problem persists, contact next level of support.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.100**
- Storage Alarm Condition:
Cinder I/O Congestion is above normal range and is building
* - Entity Instance
- cinder\_io\_monitor
* - Severity:
- M
* - Proposed Repair Action
- Reduce the I/O load on the Cinder LVM backend. Use Cinder QoS mechanisms on high usage volumes.
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Alarm ID: 800.101**
- Storage Alarm Condition:
Cinder I/O Congestion is high and impacting guest performance
* - Entity Instance
- cinder\_io\_monitor
* - Severity:
- C\*
* - Proposed Repair Action
- Reduce the I/O load on the Cinder LVM backend. Cinder actions may fail until congestion is reduced. Use Cinder QoS mechanisms on high usage volumes.

View File

@ -0,0 +1,38 @@
.. ftb1579789103703
.. _customer-log-messages-270s-virtual-machines:
=============================================
Customer Log Messages 270s - Virtual Machines
=============================================
.. include:: ../_includes/openstack-customer-log-messages-xxxs.rest
.. _customer-log-messages-270s-virtual-machines-table-zgf-jvw-v5:
.. table:: Table 1. Customer Log Messages - Virtual Machines
:widths: auto
+-----------+----------------------------------------------------------------------------------+----------+
| Log ID | Description | Severity |
+ +----------------------------------------------------------------------------------+----------+
| | Entity Instance ID | |
+===========+==================================================================================+==========+
| 270.101 | Host <host\_name> compute services failure\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+-----------+----------------------------------------------------------------------------------+----------+
| 270.102 | Host <host\_name> compute services enabled | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+-----------+----------------------------------------------------------------------------------+----------+
| 270.103 | Host <host\_name> compute services disabled | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+-----------+----------------------------------------------------------------------------------+----------+
| 275.001 | Host <host\_name> hypervisor is now <administrative\_state>-<operational\_state> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+-----------+----------------------------------------------------------------------------------+----------+
See also :ref:`Customer Log Messages 700s - Virtual Machines <customer-log-messages-700s-virtual-machines>`

View File

@ -0,0 +1,45 @@
.. hwr1579789203684
.. _customer-log-messages-401s-services:
=====================================
Customer Log Messages 401s - Services
=====================================
.. include:: ../_includes/openstack-customer-log-messages-xxxs.rest
.. _customer-log-messages-401s-services-table-zgf-jvw-v5:
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Log Message: 401.001**
- Service group <group> state change from <state> to <state> on host <host\_name>
* - Entity Instance
- service\_domain=<domain>.service\_group=<group>.host=<host\_name>
* - Severity:
- C
.. list-table::
:widths: 6 15
:header-rows: 0
* - **Log Message: 401.002**
- Service group <group> loss of redundancy; expected <X> standby member but no standby members available.
or
Service group <group> loss of redundancy; expected <X> standby member but only <Y> standby member\(s\) available.
or
Service group <group> has no active members available; expected <X> active member\(s\)
or
Service group <group> loss of redundancy; expected <X> active member\(s\) but only <Y> active member\(s\) available.
* - Entity Instance
- service\_domain=<domain>.service\_group=<group>
* - Severity:
- C

View File

@ -0,0 +1,480 @@
.. qfy1579789227230
.. _customer-log-messages-700s-virtual-machines:
=============================================
Customer Log Messages 700s - Virtual Machines
=============================================
.. include:: ../_includes/openstack-customer-log-messages-xxxs.rest
.. _customer-log-messages-700s-virtual-machines-table-zgf-jvw-v5:
.. table:: Table 1. Customer Log Messages
:widths: auto
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Log ID | Description | Severity |
+ +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| | Entity Instance ID | |
+==========+====================================================================================================================================================================================+==========+
| 700.101 | Instance <instance\_name> is enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.102 | Instance <instance\_name> owned by <tenant\_name> has failed\[, reason = <reason\_text>\]. | C |
| | Instance <instance\_name> owned by <tenant\_name> has failed to schedule\[, reason = <reason\_text>\] | |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.103 | Create issued by <tenant\_name\> or by the system against <instance\_name> owned by <tenant\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.104 | Creating instance <instance\_name> owned by <tenant\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.105 | Create rejected for instance <instance\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.106 | Create canceled for instance <instance\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.107 | Create failed for instance <instance\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.108 | Instance <instance\_name> owned by <tenant\_name> has been created | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.109 | Delete issued by <tenant\_name\> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.110 | Deleting instance <instance\_name> owned by <tenant\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.111 | Delete rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.112 | Delete canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.113 | Delete failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.114 | Deleted instance <instance\_name> owned by <tenant\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.115 | Pause issued by <tenant\_name\> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.116 | Pause inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.117 | Pause rejected for instance <instance\_name> enabled on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.118 | Pause canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.119 | Pause failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.120 | Pause complete for instance <instance\_name> now paused on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.121 | Unpause issued by <tenant\_name\> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.122 | Unpause inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.123 | Unpause rejected for instance <instance\_name> paused on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.124 | Unpause canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.125 | Unpause failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.126 | Unpause complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.127 | Suspend issued by <tenant\_name\> or by the system> against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.128 | Suspend inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.129 | Suspend rejected for instance <instance\_name> enabled on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.130 | Suspend canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.131 | Suspend failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.132 | Suspend complete for instance <instance\_name> now suspended on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.133 | Resume issued by <tenant\_name\> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.134 | Resume inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.135 | Resume rejected for instance <instance\_name> suspended on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.136 | Resume canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.137 | Resume failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.138 | Resume complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.139 | Start issued by <tenant\_name\> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.140 | Start inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.141 | Start rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.142 | Start canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.143 | Start failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.144 | Start complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.145 | Stop issued by <tenant\_name>\ or by the system or by the instance against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.146 | Stop inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.147 | Stop rejected for instance <instance\_name> enabled on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.148 | Stop canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.149 | Stop failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.150 | Stop complete for instance <instance\_name> now disabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.151 | Live-Migrate issued by <tenant\_name> or by the system against instance <instance\_name> owned by <tenant\_name> from host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.152 | Live-Migrate inprogress for instance <instance\_name> from host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.153 | Live-Migrate rejected for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.154 | Live-Migrate canceled for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.155 | Live-Migrate failed for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.156 | Live-Migrate complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.157 | Cold-Migrate issued by <tenant\_name> or by the system against instance <instance\_name> owned by <tenant\_name> from host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.158 | Cold-Migrate inprogress for instance <instance\_name> from host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.159 | Cold-Migrate rejected for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.160 | Cold-Migrate canceled for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.161 | Cold-Migrate failed for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.162 | Cold-Migrate complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.163 | Cold-Migrate-Confirm issued by <tenant\_name> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.164 | Cold-Migrate-Confirm inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.165 | Cold-Migrate-Confirm rejected for instance <instance\_name> now enabled on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.166 | Cold-Migrate-Confirm canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.167 | Cold-Migrate-Confirm failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.168 | Cold-Migrate-Confirm complete for instance <instance\_name> enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.169 | Cold-Migrate-Revert issued by <tenant\_name> or by the system\> against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.170 | Cold-Migrate-Revert inprogress for instance <instance\_name> from host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.171 | Cold-Migrate-Revert rejected for instance <instance\_name> now on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.172 | Cold-Migrate-Revert canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.173 | Cold-Migrate-Revert failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.174 | Cold-Migrate-Revert complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.175 | Evacuate issued by <tenant\_name> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.176 | Evacuating instance <instance\_name> owned by <tenant\_name> from host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.177 | Evacuate rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.178 | Evacuate canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.179 | Evacuate failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.180 | Evacuate complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.181 | Reboot <\(soft-reboot\) or \(hard-reboot\)> issued by <tenant\_name> or by the system or by the instance against instance <instance\_name> owned by | C |
| | <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.182 | Reboot inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.183 | Reboot rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.184 | Reboot canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.185 | Reboot failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.186 | Reboot complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.187 | Rebuild issued by <tenant\_name> or by the system against instance <instance\_name> using image <image\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.188 | Rebuild inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.189 | Rebuild rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.190 | Rebuild canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.191 | Rebuild failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.192 | Rebuild complete for instance <instance\_name> now enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.193 | Resize issued by <tenant\_name\> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.194 | Resize inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.195 | Resize rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.196 | Resize canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.197 | Resize failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.198 | Resize complete for instance <instance\_name> enabled on host <host\_name> waiting for confirmation | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.199 | Resize-Confirm issued by <tenant\_name> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.200 | Resize-Confirm inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.201 | Resize-Confirm rejected for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.202 | Resize-Confirm canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.203 | Resize-Confirm failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.204 | Resize-Confirm complete for instance <instance\_name> enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.205 | Resize-Revert issued by <tenant\_name> or by the system against instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.206 | Resize-Revert inprogress for instance <instance\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.207 | Resize-Revert rejected for instance <instance\_name> owned by <tenant\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.208 | Resize-Revert canceled for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.209 | Resize-Revert failed for instance <instance\_name> on host <host\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.210 | Resize-Revert complete for instance <instance\_name> enabled on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.214 | Instance <instance\_name> has been renamed to <new\_instance\_name> owned by <tenant\_name> on host <host\_name> | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.215 | Guest Health Check failed for instance <instance\_name>\[, reason = <reason\_text>\] | C |
| | | |
| | tenant=<tenant-uuid>.instance=<instance-uuid> | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.216 | Entered Multi-Node Recovery Mode | C |
| | | |
| | subsystem-vim | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 700.217 | Exited Multi-Node Recovery Mode | C |
| | | |
| | subsystem-vim | |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
See also :ref:`Customer Log Messages 270s - Virtual Machines <customer-log-messages-270s-virtual-machines>`

View File

@ -0,0 +1,19 @@
.. ekn1458933172232
.. _openstack-fault-management-overview:
========
Overview
========
|prod-os| is a containerized application running on top of |prod|.
All Fault Management related interfaces for displaying alarms and logs,
suppressing/unsuppressing events, enabling SNMP and enabling remote log
collection are available on the |prod| REST APIs, CLIs and/or GUIs.
.. xreflink See :ref:`Fault Management Overview <platform-fault-management-overview>` for details on these interfaces.
This section provides the list of OpenStack related Alarms and Customer Logs
that are monitored and reported for the |prod-os| application through the
|prod| fault management interfaces.

View File

@ -0,0 +1,30 @@
.. tie1580219717420
.. _setting-snmp-identifying-information:
================================
Set SNMP Identifying Information
================================
You can set SNMP system information including name, location and contact
details.
.. rubric:: |proc|
- Use the following command syntax to set the **sysContact** attribute.
.. code-block:: none
~(keystone_admin)$ system modify --contact <site-contact>
- Use the following command syntax to set the **sysLocation** attribute.
.. code-block:: none
~(keystone_admin)$ system modify --location <location>
- Use the following command syntax to set the **sysName** attribute.
.. code-block:: none
~(keystone_admin)$ system modify --location <system-name>

View File

@ -0,0 +1,44 @@
.. rdr1552680506097
.. _snmp-event-table:
================
SNMP Event Table
================
|prod| supports SNMP active and historical alarms, and customer logs, in an
event table.
The event table contains historical alarms \(sets and clears\) alarms and
customer logs. It does not contain active alarms. Each entry in the table
includes the following variables:
.. _snmp-event-table-ul-y1w-4lk-qq:
- <UUID>
- <EventID>
- <State>
- <EntityInstanceID>
- <DateAndTime>
- <EventSeverity>
- <ReasonText>
- <EventType>
- <ProbableCause>
- <ProposedRepairAction>
- <ServiceAffecting>
- <SuppressionAllowed>
.. note::
The previous SNMP Historical Alarm Table and the SNMP Customer Log Table
are still supported but marked as deprecated in the MIB.

View File

@ -0,0 +1,136 @@
.. gzl1552680561274
.. _snmp-overview:
=============
SNMP Overview
=============
|prod| can generate SNMP traps for |prod| Alarm Events and Customer Log Events.
This includes alarms based on hardware sensors monitored by board management
controllers.
.. xreflink For more information, see |node-doc|: :ref:`Sensors Tab <sensors-tab>`.
.. contents::
:local:
:depth: 1
.. _snmp-overview-section-N10027-N1001F-N10001:
------------------
About SNMP Support
------------------
Support for Simple Network Management Protocol \(SNMP\) is implemented as follows:
.. _snmp-overview-ul-bjv-cjd-cp:
- access is disabled by default, must be enabled manually from the command
line interface
- available using the controller's node floating OAM IP address, over the
standard SNMP UDP port 161
- supported version is SNMPv2c
- access is read-only for all SNMP communities
- all SNMP communities have access to the entire OID tree, there is no
support for VIEWS
- supported SNMP operations are GET, GETNEXT, GETBULK, and SNMPv2C-TRAP2
- the SNMP SET operation is not supported
For information on enabling SNMP support, see
:ref:`Enabling SNMP Support <enabling-snmp-support>`.
.. _snmp-overview-section-N10099-N1001F-N10001:
-----------------------
SNMPv2-MIB \(RFC 3418\)
-----------------------
Support for the basic standard MIB for SNMP entities is limited to the System
and SNMP groups, as follows:
.. _snmp-overview-ul-ulb-ypl-hp:
- System Group, **.iso.org.dod.internet.mgmt.mib-2.system**
- SNMP Group, **.iso.org.dod.internet.mgmt.mib-2.snmp**
- coldStart and warmStart Traps
The following system attributes are used in support of the SNMP implementation.
They can be displayed using the :command:`system show` command.
**contact**
A read-write system attribute used to populate the **sysContact** attribute
of the SNMP System group.
**location**
A read-write system attribute used to populate the **sysLocation** attribute
of the SNMP System group.
**name**
A read-write system attribute used to populate the **sysName** attribute of
the SNMP System group.
**software\_version**
A read-only system attribute set automatically by the system. Its value is
used to populate the **sysDescr** attribute of the SNMP System group.
For information on setting the **sysContact**, **sysLocation**, and **sysName**
attributes, see
:ref:`Setting SNMP Identifying Information <setting-snmp-identifying-information>`.
The following SNMP attributes are used as follows:
**sysObjectId**
Set to **iso.org.dod.internet.private.enterprise.wrs.titanium** \(1.3.6.1.4.1.1.2\).
**sysUpTime**
Set to the up time of the active controller.
**sysServices**
Set to the nominal value of 72 to indicate that the host provides services at layers 1 to 7.
.. _snmp-overview-section-N100C9-N1001F-N10001:
--------------------------
Wind River Enterprise MIBs
--------------------------
|prod| supports the Wind River Enterprise Registration and Alarm MIBs.
**Enterprise Registration MIB, wrsEnterpriseReg.mib**
Defines the Wind River Systems \(WRS\) hierarchy underneath the
**iso\(1\).org\(3\).dod\(6\).internet\(1\).private\(4\).enterprise\(1\)**.
This hierarchy is administered as follows:
- **.wrs\(731\)**, the IANA-registered enterprise code for Wind River
Systems
- **.wrs\(731\).wrsCommon\(1\).wrs<Module\>\(1-...\)**,
defined in wrsCommon<Module\>.mib.
- **.wrs\(731\).wrsProduct\(2-...\)**, defined in wrs<Product\>.mib.
**Alarm MIB, wrsAlarmMib.mib**
Defines the common TRAP and ALARM MIBs for |org| products.
The definition includes textual conventions, an active alarm table, a
historical alarm table, a customer log table, and traps.
**Textual Conventions**
Semantic statements used to simplify definitions in the active alarm
table and traps components of the MIB.
**Tables**
See :ref:`SNMP Event Table <snmp-event-table>` for detailed
descriptions.
**Traps**
See :ref:`Traps <traps>` for detailed descriptions.

View File

@ -0,0 +1,47 @@
.. ani1552680633324
.. _suppressing-an-alarm-using-the-cli:
===============================
Suppress an Alarm Using the CLI
===============================
You can use the CLI to prevent a monitored system parameter from generating
unnecessary alarms.
.. rubric:: |proc|
#. Use the :command:`fm event-suppress` to suppress a single alarm or
multiple alarms by ID.
.. code-block:: none
~(keystone_admin)$ fm event-suppress [--nowrap] --alarm id <alarm_ id>[,<alarm-id>] \
[--nopaging] [--uuid]
where
**<alarm-id>**
is a comma separated list of alarm UUIDs.
**--nowrap**
disables output wrapping
**--nopaging**
disables paged output
**--uuid**
includes the alarm type UUIDs in the output
An error message is generated in the case of an invalid
<alarm-id>: **Alarm ID not found: <alarm-id\>**.
If the specified number of Alarm IDs is greater than 1, and at least 1 is
wrong, then the suppress command is not applied \(none of the specified
Alarm IDs are suppressed\).
.. note::
Suppressing an Alarm will result in the system NOT notifying the
operator of this particular fault.

View File

@ -0,0 +1,37 @@
.. sla1552680666298
.. _suppressing-and-unsuppressing-events:
=================================
Suppress and Unsuppressing Events
=================================
You can set events to a suppressed state and toggle them back to unsuppressed.
.. rubric:: |proc|
#. Open the Events Suppression page, available from **Admin** \>
**Fault Management** \> **Events Suppression** in the left-hand pane.
The Events Suppression page appears. It provides the suppression status of
each event type and functionality for suppressing or unsuppressing each
event, depending on the current status of the event.
#. Locate the event ID that you want to suppress.
#. Click the **Suppress Event** button for that event.
You are prompted to confirm that you want to suppress the event.
.. caution::
Suppressing an Alarm will result in the system *not* notifying the
operator of this particular fault.
#. Click **Suppress Event** in the Confirm Suppress Event dialog box.
The Events Suppression tab is refreshed to show the selected event ID with
a status of Suppressed, as shown below. The **Suppress Event** button is
replaced by **Unsuppress Event**, providing a way to toggle the event back
to unsuppressed.
.. image:: figures/nlc1463584178366.png

View File

@ -0,0 +1,25 @@
.. wtg1552680748451
.. _the-global-alarm-banner:
=======================
The Global Alarm Banner
=======================
The |prod| Horizon Web interface provides an active alarm counts banner in the
page header of all screens.
The global alarm banner provides a high-level indicator of faults on the system,
that is always visible, regardless of what page you are on in the GUI. The
banner provides a color-coded snapshot of current active alarm counts for each
alarm severity.
.. image:: figures/xyj1558447807645.png
.. note::
Suppressed alarms are not shown. For more about suppressed alarms, see
:ref:`Events Suppression Overview <events-suppression-overview>`.
Clicking on the alarm banner opens the Fault Management page, where more
detailed information about the alarms is provided.

View File

@ -0,0 +1,63 @@
.. lmy1552680547012
.. _traps:
=====
Traps
=====
|prod| supports SNMP traps. Traps send unsolicited information to monitoring
software when significant events occur.
The following traps are defined.
.. _traps-ul-p1j-tvn-c5:
- **wrsAlarmCritical**
- **wrsAlarmMajor**
- **wrsAlarmMinor**
- **wrsAlarmWarning**
- **wrsAlarmMessage**
- **wrsAlarmClear**
- **wrsAlarmHierarchicalClear**
.. note::
Customer Logs always result in **wrsAlarmMessage** traps.
For Critical, Major, Minor, Warning, and Message traps, all variables in the
active alarm table are included as varbinds \(variable bindings\), where each
varbind is a pair of fields consisting of an object identifier and a value
for the object.
For the Clear trap, varbinds include only the following variables:
.. _traps-ul-uks-byn-nkb:
- <AlarmID>
- <EntityInstanceID>
- <DateAndTime>
- <ReasonText>
For the HierarchicalClear trap, varbinds include only the following variables:
.. _traps-ul-isn-fyn-nkb:
- <EntityInstanceID>
- <DateAndTime>
- <ReasonText>
For all alarms, the Notification Type is based on the severity of the trap or
alarm. This is done to facilitate the interaction with most SNMP trap viewers
which typically use the Notification Type to drive the coloring of traps, that
is, red for critical, yellow for minor, and so on.

View File

@ -0,0 +1,99 @@
.. ley1552581824091
.. _troubleshooting-log-collection:
===========================
Troubleshoot Log Collection
===========================
The |prod| log collection tool gathers detailed information.
.. contents::
:local:
:depth: 1
.. _troubleshooting-log-collection-section-N10061-N1001C-N10001:
------------------------------
Collect Tool Caveats and Usage
------------------------------
.. _troubleshooting-log-collection-ul-dpj-bxp-jdb:
- Log in as **sysadmin**, NOT as root, on the active controller and use the
:command:`collect` command.
- All usage options can be found by using the following command:
.. code-block:: none
(keystone_admin)$ collect --help
- For |prod| Simplex or Duplex systems, use the following command:
.. code-block:: none
(keystone_admin)$ collect --all
- For |prod| Standard systems, use the following commands:
- For a small deployment \(less than two worker nodes\):
.. code-block:: none
(keystone_admin)$ collect -all
- For large deployments:
.. code-block:: none
(keystone_admin)$ collect --list host1 host2 host3
- For systems with an up-time of more than 2 months, use the date range options.
Use --start-date for the collection of logs on and after a given date:
.. code-block:: none
(keystone_admin)$ collect [--start-date | -s] <YYYYMMDD>
Use --end-date for the collection of logs on and before a given date :
.. code-block:: none
(keystone_admin)$ collect [--end-date | -s] <YYYYMMDD>
- To prefix the collect tar ball name and easily identify the
:command:`collect` when several are present, use the following command.
.. code-block:: none
(keystone_admin)$ collect [--name | -n] <prefix>
For example, the following prepends **TEST1** to the name of the tarball:
.. code-block:: none
(keystone_admin)$ collect --name TEST1
[sudo] password for sysadmin:
collecting data from 1 host(s): controller-0
collecting controller-0_20200316.155805 ... done (00:01:39 56M)
creating user-named tarball /scratch/TEST1_20200316.155805.tar ... done (00:01:39 56M)
- Prior to using the :command:`collect` command, the nodes need to be
unlocked-enabled or disabled online and are required to be unlocked at
least once.
- Lock the node and wait for the node to reach the disabled-online state
before collecting logs for a node that is rebooting indefinitely.
- You may be required to run the local :command:`collect` command if the
collect tool running from the active controller node fails to collect
logs from one of the system nodes. Execute the :command:`collect` command
using the console or BMC connection on the node that displays the failure.
.. only:: partner
.. include:: ../_includes/troubleshooting-log-collection.rest

View File

@ -0,0 +1,41 @@
.. maj1552680619436
.. _unsuppressing-an-alarm-using-the-cli:
====================================
Unsuppressing an Alarm Using the CLI
====================================
If you need to reactivate a suppressed alarm, you can do so using the CLI.
.. rubric:: |proc|
- Use the :command:`fm event-unsuppress` CLI command to unsuppress a
currently suppressed alarm.
.. code-block:: none
~(keystone_admin)$ fm event-unsuppress [--nowrap] --alarm_id <alarm-id>[,<alarm-id>] \
[--nopaging] [--uuid]
where
**<alarm-id>**
is a comma separated list of **Alarm ID** s of alarms to unsuppress.
**--nowrap**
disables output wrapping.
**--nopaging**
disables paged output
**--uuid**
includes the alarm type UUIDs in the output.
Alarm type\(s\) with the specified <alarm-id\(s\)> will be unsuppressed.
You can unsuppress all currently suppressed alarms using the following command:
.. code-block:: none
~(keystone_admin)$ fm event-unsuppress -all [--nopaging] [--uuid]

View File

@ -0,0 +1,47 @@
.. sqv1552680735693
.. _viewing-active-alarms-using-horizon:
================================
View Active Alarms Using Horizon
================================
The |prod| Horizon Web interface provides a page for viewing active alarms.
Alarms are fault conditions that have a state; they are set and cleared by the
system as a result of monitoring and detecting a change in a fault condition.
Active alarms are alarms that are in the set condition. Active alarms typically
require user action to be cleared, for example, replacing a faulty cable, or
removing files from a nearly full filesystem, etc.
.. note::
For data networks and worker host data interfaces, you can also use the
Data Network Topology view to monitor active alarms.
.. xreflink For more information, see |datanet-doc|: :ref:`The Data Network Topology View <the-data-network-topology-view>`.
.. rubric:: |proc|
.. _viewing-active-alarms-using-horizon-steps-n43-ssf-pkb:
#. Select **Admin** \> **Fault Management** \> **Active Alarms** in the left pane.
The currently Active Alarms are displayed in a table, by default sorted by
severity with the most critical alarms at the top. A color-coded summary
count of active alarms is shown at the top of the active alarm tab as well.
You can change the sorting of entries by clicking on the column titles.
For example, to sort the table by timestamp severity, click
**Timestamp**. The entries are re-sorted by timestamp.
Suppressed alarms are excluded by default from the table. Suppressed alarms
can be included or excluded in the table with the **Show Suppressed** and
**Hide Suppressed** filter buttons at the top right of the table. The
suppression filter buttons are only shown when one or more alarms are
suppressed.
The **Suppression Status** column is only shown in the table when the
**Show Suppressed** filter button is selected.
#. Click the Alarm ID of an alarm entry in the table to display the details
of the alarm.

View File

@ -0,0 +1,192 @@
.. pdd1551804388161
.. _viewing-active-alarms-using-the-cli:
================================
View Active Alarms Using the CLI
================================
You can use the CLI to find information about currently active system alarms.
.. rubric:: |context|
.. note::
You can also use the command :command:`fm alarm-summary` to view the count
of alarms and warnings for the system.
To review detailed information about a specific alarm instance, see
:ref:`Viewing Alarm Details Using the CLI <viewing-alarm-details-using-the-cli>`.
.. rubric:: |proc|
.. _viewing-active-alarms-using-the-cli-steps-gsj-prg-pkb:
#. Log in with administrative privileges.
.. code-block:: none
$ source /etc/platform/openrc
#. Run the :command:`fm alarm-list` command to view alarms.
The command syntax is:
.. code-block:: none
fm alarm-list [--nowrap] [-q <QUERY>] [--uuid] [--include_suppress] [--mgmt_affecting] [--degrade_affecting]
**--nowrap**
Prevent word-wrapping of output. This option is useful when output will
be piped to another process.
**-q**
<QUERY> is a query string to filter the list output. The typical
OpenStack CLI syntax for this query string is used. The syntax is a
combination of attribute, operator and value. For example:
severity=warning would filter alarms with a severity of warning. More
complex queries can be built. See the upstream OpenStack CLI syntax
for more details on <QUERY> string syntax. Also see additional query
examples below.
You can use one of the following --query command filters to view
specific subsets of alarms, or a particular alarm:
.. table::
:widths: auto
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
| Query Filter | Comment |
+============================================================================+============================================================================+
| :command:`uuid=<uuid\>` | Query alarms by UUID, for example: |
| | |
| | .. code-block:: none |
| | |
| | ~(keystone_admin)$ fm alarm-list --query uuid=4ab5698a-19cb... |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
| :command:`alarm\_id=<alarm id\>` | Query alarms by alarm ID, for example: |
| | |
| | .. code-block:: none |
| | |
| | ~(keystone_admin)$ fm alarm-list --query alarm_id=100.104 |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
| :command:`alarm\_type=<type\>` | Query alarms by type, for example: |
| | |
| | .. code-block:: none |
| | |
| | ~(keystone_admin)$ fm alarm-list --query \ |
| | alarm_type=operational-violation |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
| :command:`entity\_type\_id=<type id\>` | Query alarms by entity type ID, for example: |
| | |
| | .. code-block:: none |
| | |
| | ~(keystone_admin)$ fm alarm-list --query \ |
| | entity_type_id=system.host |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
| :command:`entity\_instance\_id=<instance id\>` | Query alarms by entity instance id, for example: |
| | |
| | .. code-block:: none |
| | |
| | ~(keystone_admin)$ fm alarm-list --query \ |
| | entity_instance_id=host=worker-0 |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
| :command:`severity=<severity\>` | Query alarms by severity type, for example: |
| | |
| | .. code-block:: none |
| | |
| | ~(keystone_admin)$ fm alarm-list --query severity=warning |
| | |
| | The valid severity types are critical, major, minor, and warning. |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
Query command filters can be combined into a single expression
separated by semicolons, as illustrated in the following example:
.. code-block:: none
~(keystone_admin)$ fm alarm-list -q 'alarm_id=400.002;entity_instance_id=service_domain=controller.service_group=directory-services'
This option indicates that all active alarms should be displayed,
including suppressed alarms. Suppressed alarms are displayed with
their Alarm ID set to S<\(alarm-id\)>.
**--uuid**
The --uuid option on the :command:`fm alarm-list` command lists the
active alarm list with unique UUIDs for each alarm such that this
UUID can be used in display alarm details with the
:command:`fm alarm-show` <UUID> command.
**--include\_suppress**
Use this option to include suppressed alarms in the list.
**--mgmt\_affecting**
Management affecting alarms prevent some critical administrative
actions from being performed. For example, software upgrades. Using the
--mgmt\_affecting option will list an additional column in the output,
'Management Affecting', which indicates whether the alarm is management
affecting or not.
**--degrade\_affecting**
Include degrade affecting status in output.
The following example shows alarm UUIDs.
.. code-block:: none
~(keystone_admin)$ fm alarm-list --uuid
+--------------+-------+------------------+---------------+----------+-----------+
| UUID | Alarm | Reason Text | Entity ID | Severity | Time |
| | ID | | | | Stamp |
+--------------+-------+------------------+---------------+----------+-----------+
| 6056e290- | 200. | compute-0 was | host= | warning | 2019 |
| 2e56- | 001 | administratively | compute-0 | | -08-29T |
| 4e22-b07a- | | locked to take | | | 17:00:16. |
| ff9cf4fbd81a | | it out-of | | | 363072 |
| | | -service. | | | |
| | | | | | |
| | | | | | |
| 0a8a4aec- | 100. | NTP address | host= | minor | 2019 |
| a2cb- | 114 | 2607:5300:201:3 | controller-1. | | -08-29T |
| 46aa-8498- | | is not a valid | ntp= | | 15:44:44. |
| 9ed9b6448e0c | | or a reachable | 2607:5300: | | 773704 |
| | | NTP server. | 201:3 | | |
| | | | | | |
| | | | | | |
+--------------+-------+------------------+---------------+----------+-----------+
This command shows a column to track the management affecting severity of each alarm type.
.. code-block:: none
~(keystone_admin)$ fm alarm-list --mgmt_affecting
+-------+-------------------+---------------+----------+------------+-------------+
| Alarm | Reason Text | Entity ID | Severity | Management | Time Stamp |
| ID | | | | Affecting | |
+-------+-------------------+---------------+----------+------------+-------------+
| 100. | Platform Memory | host= | major | False | 2019-05-21T |
| 103 | threshold | controller-0. | | | 13:15:26. |
| | exceeded ; | numa=node0 | | | 464231 |
| | threshold 80%, | | | | |
| | actual 80% | | | | |
| | | | | | |
| 100. | Platform Memory | host= | major | False | 2019-05-21T |
| 103 | threshold | controller-0 | | | 13:15:26. |
| | exceeded ; | | | | 456738 |
| | threshold 80%, | | | | |
| | actual 80% | | | | |
| | | | | | |
| 200. | controller-0 is | host= | major | True | 2019-05-20T |
| 006 | degraded due to | controller-0. | | | 23:56:51. |
| | the failure of | process=ceph | | | 557509 |
| | its 'ceph (osd.0, | (osd.0, ) | | | |
| | )' process. Auto | | | | |
| | recovery of this | | | | |
| | major process is | | | | |
| | in progress. | | | | |
| | | | | | |
| 200. | controller-0 was | host= | warning | True | 2019-05-17T |
| 001 | administratively | controller-0 | | | 14:17:32. |
| | locked to take it | | | | 794640 |
| | out-of-service. | | | | |
| | | | | | |
+-------+-------------------+---------------+----------+------------+-------------+

View File

@ -0,0 +1,56 @@
.. kfs1580755127017
.. _viewing-alarm-details-using-the-cli:
================================
View Alarm Details Using the CLI
================================
You can view detailed information to help troubleshoot an alarm.
.. rubric:: |proc|
- Use the following command to view details about an alarm.
.. code-block:: none
fm alarm-show <uuid>
<uuid> is the ID of the alarm to query. Use the :command:`fm alarm-list`
to obtain UUIDs as described in
:ref:`Viewing Active Alarms Using the CLI <viewing-active-alarms-using-the-cli>`.
.. code-block:: none
~(keystone_admin)$ fm alarm-show 4ab5698a-19cb-4c17-bd63-302173fef62c
+------------------------+-------------------------------------------------+
| Property | Value |
+------------------------+-------------------------------------------------+
| alarm_id | 100.104 |
| alarm_state | set |
| alarm_type | operational-violation |
| entity_instance_id | system=hp380-1_4.host=controller-0 |
| entity_type_id | system.host |
| probable_cause | threshold-crossed |
| proposed_repair_action | /dev/sda3 check usage |
| reason_text | /dev/sda3 critical threshold set (0.00 MB left) |
| service_affecting | False |
| severity | critical |
| suppression | True |
| timestamp | 2014-06-25T16:58:57.324613 |
| uuid | 4ab5698a-19cb-4c17-bd63-302173fef62c |
+------------------------+-------------------------------------------------+
The pair of attributes **\(alarm\_id, entity\_instance\_id\)** uniquely
identifies an active alarm:
**alarm\_id**
An ID identifying the particular alarm condition. Note that there are
some alarm conditions, such as *administratively locked*, that can be
raised by more than one entity-instance-id.
**entity\_instance\_id**
Type and instance information of the object raising the alarm. A
period-separated list of \(key, value\) pairs, representing the
containment structure of the overall entity instance. This structure
is used for processing hierarchical clearing of alarms.

View File

@ -0,0 +1,49 @@
.. ohs1552680649558
.. _viewing-suppressed-alarms-using-the-cli:
====================================
View Suppressed Alarms Using the CLI
====================================
Alarms may be suppressed. List them to determine if any need to be unsuppressed
or otherwise managed.
.. rubric:: |proc|
.. _viewing-suppressed-alarms-using-the-cli-steps-hyn-g1x-nkb:
- Use the :command:`fm event-suppress-list` CLI command to view a list of
all currently suppressed alarms.
This command shows all alarm IDs along with their suppression status.
.. code-block:: none
~(keystone_admin)$ fm event-suppress-list [--nopaging] [--uuid] [--include-unsuppressed]
where
**--nopaging**
disables paged output, see :ref:`CLI Commands and Paged Output <cli-commands-and-paged-output>`
**--uuid**
includes the alarm type UUIDs in the output
**--include-unsuppressed**
includes unsuppressed alarm types in the output. By default only
suppressed alarm types are shown.
For example:
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)] fm event-suppress-list
+----------+-------------+
| Event ID | Status |
+----------+-------------+
| 100.101 | suppressed |
| 100.103 | suppressed |
| 100.105 | suppressed |
| ... | ... |
+----------+-------------+

View File

@ -0,0 +1,55 @@
.. ubf1552680722858
.. _viewing-the-event-log-using-horizon:
================================
View the Event Log Using Horizon
================================
The |prod| Horizon Web interface provides a convenient way to work with
historical alarms, and customer logs.
.. rubric:: |context|
The event log consolidates historical alarms events, that is, the sets and
clears of alarms that have occurred in the past, and customer logs.
Customer logs capture important system events and provide useful information
to the administrator for the purposes of overall fault management. Customer
log events do not have a state and do not typically require administrator
actions, for example, they may be reporting a failed login attempt or the fact
that a container was evacuated to another host.
Customer logs and historical alarms' set and clear actions are held in a
buffer, with older entries discarded as needed to release logging space.
.. rubric:: |proc|
#. Select **Admin** \> **Fault Management** \> **Events** in the left pane.
The Events window appears. By default, the Events screen shows all events,
including both historical set/clear alarms and logs, with the most recent
events at the top.
#. Use the filter selections from the search field to select the information
you want to view.
Use the **All Events**, **Alarm Events** and **Log Events** filter buttons
to select all events, only historical alarms set/clear events or only
customer log events to be displayed. By default, all events are displayed.
Suppressed events are by default excluded from the table. Suppressed events
can be included or excluded in the table with the **Show Suppressed and Hide
Suppressed** filter buttons at the top right of table. The suppression filter
buttons are only shown when one or more events are suppressed.
The **Suppression Status** column is only shown in the table when
**Show Suppressed** filter button is selected.
.. image:: figures/psa1567524091300.png
You can sort the entries by clicking on the column titles. For example, to
sort the view of the entries by severity, click **Severity**; the entries
are resorted and grouped by severity.
#. Click the arrow to the left of an event entry in the table for an expanded
view of event details.

View File

@ -0,0 +1,183 @@
.. fcv1552680708686
.. _viewing-the-event-log-using-the-cli:
================================
View the Event Log Using the CLI
================================
You can use CLI commands to work with historical alarms and logs in the event log.
.. rubric:: |proc|
.. _viewing-the-event-log-using-the-cli-steps-v3r-stf-pkb:
#. Log in with administrative privileges.
.. code-block:: none
$ source /etc/platform/openrc
#. Use the :command:`fm event-list` command to view historical alarms'
sets/clears and logs. By default, only unsuppressed events are shown.
For more about event suppression, see
:ref:`Events Suppression Overview <events-suppression-overview>`.
The syntax of the command is:
.. code-block:: none
fm event-list [-q <QUERY>] [-l <NUMBER>] [--alarms] [--logs] [--include_suppress]
Optional arguments:
**-q QUERY, --query QUERY**
\- key\[op\]data\_type::value; list. data\_type is optional, but if
supplied must be string, integer, float, or boolean.
**-l NUMBER, --limit NUMBER**
Maximum number of event logs to return.
**--alarms**
Show historical alarms set/clears only.
**--logs**
Show customer logs only.
**--include\_suppress**
Show suppressed alarms as well as unsuppressed alarms.
**--uuid**
Include the unique event UUID in the listing such that it can be used
in displaying event details with :command:`fm event-show` <uuid>.
**-nopaging**
Disable output paging.
For details on CLI paging, see
:ref:`CLI Commands and Paged Output <cli-commands-and-paged-output>`.
For example:
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ fm event-list -l 5
+-----------+-----+-----+--------------------+-----------------+---------+
|Time Stamp |State|Event|Reason Text |Entity Instance |Severity |
| | |Log | |ID | |
| | |ID | | | |
+-----------+-----+-----+--------------------+-----------------+---------+
|2019-05-21T| set |100. |Platform Memory |host=controller-0|major |
| 13:15:26. | |103 |threshold exceeded ;|numa=node0 | |
| 464231 | | |threshold 80%,actual| | |
| | | |80% | | |
| | | | | | |
|2019-05-21T| set | 100.|Platform Memory |host=controller-0|major |
| 13:15:26. | | 103 |threshold exceeded; | | |
| 456738 | | |threshold 80%,actual| | |
| | | |80% | | |
| | | | | | |
|2019-05-21T|clear| 100.|Platform Memory |host=controller-0|major |
| 13:07:26. | | 103 |threshold exceeded; |numa=node0 | |
| 658374 | | |threshold 80%,actual| | |
| | | |79% | | |
| | | | | | |
|2019-05-21T|clear| 100.|Platform Memory |host=controller-0|major |
| 13:07:26. | | 103 |threshold exceeded; | | |
| 656608 | | |threshold 80%,actual| | |
| | | |79% | | |
| | | | | | |
|2019-05-21T| set | 100 |Platform Memory |host=controller-0|major |
| 13:05:26. | | 103 |threshold exceeded; |numa=node0 | |
| 481240 | | |threshold 80%,actual| | |
| | | |79% | | |
| | | | | | |
+-----------+-----+-----+--------------------+-----------------+---------+
.. note::
You can also use the --nopaging option to avoid paging long event
lists.
In the following example, the :command:`fm event-list` command shows
alarms only; the **State** column indicates either **set** or **clear**.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ fm event-list -l 5 --alarms
+-------------+-------+-------+--------------------+---------------+----------+
| Time Stamp | State | Event | Reason Text | Entity | Severity |
| | | Log | | Instance ID | |
| | | ID | | | |
+-------------+-------+-------+--------------------+---------------+----------+
| 2019-05-21T | set | 100. | Platform Memory | host= | major |
| 13:15:26. | | 103 | threshold exceeded | controller-0. | |
| 464231 | | | ; threshold 80%, | numa=node0 | |
| | | | actual 80% | | |
| | | | | | |
| 2019-05-21T | set | 100. | Platform Memory | host= | |
| 13:15:26. | | 103 | threshold exceeded | controller-0 | major |
| 456738 | | | ; threshold 80%, | | |
| | | | actual 80% | | |
| | | | | | |
| 2019-05-21T | clear | 100. | Platform Memory | host= | |
| 13:07:26. | | 103 | threshold exceeded | controller-0. | major |
| 658374 | | | ; threshold 80%, | numa=node0 | |
| | | | actual 79% | | |
| | | | | | |
| 2019-05-21T | clear | 100. | Platform Memory | host= | |
| 13:07:26. | | 103 | threshold exceeded | controller-0 | major |
| 656608 | | | ; threshold 80%, | | |
| | | | actual 79% | | |
| | | | | | |
| 2019-05-21T | set | 100. | Platform Memory | host= | |
| 13:05:26. | | 103 | threshold exceeded | controller-0. | major |
| 481240 | | | ; threshold 80%, | numa=node0 | |
| | | | actual 79% | | |
| | | | | | |
+-------------+-------+-------+--------------------+---------------+----------+
In the following example, the :command:`fm event-list` command shows logs
only; the **State** column indicates **log**.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ fm event-list -l 5 --logs
+-------------+-------+-------+---------------------+---------------+----------+
| Time Stamp | State | Event | Reason Text | Entity | Severity |
| | | Log | | Instance ID | |
| | | ID | | | |
+-------------+-------+-------+---------------------+---------------+----------+
| 2019-05-21T | log | 700. | Exited Multi-Node | subsystem=vim | critical |
| 00:50:29. | | 217 | Recovery Mode | | |
| 525068 | | | | | |
| | | | | | |
| 2019-05-21T | log | 700. | Entered Multi-Node | subsystem=vim | critical |
| 00:49:49. | | 216 | Recovery Mode | | |
| 979021 | | | | | |
| | | | | | |
| 2019-05-21T | log | 401. | Service group vim- | service | |
| 00:49:31. | | 002 | services redundancy | _domain= | critical |
| 205116 | | | restored | controller. | |
| | | | | service_group | |
| | | | | =vim- | |
| | | | | services | |
| | | | | | |
| 2019-05-21T | log | 401. | Service group vim- | service | |
| 00:49:30. | | 001 | services state | _domain= | critical |
| 003221 | | | change from go- | controller. | |
| | | | active to active on | service_group | |
| | | | host controller-0 | =vim-services | |
| | | | | .host= | |
| | | | | controller-0 | |
| | | | | | |
| 2019-05-21T | log | 401. | Service group | service | |
| 00:49:29. | | 002 | controller-services | _domain= | critical |
| 950524 | | | redundancy restored | controller. | |
| | | | | service | |
| | | | | _group= | |
| | | | | controller | |
| | | | | -services | |
| | | | | | |
+-------------+-------+-------+---------------------+---------------+----------+

View File

@ -57,6 +57,15 @@ Configuration
configuration/index
----------------
Fault Management
----------------
.. toctree::
:maxdepth: 2
fault/index
----------------
Operation guides
----------------
@ -91,18 +100,13 @@ General information
Governance
----------
StarlingX is a top-level Open Infrastructure Foundation confirmed project that
is governed by two separate bodies: The `Open Infrastructure Foundation Board of
Directors`_ and the `StarlingX Technical Steering Committee`_.
StarlingX is a top-level OpenStack Foundation pilot project that is governed by
two separate bodies: The `OpenStack Foundation Board of Directors`_ and the
`StarlingX Technical Steering Committee`_.
See `StarlingX Governance`_ for additional information about StarlingX project
governance.
.. _`Open Infrastructure Foundation Board of Directors`: https://openinfra.dev/about/board/
.. _`OpenStack Foundation Board of Directors`: https://wiki.openstack.org/wiki/Governance/Foundation
.. _`StarlingX Technical Steering Committee`: https://docs.starlingx.io/governance/reference/tsc/
.. _`StarlingX Governance`: https://docs.starlingx.io/governance/
.. _`StarlingX Governance`: https://docs.starlingx.io/governance/