StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric Macdonald 50dc29f6c0 Improve maintenance power/reset control command retry handling
This update improves on and drives consistency into the
maintenance power on/off and reset handling in terms of
retries and use of graceful and immediate commands.

This update maintains the 10 retries for both power-on
and power-off commands and increases the number of retries
for the reset command from 5 to 10 to line up with the
power operation commands.

This update also ensures that the first 5 retries are done
with the graceful action command while the last 5 are with
the immediate.

This update also removed a power on handling case that could
have lead to a stuck state. This case was virtually impossible
to hit based on the required sequence of intermittent command
failures but that scenario handling was fixed up anyway.

Issues have been seen with the power-off handling on some servers.
Suspect that those servers need more time to power-off. So, this
introduced a 30 seconds delay following a power-off command before
issuing the power status query to give the server some time to
power-off before retrying the power-off command.

Test Plan: Both IPMI and Redfish

PASS: Verify power on/off and reset handling support up to 10 retries
PASS: Verify graceful command is used for the first power on/off
      or reset try and the first 5 retries
PASS: Verify immediate command is used for the final 5 retries
PASS: Verify reset handling with/without retries (none/mid/max)
PASS: Verify power-on  handling with/without retries (none/mid/max)
PASS: Verify power-off handling  with/without retries (none/mid/max)
PASS: Verify power status command failure handling for power on/off
NOTE: FIT (fault insertion testing) was used to create retry scenarios

PASS: Verify power-off inter retry delay feature
PASS: Verify 30 second power-off to power query delay
PASS: Verify redfish power/reset commands used are logged by default
PASS: Verify power-off/on and reset logging

Regression:

PASS: verify power-on/off and reset handling without retries
PASS: Verify power-off handling when power is already off
PASS: Verify power-on handling when power is already on

Closes-Bug: 2031945
Signed-off-by: Eric Macdonald <eric.macdonald@windriver.com>
Change-Id: Ie39326bcb205702df48ff9dd090f461c7110dd36
2024-01-25 22:42:26 +00:00
api-ref/source Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
bsp-files Add patch extract from load 2023-08-08 12:15:00 -03:00
devstack Security: Handle nospectre_v1 in the bootargs 2020-01-28 18:21:13 -05:00
doc Fix tox-docs failing sphinx 2023-08-29 16:50:22 -04:00
installer Fix kickstarts patching 2023-10-11 14:40:38 +00:00
kickstart Copy luks.conf to '/etc/pmon.d' 2023-12-14 23:01:13 -05:00
mtce Improve maintenance power/reset control command retry handling 2024-01-25 22:42:26 +00:00
mtce-common Improve maintenance power/reset control command retry handling 2024-01-25 22:42:26 +00:00
mtce-compute Remove qemu dependency from mtce-compute and mtce-control 2023-12-04 14:19:28 +00:00
mtce-control Remove qemu dependency from mtce-compute and mtce-control 2023-12-04 14:19:28 +00:00
mtce-storage Update mtce debian package ver based on git 2023-03-02 14:50:35 +00:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
tools Set longer shutdown time and fix power state error log 2023-10-05 17:12:19 -04:00
.gitignore Update tox.ini files to use stein constraints 2019-06-25 13:20:35 -04:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Fix github mirroring for this repo 2023-04-28 12:38:51 -04:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst starlingx/metal README improvement 2023-07-19 12:32:13 -03:00
centos_build_layer.cfg Build layering, add layer build config file 2019-10-15 19:19:45 +08:00
centos_iso_image.inc Remove unused inventory and python-inventoryclient 2020-01-08 14:12:05 -06:00
centos_pkg_dirs rvmc: remove un-used build data 2020-01-16 08:39:54 -08:00
centos_stable_docker_images.inc Utility to install a server via Redfish 2019-12-31 15:34:54 +00:00
debian_build_layer.cfg Add debian_build_layer.cfg file 2021-10-05 14:08:23 -04:00
debian_iso_image.inc Debian: metal: update debian_iso_image.inc 2022-11-16 12:06:51 +08:00
debian_pkg_dirs Include upgrades meta files to Debian ISO 2022-08-02 21:01:58 +00:00
debian_stable_docker_images.inc debian: port rvmc docker image to Debian 2022-08-12 16:30:01 +00:00
pylint.rc Add pylint py3 portability checks for the metal repo 2021-09-13 11:57:42 -03:00
test-requirements.txt Removed wait_for_worker_config_init in AIO systems 2021-07-08 18:48:28 -04:00
tox.ini Update tox.ini to work with tox 4 2022-12-26 23:26:54 +00:00

README.rst

metal

The starlingx/metal repository handles StarlingX Bare Metal Management1.

This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest2.

References


  1. https://docs.starlingx.io/api-ref/metal↩︎

  2. https://opendev.org/starlingx/manifest.git↩︎