StarlingX System Configuration Management

Go to file

Eric MacDonald f19dd0498f Mtce: Make Multi-Node Failure Avoidance Configurable The maintenance system implements a high availability (HA) feature designed to detect the simultaneous heartbeat failure of a group of hosts and avoid failing all those hosts until heartbeat resumes or after a set period of time. This feature is called Multi-Node Failure Avoidance, aka MNFA, and currently has the hosts threshold set to 3 and timeout set to 100 secs. This update implements enhancements to that existing feature by making the 'number-of-hosts threshold' and 'timeout period' customer configurable service parameters. The new service parameters are listed under platform:maintenance which display with the following command > system service-parameter-list mnfa_threshold: This new label and value is added to the puppet managed /etc/mtc.ini and represents the number of hosts that are required to fail heartbeat as a group; within the heartbeat failure window (heartbeat_failure_threshold) after which maintenance activates MNFA Mode. This update changes the default number of failing hosts from 3 to 2 while allowing a configurable range from 2 to 100. mnfa_timeout: This new label and value is added to the puppet managed /etc/mtc.ini. While MNFA mode is active, it will remain active until the number of failing hosts drop below the mnfa_threshold or this timer expires. The MNFA mode deactivates on the first occurance of either case. Upon deactivation the remaining failed hosts are no longer treated as a failure group but instead are all Gracefully Recovered individually. A value of zero imposes no timeout making the deactivation criteria solely host based. This update changes the default 100 second timer to 0; no-timeout while permitting valid a times range from 100 to 86400 secs or 1 day. DocImpact Story: 2003576 Task: 24903 Change-Id: I2fb737a4cd3c235845b064449949fcada303d6b2 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>		2018-08-31 10:43:25 -04:00
compute-huge	Integrate host configuration into configuration framework	2018-06-14 16:03:52 -05:00
computeconfig	Add support for external Ceph	2018-07-31 15:48:43 -04:00
config-gate	StarlingX open source release updates	2018-05-31 07:35:52 -07:00
configutilities	Merge "Decouple Fault Management from stx-config"	2018-08-17 14:21:36 +00:00
controllerconfig	fix dependency path and requirement to enable tox	2018-08-28 09:54:45 +08:00
devstack	Remove installation of stx-utils	2018-08-23 00:06:31 -05:00
puppet-manifests	Mtce: Make Multi-Node Failure Avoidance Configurable	2018-08-31 10:43:25 -04:00
puppet-modules-wrs	Mtce: Make Multi-Node Failure Avoidance Configurable	2018-08-31 10:43:25 -04:00
storageconfig	StarlingX open source release updates	2018-05-31 07:35:52 -07:00
sysinv	Mtce: Make Multi-Node Failure Avoidance Configurable	2018-08-31 10:43:25 -04:00
tmp/patch-scripts/EXAMPLE_SYSINV/scripts	StarlingX open source release updates	2018-05-31 07:35:52 -07:00
.gitignore	Add default test framework	2018-06-08 20:06:21 -05:00
.gitreview	[Feature] adding support devstack for stx-config sysinv	2018-08-20 13:17:55 +08:00
.zuul.yaml	Add a zuul job for sysinv tox unittest	2018-08-13 16:34:06 +08:00
CONTRIBUTORS.wrs	StarlingX open source release updates	2018-05-31 07:35:52 -07:00
LICENSE	StarlingX open source release updates	2018-05-31 07:35:52 -07:00
README.rst	StarlingX open source release updates	2018-05-31 07:35:52 -07:00
centos_iso_image.inc	Split image.inc across git repos	2018-08-16 10:08:08 -04:00
centos_pkg_dirs	Decouple Fault Management from stx-config	2018-08-16 17:24:19 -04:00
test-requirements.txt	Add default test framework	2018-06-08 20:06:21 -05:00
tox.ini	Add default test framework	2018-06-08 20:06:21 -05:00

README.rst

stx-config

StarlingX Configuration Management