config

Commit Graph

Author	SHA1	Message	Date
Scott Little	3077d0c656	Relocated some packages to repo 'stx-puppet' List of relocated subdirectories: puppet-manifests puppet-modules-wrs/puppet-dcdbsync puppet-modules-wrs/puppet-dcmanager puppet-modules-wrs/puppet-dcorch puppet-modules-wrs/puppet-fm puppet-modules-wrs/puppet-mtce puppet-modules-wrs/puppet-nfv puppet-modules-wrs/puppet-patching puppet-modules-wrs/puppet-smapi puppet-modules-wrs/puppet-sshd puppet-modules-wrs/puppet-sysinv Story: 2006166 Task: 35687 Depends-On: I665dc7fabbfffc798ad57843eb74dca16e7647a3 Change-Id: Ibc468b9d97d6dbc7ac09652dcd979c0e68a85672 Signed-off-by: Scott Little <scott.little@windriver.com> Depends-On: I00f54876e7872cf0d3e4f5e8f986cb7e3b23c86f Signed-off-by: Scott Little <scott.little@windriver.com>	2019-09-05 16:18:03 -04:00
Tee Ngo	56275fb5b0	Ansible Bootstrap Deployment This commit is initial submission of bootstrap playbook which enables the bootstrap of initial controller. The playbook defaults are meant for configuring the localhost in vbox development environment. Custom hosts file and user overrides are required for configuring multiple hosts and lab specific setup. Secret file and SSH keys are required for production test enviroment. Tests performed: - installation - config_controller complete to ensure the current method of configuring the first controller is intact - localhost bootstrap with default hosts file - multiple remote hosts bootstrap with custom hosts file - reconfigurations with user overrides - stx-application applied in AIOSX and AIODX - Failure & skip play cases (invalid config inputs, incorrect load, connection failure, no changes replay, etc...) TODO: - Support for standard & storage configurations - Docker proxy/custom registry related tests - Package bootstrap playbook in SDK - Config_controller cleanup Change-Id: If553f1eeed32606bacc690ef277e60606e9d93ea Story: 200476 Task: 29686 Task: 29687 Co-Authored-By: Ovidiu Poncea <ovidiu.poncea@windriver.com> Signed-off-by: Tee Ngo <tee.ngo@windriver.com>	2019-04-11 08:40:34 -04:00
Eric MacDonald	b00c4dd415	Remove Resource Monitor ; aka rmon, from the load All rmon resource monitoring has been moved to collectd. This update removes rmon from mtce and the load. Story: 2002823 Task: 30045 Test Plan: PASS: Build and install a standard system. PASS: Inspect mtce rpm list PASS: Inspect logs PASS: Check pmon.d Depends-On: https://review.openstack.org/#/c/643739 Change-Id: I7572a1d0a9cf746abfba3d67352534d96f60c5a7 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2019-03-19 13:27:23 -04:00
Al Bailey	cbecbf7f0b	Update manifests to remove unused openstack components Cleanup unwanted openstack setup on bare metal. Preparing the manifests to have the services removed from SM. Bypass setting up openstack services on controller, worker and storage. Cleanup haproxy ports for services that will not be running on bare metal. Cleanup upgrade, remote logging, postgres, and anything else related to openstack services that no longer run on bare metal. Remove all manifests and templates that are no longer being used. Strip out any static hiera data that is no longer needed. Story: 2004764 Task: 29850 Depends-On: Ice10fe6da6b34f1d9206f26e112eb555e2088932 Depends-On: I3c1cc8673be5cf6ab15f9158199bc24fccb44f17 Depends-On: Ie43cf11ebf1edcf3a8bb357205c4c59d2962b4fa Change-Id: I2be8e9ab418835125ff433d06d2930df37534501 Signed-off-by: Al Bailey <Al.Bailey@windriver.com>	2019-03-08 18:43:22 -06:00
Eric MacDonald	7dd943fe46	Fix mtce.pp to handle missing /etc/rmonfiles.d directory https://review.openstack.org/#/c/628687/ stopped packaging the query_ntp_servers.sh script. However, since there were no other files being packaged into that directory the spec file choose not to create an empty directory. When config controller called the mtce.pp manifest to install dynamic files into /etc/rmonfiles.d it could not. So it failed. This update adds a directory check block to the mtce.pp file to create the directoy if its not present. Testing: Install AIO SX in SM1 Change-Id: Ib2dfadb261be6f9ebbaa7213eb6669b25158c779 Closes-Bug: 1811693 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2019-01-14 20:40:15 +00:00
Don Penney	e6c0e0af8c	Fix puppet-lint warnings and errors This update addresses the following errors and warnings from puppet-lint, with most corrections done automatically using puppet-lint --fix: - 2sp_soft_tabs - arrow_alignment - arrow_on_right_operand_line - double_quoted_strings - hard_tabs - only_variable_string - quoted_booleans - star_comments - trailing_whitespace - variables_not_enclosed Change-Id: I7a2b0109534dd4715d459635fa33b09e7fd0a6a6 Story: 2004515 Task: 28683 Signed-off-by: Don Penney <don.penney@windriver.com>	2018-12-27 15:08:37 -06:00
Tao Liu	6256b0d106	Change compute node to worker node personality This update replaced the compute personality & subfunction to worker, and updated internal and customer visible references. In addition, the compute-huge package has been renamed to worker-utils as it contains various scripts/services that used to affine running tasks or interface IRQ to specific CPUs. The worker_reserved.conf is now installed to /etc/platform. The cpu function 'VM' has also been renamed to 'Application'. Tests Performed: Non-containerized deployment AIO-SX: Sanity and Nightly automated test suite AIO-DX: Sanity and Nightly automated test suite 2+2 System: Sanity and Nightly automated test suite 2+2 System: Horizon Patch Orchestration Kubernetes deployment: AIO-SX: Create, delete, reboot and rebuild instances 2+2+2 System: worker nodes are unlock enable and no alarms Story: 2004022 Task: 27013 Change-Id: I0e0be6b3a6f25f7fb8edf64ea4326854513aa396 Signed-off-by: Tao Liu <tao.liu@windriver.com>	2018-12-13 14:15:55 -05:00
Eric MacDonald	1813918cf4	Mtce: Change SM Port scope to handle AIO config. The mtc.ini file is updated a second time in AIO config. Due to the scope of the SM ports being for controller only and no defaults we see the sm port assignments missing in AIO configs. This update defaults the SM port numbers and changes the scope of the parameters so that they get set on all node types for all system types. Testing included provisioning an AIO system. Change-Id: Ib53921c4b59a9e67ed136a03504bdf0775de6dff Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2018-11-05 20:44:03 -05:00
Kevin Smith	1e63b2e45a	Generate openrc file in /etc/platform Create the platform openrc file in /etc/platform, while leaving existing /etc/nova/openrc file alone for now. New platform/client.pp file is created and most of the contents of openstack/client.pp moved there. openstack/client.pp can be removed once kubernetes is the default. Change-Id: Ib6de59da6dfc9f34a24054405b6cda30d0b74ac1 Story: 2002876 Task: 27499 Signed-off-by: Kevin Smith <kevin.smith@windriver.com>	2018-10-17 13:11:56 -04:00
Eric MacDonald	f5d212010b	Mtce: Add two new port definitions to mtc.ini for SM communications In support of the HA Improvements feature maintenance is required to, upon request, send SM a summary of maintenance's heartbeat responsiveness during the last 20 heartbeat periods. This update adds the required port assignments to the mtc.ini file in support of said communications. With this update the mtc.ini file will be updated to contain the following entries. ; Communication ports between SM and maintenance sm_server_port = 2124 ; port sm receives mtce commands from sm_client_port = 2224 ; port mtce receives sm commands from Change-Id: I05c022f7e4dcdeaea71bc0020641baa331daae57 Story: 2003576 Task: 26837 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2018-10-02 20:29:37 +00:00
Eric MacDonald	5f232f6486	Mtce: Make Heartbeat Failure Action Configurable The current maintenance heartbeat failure action handling is to Fail and Gracefully Recover the host. This means that maintenance will ensure that a heartbeat failed host is rebooted/reset before it is recovered but will avoid rebooting it a second time if its recovered uptime indicates that it has already rebooted. This update expands that single action handling behavior to support three new actions. In doing so it adds a new configuration service parameter called heartbeat_failure_action. The customer can configure this new parameter with any one of the following 4 actions in order of decreasing impact. fail - Host is failed and gracefuly recovered. - Current Network specific alarms continue to be raised/cleared. Note: Prior to this update this was standard system behavior. degrade - Host is only degraded while it is failing heartbeat. - Current Network specific alarms continue to be raised/cleared. - heartbeat degrade reason is cleared as are the alarms when heartbeat responses resume. alarm - The only indication of a heartbeat failure is by alarm. - Same set of alarms as in above action cases - Only in this case no degrade, no failure, no reboot/reset none - Heartbeat is disabled ; no multicase heartbeat message is sent. - All existing heartbeat alarms are cleared. - The heartbeat soak as part of the enable sequence is bypassed. The selected action is a system wide setting. The selected setting also applies to Multi-Node Failure Avoidance. The default action is the legacy action Fail. This update also 1. Removes redundant inservice failure alarm for MNFA case in support of degrade only action. Keeping it would make that alarm handling case unnecessarily complicated. 2. No longer used 'hbs calibration' code is removed (cleanup). 3. Small amount of heartbeat logging cleanup. Test Plan: PASS: fail: Verify MNFA and recovery PASS: fail: Verify Single Host heartbeat failure and recovery PASS: fail: Verify Single Host heartbeat failure and recovery (from none) PASS: degrade: Verify MNFA and recovery PASS: degrade: Verify Single Host heartbeat failure and recovery PASS: degrade: Verify Single Host heartbeat failure and recovery (from alarm) PASS: alarm: Verify MNFA and recovery PASS: alarm: Verify Single Host heartbeat failure and recovery PASS: alarm: Verify Single Host heartbeat failure and recovery (from degrade) PASS: none: Verify heartbeat disable, fail ignore and no recovery PASS: none: Verify Single Host heartbeat ignore and no recovery PASS: none: Verify Single Host heartbeat ignode and no recovery (from fail) PASS: Verify action change behavior from none to alarm with active MNFA PASS: Verify action change behavior from alarm to degrade with active MNFA PASS: Verify action change behavior from degrade to none with active MNFA PASS: Verify action change behavior from none to fail with active MNFA PASS: Verify action change behavior from fail to none with active MNFA PASS: Verify action change behavior from degrade to fail then MNFA timeout PASS: Verify all heartbeat action change customer logs PASS: verify heartbeat stats clear over action change PASS: Verify LO DOR (several large labs - compute and storage systems) PASS: Verify recovery from failure of active controller PASS: Verify 3 host failure behavior with MNFA threshold at 3 (action:fail) PASS: Verify 2 host failure behavior with MNFA threshold at 3 (action:fail) Change-Id: I198505fb7a923cc760b12082acff1e5bac929ef2 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2018-09-10 08:55:16 -04:00
Eric MacDonald	f19dd0498f	Mtce: Make Multi-Node Failure Avoidance Configurable The maintenance system implements a high availability (HA) feature designed to detect the simultaneous heartbeat failure of a group of hosts and avoid failing all those hosts until heartbeat resumes or after a set period of time. This feature is called Multi-Node Failure Avoidance, aka MNFA, and currently has the hosts threshold set to 3 and timeout set to 100 secs. This update implements enhancements to that existing feature by making the 'number-of-hosts threshold' and 'timeout period' customer configurable service parameters. The new service parameters are listed under platform:maintenance which display with the following command > system service-parameter-list mnfa_threshold: This new label and value is added to the puppet managed /etc/mtc.ini and represents the number of hosts that are required to fail heartbeat as a group; within the heartbeat failure window (heartbeat_failure_threshold) after which maintenance activates MNFA Mode. This update changes the default number of failing hosts from 3 to 2 while allowing a configurable range from 2 to 100. mnfa_timeout: This new label and value is added to the puppet managed /etc/mtc.ini. While MNFA mode is active, it will remain active until the number of failing hosts drop below the mnfa_threshold or this timer expires. The MNFA mode deactivates on the first occurance of either case. Upon deactivation the remaining failed hosts are no longer treated as a failure group but instead are all Gracefully Recovered individually. A value of zero imposes no timeout making the deactivation criteria solely host based. This update changes the default 100 second timer to 0; no-timeout while permitting valid a times range from 100 to 86400 secs or 1 day. DocImpact Story: 2003576 Task: 24903 Change-Id: I2fb737a4cd3c235845b064449949fcada303d6b2 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2018-08-31 10:43:25 -04:00
Bart Wensley	4d70f23c65	Initial changes to enable new upgrades Making initial changes to enable new upgrades. Most of the changes are related to removing older upgrade code that is no longer necessary (i.e. all the packstack to mattstack conversion code). Change-Id: I8fe4c8c0d3f12fd7b4fc45b226bf969ffda72dc7 Story: 2002886 Task: 22847 Signed-off-by: Jack Ding <jack.ding@windriver.com>	2018-07-06 09:10:22 -04:00
Dean Troyer	9b95aa0a35	StarlingX open source release updates Signed-off-by: Dean Troyer <dtroyer@gmail.com>	2018-05-31 07:35:52 -07:00

14 Commits