Commit Graph

14 Commits

Author SHA1 Message Date
Scott Little 3077d0c656 Relocated some packages to repo 'stx-puppet'
List of relocated subdirectories:

puppet-manifests
puppet-modules-wrs/puppet-dcdbsync
puppet-modules-wrs/puppet-dcmanager
puppet-modules-wrs/puppet-dcorch
puppet-modules-wrs/puppet-fm
puppet-modules-wrs/puppet-mtce
puppet-modules-wrs/puppet-nfv
puppet-modules-wrs/puppet-patching
puppet-modules-wrs/puppet-smapi
puppet-modules-wrs/puppet-sshd
puppet-modules-wrs/puppet-sysinv

Story: 2006166
Task: 35687
Depends-On: I665dc7fabbfffc798ad57843eb74dca16e7647a3
Change-Id: Ibc468b9d97d6dbc7ac09652dcd979c0e68a85672
Signed-off-by: Scott Little <scott.little@windriver.com>
Depends-On: I00f54876e7872cf0d3e4f5e8f986cb7e3b23c86f
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-09-05 16:18:03 -04:00
Tee Ngo 56275fb5b0 Ansible Bootstrap Deployment
This commit is initial submission of bootstrap playbook which
enables the bootstrap of initial controller. The playbook
defaults are meant for configuring the localhost in vbox
development environment. Custom hosts file and user overrides
are required for configuring multiple hosts and lab specific setup.
Secret file and SSH keys are required for production test enviroment.

Tests performed:
 - installation
 - config_controller complete to ensure the current method of
   configuring the first controller is intact
 - localhost bootstrap with default hosts file
 - multiple remote hosts bootstrap with custom hosts file
 - reconfigurations with user overrides
 - stx-application applied in AIOSX and AIODX
 - Failure & skip play cases (invalid config inputs, incorrect load,
   connection failure, no changes replay, etc...)

TODO:
 - Support for standard & storage configurations
 - Docker proxy/custom registry related tests
 - Package bootstrap playbook in SDK
 - Config_controller cleanup

Change-Id: If553f1eeed32606bacc690ef277e60606e9d93ea
Story: 200476
Task: 29686
Task: 29687
Co-Authored-By: Ovidiu Poncea <ovidiu.poncea@windriver.com>
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
2019-04-11 08:40:34 -04:00
Eric MacDonald b00c4dd415 Remove Resource Monitor ; aka rmon, from the load
All rmon resource monitoring has been moved to collectd.

This update removes rmon from mtce and the load.

Story: 2002823
Task: 30045

Test Plan:
PASS: Build and install a standard system.
PASS: Inspect mtce rpm list
PASS: Inspect logs
PASS: Check pmon.d

Depends-On: https://review.openstack.org/#/c/643739
Change-Id: I7572a1d0a9cf746abfba3d67352534d96f60c5a7
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-03-19 13:27:23 -04:00
Al Bailey cbecbf7f0b Update manifests to remove unused openstack components
Cleanup unwanted openstack setup on bare metal.

Preparing the manifests to have the services removed from SM.

Bypass setting up openstack services on controller, worker and
storage.

Cleanup haproxy ports for services that will not be running
on bare metal.

Cleanup upgrade, remote logging, postgres, and anything else
related to openstack services that no longer run on bare
metal.

Remove all manifests and templates that are no longer being used.
Strip out any static hiera data that is no longer needed.

Story: 2004764
Task: 29850
Depends-On: Ice10fe6da6b34f1d9206f26e112eb555e2088932
Depends-On: I3c1cc8673be5cf6ab15f9158199bc24fccb44f17
Depends-On: Ie43cf11ebf1edcf3a8bb357205c4c59d2962b4fa
Change-Id: I2be8e9ab418835125ff433d06d2930df37534501
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-03-08 18:43:22 -06:00
Eric MacDonald 7dd943fe46 Fix mtce.pp to handle missing /etc/rmonfiles.d directory
https://review.openstack.org/#/c/628687/  stopped packaging the
query_ntp_servers.sh script. However, since there were no other
files being packaged into that directory the spec file choose
not to create an empty directory.

When config controller called the mtce.pp manifest to install
dynamic files into /etc/rmonfiles.d it could not. So it failed.

This update adds a directory check block to the mtce.pp file
to create the directoy if its not present.

Testing: Install AIO SX in SM1

Change-Id: Ib2dfadb261be6f9ebbaa7213eb6669b25158c779
Closes-Bug: 1811693
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-01-14 20:40:15 +00:00
Don Penney e6c0e0af8c Fix puppet-lint warnings and errors
This update addresses the following errors and warnings
from puppet-lint, with most corrections done automatically
using puppet-lint --fix:
- 2sp_soft_tabs
- arrow_alignment
- arrow_on_right_operand_line
- double_quoted_strings
- hard_tabs
- only_variable_string
- quoted_booleans
- star_comments
- trailing_whitespace
- variables_not_enclosed

Change-Id: I7a2b0109534dd4715d459635fa33b09e7fd0a6a6
Story: 2004515
Task: 28683
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-12-27 15:08:37 -06:00
Tao Liu 6256b0d106 Change compute node to worker node personality
This update replaced the compute personality & subfunction
to worker, and updated internal and customer visible
references.

In addition, the compute-huge package has been renamed to
worker-utils as it contains various scripts/services that
used to affine running tasks or interface IRQ to specific CPUs.
The worker_reserved.conf is now installed to /etc/platform.

The cpu function 'VM' has also been renamed to 'Application'.

Tests Performed:
Non-containerized deployment
AIO-SX: Sanity and Nightly automated test suite
AIO-DX: Sanity and Nightly automated test suite
2+2 System: Sanity and Nightly automated test suite
2+2 System: Horizon Patch Orchestration
Kubernetes deployment:
AIO-SX: Create, delete, reboot and rebuild instances
2+2+2 System: worker nodes are unlock enable and no alarms

Story: 2004022
Task: 27013

Change-Id: I0e0be6b3a6f25f7fb8edf64ea4326854513aa396
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-12-13 14:15:55 -05:00
Eric MacDonald 1813918cf4 Mtce: Change SM Port scope to handle AIO config.
The mtc.ini file is updated a second time in AIO config.
Due to the scope of the SM ports being for controller only
and no defaults we see the sm port assignments missing in
AIO configs.

This update defaults the SM port numbers and changes the scope
of the parameters so that they get set on all node types for
all system types.

Testing included provisioning an AIO system.

Change-Id: Ib53921c4b59a9e67ed136a03504bdf0775de6dff
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-11-05 20:44:03 -05:00
Kevin Smith 1e63b2e45a Generate openrc file in /etc/platform
Create the platform openrc file in /etc/platform, while
leaving existing /etc/nova/openrc file alone for now.
New platform/client.pp file is created and most of the
contents of openstack/client.pp moved there.
openstack/client.pp can be removed once kubernetes is the
default.

Change-Id: Ib6de59da6dfc9f34a24054405b6cda30d0b74ac1
Story: 2002876
Task: 27499
Signed-off-by: Kevin Smith <kevin.smith@windriver.com>
2018-10-17 13:11:56 -04:00
Eric MacDonald f5d212010b Mtce: Add two new port definitions to mtc.ini for SM communications
In support of the HA Improvements feature maintenance is required to,
upon request, send SM a summary of maintenance's heartbeat responsiveness
during the last 20 heartbeat periods.

This update adds the required port assignments to the mtc.ini file
in support of said communications.

With this update the mtc.ini file will be updated to contain the
following entries.

  ; Communication ports between SM and maintenance
  sm_server_port = 2124 ; port sm receives mtce commands from
  sm_client_port = 2224 ; port mtce receives sm commands from

Change-Id: I05c022f7e4dcdeaea71bc0020641baa331daae57
Story: 2003576
Task: 26837
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-10-02 20:29:37 +00:00
Eric MacDonald 5f232f6486 Mtce: Make Heartbeat Failure Action Configurable
The current maintenance heartbeat failure action handling is to Fail
and Gracefully Recover the host. This means that maintenance will
ensure that a heartbeat failed host is rebooted/reset before it is
recovered but will avoid rebooting it a second time if its recovered
uptime indicates that it has already rebooted.

This update expands that single action handling behavior to support
three new actions. In doing so it adds a new configuration service
parameter called heartbeat_failure_action. The customer can configure
this new parameter with any one of the following 4 actions in order of
decreasing impact.

   fail - Host is failed and gracefuly recovered.
        - Current Network specific alarms continue to be raised/cleared.
          Note: Prior to this update this was standard system behavior.
degrade - Host is only degraded while it is failing heartbeat.
        - Current Network specific alarms continue to be raised/cleared.
        - heartbeat degrade reason is cleared as are the alarms when
          heartbeat responses resume.
  alarm - The only indication of a heartbeat failure is by alarm.
        - Same set of alarms as in above action cases
        - Only in this case no degrade, no failure, no reboot/reset
   none - Heartbeat is disabled ; no multicase heartbeat message is sent.
        - All existing heartbeat alarms are cleared.
        - The heartbeat soak as part of the enable sequence is bypassed.

The selected action is a system wide setting.
The selected setting also applies to Multi-Node Failure Avoidance.
The default action is the legacy action Fail.

This update also

 1. Removes redundant inservice failure alarm for MNFA case in support
    of degrade only action. Keeping it would make that alarm handling
    case unnecessarily complicated.
 2. No longer used 'hbs calibration' code is removed (cleanup).
 3. Small amount of heartbeat logging cleanup.

Test Plan:
PASS:    fail: Verify MNFA and recovery
PASS:    fail: Verify Single Host heartbeat failure and recovery
PASS:    fail: Verify Single Host heartbeat failure and recovery (from none)
PASS: degrade: Verify MNFA and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery (from alarm)
PASS:   alarm: Verify MNFA and recovery
PASS:   alarm: Verify Single Host heartbeat failure and recovery
PASS:   alarm: Verify Single Host heartbeat failure and recovery (from degrade)
PASS:    none: Verify heartbeat disable, fail ignore and no recovery
PASS:    none: Verify Single Host heartbeat ignore and no recovery
PASS:    none: Verify Single Host heartbeat ignode and no recovery (from fail)
PASS: Verify action change behavior from none to alarm with active MNFA
PASS: Verify action change behavior from alarm to degrade with active MNFA
PASS: Verify action change behavior from degrade to none with active MNFA
PASS: Verify action change behavior from none to fail with active MNFA
PASS: Verify action change behavior from fail to none with active MNFA
PASS: Verify action change behavior from degrade to fail then MNFA timeout
PASS: Verify all heartbeat action change customer logs
PASS: verify heartbeat stats clear over action change
PASS: Verify LO DOR (several large labs - compute and storage systems)
PASS: Verify recovery from failure of active controller
PASS: Verify 3 host failure behavior with MNFA threshold at 3 (action:fail)
PASS: Verify 2 host failure behavior with MNFA threshold at 3 (action:fail)

Change-Id: I198505fb7a923cc760b12082acff1e5bac929ef2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-09-10 08:55:16 -04:00
Eric MacDonald f19dd0498f Mtce: Make Multi-Node Failure Avoidance Configurable
The maintenance system implements a high availability (HA) feature
designed to detect the simultaneous heartbeat failure of a group
of hosts and avoid failing all those hosts until heartbeat resumes
or after a set period of time.

This feature is called Multi-Node Failure Avoidance, aka MNFA, and
currently has the hosts threshold set to 3 and timeout set to 100 secs.

This update implements enhancements to that existing feature by
making the 'number-of-hosts threshold' and 'timeout period'
customer configurable service parameters.

The new service parameters are listed under platform:maintenance which
display with the following command

> system service-parameter-list

mnfa_threshold: This new label and value is added to the puppet
managed /etc/mtc.ini and represents the number of hosts that are
required to fail heartbeat as a group; within the heartbeat
failure window (heartbeat_failure_threshold) after which maintenance
activates MNFA Mode.

This update changes the default number of failing hosts from
3 to 2 while allowing a configurable range from 2 to 100.

mnfa_timeout: This new label and value is added to the puppet
managed /etc/mtc.ini. While MNFA mode is active, it will remain active
until the number of failing hosts drop below the mnfa_threshold or this
timer expires. The MNFA mode deactivates on the first occurance of
either case. Upon deactivation the remaining failed hosts are no
longer treated as a failure group but instead are all Gracefully
Recovered individually. A value of zero imposes no timeout making the
deactivation criteria solely host based.

This update changes the default 100 second timer to 0; no-timeout
while permitting valid a times range from 100 to 86400 secs or 1 day.

DocImpact
Story: 2003576
Task: 24903

Change-Id: I2fb737a4cd3c235845b064449949fcada303d6b2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-08-31 10:43:25 -04:00
Bart Wensley 4d70f23c65 Initial changes to enable new upgrades
Making initial changes to enable new upgrades. Most
of the changes are related to removing older upgrade code that
is no longer necessary (i.e. all the packstack to mattstack
conversion code).

Change-Id: I8fe4c8c0d3f12fd7b4fc45b226bf969ffda72dc7
Story: 2002886
Task: 22847
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-07-06 09:10:22 -04:00
Dean Troyer 9b95aa0a35 StarlingX open source release updates
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-05-31 07:35:52 -07:00