Commit Graph

57 Commits

Author SHA1 Message Date
Scott Little 3077d0c656 Relocated some packages to repo 'stx-puppet'
List of relocated subdirectories:

puppet-manifests
puppet-modules-wrs/puppet-dcdbsync
puppet-modules-wrs/puppet-dcmanager
puppet-modules-wrs/puppet-dcorch
puppet-modules-wrs/puppet-fm
puppet-modules-wrs/puppet-mtce
puppet-modules-wrs/puppet-nfv
puppet-modules-wrs/puppet-patching
puppet-modules-wrs/puppet-smapi
puppet-modules-wrs/puppet-sshd
puppet-modules-wrs/puppet-sysinv

Story: 2006166
Task: 35687
Depends-On: I665dc7fabbfffc798ad57843eb74dca16e7647a3
Change-Id: Ibc468b9d97d6dbc7ac09652dcd979c0e68a85672
Signed-off-by: Scott Little <scott.little@windriver.com>
Depends-On: I00f54876e7872cf0d3e4f5e8f986cb7e3b23c86f
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-09-05 16:18:03 -04:00
Robert Church 38abbef079 Rebase Armada to latest master
Rebasing Armada to use the latest docker image tag
8a1638098f88d92bf799ef4934abe569789b885e-ubuntu_bionic.

Change-Id: Ic48a2e053d0de7dacfd6a07d817947e11dc8d596
Story: 2006347
Task: 36105
Signed-off-by: Robert Church <robert.church@windriver.com>
2019-08-15 16:54:51 -04:00
Zuul 2c04825fc9 Merge "ANSIBLE Bootstrap changes for System Controller" 2019-07-11 17:29:48 +00:00
Tao Liu bbac058ade ANSIBLE Bootstrap changes for System Controller
This update contains the following misc. config changes to
support ansible bootstrap for system controller.

Creates deps model for dcmanager and dcorch puppet modules.
Creates a system controller postgres run time manifest which is
applied upon the creation of initial controller host or replay after
the distributed cloud role has been changed.
The patch_vault file system is created during the first controller
unlocked.
And allows the dc role to be modified during bootstrap.

Change-Id: Id7b416274b2a854c469bfdca7448bf1ddea639d7
Story: 2004766
Task: 35650
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2019-07-11 12:08:06 -04:00
Zuul d86899bd7a Merge "Change the default value of sysinv-api bind host" 2019-07-09 12:44:58 +00:00
Yi Wang 89728b15ec Change the default value of sysinv-api bind host
The default sysinv-api bind host value was changed from
"0.0.0.0" to "::" to support both IPV4 and IPV6.

Change-Id: I072cda6df02f49a94d94871a9f19800f106f49dc
Closes-Bug: 1833459
Signed-off-by: Yi Wang <yi.c.wang@intel.com>
2019-06-28 15:03:18 +08:00
Al Bailey 609d84d846 Remove magnum from baremetal.
Magnum is no longer packaged on bare metal.

The sysinv and upgrades code related to magnum has been removed.

The helm configuration for magnum remains, although it is not currently
supported in containers either. The magnum-ui is not installed in
platform or containerized horizon so the code to enable it is removed.

Some upgrade code remains, due to the fact that that utility is
in the process of being re-written.

Story: 2004764
Task: 34333
Change-Id: I56873b4e04aac2e7d0cd57909beea00ecc2c1b9a
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-06-27 11:57:09 -05:00
Jerry Sun 4809c9f489 Upversion armada image
Upversion armada image from existing
af8a9ffd0873c2fbc915794e235dbd357f2adab1
to
dd2e56c473549fd16f94212b553ed58c48d1f51b-ubuntu_bionic

The specific image was chosen because it contained upstream
armada commit df68a90e057c2e1e3427d6b8497b437c8a4c3b7e, which
is a fix for keystone kubernetes auth. The ubuntu bionic image
was chosen because the old image was an ubuntu bionic based image.

Testing done by applying stx-openstack on standard, simplex,
and duplex systems.

Story: 2005860
Task: 33693

Change-Id: Ifd8a66d46e2dfd47ca7c5ab9807076ef43e67027
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2019-06-21 09:47:40 -04:00
Al Bailey 99077bad0e Cleanup ceilometer from bare metal code
Ceilometer is being setup through helm charts in containers
so the references to ceilometer in bare metal can be cleaned up.

 - Removing the sysinv puppet code for ceilometer
 - Removing the bare metal ceilometer pipeline upgrade script
 - Cleaning up unused variables from templates

Story: 2004764
Task: 33690
Change-Id: I2efe7aed7a4570121c1376c132e157c6f47e9f29
2019-06-13 10:29:18 -05:00
Al Bailey 6c3afad3e7 Remove references to pacemaker from sysinv
Sysinv use of pacemaker was replaced by SM a long time ago
and the code that referenced it is being removed.

Change-Id: Ic2a55698f64757bffeb9b53f4a105ea6ccb3dd2f
Story: 2004764
Task: 30665
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-06-06 11:56:45 -05:00
Sun Austin 384854568c Disable raise/get/clear NFV alarm to container fm-rest-api
add puppet parameters and write to nfv config file to
disable raise/get/clear NFV alarm to container fm-rest-api service

Story: 2004008
Task: 33573
Depends-On: https://review.opendev.org/#/c/658972/
Change-Id: I3ab37fe476ad083b5c8acca2684973eec30b8005
Signed-off-by: Sun Austin <austin.sun@intel.com>
2019-06-05 09:14:58 +08:00
SidneyAn 4f406285e4 update nfv-vim puppet runtime manifests and config files
nfvi would raise openstack alarms/logs to the fm in pods,
when it is availiable. following configure changes are
required:
1. add "fault_mgmt_plugin_disabled" para in vim config
file. Set it "True" when openstack application is not
implement, and "False" when it is.

2. add "fault_mgmt_endpoint_disabled" para in alarm
and event_log config file. rules are the same with
"fault_mgmt_plugin_disabled"

3. add "openstack" and "fm" info to alarms and event_logs
config file

Story: 2004008
Task: 30954
Depends-On: https://review.opendev.org/#/c/661548/
Change-Id: Iee2a4515336f4ce9b6373d56d4f7a5779664233d
Signed-off-by: SidneyAn <ran1.an@intel.com>
2019-06-04 09:02:50 +08:00
Zuul 87339ef708 Merge "Keystone DB sync - add service puppet module" 2019-05-07 20:25:11 +00:00
Zuul 2e966860bc Merge "Provide env settings to allow zuul and developers to both run tox" 2019-05-02 15:52:49 +00:00
Andy Ning aa61bc58ea Keystone DB sync - add service puppet module
This update adds the puppet package for keystone DB synchronization
service. This puppet package will be used by controller puppet manifest
to deploy and configure the synchronization service.

Story: 2002842
Task: 22787

Signed-off-by: Andy Ning <andy.ning@windriver.com>
(cherry picked from commit 51b20e03ea)

Conflicts:
	centos_pkg_dirs

Depends-On: https://review.opendev.org/#/c/655727
Change-Id: I7059800daa053eaf975ad7f02200247d77653926
2019-04-30 14:20:37 -04:00
Al Bailey 65eaf645f4 Provide env settings to allow zuul and developers to both run tox
Zuul checks out the dependant projects by their repo names.
Repo checks out the project directory structure based on the
labels in the manifest.

Currently these directories have different names and so tox
passes when run by zuul, but fails when run in a developer env.

This submission uses an env variable: "STX_PREFIX" to make
both envs able to run tox.

Story: 2004515
Task: 30664
Change-Id: I06cefab7422f53ccc0b8af30ca06945311cec70e
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-04-30 09:18:46 -05:00
Al Bailey 0704edb6cc Remove AODH and Gnocchi service parameters
Removes the aodh service parameter alarm_history_time_to_live
Removes references to aodh and gnocchi from puppet and upgrade code.
Removes old gnocchi references from remote logging.

Story: 2004764
Task: 30537
Change-Id: I3a03dd4a2afd47f1cc3f677f02d348eabf11a653
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-04-30 08:00:19 -05:00
Tyler Smith 43381a8748 Renaming deprecated options and updating spec requirements
- Renaming idle_timeout to connection_recycle_time since it was
  deprecated in Stein
- Explicitly including required packages in the sysinv spec file to fix
  DC

Change-Id: Ief055d26f3a1eb43b8cf144952a49e7e0f3ff939
Story: 2004765
Task: 28883
Depends-On: https://review.openstack.org/#/c/653086
Signed-off-by: Tyler Smith <tyler.smith@windriver.com>
2019-04-16 20:21:36 +00:00
Al Bailey b899cf351e Upversion Armada SHA to be a newer image
Using SHA: af8a9ffd0873c2fbc915794e235dbd357f2adab1
which was built and tagged on April 9, 2019.

The previous Armada SHA was from Sept 2018.

The manifest.xml is updated to not generate armada warnings
for libvirt, openvswitch, nova and neutron.
The warning was:
  "label_selector" not specified,
  waiting with no labels may cause unintended consequences.

Story: 2005198
Task: 30436
Change-Id: I97b633d9e6e1e4574e25dc8b69500faae4b4a809
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-04-11 15:13:41 -05:00
Erich Cordoba 05a26e9061 Add notices on Intel authored files.
Story: 2005265
Task: 30083

Change-Id: Ibcae6539747beb9d641e7d5eef4c4ff7574a8b13
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-03-20 10:03:44 -06:00
Al Bailey 37b041a04c Remove unused puppet modules
* Remove the nova api proxy puppet module.
* Remove openstack::swift puppet manifest.
* Refactor openstack::nova::storage as platform::worker::storage.
  This requires the nova puppet code in sysinv to write to a
  different hiera target, and creation of /var/lib/nova.
* Remove puppet modules from spec file for modules that are no
  longer being used.

Story: 2004764
Task: 29840
Change-Id: Ifa0171b06e23fd77d373983d644df3f56ae4e2de
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-03-20 08:03:07 -05:00
Tao Liu 2c3e5963f3 Enable Distributed Cloud configuration
The following changes are required to enable system
controller and sub cloud configuration in a distributed
cloud environment:

* Remove references to os-keystone-region-name as the
openstack patches that support it, have been removed.
* Change the iptables rule for the NAT entry, to only
apply, if the selected outgoing interface is the
OAM interface.
* Configure keystone endpoints, before configuring
openrc on subclouds
* Remove all openstack services, and users from the region
config and update the tox
* Disable nova, cinder and neutron api proxy

Only tested distributed cloud configuration as multi-region
configuration is not supported in the current release.

Story: 2004766
Task: 30017

Change-Id: I5c43e2112f34225aa9e23ff777c5333ae77efcdc
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2019-03-14 17:48:44 -04:00
Mingyuan Qi 611a68a96a Allow user specified registries for config_controller
Currently docker images were pulled from public registries during
config_controller. For some users, the connection to the public
docker registry may be slow such that installing the containerized
services images may timeout or the system simply does not have
access to the public internet.

This change allows users to specify alternative public/private
registries to replace k8s.gcr.io, gcr.io, quay.io and docker.io.
Insecure registry is supported if all default registries were
replaced by one unified registry. It lowers the complexity for
those who build his own registry without internet access.

Docker doesn't support ipv6 addr as registry name, instead
hostname or domain name in ipv6 network is allowed.

Test:
AIO-SX/AIO-DX/Standard(2+2):
  Alternative public registry (ipv4/domain) with proxy
    - config_controller pass
  Private registry (ipv4/ipv6/domain) without internet
    - config_controller pass
  Default registry with/without proxy
    - config_controller pass

Story: 2004711
Task: 28742

Change-Id: I4fee3f4e0637863b9b5ef4ef556082ac75f62a1d
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
2019-02-23 10:10:07 +08:00
Erich Cordoba ea0b33e950 Standardize makefiles for puppet-modules-wrs
The puppet-modules-wrs is formed by several subcomponents, in all
of them the same changes were applied:

  - Create a makefile with a install target.
  - Remove license file from build_srpm.data as is not needed.
  - Update target in specfile
  - Change autosetup to setup in specfile, this was bug in the spec
    files.

Testing:
  - Verification on correct install paths.
  - config_controller complete on simplex configuration.

Change-Id: I1512eb0c3034ffa2d57d098dab9800bdaba5b48d
Story: 2004043
Task: 27552
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-02-11 13:30:42 -06:00
Alex Kozyrev f44717154a Add Barbican bootstrap and runtime manifests
Barbican service is needed during bootstrap phase for StarlingX.
Implement bootstrap and runtime manifests to achieve that.

Change-Id: I6c22ebddacf8aec3a731f7f6d7a762f79f511c78
Story: 2003108
Task: 27700
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
2019-01-11 13:33:00 -05:00
Don Penney 6fee40bd23 Update puppet module tox.ini files for puppet-lint
When running "tox -e puppetlint" manually, the tox.ini will
install puppet-lint via gem, but does not automatically install
the json module upon which puppet-lint depends.

This commit adds json to the gem install command.

Change-Id: Ib8b6133395bf76748a8bcac0cb7bd718a89d6d5a
Story: 2004515
Task: 28704
Signed-off-by: Don Penney <don.penney@windriver.com>
2019-01-02 15:27:09 -05:00
Don Penney 9a3264acaa Fix additional puppet-lint warnings and errors
This update addresses the following errors and warnings
from puppet-lint:
- 140chars
- case_without_default
- ensure_first_param
- inherits_across_namespaces
- parameter_order
- single_quote_string_with_variables
- variable_is_lowercase
- variable_scope

In the case of variable_is_lowercase, the compute.pp manifest
has variables with sizes like 2M in the name. These have been
left as-is, with lint:ignore comments for the check, due to
the semantics of the name.

For the 140chars check, certain long lines have been left as-is,
with lint:ignore comments, due to long commands being executed.
These can be revisited in a future update to try to break up
the lines and remove the lint:ignore directives.

Change-Id: I37809bacb43818e0956f9f434c30c48e05017325
Story: 2004515
Task: 28685
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-12-27 16:23:13 -06:00
Don Penney e6c0e0af8c Fix puppet-lint warnings and errors
This update addresses the following errors and warnings
from puppet-lint, with most corrections done automatically
using puppet-lint --fix:
- 2sp_soft_tabs
- arrow_alignment
- arrow_on_right_operand_line
- double_quoted_strings
- hard_tabs
- only_variable_string
- quoted_booleans
- star_comments
- trailing_whitespace
- variables_not_enclosed

Change-Id: I7a2b0109534dd4715d459635fa33b09e7fd0a6a6
Story: 2004515
Task: 28683
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-12-27 15:08:37 -06:00
Don Penney a91160daa2 Add puppet-lint support
This update adds the tox and zuul configuration to
run puppet-lint against the puppet manifests. The
initial update ignores all existing errors, which
will be cleaned up later.

Change-Id: I293abc2eac6bc6216cbbf6d939c1ba3474fb9384
Story: 2004515
Task: 28665
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-12-24 13:50:20 -06:00
Tao Liu 6256b0d106 Change compute node to worker node personality
This update replaced the compute personality & subfunction
to worker, and updated internal and customer visible
references.

In addition, the compute-huge package has been renamed to
worker-utils as it contains various scripts/services that
used to affine running tasks or interface IRQ to specific CPUs.
The worker_reserved.conf is now installed to /etc/platform.

The cpu function 'VM' has also been renamed to 'Application'.

Tests Performed:
Non-containerized deployment
AIO-SX: Sanity and Nightly automated test suite
AIO-DX: Sanity and Nightly automated test suite
2+2 System: Sanity and Nightly automated test suite
2+2 System: Horizon Patch Orchestration
Kubernetes deployment:
AIO-SX: Create, delete, reboot and rebuild instances
2+2+2 System: worker nodes are unlock enable and no alarms

Story: 2004022
Task: 27013

Change-Id: I0e0be6b3a6f25f7fb8edf64ea4326854513aa396
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-12-13 14:15:55 -05:00
Bart Wensley 4a43480f6b Configure VIM to use pod based OpenStack services
When kubernetes is configured and the OpenStack application has
been installed, the VIM will be configured to access the OpenStack
services running in pods (keystone, nova, rabbitmq, etc...).

In order to support this, some extensions were done to the sysinv
helm code to allow parts of the OpenStack application
configuration to be retrieved (e.g. endpoint info). Changes
were also required to dnsmasq configuration to get resolution of
pod based names (e.g. keystone.openstack.svc.cluster.local)
working properly.

This commit is just the first step and has limitations. There is
no trigger to reconfigure the VIM after the OpenStack application
has been installed - a controller lock/unlock is required.

Story: 2003910
Task: 27852

Change-Id: I1c6dcdecd1365104457009196bbcf06b19c95489
Signed-off-by: Bart Wensley <barton.wensley@windriver.com>
2018-11-15 14:39:39 -06:00
Zuul e237e94b80 Merge "Fix word and statement errors in comments" 2018-11-14 14:55:37 +00:00
zhangkunpeng 48699351c8 Fix word and statement errors in comments
fix some typos in comments, such as the duplicated word 'the the'.

Change-Id: I28ffde825fd95186bc3a0bd077dea7c20287fc1f
Story: 2004164
Task: 27641
Signed-off-by: zhangkunpeng <zhang.kunpeng@99cloud.net>
2018-11-14 10:04:51 +08:00
Tee Ngo d8d8851fa2 Armada-Sysinv integration
Initial implementation of Armada integration with sysinv which
entails:

- Basic application upload via system application-upload command
- Application install via system application-apply command
- Application remove via system application-remove command
- Application delete via system application-delete command
- Application list and detail viewing via system
  application-list and application-show commands.

This implementation does not cover the following functionalities
that are either still under discussion or in planning:
a) support for remote CLI where application tarball resides in
   the client machine
b) support for air-gapped scenario/embedded private images
c) support for custom apps' user overrides

Tests conducted:
- config controller
- tox
- functional tests (both Openstack and simple test app):
    - upload
    - apply
    - remove
    - delete
    - show
    - list
    - release group upgrade with user overrides
- failure tests:
    - no tar file supplied
    - corrupted tar file
    - app already exists/does not exist
    - upload failure (missing manifest, multi manifests,
      no image tags, checksum test failure, etc...)
    - apply failure (nodes are not labeled, image download
      failure, etc...)
    - operation not permitted

Change-Id: Iec27f356bd0047b2c7ef860ab3a2528f5a371868
Story: 2003908
Task: 26792
Signed-off-by: Tee Ngo <Tee.Ngo@windriver.com>
2018-11-07 07:52:35 -05:00
Tao Liu 485445def0 Fernet key synchronization
This update contains the following changes for Distributed
Cloud Fernet Key Synching & Management:

1.Disable key rotation cron job for distributed cloud
2.Add a fernet key repo config option in puppet sysinv
3.Add fernet repo sysinv APIs for create/update/retrieve keys
4.Add a fernet operator to create/update/retrieve the keys

Story: 2002842
Task: 22786

Change-Id: Ia14caeef067fa481e3a4159c1658289250632779
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-10-26 14:56:42 -05:00
Lachlan Plant 99323d74a9 Add logging configuration to nova-api-proxy
Puppet manifests now include logging data to push to
the conf file. This is needed for a subsequent code
change to change the logging backend to oslo_log

Change-Id: I303e199fd3c984af20564c43bdb98c460cbed0f1
Story: 2004007
Task: 27608
Signed-off-by: Lachlan Plant <lachlan.plant@windriver.com>
2018-10-22 12:50:44 -05:00
Kevin Smith 3a91cbae4d Containerization, support 2 keystones in sysinv
Support bare metal and pod based keystone in sysinv.  The existing
keystone_authtoken section of sysinv.conf remains and is used for
platform service authentication, while openstack service authentication
parameters are moved to a new openstack_keystone_authtoken section.
Admin credentials are used in the new openstack_keystone_authtoken
section and the region name parameters are also moved to this new
section.

Change-Id: I7a53dd5a2dc52213e0f1e0cc748649a33f0f9f40
Story: 2002876
Task: 26926
Signed-off-by: Kevin Smith <kevin.smith@windriver.com>
2018-10-11 14:26:48 -04:00
Zuul 2575c43911 Merge "Add configuration for containerized keystone to VIM" 2018-10-03 16:49:55 +00:00
Bart Wensley e3c1fbed88 Add configuration for containerized keystone to VIM
Adding configuration to the VIM for containerized keystone. The
VIM will now support two keystone instances:
- platform: bare metal keystone used to authenticate with
  platform services (e.g. sysinv, patching)
- openstack: containerized keystone used to authenticate with
  openstack services (e.g. nova, neutron, cinder)

For now, the same configuration will be used for both, as we
still only deploy with the baremetal keystone.

Story: 2002876
Task: 26872

Change-Id: If4bd46a4c14cc65978774001cb2887e5d3e3607b
2018-10-03 06:55:58 -05:00
Eric MacDonald f5d212010b Mtce: Add two new port definitions to mtc.ini for SM communications
In support of the HA Improvements feature maintenance is required to,
upon request, send SM a summary of maintenance's heartbeat responsiveness
during the last 20 heartbeat periods.

This update adds the required port assignments to the mtc.ini file
in support of said communications.

With this update the mtc.ini file will be updated to contain the
following entries.

  ; Communication ports between SM and maintenance
  sm_server_port = 2124 ; port sm receives mtce commands from
  sm_client_port = 2224 ; port mtce receives sm commands from

Change-Id: I05c022f7e4dcdeaea71bc0020641baa331daae57
Story: 2003576
Task: 26837
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-10-02 20:29:37 +00:00
Zuul 38027c134f Merge "LLDP OVS enablement: puppet configuration" 2018-09-27 00:56:51 +00:00
Steven Webster da1110a3d8 LLDP OVS enablement: puppet configuration
This commit introduces puppet configuration enabling LLDP to operate over
OVS.  Specifically, separate ports flows are configured to handle LLDP
traffic.

In addition, we restrict the lldpd daemon from
operating over bridge, tap, and ovs-netdev devices.

Story: 2002946
Task: 22940

Change-Id: Ibadc9c082425412b5b68b02a55e8c02692de0e17
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2018-09-26 11:11:42 -04:00
Kevin Smith 987e372465 Disable VIM plugins for Kubernetes deployment
Do not load vim plugins and disable vim audits instead of
just disabling the endpoints as was previously done in
Change 599741.  Leave setting of (new) Nova and (pre-existing)
Neutron endpoint disabled flags for infrastructure host services usage.

Story: 2002876
Task: 26573

Change-Id: Id3af829562e5765b99dbab23d913d65a4e6ec4a7
Signed-off-by: Kevin Smith <kevin.smith@windriver.com>
2018-09-24 09:29:35 -04:00
Tao Liu a8acc56242 Sysinv healthy query API request failed
The healthy query API request triggers sysinv to query the alarm list.
The alarm query is attempted via a sysinv database API which is no
longer supported. This results in the REST API request failure.

This update contains the following changes to address the issue:
1.Add FM catalog info to sysinv puppet class and manifest
2.Add service catalog to the user request context
3.Add a FM client interface to communicate with FM API
4.Update the health query to retrieve the alarm list via FM client

Closes-Bug: # 1789983

Change-Id: I31b256f6de22fe70cba59b08bf927c8b0ac119ee
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-09-13 13:36:15 -04:00
Zuul 1b95285b16 Merge "Mtce: Make Heartbeat Failure Action Configurable" 2018-09-11 13:41:12 +00:00
Eric MacDonald 5f232f6486 Mtce: Make Heartbeat Failure Action Configurable
The current maintenance heartbeat failure action handling is to Fail
and Gracefully Recover the host. This means that maintenance will
ensure that a heartbeat failed host is rebooted/reset before it is
recovered but will avoid rebooting it a second time if its recovered
uptime indicates that it has already rebooted.

This update expands that single action handling behavior to support
three new actions. In doing so it adds a new configuration service
parameter called heartbeat_failure_action. The customer can configure
this new parameter with any one of the following 4 actions in order of
decreasing impact.

   fail - Host is failed and gracefuly recovered.
        - Current Network specific alarms continue to be raised/cleared.
          Note: Prior to this update this was standard system behavior.
degrade - Host is only degraded while it is failing heartbeat.
        - Current Network specific alarms continue to be raised/cleared.
        - heartbeat degrade reason is cleared as are the alarms when
          heartbeat responses resume.
  alarm - The only indication of a heartbeat failure is by alarm.
        - Same set of alarms as in above action cases
        - Only in this case no degrade, no failure, no reboot/reset
   none - Heartbeat is disabled ; no multicase heartbeat message is sent.
        - All existing heartbeat alarms are cleared.
        - The heartbeat soak as part of the enable sequence is bypassed.

The selected action is a system wide setting.
The selected setting also applies to Multi-Node Failure Avoidance.
The default action is the legacy action Fail.

This update also

 1. Removes redundant inservice failure alarm for MNFA case in support
    of degrade only action. Keeping it would make that alarm handling
    case unnecessarily complicated.
 2. No longer used 'hbs calibration' code is removed (cleanup).
 3. Small amount of heartbeat logging cleanup.

Test Plan:
PASS:    fail: Verify MNFA and recovery
PASS:    fail: Verify Single Host heartbeat failure and recovery
PASS:    fail: Verify Single Host heartbeat failure and recovery (from none)
PASS: degrade: Verify MNFA and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery (from alarm)
PASS:   alarm: Verify MNFA and recovery
PASS:   alarm: Verify Single Host heartbeat failure and recovery
PASS:   alarm: Verify Single Host heartbeat failure and recovery (from degrade)
PASS:    none: Verify heartbeat disable, fail ignore and no recovery
PASS:    none: Verify Single Host heartbeat ignore and no recovery
PASS:    none: Verify Single Host heartbeat ignode and no recovery (from fail)
PASS: Verify action change behavior from none to alarm with active MNFA
PASS: Verify action change behavior from alarm to degrade with active MNFA
PASS: Verify action change behavior from degrade to none with active MNFA
PASS: Verify action change behavior from none to fail with active MNFA
PASS: Verify action change behavior from fail to none with active MNFA
PASS: Verify action change behavior from degrade to fail then MNFA timeout
PASS: Verify all heartbeat action change customer logs
PASS: verify heartbeat stats clear over action change
PASS: Verify LO DOR (several large labs - compute and storage systems)
PASS: Verify recovery from failure of active controller
PASS: Verify 3 host failure behavior with MNFA threshold at 3 (action:fail)
PASS: Verify 2 host failure behavior with MNFA threshold at 3 (action:fail)

Change-Id: I198505fb7a923cc760b12082acff1e5bac929ef2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-09-10 08:55:16 -04:00
Kevin Smith e8f97939b0 Disable VIM monitoring of Openstack services for Kubernetes deployment
Story: 2002843
Task: 22790

Change-Id: I0a2b1c2a3799ab6e5c4a5a78cf60895e22469782
Signed-off-by: Kevin Smith <kevin.smith@windriver.com>
2018-09-04 14:27:52 -04:00
Eric MacDonald f19dd0498f Mtce: Make Multi-Node Failure Avoidance Configurable
The maintenance system implements a high availability (HA) feature
designed to detect the simultaneous heartbeat failure of a group
of hosts and avoid failing all those hosts until heartbeat resumes
or after a set period of time.

This feature is called Multi-Node Failure Avoidance, aka MNFA, and
currently has the hosts threshold set to 3 and timeout set to 100 secs.

This update implements enhancements to that existing feature by
making the 'number-of-hosts threshold' and 'timeout period'
customer configurable service parameters.

The new service parameters are listed under platform:maintenance which
display with the following command

> system service-parameter-list

mnfa_threshold: This new label and value is added to the puppet
managed /etc/mtc.ini and represents the number of hosts that are
required to fail heartbeat as a group; within the heartbeat
failure window (heartbeat_failure_threshold) after which maintenance
activates MNFA Mode.

This update changes the default number of failing hosts from
3 to 2 while allowing a configurable range from 2 to 100.

mnfa_timeout: This new label and value is added to the puppet
managed /etc/mtc.ini. While MNFA mode is active, it will remain active
until the number of failing hosts drop below the mnfa_threshold or this
timer expires. The MNFA mode deactivates on the first occurance of
either case. Upon deactivation the remaining failed hosts are no
longer treated as a failure group but instead are all Gracefully
Recovered individually. A value of zero imposes no timeout making the
deactivation criteria solely host based.

This update changes the default 100 second timer to 0; no-timeout
while permitting valid a times range from 100 to 86400 secs or 1 day.

DocImpact
Story: 2003576
Task: 24903

Change-Id: I2fb737a4cd3c235845b064449949fcada303d6b2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-08-31 10:43:25 -04:00
Tao Liu 5421df7098 Decouple Fault Management from stx-config
List of changes:
1.Remove all fault management (FM) database tables from sysinv DB
2.Remove all FM commands from sysinv REST API service
3.Remove all FM CLI commands from cgts client
4.Add FM user to config controller to support region config
5.Update backup restore to reference the new alarm database table
6.Update controller config test files and add the new FM user
7.Add a FM puppet module in order to manage configuration data and
  database; to configure user, service and endpoint in Keystone
8.Add a FM puppet operator to populate FM and SNMP configuration data
9.Update NFV puppet to support FM endpoint configuration
10.Update haproxy manifest to support active-active FM API service

Story: 2002828

Task: 22747

Change-Id: I96d22a18d5872c2e5398f2e9e26a7056fe9b4e82
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-08-16 17:24:19 -04:00
Kevin Smith 38be5431c4 Change permission and ownership on dcorch files
Change puppet to set appropriate dcorch ownerships and privileges
for /etc/dcorch/api-paste.ini and /etc/dcorch/dcorch.conf

Story: 2002992
Task: 23006

Change-Id: I5be797de8bb9d8a7e73b9b7888e155f9f103e7fd
Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>
2018-08-13 16:59:47 -04:00