Rebasing Armada to use the latest docker image tag
8a1638098f88d92bf799ef4934abe569789b885e-ubuntu_bionic.
Change-Id: Ic48a2e053d0de7dacfd6a07d817947e11dc8d596
Story: 2006347
Task: 36105
Signed-off-by: Robert Church <robert.church@windriver.com>
This update contains the following misc. config changes to
support ansible bootstrap for system controller.
Creates deps model for dcmanager and dcorch puppet modules.
Creates a system controller postgres run time manifest which is
applied upon the creation of initial controller host or replay after
the distributed cloud role has been changed.
The patch_vault file system is created during the first controller
unlocked.
And allows the dc role to be modified during bootstrap.
Change-Id: Id7b416274b2a854c469bfdca7448bf1ddea639d7
Story: 2004766
Task: 35650
Signed-off-by: Tao Liu <tao.liu@windriver.com>
The default sysinv-api bind host value was changed from
"0.0.0.0" to "::" to support both IPV4 and IPV6.
Change-Id: I072cda6df02f49a94d94871a9f19800f106f49dc
Closes-Bug: 1833459
Signed-off-by: Yi Wang <yi.c.wang@intel.com>
Magnum is no longer packaged on bare metal.
The sysinv and upgrades code related to magnum has been removed.
The helm configuration for magnum remains, although it is not currently
supported in containers either. The magnum-ui is not installed in
platform or containerized horizon so the code to enable it is removed.
Some upgrade code remains, due to the fact that that utility is
in the process of being re-written.
Story: 2004764
Task: 34333
Change-Id: I56873b4e04aac2e7d0cd57909beea00ecc2c1b9a
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
Upversion armada image from existing
af8a9ffd0873c2fbc915794e235dbd357f2adab1
to
dd2e56c473549fd16f94212b553ed58c48d1f51b-ubuntu_bionic
The specific image was chosen because it contained upstream
armada commit df68a90e057c2e1e3427d6b8497b437c8a4c3b7e, which
is a fix for keystone kubernetes auth. The ubuntu bionic image
was chosen because the old image was an ubuntu bionic based image.
Testing done by applying stx-openstack on standard, simplex,
and duplex systems.
Story: 2005860
Task: 33693
Change-Id: Ifd8a66d46e2dfd47ca7c5ab9807076ef43e67027
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
Ceilometer is being setup through helm charts in containers
so the references to ceilometer in bare metal can be cleaned up.
- Removing the sysinv puppet code for ceilometer
- Removing the bare metal ceilometer pipeline upgrade script
- Cleaning up unused variables from templates
Story: 2004764
Task: 33690
Change-Id: I2efe7aed7a4570121c1376c132e157c6f47e9f29
Sysinv use of pacemaker was replaced by SM a long time ago
and the code that referenced it is being removed.
Change-Id: Ic2a55698f64757bffeb9b53f4a105ea6ccb3dd2f
Story: 2004764
Task: 30665
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
add puppet parameters and write to nfv config file to
disable raise/get/clear NFV alarm to container fm-rest-api service
Story: 2004008
Task: 33573
Depends-On: https://review.opendev.org/#/c/658972/
Change-Id: I3ab37fe476ad083b5c8acca2684973eec30b8005
Signed-off-by: Sun Austin <austin.sun@intel.com>
nfvi would raise openstack alarms/logs to the fm in pods,
when it is availiable. following configure changes are
required:
1. add "fault_mgmt_plugin_disabled" para in vim config
file. Set it "True" when openstack application is not
implement, and "False" when it is.
2. add "fault_mgmt_endpoint_disabled" para in alarm
and event_log config file. rules are the same with
"fault_mgmt_plugin_disabled"
3. add "openstack" and "fm" info to alarms and event_logs
config file
Story: 2004008
Task: 30954
Depends-On: https://review.opendev.org/#/c/661548/
Change-Id: Iee2a4515336f4ce9b6373d56d4f7a5779664233d
Signed-off-by: SidneyAn <ran1.an@intel.com>
This update adds the puppet package for keystone DB synchronization
service. This puppet package will be used by controller puppet manifest
to deploy and configure the synchronization service.
Story: 2002842
Task: 22787
Signed-off-by: Andy Ning <andy.ning@windriver.com>
(cherry picked from commit 51b20e03ea)
Conflicts:
centos_pkg_dirs
Depends-On: https://review.opendev.org/#/c/655727
Change-Id: I7059800daa053eaf975ad7f02200247d77653926
Zuul checks out the dependant projects by their repo names.
Repo checks out the project directory structure based on the
labels in the manifest.
Currently these directories have different names and so tox
passes when run by zuul, but fails when run in a developer env.
This submission uses an env variable: "STX_PREFIX" to make
both envs able to run tox.
Story: 2004515
Task: 30664
Change-Id: I06cefab7422f53ccc0b8af30ca06945311cec70e
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
Removes the aodh service parameter alarm_history_time_to_live
Removes references to aodh and gnocchi from puppet and upgrade code.
Removes old gnocchi references from remote logging.
Story: 2004764
Task: 30537
Change-Id: I3a03dd4a2afd47f1cc3f677f02d348eabf11a653
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
- Renaming idle_timeout to connection_recycle_time since it was
deprecated in Stein
- Explicitly including required packages in the sysinv spec file to fix
DC
Change-Id: Ief055d26f3a1eb43b8cf144952a49e7e0f3ff939
Story: 2004765
Task: 28883
Depends-On: https://review.openstack.org/#/c/653086
Signed-off-by: Tyler Smith <tyler.smith@windriver.com>
Using SHA: af8a9ffd0873c2fbc915794e235dbd357f2adab1
which was built and tagged on April 9, 2019.
The previous Armada SHA was from Sept 2018.
The manifest.xml is updated to not generate armada warnings
for libvirt, openvswitch, nova and neutron.
The warning was:
"label_selector" not specified,
waiting with no labels may cause unintended consequences.
Story: 2005198
Task: 30436
Change-Id: I97b633d9e6e1e4574e25dc8b69500faae4b4a809
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
* Remove the nova api proxy puppet module.
* Remove openstack::swift puppet manifest.
* Refactor openstack::nova::storage as platform::worker::storage.
This requires the nova puppet code in sysinv to write to a
different hiera target, and creation of /var/lib/nova.
* Remove puppet modules from spec file for modules that are no
longer being used.
Story: 2004764
Task: 29840
Change-Id: Ifa0171b06e23fd77d373983d644df3f56ae4e2de
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
The following changes are required to enable system
controller and sub cloud configuration in a distributed
cloud environment:
* Remove references to os-keystone-region-name as the
openstack patches that support it, have been removed.
* Change the iptables rule for the NAT entry, to only
apply, if the selected outgoing interface is the
OAM interface.
* Configure keystone endpoints, before configuring
openrc on subclouds
* Remove all openstack services, and users from the region
config and update the tox
* Disable nova, cinder and neutron api proxy
Only tested distributed cloud configuration as multi-region
configuration is not supported in the current release.
Story: 2004766
Task: 30017
Change-Id: I5c43e2112f34225aa9e23ff777c5333ae77efcdc
Signed-off-by: Tao Liu <tao.liu@windriver.com>
Currently docker images were pulled from public registries during
config_controller. For some users, the connection to the public
docker registry may be slow such that installing the containerized
services images may timeout or the system simply does not have
access to the public internet.
This change allows users to specify alternative public/private
registries to replace k8s.gcr.io, gcr.io, quay.io and docker.io.
Insecure registry is supported if all default registries were
replaced by one unified registry. It lowers the complexity for
those who build his own registry without internet access.
Docker doesn't support ipv6 addr as registry name, instead
hostname or domain name in ipv6 network is allowed.
Test:
AIO-SX/AIO-DX/Standard(2+2):
Alternative public registry (ipv4/domain) with proxy
- config_controller pass
Private registry (ipv4/ipv6/domain) without internet
- config_controller pass
Default registry with/without proxy
- config_controller pass
Story: 2004711
Task: 28742
Change-Id: I4fee3f4e0637863b9b5ef4ef556082ac75f62a1d
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
The puppet-modules-wrs is formed by several subcomponents, in all
of them the same changes were applied:
- Create a makefile with a install target.
- Remove license file from build_srpm.data as is not needed.
- Update target in specfile
- Change autosetup to setup in specfile, this was bug in the spec
files.
Testing:
- Verification on correct install paths.
- config_controller complete on simplex configuration.
Change-Id: I1512eb0c3034ffa2d57d098dab9800bdaba5b48d
Story: 2004043
Task: 27552
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
Barbican service is needed during bootstrap phase for StarlingX.
Implement bootstrap and runtime manifests to achieve that.
Change-Id: I6c22ebddacf8aec3a731f7f6d7a762f79f511c78
Story: 2003108
Task: 27700
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
When running "tox -e puppetlint" manually, the tox.ini will
install puppet-lint via gem, but does not automatically install
the json module upon which puppet-lint depends.
This commit adds json to the gem install command.
Change-Id: Ib8b6133395bf76748a8bcac0cb7bd718a89d6d5a
Story: 2004515
Task: 28704
Signed-off-by: Don Penney <don.penney@windriver.com>
This update addresses the following errors and warnings
from puppet-lint:
- 140chars
- case_without_default
- ensure_first_param
- inherits_across_namespaces
- parameter_order
- single_quote_string_with_variables
- variable_is_lowercase
- variable_scope
In the case of variable_is_lowercase, the compute.pp manifest
has variables with sizes like 2M in the name. These have been
left as-is, with lint:ignore comments for the check, due to
the semantics of the name.
For the 140chars check, certain long lines have been left as-is,
with lint:ignore comments, due to long commands being executed.
These can be revisited in a future update to try to break up
the lines and remove the lint:ignore directives.
Change-Id: I37809bacb43818e0956f9f434c30c48e05017325
Story: 2004515
Task: 28685
Signed-off-by: Don Penney <don.penney@windriver.com>
This update adds the tox and zuul configuration to
run puppet-lint against the puppet manifests. The
initial update ignores all existing errors, which
will be cleaned up later.
Change-Id: I293abc2eac6bc6216cbbf6d939c1ba3474fb9384
Story: 2004515
Task: 28665
Signed-off-by: Don Penney <don.penney@windriver.com>
This update replaced the compute personality & subfunction
to worker, and updated internal and customer visible
references.
In addition, the compute-huge package has been renamed to
worker-utils as it contains various scripts/services that
used to affine running tasks or interface IRQ to specific CPUs.
The worker_reserved.conf is now installed to /etc/platform.
The cpu function 'VM' has also been renamed to 'Application'.
Tests Performed:
Non-containerized deployment
AIO-SX: Sanity and Nightly automated test suite
AIO-DX: Sanity and Nightly automated test suite
2+2 System: Sanity and Nightly automated test suite
2+2 System: Horizon Patch Orchestration
Kubernetes deployment:
AIO-SX: Create, delete, reboot and rebuild instances
2+2+2 System: worker nodes are unlock enable and no alarms
Story: 2004022
Task: 27013
Change-Id: I0e0be6b3a6f25f7fb8edf64ea4326854513aa396
Signed-off-by: Tao Liu <tao.liu@windriver.com>
When kubernetes is configured and the OpenStack application has
been installed, the VIM will be configured to access the OpenStack
services running in pods (keystone, nova, rabbitmq, etc...).
In order to support this, some extensions were done to the sysinv
helm code to allow parts of the OpenStack application
configuration to be retrieved (e.g. endpoint info). Changes
were also required to dnsmasq configuration to get resolution of
pod based names (e.g. keystone.openstack.svc.cluster.local)
working properly.
This commit is just the first step and has limitations. There is
no trigger to reconfigure the VIM after the OpenStack application
has been installed - a controller lock/unlock is required.
Story: 2003910
Task: 27852
Change-Id: I1c6dcdecd1365104457009196bbcf06b19c95489
Signed-off-by: Bart Wensley <barton.wensley@windriver.com>
fix some typos in comments, such as the duplicated word 'the the'.
Change-Id: I28ffde825fd95186bc3a0bd077dea7c20287fc1f
Story: 2004164
Task: 27641
Signed-off-by: zhangkunpeng <zhang.kunpeng@99cloud.net>
Initial implementation of Armada integration with sysinv which
entails:
- Basic application upload via system application-upload command
- Application install via system application-apply command
- Application remove via system application-remove command
- Application delete via system application-delete command
- Application list and detail viewing via system
application-list and application-show commands.
This implementation does not cover the following functionalities
that are either still under discussion or in planning:
a) support for remote CLI where application tarball resides in
the client machine
b) support for air-gapped scenario/embedded private images
c) support for custom apps' user overrides
Tests conducted:
- config controller
- tox
- functional tests (both Openstack and simple test app):
- upload
- apply
- remove
- delete
- show
- list
- release group upgrade with user overrides
- failure tests:
- no tar file supplied
- corrupted tar file
- app already exists/does not exist
- upload failure (missing manifest, multi manifests,
no image tags, checksum test failure, etc...)
- apply failure (nodes are not labeled, image download
failure, etc...)
- operation not permitted
Change-Id: Iec27f356bd0047b2c7ef860ab3a2528f5a371868
Story: 2003908
Task: 26792
Signed-off-by: Tee Ngo <Tee.Ngo@windriver.com>
This update contains the following changes for Distributed
Cloud Fernet Key Synching & Management:
1.Disable key rotation cron job for distributed cloud
2.Add a fernet key repo config option in puppet sysinv
3.Add fernet repo sysinv APIs for create/update/retrieve keys
4.Add a fernet operator to create/update/retrieve the keys
Story: 2002842
Task: 22786
Change-Id: Ia14caeef067fa481e3a4159c1658289250632779
Signed-off-by: Tao Liu <tao.liu@windriver.com>
Puppet manifests now include logging data to push to
the conf file. This is needed for a subsequent code
change to change the logging backend to oslo_log
Change-Id: I303e199fd3c984af20564c43bdb98c460cbed0f1
Story: 2004007
Task: 27608
Signed-off-by: Lachlan Plant <lachlan.plant@windriver.com>
Support bare metal and pod based keystone in sysinv. The existing
keystone_authtoken section of sysinv.conf remains and is used for
platform service authentication, while openstack service authentication
parameters are moved to a new openstack_keystone_authtoken section.
Admin credentials are used in the new openstack_keystone_authtoken
section and the region name parameters are also moved to this new
section.
Change-Id: I7a53dd5a2dc52213e0f1e0cc748649a33f0f9f40
Story: 2002876
Task: 26926
Signed-off-by: Kevin Smith <kevin.smith@windriver.com>
Adding configuration to the VIM for containerized keystone. The
VIM will now support two keystone instances:
- platform: bare metal keystone used to authenticate with
platform services (e.g. sysinv, patching)
- openstack: containerized keystone used to authenticate with
openstack services (e.g. nova, neutron, cinder)
For now, the same configuration will be used for both, as we
still only deploy with the baremetal keystone.
Story: 2002876
Task: 26872
Change-Id: If4bd46a4c14cc65978774001cb2887e5d3e3607b
In support of the HA Improvements feature maintenance is required to,
upon request, send SM a summary of maintenance's heartbeat responsiveness
during the last 20 heartbeat periods.
This update adds the required port assignments to the mtc.ini file
in support of said communications.
With this update the mtc.ini file will be updated to contain the
following entries.
; Communication ports between SM and maintenance
sm_server_port = 2124 ; port sm receives mtce commands from
sm_client_port = 2224 ; port mtce receives sm commands from
Change-Id: I05c022f7e4dcdeaea71bc0020641baa331daae57
Story: 2003576
Task: 26837
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit introduces puppet configuration enabling LLDP to operate over
OVS. Specifically, separate ports flows are configured to handle LLDP
traffic.
In addition, we restrict the lldpd daemon from
operating over bridge, tap, and ovs-netdev devices.
Story: 2002946
Task: 22940
Change-Id: Ibadc9c082425412b5b68b02a55e8c02692de0e17
Signed-off-by: Steven Webster <steven.webster@windriver.com>
Do not load vim plugins and disable vim audits instead of
just disabling the endpoints as was previously done in
Change 599741. Leave setting of (new) Nova and (pre-existing)
Neutron endpoint disabled flags for infrastructure host services usage.
Story: 2002876
Task: 26573
Change-Id: Id3af829562e5765b99dbab23d913d65a4e6ec4a7
Signed-off-by: Kevin Smith <kevin.smith@windriver.com>
The healthy query API request triggers sysinv to query the alarm list.
The alarm query is attempted via a sysinv database API which is no
longer supported. This results in the REST API request failure.
This update contains the following changes to address the issue:
1.Add FM catalog info to sysinv puppet class and manifest
2.Add service catalog to the user request context
3.Add a FM client interface to communicate with FM API
4.Update the health query to retrieve the alarm list via FM client
Closes-Bug: # 1789983
Change-Id: I31b256f6de22fe70cba59b08bf927c8b0ac119ee
Signed-off-by: Tao Liu <tao.liu@windriver.com>
The current maintenance heartbeat failure action handling is to Fail
and Gracefully Recover the host. This means that maintenance will
ensure that a heartbeat failed host is rebooted/reset before it is
recovered but will avoid rebooting it a second time if its recovered
uptime indicates that it has already rebooted.
This update expands that single action handling behavior to support
three new actions. In doing so it adds a new configuration service
parameter called heartbeat_failure_action. The customer can configure
this new parameter with any one of the following 4 actions in order of
decreasing impact.
fail - Host is failed and gracefuly recovered.
- Current Network specific alarms continue to be raised/cleared.
Note: Prior to this update this was standard system behavior.
degrade - Host is only degraded while it is failing heartbeat.
- Current Network specific alarms continue to be raised/cleared.
- heartbeat degrade reason is cleared as are the alarms when
heartbeat responses resume.
alarm - The only indication of a heartbeat failure is by alarm.
- Same set of alarms as in above action cases
- Only in this case no degrade, no failure, no reboot/reset
none - Heartbeat is disabled ; no multicase heartbeat message is sent.
- All existing heartbeat alarms are cleared.
- The heartbeat soak as part of the enable sequence is bypassed.
The selected action is a system wide setting.
The selected setting also applies to Multi-Node Failure Avoidance.
The default action is the legacy action Fail.
This update also
1. Removes redundant inservice failure alarm for MNFA case in support
of degrade only action. Keeping it would make that alarm handling
case unnecessarily complicated.
2. No longer used 'hbs calibration' code is removed (cleanup).
3. Small amount of heartbeat logging cleanup.
Test Plan:
PASS: fail: Verify MNFA and recovery
PASS: fail: Verify Single Host heartbeat failure and recovery
PASS: fail: Verify Single Host heartbeat failure and recovery (from none)
PASS: degrade: Verify MNFA and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery (from alarm)
PASS: alarm: Verify MNFA and recovery
PASS: alarm: Verify Single Host heartbeat failure and recovery
PASS: alarm: Verify Single Host heartbeat failure and recovery (from degrade)
PASS: none: Verify heartbeat disable, fail ignore and no recovery
PASS: none: Verify Single Host heartbeat ignore and no recovery
PASS: none: Verify Single Host heartbeat ignode and no recovery (from fail)
PASS: Verify action change behavior from none to alarm with active MNFA
PASS: Verify action change behavior from alarm to degrade with active MNFA
PASS: Verify action change behavior from degrade to none with active MNFA
PASS: Verify action change behavior from none to fail with active MNFA
PASS: Verify action change behavior from fail to none with active MNFA
PASS: Verify action change behavior from degrade to fail then MNFA timeout
PASS: Verify all heartbeat action change customer logs
PASS: verify heartbeat stats clear over action change
PASS: Verify LO DOR (several large labs - compute and storage systems)
PASS: Verify recovery from failure of active controller
PASS: Verify 3 host failure behavior with MNFA threshold at 3 (action:fail)
PASS: Verify 2 host failure behavior with MNFA threshold at 3 (action:fail)
Change-Id: I198505fb7a923cc760b12082acff1e5bac929ef2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
The maintenance system implements a high availability (HA) feature
designed to detect the simultaneous heartbeat failure of a group
of hosts and avoid failing all those hosts until heartbeat resumes
or after a set period of time.
This feature is called Multi-Node Failure Avoidance, aka MNFA, and
currently has the hosts threshold set to 3 and timeout set to 100 secs.
This update implements enhancements to that existing feature by
making the 'number-of-hosts threshold' and 'timeout period'
customer configurable service parameters.
The new service parameters are listed under platform:maintenance which
display with the following command
> system service-parameter-list
mnfa_threshold: This new label and value is added to the puppet
managed /etc/mtc.ini and represents the number of hosts that are
required to fail heartbeat as a group; within the heartbeat
failure window (heartbeat_failure_threshold) after which maintenance
activates MNFA Mode.
This update changes the default number of failing hosts from
3 to 2 while allowing a configurable range from 2 to 100.
mnfa_timeout: This new label and value is added to the puppet
managed /etc/mtc.ini. While MNFA mode is active, it will remain active
until the number of failing hosts drop below the mnfa_threshold or this
timer expires. The MNFA mode deactivates on the first occurance of
either case. Upon deactivation the remaining failed hosts are no
longer treated as a failure group but instead are all Gracefully
Recovered individually. A value of zero imposes no timeout making the
deactivation criteria solely host based.
This update changes the default 100 second timer to 0; no-timeout
while permitting valid a times range from 100 to 86400 secs or 1 day.
DocImpact
Story: 2003576
Task: 24903
Change-Id: I2fb737a4cd3c235845b064449949fcada303d6b2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
List of changes:
1.Remove all fault management (FM) database tables from sysinv DB
2.Remove all FM commands from sysinv REST API service
3.Remove all FM CLI commands from cgts client
4.Add FM user to config controller to support region config
5.Update backup restore to reference the new alarm database table
6.Update controller config test files and add the new FM user
7.Add a FM puppet module in order to manage configuration data and
database; to configure user, service and endpoint in Keystone
8.Add a FM puppet operator to populate FM and SNMP configuration data
9.Update NFV puppet to support FM endpoint configuration
10.Update haproxy manifest to support active-active FM API service
Story: 2002828
Task: 22747
Change-Id: I96d22a18d5872c2e5398f2e9e26a7056fe9b4e82
Signed-off-by: Tao Liu <tao.liu@windriver.com>
Change puppet to set appropriate dcorch ownerships and privileges
for /etc/dcorch/api-paste.ini and /etc/dcorch/dcorch.conf
Story: 2002992
Task: 23006
Change-Id: I5be797de8bb9d8a7e73b9b7888e155f9f103e7fd
Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>