Commit Graph

631 Commits

Author SHA1 Message Date
OpenDev Sysadmins d11d0b2a53 OpenDev Migration Patch
This commit was bulk generated and pushed by the OpenDev sysadmins
as a part of the Git hosting and code review systems migration
detailed in these mailing list posts:

http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html
http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html

Attempts have been made to correct repository namespaces and
hostnames based on simple pattern matching, but it's possible some
were updated incorrectly or missed entirely. Please reach out to us
via the contact information listed at https://opendev.org/ with any
questions you may have.
2019-04-19 19:52:29 +00:00
Scott Little f703fcd766 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: Ib5222b996a44d146033b52a6447ced7fa10e9f1c
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-22 13:29:44 -05:00
Zuul 718e23254f Merge "DevStack plugin updates for bionic job" 2019-02-22 16:52:14 +00:00
Scott Little 354b7edd3d Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: I9f54f6bed6fc7281c078ce6c54963e378430a67d
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-21 13:04:15 -05:00
Zuul 12df3fd47f Merge "Set docker service monitored by pmond" 2019-02-21 06:19:58 +00:00
Dean Troyer ed19c6c0bd DevStack plugin updates for bionic job
This makes some cleanup changes to the DevStack plugin for the
change to master and bionic.
* Define values for precedence handling by the upstream devstack playbook
* Add STX_INST_DIR for a deterministic install location
* Add stx-update to required plugins
* Sort functions in the main plugin
* Cleanup comments in a couple of places

Change-Id: I147882e8980f3e1b599008205db268eb9e7736b0
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-20 15:07:09 -06:00
Bin Qian 5d13b4c911 Set docker service monitored by pmond
override docker service so systemd:
1. to create/remove /var/run/dockerd.pid file at service start/stop.
2. not to restart automatically on exit or failure

deploy docker.conf for pmond to monitor docker service

Story: 2002843
Task: 29391

Change-Id: I3595d0d4f97d90e4119fc1455bcf164aebc5d6ec
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-02-19 11:33:00 -05:00
Scott Little cc346be366 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: Ic0dcbf0f9548cb23f48565d702a00d3b07480735
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-14 12:29:29 -05:00
Zuul 5246aa4f28 Merge "Add numa node and huge page memory monitoring" 2019-02-13 21:41:36 +00:00
Scott Little 4f8f7a86f4 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: I36341c16146ea08ad0d12f620f21e662571f5d40
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-13 12:31:26 -05:00
Eric MacDonald 4dadf61bea Add numa node and huge page memory monitoring
This update adds titled support to the existing
Platform Memory monitor collectd plugin.

Instance Mapping

Plugin Refinements                      Instance Name
-------------------------------------   ----------
Platform Memory                         platform
Platform Memory Numa Node 0             node0
Platform Memory Numa Node 1             node1
Platform Memory Numa Node 0 Huge Pages  node0_hugepages
Platform Memory Numa Node 1 Huge Pages  node1_hugepages

New Alarm Entity IDs added to existing 100.103 alarm ID

host=<hostname>.numa=node0
host=<hostname>.numa=node1
host=<hostname>.numa=node0_hugepages
host=<hostname>.numa=node1_hugepages

Modified memory plugin thresholds and added alarm notifier
to support collectd requiring samples to be 'gt' rather
than 'ge' the specified thresholds for a severity change.

This update also corrects a few subtle pep8 warnings to
a few of the existing python plugins.

There is no need for an rmond update because numa and
huge page monitoring was never enabled in rmond.

Story: 2002823
Task: 29369

PASS: Verify logging of all memory instance types
PASS: Verify monitoring of new numa node memory
PASS: Verify monitoring of new numa node huge page memory
PASS: Verify memory instance alarm handling in fm notifier
PASS: Verify memory instance alarm load on startup
PASS: Verify memory instance alarm clear ; runtime condition gone
PASS: Verify memory instance alarm clear ; startup condition gone

Regression:
PASS: Verify End-To-End Sample Collection for all monitored resources.
Corner Case:
PASS: Verify alarm reporting with threshold of zero
PROG: Verify memory alarm raised at threshold value
PASS: Verify memory alarm cleared 1 below threshold value
PASS: Verify above case for both major and critical thresholds

Change-Id: I4e2612ac7b3d906be4b0a140286dbbb095ce7e1b
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-13 10:30:42 -05:00
Zuul a7bc2baa1a Merge "Enable kubelet service to be monitored by pmond" 2019-02-12 20:13:07 +00:00
Bin Qian 2b5e9fe31a Enable kubelet service to be monitored by pmond
Generate/clear kubelet.pid file at start/stop of kubelet service

Story: 2002843
Task: 29216

Change-Id: I41206c7ea14d79b5d0cbca945e7a6488eda9b7bb
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-02-11 15:33:00 -05:00
Scott Little e27f8169ee Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: I6e6b9430c376c9944c9fed372192bcb09dbee47e
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-11 12:17:26 -05:00
Zuul c17094ab17 Merge "Add network interface monitoring plugin to collectd" 2019-02-11 16:09:30 +00:00
Zuul b880a911db Merge "Check for mount before demoting DRBD filesystem" 2019-02-07 21:13:34 +00:00
Zuul 42622c6978 Merge "Configurable Host HTTP/HTTPS Port Binding" 2019-02-07 19:28:46 +00:00
Don Penney a01bae238b Check for mount before demoting DRBD filesystem
This update enhances the DRBD OCF script to check whether
a filesystem is mounted before attempting to demote it from
Primary to Secondary. The demotion attempt will result in
DRBD state change failures reported to the console if it
is still in use.

Change-Id: Ie5abe5d0858f75bd0d31ce8d8d1d04e7beb83132
Story: 2004520
Task: 29398
Signed-off-by: Don Penney <don.penney@windriver.com>
2019-02-07 11:20:44 -05:00
Eric MacDonald e8c9676d98 Add network interface monitoring plugin to collectd
This update introduces interface monitoring for oam,
mgmt and infra networks as a collectd plugin.

The interface plugin runs and queries the new maintenance
Link Monitor daemon for Link Model and Information every
10 seconds.

The plugin then manages alarms based on the link model similar
to how rmon did in the past ; port and interface alarms.

Severity: Interface and Port levels

Alarm Level  Minor        Major              Critical
-----------  -----  ---------------------    ----------------------------
Interface     N/A   One of lag pair is Up    All Interface ports are Down
     Port     N/A   Physical Link is Down    N/A

Degrade support for interface monitoring is add to the mtce
degrade notifier. Any link down condition results in a host
degrade condition like was in rmon.

Sample Data: represented as % of total links Up for that network interface
100 or 100% percent used - all links of interface are up.
 50 or  50% percent used - one of lag pair is Up and the other is Down
  0 or   0% percent used - all ports for that network are Down

The plugin documents all of this in its header.

This update also

1. Adds the new lmond process to syslog-ng config file.
2. Adds the new lmond process to the mtce patch script.
3. Modifies the cpu, df and memory threshold settings by -1.
   rmon thresholds were precise whereas collectd requires
   that the samples cross the thresholds, not just meet them.
   So for example, in terms of a 90% usage action the
   threshold needs to be 89.

Test Plan: (WIP but almost complete)

PASS: Verify interface plugin startup
PASS: Verify interface plugin logging
PASS: Verify interface plugin Link Status Query and response handling
PASS: Verify monitor, sample storage and grafana display
PASS: verify port and interface alarm matches what rmon produced
PASS: Verify lmon port config from manifest configured plugin
PASS: Verify lmon port config from lmon.conf
PASS: Verify single interface failure handling and recovery
PASS: Verify lagged interface failure handling and recovery
PASS: Verify link loss of lagged interface shared between mgmt and oam (hp380)
PASS: Verify network interface failure handling ; single port
PASS: Verify network interface degrade handling ; lagged interface
PEND: Verify network interface degrade handling ; vlan interface
PASS: Verify HTTP request timeout period and handling
PASS: Verify link status query failure handling - invalid uri (timeout)
PASS: Verify link status query failure handling - missing uri (timeout)
PASS: Verify link status query failure handling - status fail
PASS: Verify link status query failure handling - bad json resp

Change-Id: I2e2dfe6ddfa06a46770245540c7153d330bdf196
Story: 2002823
Task: 28635
Depends-On: https://review.openstack.org/#/c/633264
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-06 14:18:14 -05:00
Tao Liu 448321fed4 Configurable Host HTTP/HTTPS Port Binding
In order to avoid conflicts with containerized services
binding to standard HTTP (80) / HTTPS (443) port numbers,
the default port numbers are changed to 8080 and 8443.

Lighttpd port configuration is performed through puppet,
and the packaged lighttpd.conf uses port 80. As a result,
lighttpd is bind to port 80 before running config_controller.
This prevents patching before running config_controller.
This update changes the default http port to 8080 in the
packaged lighttpd.conf.

8080 is http port and 8008 is horizon port. The default
config file is changed here to be consistent with the port
number configured via puppet.

Story: 2004642
Task: 29300
Depends-On: https://review.openstack.org/#/c/634237/

Change-Id: I52b8f602dc2349ffabd9b90344dfafaf703ee4d7
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2019-02-06 10:58:26 -06:00
Scott Little 52a51cbb15 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: Ie11b0475fd0eae5427d303d0c6fadf5f0a1d11f9
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-06 11:34:44 -05:00
Zuul 34bc8404f0 Merge "Resolve AIO-SX shutdown hang with CEPH ordering hooks" 2019-02-06 00:33:48 +00:00
Don Penney a883e82866 Resolve AIO-SX shutdown hang with CEPH ordering hooks
In ceph-10.2.6, the ceph init script uses systemd-run to launch
ceph-mon and ceph-osd services. This generates transient systemd
service files with basic configuration. On node shutdown, ceph is
getting shutdown while it is still in use by containers, and without
unmapping the RBD devices, causing the libceph kernel module to
hang trying to communicate with the ceph monitor.

This update patches the ceph init script to generate systemd
overrides config files for the ceph-mon and ceph-osd that provide
improved ordering during shutdown, as well as a script to run
as part of the docker.service shutdown (by packaging a systemd
override) to unmap the RBD devices. This ordering ensures kubelet
and docker services are shutdown first, then the RBD devices are
cleaned up, followed by the shutdown of the ceph services and
service management (SM). Once kubelet and docker have shut down,
the ceph-preshutdown.sh script is able to cleanly unmount and
unmap the RBD devices and unload the rbd and libceph
kernel modules.

In ceph-11.0.1, the use of systemd-run was replaced with proper
systemd service configuration files. Once ceph is upgraded for
StarlingX, the ordering and cleanup will need to be revisited.

Story: 2004520
Task: 28258
Change-Id: I6f7d7b9e704121c54211afd86b38df015b8d7a63
Signed-off-by: Don Penney <don.penney@windriver.com>
2019-02-05 17:59:09 -05:00
Eric MacDonald fab989b5bc Add on-demand instance support to collectd alarm manager plugin
Many plugins need support for on-demand instance sampling
and alarming. The filesystem and memory monitoring plugins
are perfect examples. The number of numa nodes or monitored
file systems vary from host to host.

This update adds on-demand instance support. Any plugin
can now support multiple instances. As new plugin
instances are learned ; memory is allocated for them
and linked to that plugins base object and managed as a
separate instance but within the scope of its parent.

The following additional enhancements were made to the common
alarm and degrade plugins.

1. added /opt/etcd as a new monitored filesystem.
2. added common support for vswitch alarm/degrade handling.
3. a few general cleanup changes for code maintainability.

Change-Id: I05b4de78f30fc27362c63b6dbfc97268d6588e4f
Story: 2002823
Task:29297
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-05 12:39:11 -05:00
Scott Little b9a13b38e9 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: I0fc135031ad40558f4227fb36905ae7b4a8cdc95
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-02-04 12:03:27 -05:00
Zuul dc14b89999 Merge "Add low latency per-cpu power management" 2019-02-04 13:56:07 +00:00
Daniel Chavolla 1e9f9ff1f4 Add low latency per-cpu power management
Refactor low latency compute per-cpu power management
out of stx-nova into libvirt qemu hook

Story: 2004610
Task: 28508

Change-Id: I80432b36c4e71d957db51f1742ef87fb519acce2
Signed-off-by: Daniel Chavolla <daniel.chavolla@windriver.com>
2019-02-01 14:27:10 -05:00
Kristal Dale 750b102ed5 Update config
Update conf.py for release notes to include the project
variable, set to the project name. This is so the string
above the left nav renders the project name.

Story: 2004900
Task: 29230

Change-Id: Ib112654387cd8873f805ede225df1bf3e292c697
Signed-off-by: Kristal Dale <kristal.dale@intel.com>
2019-01-30 16:40:20 -08:00
Scott Little 277007da66 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: I4e15d5f5ded260bc7c403a7a8a9a5afe4b103fd2
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-01-29 12:57:40 -05:00
Zuul e76af6d929 Merge "Uprev kubernetes to 1.12.3" 2019-01-24 16:01:06 +00:00
Zuul 0e94333b17 Merge "build mariadb docker image with galera arbitrator added" 2019-01-23 23:43:07 +00:00
Chris Friesen 8b811b39a8 build mariadb docker image with galera arbitrator added
In order to support running the galera arbitrator we need to add
it to the openstack-helm mariadb docker image.  This means building
our own docker image for now.

I've talked with "jayahn" on the openstack-helm IRC channel, and they
said they had no objection to adding galera-arbitrator to their mariadb
image, so we should upstream it as soon as possible.  Once it's
upstreamed we can remove this.

Change-Id: I6ab2607abcd8e0d130ef80fbd1979c62a20a6ff4
Story: 2004712
Task:  29053
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-01-23 17:11:45 -06:00
Scott Little d6a1fd98d6 Merge remote-tracking branch 'starlingx/master' into HEAD
Change-Id: Ic566b87ddbfc0f838dded07306b19b73cd566161
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-01-23 15:59:11 -05:00
Wei Zhou ed8655fa77 Move STX specific files from stx-ceph to stx-integ
By moving STX specific files from stx-ceph to stx-integ, we
decouple STX code from the upstream ceph repo. When making
changes in those STX files, we don't need to make "pull
request" in stx-ceph repo any more.

Change-Id: Ifaaae452798561ddfa7557cf59b072535bec7687
Story: 2002844
Task: 28993
Signed-off-by: Wei Zhou <wei.zhou@windriver.com>
2019-01-23 10:05:40 -05:00
Al Bailey 76989227c7 Uprev kubernetes to 1.12.3
Rather than storing a diff file of the spec file changes,
the original spec file is included for easier comparison.

Story: 2002843
Task: 28909
Change-Id: I11b327e292e9acdeee66d0869f2b159698e40706
Depends-On: Ifb2ca9f36ae2a2f69038f0aad05a4af93eaaa5ad
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-01-22 09:39:50 -06:00
Zuul e12b3a436f Merge "Remove alarm query before clear in NTP plugin" 2019-01-21 14:59:56 +00:00
Eric MacDonald abaff6b275 Remove alarm query before clear in NTP plugin
Issue titled 'NTP 100.14 alarm is not cleared' exposed
an issue where the NTP plugin alarm clear operation is
circumvented when its pre-curser fm_api.get_fault call
returns None if the fm process is not running.
From the callers point of view the None return suggests
that the alarm to be cleared does not exist so the code
skips the call to clear.

This update works around this by simply issuing the
clear without the query.

Change-Id: Idcc05bb0e7e1aa1082af1e8ecdcb1a5463b19440
Closes-Bug: 1812440
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-01-18 16:32:53 -05:00
Jerry Sun 7bb43963d3 Build registry-token-server without dep
This change reworks the registry-token-server package spec with
go dependencies downloaded at mirror-download time, rather than
at build time.  The dependencies (at fixed revisions) are
extracted into the package's build tree for compilation.

Story: 2002840
Task: 22783
Depends-On: https://review.openstack.org/#/c/631001/
Change-Id: Ib7d745c6469beacf029195c3e6eaa4935f398483
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
Signed-off-by: Jason McKenna <jason.mckenna@windriver.com>
2019-01-18 09:36:15 -05:00
Zuul 9fe8574234 Merge "Helm repository replication" 2019-01-17 15:40:23 +00:00
Ovidiu Poncea 6db8e31b21 Add StarlingX specific restart command for Ceph monitors
Since we don't use systemd to manage Ceph and we have pmon monitoring we
have to make sure that:
1. Restarting is properly handled as "systemctl restart" will return
   error and manifest will fail;
2. Pmon does not check ceph-mon status during restart. Otherwise we risk
   getting into a race condition between the puppet restart and pmon
   detecting that ceph is down and trying a restart.

Both are resolved when using /etc/init.d/ceph-init-wrapper restart.

Change-Id: Ie316bb611a006bbbc92ac22c52c3973cc9f15109
Co-Authored-By: Ovidiu Poncea <ovidiu.poncea@windriver.com>
Implements: containerization-2002844-CEPH-persistent-storage-backend-for-Kubernetes
Story: 2002844
Task: 28723
Signed-off-by: Ovidiu Poncea <Ovidiu.Poncea@windriver.com>
2019-01-16 17:05:57 +02:00
Scott Little a02e003618 Update .gitreview for f/stein
Change-Id: I2cfd8fa508b8f072b333d6a19d1674c8aa948e60
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-01-15 14:23:07 -05:00
Angie Wang d2a4c3d012 Helm repository replication
This updates the helm-upload to stop syncing charts to standby
controller as charts are changed to store in drbd fs.

Story: 2004520
Task: 28343
Depends-On: https://review.openstack.org/#/c/630763/
Change-Id: I12f17fae6124650d878ba7a560f94b7a8ed36e56
Signed-off-by: Angie Wang <angie.wang@windriver.com>
2019-01-14 15:05:47 -05:00
Zuul 5d7ebb734c Merge "change 'compute' to 'worker' in collect utils" 2019-01-10 15:12:24 +00:00
Zuul 1ac4479e80 Merge "Add NTP server monitoring as a collectd plugin" 2019-01-10 15:11:53 +00:00
Eric MacDonald 4d7c958711 Add NTP server monitoring as a collectd plugin
This update replaces the currently existing but disabled
ntpq.py plugin with one that does not rely on an external
query_ntp_servers.sh.

This new ntpq.py is an entirely new self contained
implementation of what rmon and query_ntp_servers.sh
was doing but now more efficiently all in one python
plugin file.

Story: 2002823
Task: 22859

Test Plan:
PASS: Verify handling of one and two unreachable NTP servers.
PASS: verify handling of pingable but not an NTP server.
PASS: Verify NTP server re-provisioning from unreachable to reachable server.
PASS: Verify NTP server re-provisioning from reachable to unreachable server.
PASS: Verify NTP server alarms suppressed while controller is locked.
PASS: Verify NTP asserted alarms show up on unlock until cleared.
PASS: Verify NTP server monitoring occurs on controller only.
PASS: Verify NTP unreachable server alarms are cleared over a collectd restart
PASS: Verify NTP minor IP alarms are cleared on process startup
PASS: Verify NTP minor IP alarm clear retries when FM call fails on process startup.
PASS: Verify NTP alarm assertion retry while FM call fails at runtime.
PASS: Verify NTP alarm clear retry while FM call fails at runtime.
PASS: Verify NTP monitoring after controller Swact.
PASS: Verify NTP monitoring cadence is every 10 minutes.
PASS: Verify NTP plugin logs are useful and assist debug without flooding.

Change-Id: I67c4c5518a6e5dec64b4e419ab7ee2ffcefb9bf3
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-01-09 12:47:00 -05:00
Zuul 40905d8dd8 Merge "Adding a pylint tox and zuul job for stx-integ" 2019-01-08 22:24:24 +00:00
Jerry Sun bfca86f5d9 Remove Docker Registry Token Server From Build
Remove Docker Registry Token Server from build for now. Currently,
it needs network access to build, which doesn't work for some people.
Removing it from the build for now to decide how we want to rework
this.

Story: 2002840
Task: 22783

Change-Id: I7991f68288b45255ea850110ce24087297c185ca
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2019-01-08 15:24:27 -05:00
Zuul 0dcd28bea1 Merge "Upversion helm spec to v2.12.1" 2019-01-08 17:18:10 +00:00
Al Bailey 7fbddc4096 Adding a pylint tox and zuul job for stx-integ
The failing pylint warnings and errors are currently
suppressed.  They will be fixed by subsequent commits.

Story: 2004515
Task: 28791
Change-Id: I93a89554bf2dfbd9d1cbd96728a7663c408a79b1
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-01-08 11:14:46 -06:00
Zuul 293f483aea Merge "Add Docker Registry Token Server" 2019-01-08 17:12:22 +00:00