This makes some cleanup changes to the DevStack plugin for the
change to master and bionic.
* Define values for precedence handling by the upstream devstack playbook
* Add STX_INST_DIR for a deterministic install location
* Add stx-update to required plugins
* Sort functions in the main plugin
* Cleanup comments in a couple of places
Change-Id: I147882e8980f3e1b599008205db268eb9e7736b0
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
override docker service so systemd:
1. to create/remove /var/run/dockerd.pid file at service start/stop.
2. not to restart automatically on exit or failure
deploy docker.conf for pmond to monitor docker service
Story: 2002843
Task: 29391
Change-Id: I3595d0d4f97d90e4119fc1455bcf164aebc5d6ec
Signed-off-by: Bin Qian <bin.qian@windriver.com>
This update adds titled support to the existing
Platform Memory monitor collectd plugin.
Instance Mapping
Plugin Refinements Instance Name
------------------------------------- ----------
Platform Memory platform
Platform Memory Numa Node 0 node0
Platform Memory Numa Node 1 node1
Platform Memory Numa Node 0 Huge Pages node0_hugepages
Platform Memory Numa Node 1 Huge Pages node1_hugepages
New Alarm Entity IDs added to existing 100.103 alarm ID
host=<hostname>.numa=node0
host=<hostname>.numa=node1
host=<hostname>.numa=node0_hugepages
host=<hostname>.numa=node1_hugepages
Modified memory plugin thresholds and added alarm notifier
to support collectd requiring samples to be 'gt' rather
than 'ge' the specified thresholds for a severity change.
This update also corrects a few subtle pep8 warnings to
a few of the existing python plugins.
There is no need for an rmond update because numa and
huge page monitoring was never enabled in rmond.
Story: 2002823
Task: 29369
PASS: Verify logging of all memory instance types
PASS: Verify monitoring of new numa node memory
PASS: Verify monitoring of new numa node huge page memory
PASS: Verify memory instance alarm handling in fm notifier
PASS: Verify memory instance alarm load on startup
PASS: Verify memory instance alarm clear ; runtime condition gone
PASS: Verify memory instance alarm clear ; startup condition gone
Regression:
PASS: Verify End-To-End Sample Collection for all monitored resources.
Corner Case:
PASS: Verify alarm reporting with threshold of zero
PROG: Verify memory alarm raised at threshold value
PASS: Verify memory alarm cleared 1 below threshold value
PASS: Verify above case for both major and critical thresholds
Change-Id: I4e2612ac7b3d906be4b0a140286dbbb095ce7e1b
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Closes-Bug: 1814360
Test Case:
Deploy 2 controller and 1 compute on bare metal
Change-Id: I4ec59180a28ac743935601332cb8f210e87e4a85
Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>
Generate/clear kubelet.pid file at start/stop of kubelet service
Story: 2002843
Task: 29216
Change-Id: I41206c7ea14d79b5d0cbca945e7a6488eda9b7bb
Signed-off-by: Bin Qian <bin.qian@windriver.com>
This update enhances the DRBD OCF script to check whether
a filesystem is mounted before attempting to demote it from
Primary to Secondary. The demotion attempt will result in
DRBD state change failures reported to the console if it
is still in use.
Change-Id: Ie5abe5d0858f75bd0d31ce8d8d1d04e7beb83132
Story: 2004520
Task: 29398
Signed-off-by: Don Penney <don.penney@windriver.com>
This update introduces interface monitoring for oam,
mgmt and infra networks as a collectd plugin.
The interface plugin runs and queries the new maintenance
Link Monitor daemon for Link Model and Information every
10 seconds.
The plugin then manages alarms based on the link model similar
to how rmon did in the past ; port and interface alarms.
Severity: Interface and Port levels
Alarm Level Minor Major Critical
----------- ----- --------------------- ----------------------------
Interface N/A One of lag pair is Up All Interface ports are Down
Port N/A Physical Link is Down N/A
Degrade support for interface monitoring is add to the mtce
degrade notifier. Any link down condition results in a host
degrade condition like was in rmon.
Sample Data: represented as % of total links Up for that network interface
100 or 100% percent used - all links of interface are up.
50 or 50% percent used - one of lag pair is Up and the other is Down
0 or 0% percent used - all ports for that network are Down
The plugin documents all of this in its header.
This update also
1. Adds the new lmond process to syslog-ng config file.
2. Adds the new lmond process to the mtce patch script.
3. Modifies the cpu, df and memory threshold settings by -1.
rmon thresholds were precise whereas collectd requires
that the samples cross the thresholds, not just meet them.
So for example, in terms of a 90% usage action the
threshold needs to be 89.
Test Plan: (WIP but almost complete)
PASS: Verify interface plugin startup
PASS: Verify interface plugin logging
PASS: Verify interface plugin Link Status Query and response handling
PASS: Verify monitor, sample storage and grafana display
PASS: verify port and interface alarm matches what rmon produced
PASS: Verify lmon port config from manifest configured plugin
PASS: Verify lmon port config from lmon.conf
PASS: Verify single interface failure handling and recovery
PASS: Verify lagged interface failure handling and recovery
PASS: Verify link loss of lagged interface shared between mgmt and oam (hp380)
PASS: Verify network interface failure handling ; single port
PASS: Verify network interface degrade handling ; lagged interface
PEND: Verify network interface degrade handling ; vlan interface
PASS: Verify HTTP request timeout period and handling
PASS: Verify link status query failure handling - invalid uri (timeout)
PASS: Verify link status query failure handling - missing uri (timeout)
PASS: Verify link status query failure handling - status fail
PASS: Verify link status query failure handling - bad json resp
Change-Id: I2e2dfe6ddfa06a46770245540c7153d330bdf196
Story: 2002823
Task: 28635
Depends-On: https://review.openstack.org/#/c/633264
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
In order to avoid conflicts with containerized services
binding to standard HTTP (80) / HTTPS (443) port numbers,
the default port numbers are changed to 8080 and 8443.
Lighttpd port configuration is performed through puppet,
and the packaged lighttpd.conf uses port 80. As a result,
lighttpd is bind to port 80 before running config_controller.
This prevents patching before running config_controller.
This update changes the default http port to 8080 in the
packaged lighttpd.conf.
8080 is http port and 8008 is horizon port. The default
config file is changed here to be consistent with the port
number configured via puppet.
Story: 2004642
Task: 29300
Depends-On: https://review.openstack.org/#/c/634237/
Change-Id: I52b8f602dc2349ffabd9b90344dfafaf703ee4d7
Signed-off-by: Tao Liu <tao.liu@windriver.com>
In ceph-10.2.6, the ceph init script uses systemd-run to launch
ceph-mon and ceph-osd services. This generates transient systemd
service files with basic configuration. On node shutdown, ceph is
getting shutdown while it is still in use by containers, and without
unmapping the RBD devices, causing the libceph kernel module to
hang trying to communicate with the ceph monitor.
This update patches the ceph init script to generate systemd
overrides config files for the ceph-mon and ceph-osd that provide
improved ordering during shutdown, as well as a script to run
as part of the docker.service shutdown (by packaging a systemd
override) to unmap the RBD devices. This ordering ensures kubelet
and docker services are shutdown first, then the RBD devices are
cleaned up, followed by the shutdown of the ceph services and
service management (SM). Once kubelet and docker have shut down,
the ceph-preshutdown.sh script is able to cleanly unmount and
unmap the RBD devices and unload the rbd and libceph
kernel modules.
In ceph-11.0.1, the use of systemd-run was replaced with proper
systemd service configuration files. Once ceph is upgraded for
StarlingX, the ordering and cleanup will need to be revisited.
Story: 2004520
Task: 28258
Change-Id: I6f7d7b9e704121c54211afd86b38df015b8d7a63
Signed-off-by: Don Penney <don.penney@windriver.com>
Many plugins need support for on-demand instance sampling
and alarming. The filesystem and memory monitoring plugins
are perfect examples. The number of numa nodes or monitored
file systems vary from host to host.
This update adds on-demand instance support. Any plugin
can now support multiple instances. As new plugin
instances are learned ; memory is allocated for them
and linked to that plugins base object and managed as a
separate instance but within the scope of its parent.
The following additional enhancements were made to the common
alarm and degrade plugins.
1. added /opt/etcd as a new monitored filesystem.
2. added common support for vswitch alarm/degrade handling.
3. a few general cleanup changes for code maintainability.
Change-Id: I05b4de78f30fc27362c63b6dbfc97268d6588e4f
Story: 2002823
Task:29297
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Refactor low latency compute per-cpu power management
out of stx-nova into libvirt qemu hook
Story: 2004610
Task: 28508
Change-Id: I80432b36c4e71d957db51f1742ef87fb519acce2
Signed-off-by: Daniel Chavolla <daniel.chavolla@windriver.com>
Update conf.py for release notes to include the project
variable, set to the project name. This is so the string
above the left nav renders the project name.
Story: 2004900
Task: 29230
Change-Id: Ib112654387cd8873f805ede225df1bf3e292c697
Signed-off-by: Kristal Dale <kristal.dale@intel.com>
integrity tarball in my local mirror is wrong, cause the patch is
not correct. Correct the patch with the right tarball.
Story: 2004521
Task: 29194
Change-Id: Iee0e7afa12b8583d1bb3d620a5f7626a28f57fed
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Porting upstream patch to fix the build failure with CentOS 7.6 kernel
If we choose to upgrade tpm driver to include this patch, there will
be other build failure due to some structure missing in 957 kernel.
So I decide to back port upstream patch instead of upgrade tpm driver.
Depends-On: https://review.openstack.org/625785
Depends-On: https://review.openstack.org/625786
Story: 2004521
Task: 28534
Change-Id: I00d88f4d27ac47107825a17b3bf6d8c74194a7ff
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
In order to support running the galera arbitrator we need to add
it to the openstack-helm mariadb docker image. This means building
our own docker image for now.
I've talked with "jayahn" on the openstack-helm IRC channel, and they
said they had no objection to adding galera-arbitrator to their mariadb
image, so we should upstream it as soon as possible. Once it's
upstreamed we can remove this.
Change-Id: I6ab2607abcd8e0d130ef80fbd1979c62a20a6ff4
Story: 2004712
Task: 29053
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
By moving STX specific files from stx-ceph to stx-integ, we
decouple STX code from the upstream ceph repo. When making
changes in those STX files, we don't need to make "pull
request" in stx-ceph repo any more.
Change-Id: Ifaaae452798561ddfa7557cf59b072535bec7687
Story: 2002844
Task: 28993
Signed-off-by: Wei Zhou <wei.zhou@windriver.com>
Porting upstream patch to fix the build failure with the new kernel
Depends-On: https://review.openstack.org/625785
Depends-On: https://review.openstack.org/625786
Story: 2004521
Task: 28584
Change-Id: I261d2d9534d90064d250ffabc11221caadcc2a04
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>