This update added ipsec auth server pmon configuration file in
mtce-control package. The pmon configuration file is only needed
on controller node, as ipsec-server is running on controllers only.
Test Plan:
PASS: In a deployed system, verify ipsec-server is running
PASS: kill the ipsec-server process, verify that it is started
by pmon.
Story: 2010940
Task: 49484
Co-Authored-By: Andy Ning <andy.ning@windriver.com>
Change-Id: Iadb9ca6f086640d008880a21cfd97256b00ab7ab
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
Update debian package versions to use git commits for:
- mtce (old 9, new 30)
- mtce-common (old 1, new 9)
- mtce-compute (old 3, new 4)
- mtce-control (old 7, new 10)
- mtce-storage (old 3, new 4)
The Debian packaging has been changed to reflect all the
git commits under the directory, and not just the commits
to the metadata folder.
This ensures that any new code submissions under those
directories will increment the versions.
Test Plan:
PASS: build-pkgs -p mtce
PASS: build-pkgs -p mtce-common
PASS: build-pkgs -p mtce-compute
PASS: build-pkgs -p mtce-control
PASS: build-pkgs -p mtce-storage
Story: 2010550
Task: 47401
Task: 47402
Task: 47403
Task: 47404
Task: 47405
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I4846804320b0ad3ec10799a468a9ee3bf7973587
Remove the installation of per-package preset installs
since they are centrally managed now by the ISO install
for the following packages:
- mtce-compute
- mtce-control
- mtce-storage
Story: 2009968
Task: 46406
Test Plan
PASS Build package
PASS Build ISO
PASS Check for non-existant preset file in /etc/systemd/system-preset
Depends-On: https://review.opendev.org/c/starlingx/integ/+/853653
Signed-off-by: Charles Short <charles.short@windriver.com>
Change-Id: Ica1a99efe2336fdb6096086f46189dfd25efc6e1
Removed conf files from /etc/pmon.d/
as they are being moved to another location.
This is part of an effort to allow pmon conf files
to be selected at runtime by kickstarts.
The change is debian-only, since centos support
will be dropped soon.
Centos' pmon conf files remain in /etc/pmon.d/
Test Plan:
PASS - deb doesn't install anything to /etc/pmon.d/
PASS - rpm files unchanged
PASS - AIOSX unlocked-enabled-available
PASS - Standard 2+2 unlocked-enabled-available
Story: 2010211
Task: 46306
Depends-On: https://review.opendev.org/c/starlingx/metal/+/855095
Signed-off-by: Leonardo Fagundes Luz Serrano <Leonardo.FagundesLuzSerrano@windriver.com>
Change-Id: I086db0750df5626d2a8ba1010153ce4f45535ca5
Created a duplicate install of /etc/pmon.d/*.conf files
to /usr/share/starlingx/pmon.d/
This is part of an effort to allow pmon conf files
to be selected at runtime by kickstarter.
Test Plan:
PASS: duplicate conf on deb
Story: 2010211
Task: 46112
Signed-off-by: Leonardo Fagundes Luz Serrano <Leonardo.FagundesLuzSerrano@windriver.com>
Change-Id: Ie07c1bfa370da5b2ec71fe3fce948d59be1dd098
- Ensure that the service is started when the package
is installed.
- Ensure that the service dependencies are started
when the package is installed.
- Simplify debian/rules to use the Makefile in order
to install the files that are needed.
Test Plan
PASS Build package and ISO
PASS Boot and check for goenabled-control.service
Story: 2009101
Task: 43023
Signed-off-by: Chuck Short <charles.short@windriver.com>
Change-Id: I3863042357257ffbcfaf8084da2f44853e0b6264
Modified mtce and mtce-control to address the following
failing services on Debian:
hbsAgent.service
hbsClient.service
hwmon.service
lmon.service
mtcalarm.service
mtclog.service
runservices.service
Applied fix:
- Included modified .service files for debian
directly into into the deb_folder.
- Changed the init files to account for the different
locations of the init-functions and service daemons
on Debian and CentOS
- Included "override_dh_installsystemd" section
to rules in order to start services at boot.
Test Plan:
PASS: Package installed and ISO built successfully
PASS: Ran "systemctl list-units --failed" and verified that the
services are not failing
PASS: Ran "systemctl status <service_name>" for
each service and verified that they are active
Story: 2009101
Task: 44192
Signed-off-by: Matheus Machado Guilhermino <Matheus.MachadoGuilhermino@windriver.com>
Change-Id: I50915c17d6f50f5e20e6448d3e75bfe54a75acc0
Some of the code used TRUE instead of true which did not compile
for Debian. These instances were changed to true.
Some #define constants generated narrowing errors because their
values are negative in a 32 bit integer. These values were
explicitly casted to int in the case statements causing the errors.
Story: 2009101
Task: 43426
Signed-off-by: Tracey Bogue <tracey.bogue@windriver.com>
Change-Id: Iffc4305660779010969e0c506d4ef46e1ebc2c71
The current heartbeat cluster state change notification
needs to be sent when heartbeat pulses begin to be missed
rather than only after the host has reached the Heartbeat
Loss threshold. This buys SM more time, almost a full
second, and in doing so provides more accurate data for
it to make its SM heartbeat failure handling decisions.
This update also begins sending maintenance heartbeat
cluster state change notifications just before the next
multicast pulse request but after the cluster vault is
updated from the last pulse period. This ensures that
SM gets the most up-to-date cluster information.
This update also changes the hbsAgent's service file
to depend on the local hbsClient. By doing so, the
hbsAgent shuts down earlier over a graceful reboot
thereby preventing the hbsAgent from continuing to
report healthy response to the inactive controller
during active controller shutdown.
This way the inactive SM sees the failed active
controller when it queries the cluster in its
fail-pending state resulting in an inactive SM
take-over rather than stand-down.
Additional hbsAgent service file changes were made to
prevent systemd from auto recovering a failed hbsAgent
process, as its monitored and managed by pmond, and
fixed the ExecStop command line.
Test Plan:
PASS: Verify active controller graceful reboot.
Standby controller takes over rather than shutdown
- 30 of 30 iterations
PASS: Verify active controller forced reboot
PASS: Verify enabled standby controller graceful reboot
PASS: Verify Standard System install
PASS: Verify AIO DX system install
Regression:
PASS: Verify SM Uncontrolled Swact if active
controller Mgmnt link drops.
PASS: Verify handling of downed cluster interface in
- AIO DX (fail) and Standard (degrade) system
PASS: Verify no coredumps
PASS: Verify update as a patch
Change-Id: I6869631e091eb28a3cbb6f15d9a8ccd939c54410
Closes-Bug: 1906556
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Maintenance heartbeat service should not be multicast
messaging over an 'lo' interface which in IPv6 leads
to socket failures, log flooding and the inability to
detect and report pmond process failure.
To fix that this update
- configures pulse messaging to unicast for monitored
networks configured as 'lo'.
- prevents heartbeating over the cluster network if both
it and the management network are both configured on
the 'lo' interface.
- improves logging to avoid flooding in the presence of
socket setup or access errors.
- stops logging netlink events (interface state changes)
on unmonitored network interfaces.
- maintains heartbeat disabled state until the management
network is up.
- modifies hbsAgent socket failure handling and its pmon
conf file so that a persistent socket failure during
startup is alarmed as an hbsAgent process failure.
Test Plan:
PASS: Verify logging over system install and socket errors
PASS: Verify unicast messaging when cluster is set to 'lo'
PASS: Verify no cluster network heartbeat when it and mgmnt
are set to 'lo'.
Regression:
PASS: Verify heartbeat messaging and cluster info
PASS: Verify pmond process failure alarm management
PASS: Verify heartbeat failure detection and graceful recovery
PASS: Verify AIO SX IPv6 system install and run
PASS: Verify AIO DX IPv6 system install and run
PASS: Verify Standard IPv6 system install and run
PASS: Verify Storage system IPv6 install and run
PASS: Verify Storage system IPv4 install and run
PASS: Verify MNFA handling in IPv6 storage system
Change-Id: I5a2a0b2dee0c690617c4e0b0e2ab8b1172b2dc49
Closes-Bug: 1884585
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This update makes use of the PKG_GITREVCOUNT variable
to auto-version the mtce packages in this repo.
Change-Id: Ifb4da4570e0261bbdcf0d7af79b8add7cfc133ac
Story: 2006166
Task: 39822
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
1. Rename Titanium Cloud to StarlingX for .spec files
2. Rename Titanium Cloud to StarlingX for .service file
Test:
After the de-brand change, bootimage.iso has built in the flock layer
and installed on the dev machine to validate the changes.
Please note, doing de-brand changes in batches, this is batch1 changes.
Story: 2006387
Task: 36207
Change-Id: Ifa4dc5c7aa3189815e00b796fc833852e88c8fe3
Signed-off-by: Sharath Kumar K <sharath.kumar@intel.com>
The openSUSE spec files needs to have the path of the source code in
the setup to have the package generation automated through _service
file in OBS.
Change-Id: I2b7c08d5772025c02821dfb9fc944fff0f5b6f90
Story: 2006508
Task: 36812
Signed-off-by: Marcela Rosales <marcela.a.rosales.jimenez@intel.com>
It is required for the goenabled and hbsAgent scripts headers to be
compliant with LSB in order to build on OBS infrastructure.
Story: 2005684
Task: 33442
Change-Id: Ic1ad5722b725c04d91f1650065faca3dc7b5c2c9
Signed-off-by: Hayde Martinez <hayde.martinez.landa@intel.com>
This update introduces mtce changes to support Active-Active Heartbeating.
The purpose of Active-Active Heartbeating is help avoid Split-Brain.
Active-Active heartbeating has each controller maintain a 5 second
heartbeat response history cache of each network for all monitored
hosts as well as the on-going health of storage-0 if provisioned and
enabled.
This is referred to as the 'heartbeat cluster history'
Each controller then includes its cluster history in each heartbeat
pulse request message.
The hbsClient, now modified to handle heartbeat from both controllers,
saves each controllers' heartbeat cluster history in a local cache and
criss-crosses the data in its pulse responses.
So when the hbsClient receives a pulse request from controller-0 it
saves its reported history and then replaces that history information
in its response to controller-0 with what it saved from controller-1's
last pulse request ; i.e. its view of the system.
Controller-0, receiving a host's pulse response, saves its peers
heartbeat cluster history so that it has summary of heartbeat
cluster history for the last 5 seconds for each monitored network
of every monitored host in the system from both controllers'
perspectives. Same for controller-1 with controller-0's history.
The hbsAgent is then further enhanced to support a query request
for this information.
So now SM, when it needs to make a decision to avoid Split-Brain
or otherwise, can query either controller for its heartbeat cluster
history and get the last 5 second summary view of heartbeat (network)
responsivness from both controllers perspectives to help decide which
controller to make active.
This involved removing the hbsAgent process from SM control and monitor
and adding a new hbsAgent LSB init script for process launch, service
file to run the init script and pmon config file for hbsAgent process
monitoring.
With hbsAgent now running on both controllers, changes to maintenance
were required to send inventory to hbsAgent on both controllers,
listen for hbsAgent event messages over the management interface
and inform both hbsAgents which controller is active.
The hbsAgent running on the inactive controller does not
- does not send heartbeat events to maintenance
- does not send raise or clear alarms or produce customer logs
Test Plan:
Feature:
PASS: Verify hbsAgent runs on both controllers
PASS: Verify hbsAgent as pmon monitored process (not SM)
PASS: Verify system install and cluster collection in all system types (10+)
PASS: Verify active controller hbsAgent detects and handles heartbeat loss
PASS: Verify inactive controller hbsAgent detects and logs heartbeat loss
PASS: Verify heartbeat cluster history collection functions properly.
PASS: Verify storage-0 state tracking in cluster into.
PASS: Verify storage-0 not responding handling
PASS: Verify heartbeat response is sent back to only the requesting controller.
PASS: Verify heartbeat history is correct from each controller
PASS: Verify MNFA from active controller after install to controller-0
PASS: Verify MNFA from active controller after swact to controller-1
PASS: Verify MNFA for 80%+ of the hosts in the storage system
PASS: Verify SM cluster query operation and content from both controllers
PASS: Verify restart of inactive hbsAgent doesn't clear existing heartbeat alarms
Logging:
PASS: Verify cluster info logs.
PASS: Verify feature design logging.
PASS: Verify hbsAgent and hbsClient design logs on all hosts add value
PASS: Verify design logging from both controllers in heartbeat loss case
PASS: Verify design logging from both controllers in MNFA case
PASS: Verify clog logs cluster info vault status and updates for controllers
PASS: Verify clog1 logs full cluster state change for all hosts
PASS: Verify clog2 logs cluster info save/append logs for controllers
PASS: Verify clog3 memory dumps a cluster history
PASS: Verify USR2 forces heartbeat and cluster info log dump
PASS: Verify hourly heartbeat and cluster info log dump
PASS: Verify loss events force heartbeat and cluster info log dump
Regression:
PASS: Verify Large System DOR
PASS: Verify pmond regression test that now includes hbsAgent
PASS: Verify Lock/Unlock of inactive controller (x3)
PASS: Verify Swact behavior (x10)
PASS: Verify compute Lock/Unlock
PASS: Verify storage-0 Lock/Unlock
PASS: Verify compute Host Failure and Graceful Recovery
PASS: Verify Graceful Recovery Retry to Max:3 then Full Enable
PASS: Verify Delete Host
PASS: Verify Patching hbsAgent and hbsClient
PASS: Verify event driven cluster push
Story: 2003576
Task: 24907
Change-Id: I5baf5bcca23601a99473d039356d58250ffb01b5
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
There are 2 duplicated LICESE files in mtce-control, mtce-compute,
and mtce-storage. Additionally, LICENSE was not placed in the root
directory of src RPM, so this patch is made as an enhancement or fix.
After this change, license file location and code structure in all 4
modules (mtce-common, mtce-compute, mtce-storage and mtce-control)
will be the same.
Test method: make a clean build and check src RPM and binary RPM
to assure there is only one LICENSE in correct place.
Story: 2004186
Task: 27676
Change-Id: Id71a7450e8b45438c5d15976ae8e853b9ba8f4f5
Signed-off-by: Yong Hu <yong.hu@intel.com>
Rename files and folders in mtce-compute, mtce-control, and
mtce-storage. As well update packages' names in bsp-files/
filter_out_* scripts accordingly.
Story: 2004079
Task: 27485
Change-Id: Ic1e9bd4bb8d72f30ddcc2a2bfc602a1a34e583da
Signed-off-by: Yong Hu <yong.hu@intel.com>