Commit Graph

7 Commits

Author SHA1 Message Date
Eric MacDonald 14bb67789e Add pxeboot network mtcAlive messaging to Maintenance
The introduction of the new pxeboot network requires maintenance
verify and report on messaging failures over that network.

Towards that, this update introduces periodic mtcAlive messaging
between the mtcAgent and mtcClinet.

Test Plan:

PASS: Verify install and provision each system type with a mix
             of networking modes ; ethernet, bond and vlan
             - AIO SX, AIO DX, AIO DX plus
             - Standard System 2+1
             - Storage System 2+1+1
PASS: Verify feature with physical on management interface
PASS: Verify feature with vlan on management interface
PASS: Verify feature with bonded management interface
PASS: Verify feature with bonded vlans on management interface
PASS: Verify in bonded cases handling with 2, 1 or no slaves found
PASS: Verify mgmt-combined or separate cluster-host network
PASS: Verify mtcClient pxeboot interface address learning
             - for worker and storage nodes       ; dhcp leases file
             - for controller nodes before unlock ; dhcp leases file
             - for controller nodes after unlock  ; static from ifcfg
             - from controller within 10 seconds of process restart
PASS: Verify mtcAgent pxeboot interface address learning from
             dnsmasq.hosts file
PASS: Verify pxeboot mtcAlive initiation, handling, loss detection
             and recovery
PASS: Verify success and failure handling of all new pxeboot ip
             address learning functions ;
             - dhcp - all system node installs.
             - dnsmasq.hosts - active controller for all hosts.
             - interfaces.d - controller's mtcClient pxeboot address.
             - pxeboot req mtcAlive - mtcAgent mtcAlive request message.
PASS: Verify mtcClient pxeboot network 'mtcAlive request' and 'reboot'
             command handling for ethernet, vlan and bond configs.
PASS: Verify mtcAlive sequence number monitoring, out-of-sequence
             detection, handling and logging.
PASS: Verify pxeboot rx socket binding and non-blocking attribute
PASS: Verify mtcAgent handling stress soaking of sustained incoming
             500+ msgs/sec ; batch handling and logging.
PASS: Verify mtcAgent and mtcClient pxeboot tx and rx socket messaging,
             failure recovery handling and logging.
PASS: Verify pxeboot receiver is not setup on the oam interface on
             controller-0 first install until after initial config
             complete.

Regression:

PASS: Verify mtcAgent/mtcClient online and offline state management
PASS: Verify mtcAgent/mtcClient command handling
      - over management network
      - over cluster-host network
PASS: Verify mtcClient interface chain log for all iface types
      - bond    : vlan123 -> pxeboot0 (802.3ad 4) -> enp0s8 and enp0s9
      - vlan    : vlan123 -> enp0s8
      - ethernet: enp0s8
PASS: Verify mtcAgent/mtcClient handling and logging including debug
      logging for standard operations
      - node install and unlock
      - node lock and unlock
      - node reinstall, reboot, reset
PASS: Verify graceful recovery handling of heartbeat loss failure.
      - node reboot
      - management interface down
PASS: Verify systemcontroller and subcloud install with dc-libvirt
PASS: Verify no log flooding, coredumps, memory leaks

Story: 2010940
Task: 49541
Change-Id: Ibc87b85e3e0e07c3b8c40b5291bd3372506fbdfb
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-03-28 15:28:27 +00:00
Eric MacDonald e379fdfe18 Prevent pmond process recovery when system is not running
The maintenance process monitor (pmon) should only
recover failed processes when the system state is
'running' or 'degraded'.

The current implementation allowed process recovery
for other non-inservice states, including an unknown
state if systemd returns no data on the state query.

This update tighten's up the system state check by
adding retries to the state query utility and
restricting accepted states to 'running' and 'degraded'.

This change then prevents pmon from inadvertently killing
and recovering the mtcClient which indirectly kills off
the mtcClient's fail-safe sysreq reboot child thread
if pmon state query returns anything other than running
or degraded during a shut down.

Change-Id: I605ae8be06f8f8351a51afce98a4f8bae54a40fd
Closes-Bug: 1883519
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-06-15 11:09:47 -04:00
Eric MacDonald b13750d5f0 Make Mtce system mode scan case in-sensitive
Mtce is looking for 'Standard' in system_type and will
default to AIO if the 'S' in 'standard' is not uppercase.

This update makes the mtce system_type driver handle
system_type and system_mode readings case in-sensitive.

Tested on-system case variations for both type and mode
in platform.conf.

Closes-Bug: 1827904
Change-Id: I5e33097e1b13e5b5d385929dd13e7912ae89ead8
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-05-06 19:14:14 +00:00
Teresa Ho 8e51a1660a Refactor infrastructure network in mtce code
Updated to read the host cluster-host parameter in /etc/hosts
file.
Replaced references of infra network with cluster-host network

Story: 2004273
Task: 29473

Change-Id: I199fb82e5f6b459b181196d0802f1a74220b796e
Signed-off-by: Teresa Ho <teresa.ho@windriver.com>
2019-04-18 09:32:41 -04:00
Alex Kozyrev 506ef3fd7f MTCE: reading BMC passwords from Barbican secret storage.
Use Openstack Barbican API to retrieve BMC passwords stored by SysInv.
See SysInv commit for details on how to write password to Barbican.
MTCE is going to find corresponding secret by host uuid and retrieve
secret payload associated with it. mtcSecretApi_get is used to find
secret reference, based on a hostname. mtcSecretApi_read is used to
read a password using the reference found on a prevoius step.
Also, did a little cleanup and removed old unused token handling code.

Depends-On: I7102a9662f3757c062ab310737f4ba08379d0100
Change-Id: I66011dc95bb69ff536bd5888c08e3987bd666082
Story: 2003108
Task: 27700
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
2019-02-14 09:04:46 -05:00
Eric MacDonald 7941ee5bbb Add new Link Monitor (lmond) daemon to Mtce
This update introduces a new Link Monitor daemon to the Mtce
flock of daemons and disable rmon's interface monitoring.

This new daemon parses the platform.conf file and using the
interface names assigned to each monitored network (mgmt,
infra and oam) queries the kernel for their physical,
bonded and vlan interface names and then registers to listen
for netlink events.

All link/interface state change (netlink) events that correspond
to any of the interfaces or links assiciated with the monitored
networks are tracked by this new daemon.

This new daemon then also implements an http listener for
localhost initiated GET requests targeted to /mtce/lmond
on port 2122 and responds with a json link_info string that
contains a summary of monitored networks, links and their
current Up/Down status.

lmond behavioral summary:
  1. learn interface/port model,
  2. load initial link status for learned links,
  3. listen for link status change events
  4. provide link status info to http GET Query requests.

Another update to stx-integ implements the collectd interface
plugin that periodically issues the Link Status GET requests
for the purponse of alarming port and interface Down conditions,
clearing alarms on Up state changes, and storing sample data
that represents the percentage of active links for each monitored
network.

Test Plan:

PASS: Verify lmond process startup
PASS: Verify lmond logging and log rotation
PASS: Verify lmond process monitoring by pmon
PASS: Verify lmond interface learning on process startup
PASS: Verify lmond port learning on process startup
PASS: Verify lmond handling of vlan and bond interface types
PASS: Verify lmond http link info GET Query handling
PASS: Verify lmond has no memory leak during normal and eventfull operation

Change-Id: I58915644e60f31e3a12c3b451399c4f76ec2ea37
Story: 2002823
Task: 28635
Depends-On:
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-01 14:57:40 -05:00
Jim Gauld 6a5e10492c Decouple Guest-server/agent from stx-metal
This decouples the build and packaging of guest-server, guest-agent from
mtce, by splitting guest component into stx-nfv repo.

This leaves existing C++ code, scripts, and resource files untouched,
so there is no functional change. Code refactoring is beyond the scope
of this update.

Makefiles were modified to include devel headers directories
/usr/include/mtce-common and /usr/include/mtce-daemon.
This ensures there is no contamination with other system headers.

The cgts-mtce-common package is renamed and split into:
- repo stx-metal: mtce-common, mtce-common-dev
- repo stx-metal: mtce
- repo stx-nfv: mtce-guest
- repo stx-ha: updates package dependencies to mtce-pmon for
  service-mgmt, sm, and sm-api

mtce-common:
- contains common and daemon shared source utility code

mtce-common-dev:
- based on mtce-common, contains devel package required to build
  mtce-guest and mtce
- contains common library archives and headers

mtce:
- contains components: alarm, fsmon, fsync, heartbeat, hostw, hwmon,
  maintenance, mtclog, pmon, public, rmon

mtce-guest:
- contains guest component guest-server, guest-agent

Story: 2002829
Task: 22748

Change-Id: I9c7a9b846fd69fd566b31aa3f12a043c08f19f1f
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2018-09-18 17:15:08 -04:00