Commit Graph

5 Commits

Author SHA1 Message Date
Eric MacDonald 14bb67789e Add pxeboot network mtcAlive messaging to Maintenance
The introduction of the new pxeboot network requires maintenance
verify and report on messaging failures over that network.

Towards that, this update introduces periodic mtcAlive messaging
between the mtcAgent and mtcClinet.

Test Plan:

PASS: Verify install and provision each system type with a mix
             of networking modes ; ethernet, bond and vlan
             - AIO SX, AIO DX, AIO DX plus
             - Standard System 2+1
             - Storage System 2+1+1
PASS: Verify feature with physical on management interface
PASS: Verify feature with vlan on management interface
PASS: Verify feature with bonded management interface
PASS: Verify feature with bonded vlans on management interface
PASS: Verify in bonded cases handling with 2, 1 or no slaves found
PASS: Verify mgmt-combined or separate cluster-host network
PASS: Verify mtcClient pxeboot interface address learning
             - for worker and storage nodes       ; dhcp leases file
             - for controller nodes before unlock ; dhcp leases file
             - for controller nodes after unlock  ; static from ifcfg
             - from controller within 10 seconds of process restart
PASS: Verify mtcAgent pxeboot interface address learning from
             dnsmasq.hosts file
PASS: Verify pxeboot mtcAlive initiation, handling, loss detection
             and recovery
PASS: Verify success and failure handling of all new pxeboot ip
             address learning functions ;
             - dhcp - all system node installs.
             - dnsmasq.hosts - active controller for all hosts.
             - interfaces.d - controller's mtcClient pxeboot address.
             - pxeboot req mtcAlive - mtcAgent mtcAlive request message.
PASS: Verify mtcClient pxeboot network 'mtcAlive request' and 'reboot'
             command handling for ethernet, vlan and bond configs.
PASS: Verify mtcAlive sequence number monitoring, out-of-sequence
             detection, handling and logging.
PASS: Verify pxeboot rx socket binding and non-blocking attribute
PASS: Verify mtcAgent handling stress soaking of sustained incoming
             500+ msgs/sec ; batch handling and logging.
PASS: Verify mtcAgent and mtcClient pxeboot tx and rx socket messaging,
             failure recovery handling and logging.
PASS: Verify pxeboot receiver is not setup on the oam interface on
             controller-0 first install until after initial config
             complete.

Regression:

PASS: Verify mtcAgent/mtcClient online and offline state management
PASS: Verify mtcAgent/mtcClient command handling
      - over management network
      - over cluster-host network
PASS: Verify mtcClient interface chain log for all iface types
      - bond    : vlan123 -> pxeboot0 (802.3ad 4) -> enp0s8 and enp0s9
      - vlan    : vlan123 -> enp0s8
      - ethernet: enp0s8
PASS: Verify mtcAgent/mtcClient handling and logging including debug
      logging for standard operations
      - node install and unlock
      - node lock and unlock
      - node reinstall, reboot, reset
PASS: Verify graceful recovery handling of heartbeat loss failure.
      - node reboot
      - management interface down
PASS: Verify systemcontroller and subcloud install with dc-libvirt
PASS: Verify no log flooding, coredumps, memory leaks

Story: 2010940
Task: 49541
Change-Id: Ibc87b85e3e0e07c3b8c40b5291bd3372506fbdfb
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-03-28 15:28:27 +00:00
marvin 6ccd0f4a43 Monitor the datanetwork for non-OpenStack work node
Update the lmon to support datanetwork interface monitoring
and use collectd to control the alarm information. Now lmon
will obtain the list of interfaces from /etc/lmon/lmon.conf
which can be generated by puppet.

Change-Id: Ice72eda03d1bbdee6c644b1ed7ab878c942eb85c
Story: #2002948
Task: #37326
Signed-off-by: marvin <weifei.yu@intel.com>
2019-12-26 04:00:47 +00:00
Teresa Ho 8e51a1660a Refactor infrastructure network in mtce code
Updated to read the host cluster-host parameter in /etc/hosts
file.
Replaced references of infra network with cluster-host network

Story: 2004273
Task: 29473

Change-Id: I199fb82e5f6b459b181196d0802f1a74220b796e
Signed-off-by: Teresa Ho <teresa.ho@windriver.com>
2019-04-18 09:32:41 -04:00
Eric MacDonald f55ef546a7 Remove Resource Monitor ; aka rmon, from the load
All rmon resource monitoring has been moved to collectd.

This update removes rmon from mtce and the load.

Story: 2002823
Task: 30045

Test Plan:
PASS: Build and install a standard system.
PASS: Inspect mtce rpm list
PASS: Inspect logs
PASS: Check pmon.d

Change-Id: I7cf1fa071eac89274e7fae1f307e14d548cc945b
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-03-19 16:12:38 -04:00
Eric MacDonald 7941ee5bbb Add new Link Monitor (lmond) daemon to Mtce
This update introduces a new Link Monitor daemon to the Mtce
flock of daemons and disable rmon's interface monitoring.

This new daemon parses the platform.conf file and using the
interface names assigned to each monitored network (mgmt,
infra and oam) queries the kernel for their physical,
bonded and vlan interface names and then registers to listen
for netlink events.

All link/interface state change (netlink) events that correspond
to any of the interfaces or links assiciated with the monitored
networks are tracked by this new daemon.

This new daemon then also implements an http listener for
localhost initiated GET requests targeted to /mtce/lmond
on port 2122 and responds with a json link_info string that
contains a summary of monitored networks, links and their
current Up/Down status.

lmond behavioral summary:
  1. learn interface/port model,
  2. load initial link status for learned links,
  3. listen for link status change events
  4. provide link status info to http GET Query requests.

Another update to stx-integ implements the collectd interface
plugin that periodically issues the Link Status GET requests
for the purponse of alarming port and interface Down conditions,
clearing alarms on Up state changes, and storing sample data
that represents the percentage of active links for each monitored
network.

Test Plan:

PASS: Verify lmond process startup
PASS: Verify lmond logging and log rotation
PASS: Verify lmond process monitoring by pmon
PASS: Verify lmond interface learning on process startup
PASS: Verify lmond port learning on process startup
PASS: Verify lmond handling of vlan and bond interface types
PASS: Verify lmond http link info GET Query handling
PASS: Verify lmond has no memory leak during normal and eventfull operation

Change-Id: I58915644e60f31e3a12c3b451399c4f76ec2ea37
Story: 2002823
Task: 28635
Depends-On:
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-01 14:57:40 -05:00