Commit Graph

11 Commits

Author SHA1 Message Date
Eric MacDonald aaf9d08028 Mtce: Fix bmc password fetch error handling
The mtcAgent process sometimes segfaults while trying to fetch
the bmc password from a failing barbican process.

With that issue fixed the mtcAgent sends the bmc access
credentials to the hardware monitor (hwmond) process which
then segfaults for a reason similar

In cases where the process does not segfault but also does not
get a bmc password, the mtcAgent will flood its log file.

This update

 1. Prevents the segfault case by properly managing acquired
    json-c object releases. There was one in the mtcAgent and
    another in the hardware monitor (hwmond).

    The json_object_put object release api should only be called
    against objects that were created with very specific apis.
    See new comments in the code.

 2. Avoids log flooding error case by performing a password size
    check rather than assume the password is valid following the
    secret payload receive stage.

 3. Simplifies the secret fsm and error and retry handling.

 4. Deletes useless creation and release of a few unused json
    objects in the common jsonUtil and hwmonJson modules.

Note: This update temporarily disables sensor and sensorgroup
      suppression support for the debian hardware monitor while
      a suppression type fix in sysinv is being investigated.

Test Plan:

PASS: Verify success path bmc password secret fetch
PASS: Verify secret reference get error handling
PASS: Verify secret password read error handling
PASS: Verify 24 hr provision/deprov success path soak
PASS: Verify 24 hr provision/deprov error path path soak
PASS: Verify no memory leak over success and failure path soaking
PASS: Verify failure handling stress soak ; reduced retry delay
PASS: Verify blocking secret fetch success and error handling
PASS: Verify non-blocking secret fetch success and error handling
PASS: Verify secret fetch is set non-blocking
PASS: Verify success and failure path logging
PASS: Verify all of jsonUtil module manages object release properly
PASS: Verify hardware monitor sensor model creation, monitoring,
             alarming and relearning. This test requires suppress
             disable in order to create sensor groups in debian.
PASS: Verify both ipmi and redfish and switch between them with
             just bm_type change.
PASS: Verify all above tests in CentOS
PASS: Verify over 4000 provision/deprovision cycles across both
             failure and success path handling with no process
             failures

Closes-Bug: 1975520
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Ibbfdaa1de662290f641d845d3261457904b218ff
2022-06-01 15:21:05 +00:00
Eric MacDonald 7539d36c3f Prevent mtcClient from sending to uninitialized socket in AIO SX
The mtcClient will perform a socket reinit if it detects a socket
failure. The mtcClient also avoids setting up its controller-1
cluster network socket for the AIO SX system type ; because there
is no controller-1 provisioned.

Most AIO SX systems have the management/cluster networks set to
the 'loopback' interface. However, when an AIO SX system is setup
with its management and cluster networks on physical interfaces,
with or without vlan, the mtcAlive send message utility will try
to send to the uninitialized controller-1 cluster socket. This
leads to a socket error that triggers a socket reinitialization
loop which causes log flooding.

This update adds a check to the mtcAlive send utility to avoid
sending mtcAlive to controller-1 for AIO SX system type where
there is no controller-1 provisioned; no send,no error,no flood.

Since this update needed to add a system type check, this update
also implemented a system type definition rename from CPE to AIO.
Other related definitions and comments were also changed to make
the code base more understandable and maintainable

Test Plan:

PASS: Verify AIO SX with mgmnt/clstr on physical (failure mode)
PASS: Verify AIO SX Install with mgmnt/clstr on 'lo'
PASS: Verify AIO SX Lock msg and ack over mgmnt and clstr
PASS: Verify AIO SX locked-disabled-online state
PASS: Verify mtcClient clstr socket error detect/auto-recovery (fit)
PASS: Verify mtcClient mgmnt socket error detect/auto-recovery (fit)

Regression:

PASS: Verify AIO SX Lock and Unlock (lazy reboot)
PASS: Verify AIO DX and DC install with pv regression and sanity
PASS: Verify Standard system install with pv regression and sanity

Change-Id: I658d33a677febda6c0e3fcb1d7c18e5b76cb3762
Closes-Bug: 1897334
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2021-04-21 10:20:10 -04:00
Eric MacDonald 126cdfa369 Make daemon_get_file_str return first line in specified file
The current implementation will return only the first
group of characters up to the first space from the first
line of the specified file.

This function was intended to return the entire first line.

Change-Id: Ic34361c32aeff564f4645070279cdb53d5b87626
Closes-Bug: 1896669
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2020-09-22 18:19:24 -04:00
Eric MacDonald 4d2383818f Add bmc protocol select to maintenance
This update adds BMC Info Query command handling and
info logging to maintenance.

Example of the logs produced by the BMC Query are

  compute-2 manufacturer is Intel Corporation
  compute-2 model number:<str>  part number:<str>  serial number:<str>
  compute-2 BIOS firmware version is SE5C620.86B.00.01.0013.030920180427
  compute-2 BMC  firmware version is unavailable
  compute-2 power is on
  compute-2 has 2 processors
  compute-2 has 192 GiB of memory

Please note that the default protocol remains IPMI even
if Redfish support is detected. This is because the
power/reset/netboot control implementation for Redfish
has not yet been implemented.

Test Plan:

PASS: Verify redfish BMC info query logging.
PASS: Verify IPMI remains the default selected protocol.

Regression:

PASS: Verify sensor monitoring and alarming still works.
PASS: Verify power-off command handling.
PASS: Verify power-on command handling.
PASS: Verify reset command handling.
PASS: Verify reboot command handling.
PASS: Verify reinstall (netboot) command handling.

Change-Id: I654056119018a1751a70495e3df8b541d9e00b93
Story: 2005861
Task: 35826
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-09-08 14:14:15 -04:00
Eric MacDonald b13750d5f0 Make Mtce system mode scan case in-sensitive
Mtce is looking for 'Standard' in system_type and will
default to AIO if the 'S' in 'standard' is not uppercase.

This update makes the mtce system_type driver handle
system_type and system_mode readings case in-sensitive.

Tested on-system case variations for both type and mode
in platform.conf.

Closes-Bug: 1827904
Change-Id: I5e33097e1b13e5b5d385929dd13e7912ae89ead8
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-05-06 19:14:14 +00:00
Teresa Ho 8e51a1660a Refactor infrastructure network in mtce code
Updated to read the host cluster-host parameter in /etc/hosts
file.
Replaced references of infra network with cluster-host network

Story: 2004273
Task: 29473

Change-Id: I199fb82e5f6b459b181196d0802f1a74220b796e
Signed-off-by: Teresa Ho <teresa.ho@windriver.com>
2019-04-18 09:32:41 -04:00
Eric MacDonald 7e8be89143 Make Mtce default to Simplex system type if label is missing
This update refactors daemon_system_type function so that it
returns a SIMPLEX system type if it is unable to properly
find and parse the system_mode/system_type from platform.conf

This is needed for Ansible Bootstrap Deployment where mtcAgent
and mtcClient need to run and function like it would in a
simplex system prior to the system type being added to the
platform.conf file.

Change-Id: Ib0130f3559ee3aa8d8d8203ea59d4896a571944f
Story: 2004695
Task: 28714
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-04 14:15:40 +00:00
Zuul e472f51aed Merge "fix compilation warnings in c/cpp files" 2018-10-30 02:37:06 +00:00
Yong Hu 40404e86de fix compilation warnings in c/cpp files
The warnings might be treated as errors when the build system using compiler,
for example, gcc/g++ 8.2.1, with "-O2 -Wall -Wextra -Werror" options.

Story: 2004134
Task: 27591

Change-Id: I576a8c0305a4c32772fbc750ef39c73334b19336
Signed-off-by: Yong Hu <yong.hu@intel.com>
2018-10-23 07:38:33 +00:00
Martin Chen c89f0ffa20 Fix resource leak issues, file not close case
Partial-Bug: 1794903

Change-Id: Id6de282c27374d578a0a41869ec6a934e6675db4
Signed-off-by: Martin Chen <haochuan.z.chen@intel.com>
2018-10-20 02:08:42 +08:00
Jim Gauld 6a5e10492c Decouple Guest-server/agent from stx-metal
This decouples the build and packaging of guest-server, guest-agent from
mtce, by splitting guest component into stx-nfv repo.

This leaves existing C++ code, scripts, and resource files untouched,
so there is no functional change. Code refactoring is beyond the scope
of this update.

Makefiles were modified to include devel headers directories
/usr/include/mtce-common and /usr/include/mtce-daemon.
This ensures there is no contamination with other system headers.

The cgts-mtce-common package is renamed and split into:
- repo stx-metal: mtce-common, mtce-common-dev
- repo stx-metal: mtce
- repo stx-nfv: mtce-guest
- repo stx-ha: updates package dependencies to mtce-pmon for
  service-mgmt, sm, and sm-api

mtce-common:
- contains common and daemon shared source utility code

mtce-common-dev:
- based on mtce-common, contains devel package required to build
  mtce-guest and mtce
- contains common library archives and headers

mtce:
- contains components: alarm, fsmon, fsync, heartbeat, hostw, hwmon,
  maintenance, mtclog, pmon, public, rmon

mtce-guest:
- contains guest component guest-server, guest-agent

Story: 2002829
Task: 22748

Change-Id: I9c7a9b846fd69fd566b31aa3f12a043c08f19f1f
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2018-09-18 17:15:08 -04:00