StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald 8a223f395d Mtce: Add heartbeat cluster information for SM query
This part one of a two part HA Improvements feature that introduces
the collection of heartbeat health at the system level.

The full feature is intended to provide service management (SM)
with the last 2 seconds of maintenace's heartbeat health view that
is reflective of each controller's connectivity to each host
including its peer controller.

The heartbeat cluster summary information is additional information
for SM to draw on when needing to make a choice of which controller
is healthier, if/when to switch over and to ultimately avoid split
brain scenarios in a two controller system.

Feature Behavior: A common heartbeat cluster data structure is
introduced and published to the sysroot for SM. The heartbeat
service populates and maintains a local copy of this structure
with data that reflects the responsivness for each monitored
network of all the monitored hosts for the last 20 heartbeat
periods. Mtce sends the current cluster summary to SM upon request.

General flow of cluster feature wrt hbsAgent:

  hbs_cluster_init: general data init
  hbs_cluster_nums: set controller and network numbers
  forever:

    select:
      hbs_cluster_add / hbs_cluster_del: - add/del hosts from mtcAgent
      hbs_sm_handler -> hbs_cluster_send: - send cluster to SM

    heartbeating:
      hbs_cluster_append: add controller cluster to pulse request
      hbs_cluster_update: get controller cluster data from pulse responses
      hbs_cluster_save: save other controller cluster view in cluster vault
      hbs_cluster_log: log cluster state changes (clog)

Test Plan:

  PASS: Verify compute system install
  PASS: Verify storage system install
  PASS: Verify cluster data ; all members of structure
  PASS: Verify storage-0 state management
  PASS: Verify add of second controller
  PASS: Verify add of storage-0 node
  PASS: Verify behavior over Swact
  PASS: Verify lock/unlock of second controller ; overall behavior
  PASS: Verify lock/unlock of storage-0 ; overall behavior
  PASS: Verify lock/unlock of storage-1 ; overall behavior
  PASS: Verify lock/unlock of compute nodes ; overall behavior
  PASS: Verify heartbeat failure and recovery of compute node
  PASS: Verify heartbeat failure and recovery of storage-0
  PASS: Verify heartbeat failure and recovery of controller
  PASS: Verify delete of controller node
  PASS: Verify delete of storage-0
  PASS: Verify delete of compute node
  PASS: Verify cluster when controller-1 active / controller-0 disabled
  PASS: Verify MNFA and recovery handling
  PASS: Verify handling in presence of multiple failure conditions
  PASS: Verify hbsAgent memory leak soak test with continuous SM query.
  PASS: Verify active controller-1 infra network failure behavior.
  PASS: Verify inactive controller-1 infra network failure behavior.

Change-Id: I4154287f6dcf5249be5ab3180f2752ab47c5da3c
Story: 2003576
Task: 24907
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-10-05 22:47:17 +00:00
api-ref/source stx-metal: API ref doc content added. 2018-09-28 17:33:34 +00:00
bsp-files Fix bug with PXE Boot Server 2018-10-01 14:20:41 -04:00
doc [Doc] OpenStack API Reference Guide 2018-09-05 19:59:26 -05:00
installer Fix linters issues and enable tox/zuul linters job as gate 2018-09-05 09:02:25 +08:00
kickstart Rename mwa-* subdirectories to match the git repo name 2018-07-03 16:29:24 -04:00
mtce Mtce: Add heartbeat cluster information for SM query 2018-10-05 22:47:17 +00:00
mtce-common Mtce: Add heartbeat cluster information for SM query 2018-10-05 22:47:17 +00:00
mtce-compute Fix linters issues and enable tox/zuul linters job as gate 2018-09-05 09:02:25 +08:00
mtce-control Rename mwa-* subdirectories to match the git repo name 2018-07-03 16:29:24 -04:00
mtce-storage Rename mwa-* subdirectories to match the git repo name 2018-07-03 16:29:24 -04:00
releasenotes [Doc] stx.2018.10 Release Summary 2018-09-27 11:45:18 -05:00
.gitignore [Doc] OpenStack API Reference Guide 2018-09-05 19:59:26 -05:00
.gitreview Add .gitreview 2018-05-31 07:36:43 -07:00
.zuul.yaml Add api-ref job 2018-09-28 11:00:48 -05:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst StarlingX open source release updates 2018-05-31 07:36:43 -07:00
centos_iso_image.inc Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
centos_pkg_dirs Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
test-requirements.txt pep8 job enable and fix pep8 reported issue 2018-09-06 09:45:51 +08:00
tox.ini Add some jobs for docs and releasenotes 2018-09-13 20:59:12 -05:00

README.rst

stx-metal

StarlingX Bare Metal Management