Commit Graph

12 Commits

Author SHA1 Message Date
Erich Cordoba c8735e882a Remove version from sm folder
The sm component had the 1.0.0 version in the folder name, this
change removes that version and updates the centos_pkg_dirs.

Story: 2006623
Task: 36827

Depends-On: https://review.opendev.org/#/c/685128/
Change-Id: I6725d1f961c2a82275da5fabbff8e89a8dd6f245
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-26 14:11:31 -05:00
Bin Qian 7d35e10e2c Handling provision/deprovision request
Handle provision/deprovision request
To provision a service
1. load service and service group member records
2. reload service dependency table
3. schedule new provisioned service to proper state

To deprovision a service
1. deregister all monitor timers
2. remove the service and service group member records
3. reload service dependency table
4. deregister service process monitor

Story: 2005486
Task: 30621

Depends-on: https://review.opendev.org/#/c/653749/
Depends-on: https://review.opendev.org/#/c/653783/

Change-Id: Ib6a49da31e2e50e8e1175e39c34d04d333616f9d
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-04-25 15:08:18 -04:00
Marcela Rosales 3d478b6e27 Standardizing Makefile install target name for service-mgmt/sm
Change-Id: I0b4a5aa7d498a49550c5890a79d7ff8edeae148d
Story: 2004043
Task: 27540
Signed-off-by: Marcela Rosales <marcela.a.rosales.jimenez@intel.com>
2019-02-01 12:51:53 -06:00
Luis Botello 196c036013 Improve security by avoiding buffer overflows
This patch adds compiler flags to improve the security of STX code.
Flags added:
Format string vulnerabilities: CFLAGS="-Wformat -Wformat-security"
Compiler will treat string format warnings as errors,
so at compiling level, buffer overflow is avoided.

Story: 2004380
Task: 28823

Signed-off-by: Luis Botello  <luis.botello.ortega@intel.com>
Reviewed-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
             Victor Rodriguez <vm.rod25@gmail.com>
Suggested-by: Victor Rodriguez <vm.rod25@gmail.com>
             Erich Cordoba <erich.cordoba.malibran@intel.com>

Change-Id: I45a0002288db434bc79c477c231f900e477347a1
2019-01-09 05:34:07 -06:00
Bin Qian 28e293bda5 Retrieve hbs cluster info
This change includes:
1. adds code to receive cluster info update from hbsAgent.
2. support of ondemand hbs cluster info query (asynchronous).

Depends-On: I7d294d40e84469df6b6a6f6dd490cf3c4557b711

Story: 2003577
Task: 27816

Change-Id: Idb65abc58b4afe9649aba442f0798c24d9fffb10
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-11-30 11:59:23 -05:00
Bin Qian 133da10b08 split-brain avoidance improvement
This change enables one way communication via BMC (if configured)
through mtce.
when 2 controllers lost all communications to each other.
The algorithm is:
when communications all lost,
both active and standby controllers, verify its interfaces (mgmt,
infra, and oam)
if active controller is healthy, it will request a bmc reset
thorugh mtce, against standby controller.
if standby controller is healthy, it will active itself and wait
a total 45 seconds before requesting a bmc reset through mtce,
against the active controller.

Changes also include:
1. adding new initial failover state.
   initial state is a state before the node is enabled
2. remove failover thread.
   using worker thread action to perform time consuming operations
3. remove entire failover action table

Story: 2003577
Task:  24901
Change-Id: I7d294d40e84469df6b6a6f6dd490cf3c4557b711
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-11-08 20:18:43 +00:00
Bin Qian edc8a56472 Introduce failover FSM
Introduce failover FSM to handle communication failure between
controllers.
Failover FSM has 4 states:
Normal: when system running with full redundency
Fail Pending: communication failure occured
Failed: the controller is determined as failure. Its peer will
        assume service
Survived: the controller is determined as survivor. Its peer has
        failed

The controllers are in one of the below possible state pairs:
normal/normal, fail-pending/fail-pending, failed/survived

A failed controller will not resume responsbility before the
system restores its full redundency (normal/normal)

A survivor will not fail before the system restores its
full redundency (normal/normal)

Future implementation may allow an administrator to force
a failed controller become active, to manually recover
(with possiblity of losing data), should the survivor is
no longer capable to provide service.

Story: 2003577
Task: 26404

Change-Id: I51635e9e60b6fb6bad89e06c9f08d3f28e21db82
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-09-18 08:08:40 -04:00
Bin Qian 68b5ce3835 SM to monitor infra i/f and swact when needed
Individual services should not fail itself and trigger swact when infra i/f goes down
SM will collect the overrall system healthy state to schedule the services.

Story: 2003577
Task: 24899

Change-Id: Ifa7453136f34768b99e2bcd741d1065e69ef452e
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-09-11 02:28:26 +00:00
Bin Qian a0c243b4a3 Create worker thread
This change list is to create worker thread mechanism for the
upcoming one time tasks to run in separated thread.

Story: 2003577
Task: 24898
Change-Id: I5378b80763b104bcf0af95cb083de0cf61463788
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-09-04 15:08:48 -04:00
Bin Qian 53a055cb3a remove incorrect logging when standby controller failed
Add condition for the logging so to log only when the active controller
failure which triggers a uncontrollered swact.
The following changes are made:
1. move get_controller_state to a new sm_failover_utils.c and renamed it
   to sm_get_controller_state.
2. use the above function to check ensure to log only when the controller
   schedulering state is changing (swact).

Closes-Bug: 1788697

Change-Id: I145b579c2d31e8c9e184894774d3a1c06c9149d7
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-08-24 14:20:08 -04:00
Bin Qian c2c8228d30 sm components to use C++11 standard
This commit does not have any functional or performance change.
This change adds -std=c++11 compile option to Makefiles to
enalbe using C++11 standard features.
This change also cleans errors that c++11 standard complains:
Werror=literal-suffix,
e.g:
   "msg_seq=%"PRIi64"."
   error: invalid suffix on literal;
          C++11 requires a space between literal and identifier
changed to:
   "msg_seq=%" PRIi64 "."

Story: 2003493
Task: 24770

Change-Id: I0225a4326ff8320f36246cc5678698781e903617
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2018-08-20 15:44:01 +00:00
Dean Troyer 17c909ec83 StarlingX open source release updates
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-05-31 07:36:26 -07:00