High availability domain regression test cases and fault Management

domain test cases for stx.2019.05 release story board task 29557 29558 and 29561 are to create regression testplan for High availability fault management and SNMP. Change-Id: I4e37e74c684f1db3b49191d1cac19413dd078cb6 Change-Id: Ia9c6b1060f3604583b244a13689bd8c97f7fa930
2019-03-26 10:46:13 -04:00 · 2019-03-26 10:46:13 -04:00 · 52c05f0bf4
parent ae18918fff
commit 52c05f0bf4
3 changed files with 653 additions and 0 deletions
--- a/doc/source/manual_tests/fault_management/index.rst
+++ b/doc/source/manual_tests/fault_management/index.rst
@ -0,0 +1,271 @@
+=================
+Fault Management
+=================
+This test plan covers Fault Management manual regression. It covers basic
+functionality for the following features:
+
+- Enhanced_Log_Management
+- SNMP
+
+----------------------
+Overall  Requirements:
+----------------------
+This test will require access to the following configurations:
+- Regular system
+- Storage system
+- AIO-DX systems
+
+----------
+Test Cases
+----------
+
+.. contents::
+   :local:
+   :depth: 1
+
+```````````````````````````````
+FM_Enhanced_Log_Management_01
+```````````````````````````````
+:Test ID: FM_Enhanced_Log_Management_01
+:Test Title: test_verify_install_of_SDK_module_on_Ubuntu
+:Tags: P2,FM,Enhanced log management,regression
+
+++++++++++++++++++
+Testcase Objective:
+++++++++++++++++++
+Purpose of this test is to verify split brain scenario swact on active
+controller by blocking standby controller and storage on active controller
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+system should be installed with load that has this feature.
+External VM or server is needed to install Remote logging server.
+Remote logging SDK should be available in the server
+
++++++++++
+Test Steps
++++++++++
+1. FTP the SDK module for Kibana log collection tool to Ubunthu os machine.
+2. tar xfv wrs-install-log-server-1.0.0.tgz
+3. Follow instructions from README file which is given in example for
+   installing udp transport.
+   code::
+   cd install-log-server
+   sudo ./install-log-server.sh -i <Sever IP> -u
+   ...
+
+4. Open a web browser and open kibana website to connect to log server
+
+   code::
+   http://<log server ip address>:5601
+   ...
+
+++++++++++++++++
+Expected Behavior
+++++++++++++++++
+Able launch kibana log collection tool using web browser
+http://<log server ip address>:5601
+
+
+```````````````````````````````
+FM_Enhanced_Log_Management_02
+```````````````````````````````
+:Test ID: FM_Enhanced_Log_Management_02
+:Test Title: test_verify_configure_TIS_for_external_log_collection_using_udp
+:Tags: P2,FM,Enhanced log management,regression
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+system should be installed with load that has this feature.
+External VM or server is needed to install Remote logging server.
+Remote logging SDK should be available in the server
+
+++++++++++++++++++
+Testcase Objective:
+++++++++++++++++++
+
+This is to test the Configuration of External logging on TIS with UDP option
+and verify logs collected
+on server.
+
++++++++++
+Test Steps
++++++++++
+1. After setting log server as per test case 1
+2. Configure TIS server with to collected logs  below cli show udp
+   connection option as sdk install
+   code::
+   system remote logging-modify --ip_address 128.224.186.92 --transport udp
+   --enabled True
+   ...
+3. verify the logs are collected and seen over the time period of 10 min.
+   http://<log server ip address>:5601
+
+
+++++++++++++++++
+Expected Behavior
+++++++++++++++++
+Able launch kibana log collection tool using web browser and see the logs
+getting collected http://<log server ip address>:5601
+
+```````````````````````````````
+FM_Enhanced_Log_Management_03
+```````````````````````````````
+:Test ID: FM_Enhanced_Log_Management_03
+:Test Title: test_verify_remote_logging_disable_and_enable
+:Tags: P2,FM,Enhanced log management,regression
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+system should be installed with load that has this feature.
+External VM or server is needed to install Remote logging server.
+Remote logging SDK should be available in the server
+
+++++++++++++++++++
+Testcase Objective:
+++++++++++++++++++
+
+This is to test the Configuration of External logging on TIS with UDP option
+and verify logs collected
+on server.
+
++++++++++
+Test Steps
++++++++++
+1. After setting log server as per test case 2
+2. Disable SDK by below cli on TIS
+
+   code ::
+   system remotelogging-modify --ip_address 128.224.186.92 \
+   --transport udp --enabled false
+   ...
+
+3. Verify the logs not collected and seen over the time period of 5 min or
+   more http://<log server ip address>:5601. There won't be any logs
+   during this disable
+4. Enable SDK by below cli on TIS
+
+   code ::
+   system remotelogging-modify --ip_address 128.224.186.92 \
+   --transport udp --enabled True
+   ...
+
+5. Verify the logs are collected and seen over the time period of 5 min or
+   more. http://<log server ip address>:5601
+
+++++++++++++++++
+Expected Behavior
+++++++++++++++++
+Able launch kibana log collection tool using web browser and see the logs
+when enhanced logging is enabled and not seen when it is disabled
+
+``````````
+FM_SNMP_04
+``````````
+:Test ID: FM_SNMP_04
+:Test Title: test_creating_new community_string_from_cli
+:Tags: P2,FM,SNMP,regression
+
+++++++++++++++++++
+Testcase Objective:
+++++++++++++++++++
+Able to create community string
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+system should be installed with load that has this feature.
+
++++++++++
+Test Steps
++++++++++
+1. Create community string using below cli
+
+   code::
+   system snmp-comm-add -c <comunity>
+   ...
+
+2. Verify that created community using below cli .
+
+   code::
+   system snmp-comm-list
+   ...
+
+++++++++++++++++
+Expected Behavior
+++++++++++++++++
+Able to create SNMP community string and display.
+
+``````````
+FM_SNMP_05
+``````````
+:Test ID: FM_SNMP_05
+:Test Title: SNMP_cli_trap_dest_can_be_deleted
+:Tags: P2,FM,SNMP,regression
+
+++++++++++++++++++
+Testcase Objective:
+++++++++++++++++++
+To verify trap delete and trap is no long received.
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+system should be installed with load that has this feature.
+SNMP trap receiver is installed to receive the trap.
+
++++++++++
+Test Steps
++++++++++
+1. Create community string using below cli
+
+   code::
+   system snmp-comm-add -c <comunity>
+
+2. Create trapdest using below cli.Use  ip address of client and community
+   string that was already created.
+
+   code::
+   system snmp-trapdest-add -i <ip_address> -c <comunity>
+   ...
+
+3. Verify that created trapdest displayed
+
+   code::
+   system snmp-trapdest-list
+   ...
+4. Restart snmp using below cli
+
+   code::
+   snmpd /etc/init.d/snmpd restart)
+   ...
+
+5. Verify that trap is received by the trap listener.By seeing messages
+   in SNMP viewer
+6. Delete trapdest using cli below
+
+   code::
+   system snmp-trapdest-delete <ip_address>)
+   ...
+
+7. Verify that trapdest deleted
+
+   code::
+   system snmp-trapdest-list
+   ...
+
+8. Verify that trap is no longer received by the trap listener.
+
+++++++++++++++++
+Expected Behavior
+++++++++++++++++
+When trap is available messages are seen after trap was deleted there was no
+messages on trap listener.
+
+----------
+References
+----------
+https://wiki.openstack.org/wiki/StarlingX/Containers/Installation
--- a/doc/source/manual_tests/high_availability/index.rst
+++ b/doc/source/manual_tests/high_availability/index.rst
@ -0,0 +1,380 @@
+
+=================
+High availability
+=================
+
+Titanium Clouds Service Management (SM) and Maintenance (Mtce)
+Systems handle transient and persistent networking failures between
+controllers and service hosts (storage/compute) For instance, a
+transient loss of Management Network Carrier on the active controller
+currently triggers an immediate fail-over to the standby controller even
+though the very same failure may exist for that controller as well ; i.e.
+it may be no more healthy that the current active controller. A persistent
+loss of heartbeat messaging to several or all nodes in the system results in
+the forced failure and reboot of all affected nodes once connectivity has been
+re-established.In most of these cases the network event that triggered fault
+handling is external to the system ; i.e. the reboot of a common messaging
+switch for instance, and truly beyond the control of Titanium Cloud HA
+(High Availability) Services. In such cases it's best to be more fault
+tolerant/forgiving than over active.
+
+
+----------------------
+Overall  Requirements
+----------------------
+
+This test will require access to the following configurations:
+- Regular system
+- Storage system
+- AIO-DX systems
+
+----------
+Test Cases
+----------
+
+.. contents::
+   :local:
+   :depth: 1
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_01
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_01
+:Test Title: test_split_brain_avd_activer_or_standby_based_on_only_storage_and
+ standby_controller_blocked_on_active_controller
+:Tags: P2,HA,Recovery improvement,regression
+
+
+++++++++++++++++++
+Testcase Objective
+++++++++++++++++++
+Purpose of this test is to verify split brain scenario swact on active
+controller by blocking standby controller and storage on active controller
+
++++++++++++++++++++
+Test Pre-Conditions
++++++++++++++++++++
+ System should be a storage system
+
+++++++++++
+Test Steps
+++++++++++
+1. Using below cli disconnects management storage-0 and controller-1 active
+   controller-0. Execute below command to block both storage and controller
+   first storage should be blocked and immediately controller should be
+   blocked.
+   code::
+   Execute sudo iptables -I INPUT 1 -s 192.168.222.204 -j DROP
+   ...
+2. Verify connection failure alarm
+3. Verify controller-1 becomes active. Verify system host-list from
+   controller-1
+4. Reboot new standby controller-0. Once reboot complete verify system
+   host-list from active controller.
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+ controller-1 becomes active
+ System host-list shows right states
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_02
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_02
+:Test Title: test_aio_dx_direct_active_controller_lost_connection_to_standby_ip
+:Tags: P2,HA,Recovery improvement,regression
+
+++++++++++++++++++
+Testcase Objective
+++++++++++++++++++
+Purpose of this test is to verify standby controller reboot when it is lost
+connectivity to active controller.
+
++++++++++++++++++++
+Test Pre-Conditions
++++++++++++++++++++
+System should be AIO-DX direct connection. System should be connected to
+BMC module and provisioned. If the BMC not provisioned expected behavior
+there wont be reboot on standby controller
+
+++++++++++
+Test Steps
+++++++++++
+1. Block the standby ip(Management ip) from active controller.
+   code::
+   iptables -I INPUT 1 -s 192.168.222.204
+   ...
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+The stadby controller(controller-1) becomes active
+System host-list shows right states
+controller-0 reboots if the BMC provisioned
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_03
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_03
+:Test Title: test_split_brain-avd_aio_dx_direct_active_controller_lost
+ connection_to_standby_ip_table_drop_on_mgt_infra_and_oam
+:Tags: P2,HA,Recovery improvement,regression
+
+++++++++++++++++++
+Testcase Objective
+++++++++++++++++++
+To verify split-brain scenario by triggering connection failure on MGT infra
+and OAM  on AIO-DX-Direct standby controller
+
++++++++++++++++++++
+Test Pre-Conditions
++++++++++++++++++++
+System should be a AIO-DX-Direct connected system
+
+++++++++++
+Test Steps
+++++++++++
+1. Provision BMC verify BMC provisioned.(if BMC not available there won't
+   be a reboot for loss of connection expected behavior at the time of
+   connection loss is different)
+2. From standby controller to active controller to drop MGT infra and OAM.
+   Example as below.
+   code::
+   sudo iptables -I INPUT 1 -s 192.168.204.4 -j DROP && sudo iptables -I \
+   INPUT 1 -s 192.168.205.3 -j DROP && sudo iptables -I \
+   INPUT 1 -s 128.150.150.96 -j DROP
+   ...
+3. Verify loss of connectivity and alarm on active controller
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+ Verify loss of connectivity and alarm on active controller
+ System host-list shows right states
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_04
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_04
+:Test Title: test_split-brain-avd_active/standby_number_of_the_nodes_reachable
+ _changes_couple_of_times
+:Tags: P2,HA,Recovery improvement,regression
+
++++++++++++++++++++
+Testcase Objective:
++++++++++++++++++++
+Purpose of this test is to verify Active standby controller selection criteria
+on split brain scenario is based on healthier controller.This scenario will be
+repeated after active standby selected and again connection failure on compute.
+
+++++++++++++++++++++
+Test Pre-Conditions:
+++++++++++++++++++++
+The system should have at least 3 or more computes with 2 controller.
+
+++++++++++
+Test Steps
+++++++++++
+
+1. From Active controller controller-0 block control and compute-0
+   communication (if management and infra provisioned both need to be blocked)
+   code::
+   sudo iptables -I INPUT  1 -s 192.168.223.57  -j DROP && sudo iptables\
+   -I INPUT  1 -s 192.168.222.156 -j DROP  && sudo iptables -I INPUT 1 \
+   -s 192.168.222.4 -j DROP  && sudo iptables -I INPUT 1 -s \
+   128.224.150.57 -j DROP
+   ...
+2. Verify connection failure alarm.
+3. Verify swact
+4. unblock compute-0 to controller-0 from controller-0 suing iptables command.
+   code::
+   sudo iptables -D INPUT -s 192.168.223.57  -j DROP && sudo iptables -D \
+   INPUT -s 192.168.222.156 -j DROP  && sudo iptables -D INPUT -s \
+   192.168.222.4 -j DROP  && sudo iptables -D INPUT -s 192.168.223.4 -j \
+   DROP
+   ...
+5. Repeat the above step current active controller block traffic on
+   controller-1 to compute-0
+
+++++++++++++++++++
+Expected Behavior
+++++++++++++++++++
+ controller-1 becomes active
+ System host-list shows right states
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_05
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_05
+:Test Title: test_MNFA_timeouts_2mins_1_hour
+:Tags: P2,HA,Recovery improvement,regression
+
++++++++++++++++++++
+Testcase Objective
++++++++++++++++++++
+Purpose of this test is to validate the trigger of MNFA(Multi Node Failure
+Avoidance)  mode  trigger on alarm based on different timeouts 2mins or 1 hour
+
+++++++++++++++++++++
+Test Pre-Conditions
+++++++++++++++++++++
+The system should have at least 3 or more computes with 2 controller.
+
+++++++++++
+Test Steps
+++++++++++
+1. From Active controller set mnfa_timeout (2mins or 1 hour ) on MNFA can
+   stay active before graceful recovery of affected hosts. Use below commands.
+   Eg:
+   code::
+   system service-parameter-list
+   system service-parameter-modify service=platform section=maintenance \
+   mnfa_timeout = 2 service
+   system service-parameter-apply platform
+   ...
+2. Apply the change and alarm 250.001   controller-0 Configuration is
+   out-of-date cleared using command
+   system service-parameter-apply platform
+3. Trigger heart beat failure by powering off any nodes other than active
+   controller
+4. Verify event-list --log  to see below MNFA enter and exit. If the
+   mnfa_timeout is set to 120
+   seconds mnfa enter exit log time difference will be 120 seconds.
+   If is it set to 1 hour it will be 1hour. Below stings will be seen on alarm.
+
+   host=controller-1.event=mnfa_enter
+   host=controller-1.event=mnfa_exit
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+In the above test MNFA enter and exit would be triggered in event-list log
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_06
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_06
+:Test Title: test_MNFA_timeouts_default
+:Tags: P2,HA,Recovery improvement,regression
+
+++++++++++++++++++
+Testcase Objective
+++++++++++++++++++
+Purpose of this test is to validate the trigger of MNFA mode  with the default
+values.
+
++++++++++++++++++++
+Test Pre-Conditions
++++++++++++++++++++
+The system should have at least 3 or more computes with 2 controller.
+
+++++++++++
+Test Steps
+++++++++++
+
+1. From Active controller
+   Set mnfa_timeout (2mins or 1 hour ) on MNFA can stay active before graceful
+   recovery of affected hosts.
+   Eg:
+   To check current values for mnfa_timeout use system service-parameter-list
+   code::
+   system service-parameter-modify service=platform section=maintenance \
+   mnfa_timeout=<value>
+   system service-parameter-apply platform
+2. Apply the change and alarm 250.001 controller-0 Configuration is
+   out-of-date cleared using command system service-parameter-apply platform
+3. Trigger heart beat failure by powering off any nodes other than active
+   controller.
+4. Verify system event-list --log to see below MNFA enter and exit.
+5. Verify system hosts-list. It will show hosts as degraded when host is in
+   off-line during the MNFA enter and exit.
+   host=controller-1.event=mnfa_enter
+   host=controller-1.event=mnfa_exit
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+In the above test MNFA enter and exit would be triggered in event-list log
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_07
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Test ID: HA_Cloud_Recovery_improvements_07
+:Test Title: test_pull_management_and_OAM_cable_on_active_controller
+:Tags: P2,HA,Recovery improvement,regression
+
++++++++++++++++++++
+Testcase Objective:
++++++++++++++++++++
+This test is to verify OAM & MGT cable pull alarm and swact
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+Any 2+2 system installed latest load.
+
+++++++++++
+Test Steps
+++++++++++
+
+1. Verify no alarms for fm alarm-list
+2. Physically remove OAM and MGT cable on active controller(controller-0) cable
+3. Verify alarm ID (400.005,200.005)
+4. Verify standby controller(controller-0) was swacted sudo sm-dump
+5. Verify system host-list on new active controller
+   all the hosts are available and standby controller off-line.
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+system swact with alarms for cable pull on OAM and MGT
+
+:Test ID: HA_Cloud_Recovery_improvements_08
+:Test Title: test_pull_management_cable_on_standby_controller
+:Tags: P2,HA,Recovery improvement,regression
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+HA_Cloud_Recovery_improvements_08
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
++++++++++++++++++++
+Testcase Objective:
++++++++++++++++++++
+Pull management cable on standby and verify alarm.
+
++++++++++++++++++++
+Test Pre-Conditions:
++++++++++++++++++++
+Any 2+2 system installed latest load.
+
++++++++++++
+Test Steps:
++++++++++++
+
+1. Verify no alarms for fm alarm-list
+2. Physically remove  MGT cable on standby controller(controller-0) cable
+3. Verify current alarm list fm alarm-list alarm id(400.005,200.005)
+4. Verify no change in active controller and other hosts states standby
+   host will be off-line.
+   code ::
+   system host-list
+   ...
+
++++++++++++++++++
+Expected Behavior
++++++++++++++++++
+Verify management failed alarm  ID (400.005,200.005)
+Verify hosts state system host-list
+
+-----------
+References:
+-----------
+https://wiki.openstack.org/wiki/StarlingX/Containers/Installationem
--- a/doc/source/manual_tests/index.rst
+++ b/doc/source/manual_tests/index.rst
@ -13,8 +13,10 @@ For more information about StarlingX, see https://docs.starlingx.io/.
 .. toctree::
   :maxdepth: 2

+   fault_management/index
   gnochi/index
   heat/index
+   high_availability/index
   horizon/index
   maintenance/index
   networking/index