diff --git a/doc/source/manual_tests/fault_management/index.rst b/doc/source/manual_tests/fault_management/index.rst new file mode 100644 index 0000000..6e2c29e --- /dev/null +++ b/doc/source/manual_tests/fault_management/index.rst @@ -0,0 +1,271 @@ +================= +Fault Management +================= +This test plan covers Fault Management manual regression. It covers basic +functionality for the following features: + +- Enhanced_Log_Management +- SNMP + +---------------------- +Overall Requirements: +---------------------- +This test will require access to the following configurations: +- Regular system +- Storage system +- AIO-DX systems + +---------- +Test Cases +---------- + +.. contents:: + :local: + :depth: 1 + +``````````````````````````````` +FM_Enhanced_Log_Management_01 +``````````````````````````````` +:Test ID: FM_Enhanced_Log_Management_01 +:Test Title: test_verify_install_of_SDK_module_on_Ubuntu +:Tags: P2,FM,Enhanced log management,regression + ++++++++++++++++++++ +Testcase Objective: ++++++++++++++++++++ +Purpose of this test is to verify split brain scenario swact on active +controller by blocking standby controller and storage on active controller + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +system should be installed with load that has this feature. +External VM or server is needed to install Remote logging server. +Remote logging SDK should be available in the server + +++++++++++ +Test Steps +++++++++++ +1. FTP the SDK module for Kibana log collection tool to Ubunthu os machine. +2. tar xfv wrs-install-log-server-1.0.0.tgz +3. Follow instructions from README file which is given in example for + installing udp transport. + code:: + cd install-log-server + sudo ./install-log-server.sh -i -u + ... + +4. Open a web browser and open kibana website to connect to log server + + code:: + http://:5601 + ... + ++++++++++++++++++ +Expected Behavior ++++++++++++++++++ +Able launch kibana log collection tool using web browser +http://:5601 + + +``````````````````````````````` +FM_Enhanced_Log_Management_02 +``````````````````````````````` +:Test ID: FM_Enhanced_Log_Management_02 +:Test Title: test_verify_configure_TIS_for_external_log_collection_using_udp +:Tags: P2,FM,Enhanced log management,regression + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +system should be installed with load that has this feature. +External VM or server is needed to install Remote logging server. +Remote logging SDK should be available in the server + ++++++++++++++++++++ +Testcase Objective: ++++++++++++++++++++ + +This is to test the Configuration of External logging on TIS with UDP option +and verify logs collected +on server. + +++++++++++ +Test Steps +++++++++++ +1. After setting log server as per test case 1 +2. Configure TIS server with to collected logs below cli show udp + connection option as sdk install + code:: + system remote logging-modify --ip_address 128.224.186.92 --transport udp + --enabled True + ... +3. verify the logs are collected and seen over the time period of 10 min. + http://:5601 + + ++++++++++++++++++ +Expected Behavior ++++++++++++++++++ +Able launch kibana log collection tool using web browser and see the logs +getting collected http://:5601 + +``````````````````````````````` +FM_Enhanced_Log_Management_03 +``````````````````````````````` +:Test ID: FM_Enhanced_Log_Management_03 +:Test Title: test_verify_remote_logging_disable_and_enable +:Tags: P2,FM,Enhanced log management,regression + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +system should be installed with load that has this feature. +External VM or server is needed to install Remote logging server. +Remote logging SDK should be available in the server + ++++++++++++++++++++ +Testcase Objective: ++++++++++++++++++++ + +This is to test the Configuration of External logging on TIS with UDP option +and verify logs collected +on server. + +++++++++++ +Test Steps +++++++++++ +1. After setting log server as per test case 2 +2. Disable SDK by below cli on TIS + + code :: + system remotelogging-modify --ip_address 128.224.186.92 \ + --transport udp --enabled false + ... + +3. Verify the logs not collected and seen over the time period of 5 min or + more http://:5601. There won't be any logs + during this disable +4. Enable SDK by below cli on TIS + + code :: + system remotelogging-modify --ip_address 128.224.186.92 \ + --transport udp --enabled True + ... + +5. Verify the logs are collected and seen over the time period of 5 min or + more. http://:5601 + ++++++++++++++++++ +Expected Behavior ++++++++++++++++++ +Able launch kibana log collection tool using web browser and see the logs +when enhanced logging is enabled and not seen when it is disabled + +`````````` +FM_SNMP_04 +`````````` +:Test ID: FM_SNMP_04 +:Test Title: test_creating_new community_string_from_cli +:Tags: P2,FM,SNMP,regression + ++++++++++++++++++++ +Testcase Objective: ++++++++++++++++++++ +Able to create community string + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +system should be installed with load that has this feature. + +++++++++++ +Test Steps +++++++++++ +1. Create community string using below cli + + code:: + system snmp-comm-add -c + ... + +2. Verify that created community using below cli . + + code:: + system snmp-comm-list + ... + ++++++++++++++++++ +Expected Behavior ++++++++++++++++++ +Able to create SNMP community string and display. + +`````````` +FM_SNMP_05 +`````````` +:Test ID: FM_SNMP_05 +:Test Title: SNMP_cli_trap_dest_can_be_deleted +:Tags: P2,FM,SNMP,regression + ++++++++++++++++++++ +Testcase Objective: ++++++++++++++++++++ +To verify trap delete and trap is no long received. + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +system should be installed with load that has this feature. +SNMP trap receiver is installed to receive the trap. + +++++++++++ +Test Steps +++++++++++ +1. Create community string using below cli + + code:: + system snmp-comm-add -c + +2. Create trapdest using below cli.Use ip address of client and community + string that was already created. + + code:: + system snmp-trapdest-add -i -c + ... + +3. Verify that created trapdest displayed + + code:: + system snmp-trapdest-list + ... +4. Restart snmp using below cli + + code:: + snmpd /etc/init.d/snmpd restart) + ... + +5. Verify that trap is received by the trap listener.By seeing messages + in SNMP viewer +6. Delete trapdest using cli below + + code:: + system snmp-trapdest-delete ) + ... + +7. Verify that trapdest deleted + + code:: + system snmp-trapdest-list + ... + +8. Verify that trap is no longer received by the trap listener. + ++++++++++++++++++ +Expected Behavior ++++++++++++++++++ +When trap is available messages are seen after trap was deleted there was no +messages on trap listener. + +---------- +References +---------- +https://wiki.openstack.org/wiki/StarlingX/Containers/Installation diff --git a/doc/source/manual_tests/high_availability/index.rst b/doc/source/manual_tests/high_availability/index.rst new file mode 100644 index 0000000..6a5f4dd --- /dev/null +++ b/doc/source/manual_tests/high_availability/index.rst @@ -0,0 +1,380 @@ + +================= +High availability +================= + +Titanium Clouds Service Management (SM) and Maintenance (Mtce) +Systems handle transient and persistent networking failures between +controllers and service hosts (storage/compute) For instance, a +transient loss of Management Network Carrier on the active controller +currently triggers an immediate fail-over to the standby controller even +though the very same failure may exist for that controller as well ; i.e. +it may be no more healthy that the current active controller. A persistent +loss of heartbeat messaging to several or all nodes in the system results in +the forced failure and reboot of all affected nodes once connectivity has been +re-established.In most of these cases the network event that triggered fault +handling is external to the system ; i.e. the reboot of a common messaging +switch for instance, and truly beyond the control of Titanium Cloud HA +(High Availability) Services. In such cases it's best to be more fault +tolerant/forgiving than over active. + + +---------------------- +Overall Requirements +---------------------- + +This test will require access to the following configurations: +- Regular system +- Storage system +- AIO-DX systems + +---------- +Test Cases +---------- + +.. contents:: + :local: + :depth: 1 + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_01 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_01 +:Test Title: test_split_brain_avd_activer_or_standby_based_on_only_storage_and + standby_controller_blocked_on_active_controller +:Tags: P2,HA,Recovery improvement,regression + + ++++++++++++++++++++ +Testcase Objective ++++++++++++++++++++ +Purpose of this test is to verify split brain scenario swact on active +controller by blocking standby controller and storage on active controller + +++++++++++++++++++++ +Test Pre-Conditions +++++++++++++++++++++ + System should be a storage system + ++++++++++++ +Test Steps ++++++++++++ +1. Using below cli disconnects management storage-0 and controller-1 active + controller-0. Execute below command to block both storage and controller + first storage should be blocked and immediately controller should be + blocked. + code:: + Execute sudo iptables -I INPUT 1 -s 192.168.222.204 -j DROP + ... +2. Verify connection failure alarm +3. Verify controller-1 becomes active. Verify system host-list from + controller-1 +4. Reboot new standby controller-0. Once reboot complete verify system + host-list from active controller. + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ + controller-1 becomes active + System host-list shows right states + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_02 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_02 +:Test Title: test_aio_dx_direct_active_controller_lost_connection_to_standby_ip +:Tags: P2,HA,Recovery improvement,regression + ++++++++++++++++++++ +Testcase Objective ++++++++++++++++++++ +Purpose of this test is to verify standby controller reboot when it is lost +connectivity to active controller. + +++++++++++++++++++++ +Test Pre-Conditions +++++++++++++++++++++ +System should be AIO-DX direct connection. System should be connected to +BMC module and provisioned. If the BMC not provisioned expected behavior +there wont be reboot on standby controller + ++++++++++++ +Test Steps ++++++++++++ +1. Block the standby ip(Management ip) from active controller. + code:: + iptables -I INPUT 1 -s 192.168.222.204 + ... + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ +The stadby controller(controller-1) becomes active +System host-list shows right states +controller-0 reboots if the BMC provisioned + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_03 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_03 +:Test Title: test_split_brain-avd_aio_dx_direct_active_controller_lost + connection_to_standby_ip_table_drop_on_mgt_infra_and_oam +:Tags: P2,HA,Recovery improvement,regression + ++++++++++++++++++++ +Testcase Objective ++++++++++++++++++++ +To verify split-brain scenario by triggering connection failure on MGT infra +and OAM on AIO-DX-Direct standby controller + +++++++++++++++++++++ +Test Pre-Conditions +++++++++++++++++++++ +System should be a AIO-DX-Direct connected system + ++++++++++++ +Test Steps ++++++++++++ +1. Provision BMC verify BMC provisioned.(if BMC not available there won't + be a reboot for loss of connection expected behavior at the time of + connection loss is different) +2. From standby controller to active controller to drop MGT infra and OAM. + Example as below. + code:: + sudo iptables -I INPUT 1 -s 192.168.204.4 -j DROP && sudo iptables -I \ + INPUT 1 -s 192.168.205.3 -j DROP && sudo iptables -I \ + INPUT 1 -s 128.150.150.96 -j DROP + ... +3. Verify loss of connectivity and alarm on active controller + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ + Verify loss of connectivity and alarm on active controller + System host-list shows right states + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_04 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_04 +:Test Title: test_split-brain-avd_active/standby_number_of_the_nodes_reachable + _changes_couple_of_times +:Tags: P2,HA,Recovery improvement,regression + +++++++++++++++++++++ +Testcase Objective: +++++++++++++++++++++ +Purpose of this test is to verify Active standby controller selection criteria +on split brain scenario is based on healthier controller.This scenario will be +repeated after active standby selected and again connection failure on compute. + ++++++++++++++++++++++ +Test Pre-Conditions: ++++++++++++++++++++++ +The system should have at least 3 or more computes with 2 controller. + ++++++++++++ +Test Steps ++++++++++++ + +1. From Active controller controller-0 block control and compute-0 + communication (if management and infra provisioned both need to be blocked) + code:: + sudo iptables -I INPUT 1 -s 192.168.223.57 -j DROP && sudo iptables\ + -I INPUT 1 -s 192.168.222.156 -j DROP && sudo iptables -I INPUT 1 \ + -s 192.168.222.4 -j DROP && sudo iptables -I INPUT 1 -s \ + 128.224.150.57 -j DROP + ... +2. Verify connection failure alarm. +3. Verify swact +4. unblock compute-0 to controller-0 from controller-0 suing iptables command. + code:: + sudo iptables -D INPUT -s 192.168.223.57 -j DROP && sudo iptables -D \ + INPUT -s 192.168.222.156 -j DROP && sudo iptables -D INPUT -s \ + 192.168.222.4 -j DROP && sudo iptables -D INPUT -s 192.168.223.4 -j \ + DROP + ... +5. Repeat the above step current active controller block traffic on + controller-1 to compute-0 + ++++++++++++++++++++ +Expected Behavior ++++++++++++++++++++ + controller-1 becomes active + System host-list shows right states + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_05 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_05 +:Test Title: test_MNFA_timeouts_2mins_1_hour +:Tags: P2,HA,Recovery improvement,regression + +++++++++++++++++++++ +Testcase Objective +++++++++++++++++++++ +Purpose of this test is to validate the trigger of MNFA(Multi Node Failure +Avoidance) mode trigger on alarm based on different timeouts 2mins or 1 hour + ++++++++++++++++++++++ +Test Pre-Conditions ++++++++++++++++++++++ +The system should have at least 3 or more computes with 2 controller. + ++++++++++++ +Test Steps ++++++++++++ +1. From Active controller set mnfa_timeout (2mins or 1 hour ) on MNFA can + stay active before graceful recovery of affected hosts. Use below commands. + Eg: + code:: + system service-parameter-list + system service-parameter-modify service=platform section=maintenance \ + mnfa_timeout = 2 service + system service-parameter-apply platform + ... +2. Apply the change and alarm 250.001 controller-0 Configuration is + out-of-date cleared using command + system service-parameter-apply platform +3. Trigger heart beat failure by powering off any nodes other than active + controller +4. Verify event-list --log to see below MNFA enter and exit. If the + mnfa_timeout is set to 120 + seconds mnfa enter exit log time difference will be 120 seconds. + If is it set to 1 hour it will be 1hour. Below stings will be seen on alarm. + + host=controller-1.event=mnfa_enter + host=controller-1.event=mnfa_exit + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ +In the above test MNFA enter and exit would be triggered in event-list log + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_06 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_06 +:Test Title: test_MNFA_timeouts_default +:Tags: P2,HA,Recovery improvement,regression + ++++++++++++++++++++ +Testcase Objective ++++++++++++++++++++ +Purpose of this test is to validate the trigger of MNFA mode with the default +values. + +++++++++++++++++++++ +Test Pre-Conditions +++++++++++++++++++++ +The system should have at least 3 or more computes with 2 controller. + ++++++++++++ +Test Steps ++++++++++++ + +1. From Active controller + Set mnfa_timeout (2mins or 1 hour ) on MNFA can stay active before graceful + recovery of affected hosts. + Eg: + To check current values for mnfa_timeout use system service-parameter-list + code:: + system service-parameter-modify service=platform section=maintenance \ + mnfa_timeout= + system service-parameter-apply platform +2. Apply the change and alarm 250.001 controller-0 Configuration is + out-of-date cleared using command system service-parameter-apply platform +3. Trigger heart beat failure by powering off any nodes other than active + controller. +4. Verify system event-list --log to see below MNFA enter and exit. +5. Verify system hosts-list. It will show hosts as degraded when host is in + off-line during the MNFA enter and exit. + host=controller-1.event=mnfa_enter + host=controller-1.event=mnfa_exit + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ +In the above test MNFA enter and exit would be triggered in event-list log + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_07 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Test ID: HA_Cloud_Recovery_improvements_07 +:Test Title: test_pull_management_and_OAM_cable_on_active_controller +:Tags: P2,HA,Recovery improvement,regression + +++++++++++++++++++++ +Testcase Objective: +++++++++++++++++++++ +This test is to verify OAM & MGT cable pull alarm and swact + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +Any 2+2 system installed latest load. + ++++++++++++ +Test Steps ++++++++++++ + +1. Verify no alarms for fm alarm-list +2. Physically remove OAM and MGT cable on active controller(controller-0) cable +3. Verify alarm ID (400.005,200.005) +4. Verify standby controller(controller-0) was swacted sudo sm-dump +5. Verify system host-list on new active controller + all the hosts are available and standby controller off-line. + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ +system swact with alarms for cable pull on OAM and MGT + +:Test ID: HA_Cloud_Recovery_improvements_08 +:Test Title: test_pull_management_cable_on_standby_controller +:Tags: P2,HA,Recovery improvement,regression + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +HA_Cloud_Recovery_improvements_08 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +++++++++++++++++++++ +Testcase Objective: +++++++++++++++++++++ +Pull management cable on standby and verify alarm. + +++++++++++++++++++++ +Test Pre-Conditions: +++++++++++++++++++++ +Any 2+2 system installed latest load. + +++++++++++++ +Test Steps: +++++++++++++ + +1. Verify no alarms for fm alarm-list +2. Physically remove MGT cable on standby controller(controller-0) cable +3. Verify current alarm list fm alarm-list alarm id(400.005,200.005) +4. Verify no change in active controller and other hosts states standby + host will be off-line. + code :: + system host-list + ... + +++++++++++++++++++ +Expected Behavior +++++++++++++++++++ +Verify management failed alarm ID (400.005,200.005) +Verify hosts state system host-list + +----------- +References: +----------- +https://wiki.openstack.org/wiki/StarlingX/Containers/Installationem diff --git a/doc/source/manual_tests/index.rst b/doc/source/manual_tests/index.rst index 53a3c35..838ce2a 100644 --- a/doc/source/manual_tests/index.rst +++ b/doc/source/manual_tests/index.rst @@ -13,8 +13,10 @@ For more information about StarlingX, see https://docs.starlingx.io/. .. toctree:: :maxdepth: 2 + fault_management/index gnochi/index heat/index + high_availability/index horizon/index maintenance/index networking/index