config/sysinv/sysinv/sysinv/sysinv/tests
Saba Touheed Mujawar 4c42927040 Add retry robustness for Kubernetes upgrade control plane
In the case of a rare intermittent failure behaviour during the
upgrading control plane step where puppet hits timeout first before
the upgrade is completed or kubeadm hits its own Upgrade Manifest
timeout (at 5m).

This change will retry running the process by
reporting failure to conductor when puppet manifest apply fails.
Since it is using RPC to send messages with options, we don't get
the return code directly and hence, cannot use a retry decorator.
So we use the sysinv report callback feature to handle the
success/failure path.

TEST PLAN:
PASS: Perform simplex and duplex k8s upgrade successfully.
PASS: Install iso successfully.
PASS: Manually send STOP signal to pause the process so that
      puppet manifest timeout and check whether retry code works
      and in retry attempts the upgrade completes.
PASS: Manually decrease the puppet timeout to very low number
      and verify that code retries 2 times and updates failure
      state
PASS: Perform orchestrated k8s upgrade, Manually send STOP
      signal to pause the kubeadm process during step
      upgrading-first-master and perform system kube-upgrade-abort.
      Verify that upgrade-aborted successfully and also verify
      that code does not try the retry mechanism for
      k8s upgrade control-plane as it is not in desired
      KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER
      state
PASS: Perform manual k8s upgrade, for k8s upgrade control-plane
      failure perform manual upgrade-abort successfully.
      Perform Orchestrated k8s upgrade, for k8s upgrade control-plane
      failure after retries nfv aborts automatically.

Closes-Bug: 2056326

Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806
            https://review.opendev.org/c/starlingx/stx-puppet/+/911945
            https://review.opendev.org/c/starlingx/integ/+/913422

Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 08:49:36 -04:00
..
agent Report port and device inventory after the worker manifest 2024-03-01 09:21:21 -05:00
api Merge "Fix LDAP issue for DC subcloud" 2024-03-13 20:18:24 +00:00
cert_alarm Cert-Alarm audit 2021-09-24 11:44:40 -04:00
cert_mon Cert-mon improvement 2022-03-16 13:21:02 -03:00
common Merge "Update system:node clusterrolebinding for new host" 2024-03-07 00:04:53 +00:00
conductor Add retry robustness for Kubernetes upgrade control plane 2024-03-19 08:49:36 -04:00
db New RESTful API and DB schema for network to address-pools. 2024-03-06 07:34:14 -03:00
helm Remove armada and helm v2 2023-03-23 17:19:33 -03:00
loads Add plugin to extract ostree playbooks from load 2023-03-31 09:43:49 -03:00
objects Remove ObjectListBase from sysinv 2023-03-15 14:49:39 +00:00
openstack Fixes the time calculation in the is_expired method of the Token class 2021-02-17 22:06:21 -03:00
puppet Introduce Puppet variables for primary and secondary pool addresses. 2024-03-12 07:25:46 -03:00
README.txt Create basic set of unit tests for sysinv ihosts API 2019-07-24 11:35:15 -05:00
__init__.py Fix: "__builtin__" issue for Python 2/3 compatible code 2018-12-19 10:21:57 +08:00
base.py py3: Fix iterator index operator usage 2021-08-13 16:32:10 +00:00
conf_fixture.py Add ZeroMQ RPC backend 2022-11-24 13:28:01 -03:00
events_for_testing.yaml Host compute service failure alarm removal 2023-02-10 09:30:43 -05:00
fake_policy.py Deprecate old policy engine and restrict access 2022-08-10 11:18:38 -03:00
keyring_fixture.py Initial framework and unit tests for puppet plugins 2019-07-24 09:08:52 -04:00
matchers.py Enable Bad String Formatting Linting 2019-07-17 14:36:37 -04:00
policy.yaml Deprecate old policy engine and restrict access 2022-08-10 11:18:38 -03:00
policy_fixture.py Deprecate old policy engine and restrict access 2022-08-10 11:18:38 -03:00
stubs.py Fix: "map" issue for Python 2/3 compatible code 2018-12-18 11:03:00 +08:00
test_dbsync.py StarlingX open source release updates 2018-05-31 07:35:52 -07:00
test_images.py Deprecate the sysinv.openstack.common utils files 2019-12-04 10:58:39 -06:00
test_utils.py Improve application metadata validation code 2023-10-25 14:11:02 +00:00
utils.py Removing mox from sysinv in StarlingX 2019-12-20 08:04:22 -06:00

README.txt

This file discusses the current status of sysinv tests and areas where issues
still exist and what to do in order to test them.

--------------------------------------------------------------------------------
RUNNING TESTS:

To actually run the tests, in console navigate to
$MY_REPO/stx/stx-config/sysinv/sysinv/sysinv

On your first ever run of tox tests enter:
tox --recreate -e py27
This will make sure tox's environment is fresh and fully built.

To test both py27 (the actual unit tests), and check the flake8 formatting:
tox

You can also run both py27 and flake8 by entering the following instead:
tox -e flake8,py27
The above order of environments matters. If py27 comes first, flake8 won't run.

To run either individually enter:
tox -e py27
tox -e flake8

--------------------------------------------------------------------------------
RUNNING TESTS WITH POSTGRESQL:

The default behaviour is to run the sysinv tests with the mySQL database. This
should be fine in most cases.

If you really want to test with postgreSQL, in a local Ubuntu VM or similar:
- go to test_migrations.py and in the function
  test_postgresql_opportunistically, comment out the self.skipTest line to
  enable the test to be run.
- Also go to the function test_postgresql_connect_fail and comment out the
  self.skipTest line so that test can be run as well.
- Lastly, in the function _reset_databases, go to the bottom and uncomment
  self._reset_pg(conn_pieces) so the postgres DB can be reset between runs.
  If this last line is not uncommented, your first run of the py27 tests will
  work, but after that you will get
  migrate.exceptions.DatabaseAlreadyControlledError

Do not push these lines uncommented upstream to the repo.

To set up the postgres db for the first time enter the following in console:
sudo apt-get install postgresql postgresql-contrib
pip install psycopg2

sudo -u postgres psql
CREATE USER openstack_citest WITH CREATEDB LOGIN PASSWORD 'openstack_citest';
CREATE DATABASE openstack_citest WITH OWNER openstack_citest;
\q

--------------------------------------------------------------------------------
OUTSTANDING ISSUES:

tests/api/test_acl.py
    test_authenticated
        Fails due HTTPS connection failure as a result of an invalid user token
        which causes webtest.app.AppError:
        Bad response: 401 Unauthorized 'Authentication required'

    test_non_admin
        Fails due to invalid user token resulting in
        raise mismatch_error testtools.matchers._impl.MismatchError: 401 != 403
        Occurs against Www-Authenticate: Keystone uri='https://127.0.0.1:5000'

    test_non_admin_with_admin_header
        Fails due to invalid user token resulting in
        raise mismatch_error testtools.matchers._impl.MismatchError: 401 != 403

tests/conductor/test_manager.py
    test_configure_ihost_new
        IOError: [Errno 13] Permission denied: '/tmp/dnsmasq.hosts'
        This directory does not exist. I am not sure if this directory is
        still supposed to exist, if it has moved, or if this entire test is
        based on deprecated/replaced functionality.

    test_configure_ihost_no_hostname
        os.rename(temp_dnsmasq_hosts_file, dnsmasq_hosts_file)
        OSError: [Errno 1] Operation not permitted
        Fails because the dnsmasq files don't exist.

    test_configure_ihost_replace
        IOError: [Errno 13] Permission denied: '/tmp/dnsmasq.hosts'
        This dnsmasq file doesn't exist. Same issue as in the first test.

There also exists the issue of using postgres for db migrations in
tests/db/sqlalchemy/test_migrations.py. The issue with this is that these
migrations can only be run on local VMs such as Ubuntu, and not on the build
servers or on Jenkins because it would require that someone manually set up
the database on those systems, and the issue with putting it on the build server
is that because there presently exist no ways of getting postgres running in a
virtual environment (e.g. tox's), it must be set up on the actual system. This
means that multiple people running these tests at the same time would interact
with the same db and could run into issues. The reason postgres is being used
is because between versions, some columns of enumerated types are being altered
and SQLite doesn't support ALTER COLUMN or ALTER TABLE functionality. Alembic
and sqlalchemy-migrate offer solutions to this, but presently there is no
intention to incorporate either of these packages.

--------------------------------------------------------------------------------
TESTING DECISIONS:

We've chosen to use flake8 instead of PEP8 because PEP8 results in a lot more
insignificant issues being found, and flake8 combines PEP8 with PyFlakes which
combines code formatting with syntax and import checking, additionally, flake8
provides the option to test code complexity and return warnings if the
complexity exceeds whatever limit you've set.

--------------------------------------------------------------------------------