Commit Graph

481 Commits

Author SHA1 Message Date
Zuul 3399c7fc54 Merge "Rename sw-manager upgrade-strategy to sw-deploy-strategy" 2024-03-19 21:23:44 +00:00
Saba Touheed Mujawar 471d1001e0 Set timeout for KubeHostUpgradeControlPlaneStep to 420s
The history for KubeHostUpgradeControlPlaneStep timeout of 600s
was to give significant headroom in doing control-plane upgrade.
This step was known to run long, but we had limited data, so
we set the value large. The underlying kubeadm
UpgradeManifestTimeout was 5 minutes, so timeout larger than
300s was ineffective.

This updates KubeHostUpgradeControlPlaneStep timeout
to 420s. This is intentionally engineered to be larger than
the resultant time for sysinv code to reach completion of the
Kubernetes Upgrade control-plane step with retries and
accounting for failure.

The timeout is engineered using the following equation.
This accounts for retries, hitting kubeadm upgrade timeout
each try, and some buffer for the sysinv report callback
mechanism.

nfv_timeout = ImageDownloadTime + retries*
                        (UpgradeControlPlaneTimeout + buffer)

Following are the engineered parameters:

ImageDownloadTime = 0s (images are pre-pull before this step)
UpgradeManifestTimeout = 3 minutes
buffer = 30s
2 retries

Result:
Engineered puppet timeout for upgrade control-plane:
= UpgradeControlPlaneTimeout + buffer = 3*60s + 30s = 210s

Engineered NFV timeout:
= 0s + 2(180s + 30s) = 420s

Test Plan:
PASS: Perform orchestrated k8s upgrade, manually STOP kubeadm process
      during k8s upgrade control-plane step. Check logs to verify
      puppet timeout and also verify sysinv attempts retry mechanism
      before nfv timeout.

Partial-Bug: 2056326

Change-Id: I73ab8ea7cd7fc3816372260983c4b54a02cdcc4c
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 13:44:09 -04:00
Joshua Kraitberg cc56e9405e Rename sw-manager upgrade-strategy to sw-deploy-strategy
This commit renames the upgrade-strategy to sw-deploy-strategy for the
sw-manager CLI. I've also improved the usage of consts within
nfv-client.

The backend remains unchanged.  And functionality is not impacted beyond
the new command naming.

TEST PLAN
PASS: Use sw-deploy-strategy to upgrade from stx8 to stx9, DX
      Upgrade is not stable, but the strategy should
      execute as expected
PASS: Try to use old upgrade-strategy, it should not exist
PASS: Confirm tab auto-complete works for sw-deploy-strategy
PASS: Do sw-manager sw-deploy-strategy show
      Output should detect strategy correctly.

Story: 2011045
Task: 49716
Change-Id: I82f602b25f114097a21e83b51bb80fc8e4af9603
Signed-off-by: Joshua Kraitberg <joshua.kraitberg@windriver.com>
2024-03-14 18:33:15 -04:00
Vanathi.Selvaraju bac2f0a09e NFV API to list current strategy type and state.
This new NFV API will list the current running
Strategy type and its state.

Test Plan:
PASSED: On a DX system, Create a strategy of any type.
API should return the strategy type and state.
PASSED: On a DX system, there are no existing strategies.
API returns none.
PASSED: Create a Strategy of any type, use REST API
to get the current active strategy(sample output in
comments section).
PASSED: Orchestrated k8s upgrade with the changes,
(sample output in the comments section).
PASSED: Performed three consecutive k8s upgrade
v1.24.4 - v1.25.3 - v1.26.1 - v1.27.5

Story: 2011045
Task: 49580

Change-Id: I2e367ccb5f9ff42aa0d912598fb1617a4a7ae0ee
Signed-off-by: Vanathi.Selvaraju <vanathi.selvaraju@windriver.com>
2024-03-11 12:03:49 +00:00
Zuul e85f7905c2 Merge "Add retry at nfv orchestration level" 2024-03-08 19:28:00 +00:00
sshathee 9c73d3b254 Add retry at nfv orchestration level
This commit introduces retry on failure for cases such
as kubelet killing pods due to resource contention during
kubernetes upgrade.

Test Plan:
    PASS: Simulate kubeapiserver pod failure by adding wrong resource
    in rest api request and check retries.

    PASS: Verify kubernetes orchestrated update works with
    changes on aio-sx

    PASS: Verify changes are working on AIO-DX, with strategy
    created on controller-0 and applied on controller-1

Closes-Bug: #2053236
Change-Id: I816b09bb0cd767380e5093d4732d161e4cc8cb24
Signed-off-by: sshathee <shunmugam.shatheesh@windriver.com>
2024-03-08 06:05:04 -05:00
Zuul d619a37a6a Merge "Alarm 900.701 raised on failing to remove node taint." 2024-02-27 17:41:22 +00:00
Vanathi.Selvaraju e230abb543 Alarm 900.701 raised on failing to remove node taint.
Additional checks added to ensure that alarm is
raised by VIM on failing to remove node taint.

Test Plan:
PASSED: On a DX system, locked and unlocked one of
the controller to check if taints are removed.
PASSED: On a DX system, tweaked the code to
fail untainting of node.
Alarm 900.701 is raised as there is node taint.
PASSED: On a DX system, check if the alarm 900.701
is removed on locking the node.
PASSED: Deployed a DX system with ISO that has the
changes. No trace of alarm 900.701 after
bootstrapping.
PASSED: On a DX system Node taint alarm exists with
multiple taints, node was locked followed by unlock.
On successful untaint, the alarm is cleared.
PASSED: On a DX system Node taint alarm exists, node was
locked followed by unlock. On successful untaint, the
alarm 900.701 is cleared.

Closes-Bug: 2046273
Depends-On: https://review.opendev.org/c/starlingx/fault/+/904788

Change-Id: I4206b336cbe0021f2b45e3b3cd24b42ca43bc60e
Signed-off-by: Vanathi.Selvaraju <vanathi.selvaraju@windriver.com>
2024-02-15 12:12:32 -05:00
Igor Soares b732e56b43 Account for new Kubernetes upgrade statuses
Update the Kubernetes upgrade orchestration code to account for two new
statuses: upgrade-starting and upgrade-starting-failed.

The new statuses were introduced to support updating StarlingX
applications during the kube-upgrade-start step.

Test Plan:
PASS: build-pkgs -a && build-image
PASS: sw-manager kube-upgrade-strategy create --to-version v1.27.5
      sw-manager kube-upgrade-strategy apply
      Check if Kubernetes upgrade started and finished successfully.
      sw-manager kube-upgrade-strategy delete
PASS: sw-manager kube-upgrade-strategy create --to-version v1.27.5
      sw-manager kube-upgrade-strategy apply
      sw-manager kube-upgrade-strategy abort
      Check if Kubernetes upgrade was successfully aborted.
      sw-manager kube-upgrade-strategy delete
PASS: Fresh install AIO-SX
      Create a platform-integ-apps updated tarball without the
      metadata.yaml file and copy it over to
      /usr/local/share/applications/helm/.
      sw-manager kube-upgrade-strategy create --to-version v1.27.5
      sw-manager kube-upgrade-strategy apply
      Confirm that the Kubernetes upgrade failed
      Check if Kubernetes upgrade was successfully aborted.
      sw-manager kube-upgrade-strategy delete
PASS: Fresh install AIO-DX
      Create a platform-integ-apps updated tarball without the
      metadata.yaml file and copy it over to
      /usr/local/share/applications/helm/.
      sw-manager kube-upgrade-strategy create --to-version v1.27.5
      sw-manager kube-upgrade-strategy apply
      Confirm that the Kubernetes upgrade failed
      sw-manager kube-upgrade-strategy delete
      Copy a working platform-integ-apps tarball to
      /usr/local/share/applications/helm/.
      sw-manager kube-upgrade-strategy create --to-version v1.27.5
      sw-manager kube-upgrade-strategy apply
      Confirm that the Kubernetes upgrade was resumed and successfully
      finished.
PASS: Fresh install AIO-SX with Kubernetes 1.24.4
      sw-manager kube-upgrade-strategy create --to-version v1.27.5
      sw-manager kube-upgrade-strategy apply
      Check if Kubernetes upgrade started and finished successfully.

Story: 2010929
Task: 49461

Depends-on: https://review.opendev.org/c/starlingx/config/+/905005

Change-Id: I1a8b86c9ecf8cc21a9cb25ee57d6930944cef261
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-02-13 14:57:23 -03:00
Yuxing Jiang d730d8edc2 Ignore PTP alarm in vim strategies
This commit ignores the PTP alarm for SyncE in all the vim strategies
as it should not impact related operations.

Test-plan:
1. Manually raise 100.119 alarm by fmClientCli
2. Verify kube-rootca-update and system-config-update strategies can
be created/applied/deleted successfully along with the 100.119 alarm.

Closes-bug: 2051950
Change-Id: Ie2cc570c28de95802ff1b88e30f3c9f562b3c892
Signed-off-by: Yuxing Jiang <Yuxing.Jiang@windriver.com>
2024-02-01 17:21:58 -05:00
Zuul 8af76f954d Merge "Nfv upgrade orchestration for kube-upgrade-storage" 2024-01-09 22:03:36 +00:00
Luiz Felipe Kina b90eeb1436 Nfv upgrade orchestration for kube-upgrade-storage
This adds the kube-upgrade-storage step on the orchestration for k8s
upgrade.

VIM build stages stay the same, but the VIM apply stage changes. After
the networking upgrade, there is an addition of storage upgrade and
after that everything stays the same.

Test Plan:
 PASS: Run a kubernetes upgrade with the kube-upgrade-storage step and
       observe that the image for volume-snapshot-controller is changed
 PASS: Run the kube-upgrade-storage with an unexpected state and
       expect failure.

Story: 2010877
Task: 48588

Change-Id: Ib5ff848ed67c3e57c9cfcf6d1a41abbc192a7935
Signed-off-by: Luiz Felipe Kina <LuizFelipe.EiskeKina@windriver.com>
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
2023-12-18 09:08:44 -03:00
Vanathi.Selvaraju 42442c85d9 Optimize kubelet upgrade phase using VIM orchestrator.
Optimizing kubelet upgrade by reducing the wait
time and adding in more logs for debugging.The
wait time is removed as we have the additional
kubelet state check in place (Closes-Bug: 2044209)
and an existing timeout of 900secs.

Test Plan:
PASSED: On a DX system, removed the wait time of 60 sec
and tested 3 consecutive k8s upgrade thrice.
PASSED: On a DX system, tweaked the code and tested
on the kubelet upgrade retry timeout(900sec).
PASSED: On a DX system, tweaked code and tested
the existing behaviour that retry is not
occuring in case of failure.

Closes-Bug: 2045776

Change-Id: I321d7eae5ef7ebd29c1c6aca97992e3e20acb457
Signed-off-by: Vanathi.Selvaraju <vanathi.selvaraju@windriver.com>
2023-12-11 17:54:38 +00:00
Vanathi.Selvaraju 71406be649 K8s upgrade failed in host-unlock phase
K8s upgrade failed in the host-unlock phase
as kubelet upgrade was ongoing, VIM tries
to unlock the host before completion of
kubelet upgrade.

Test Plan:
PASSED: On a DX system, apply the fix and trigger
K8s upgrade.
PASSED: On a SX system, do three consecutive k8s upgrade.
PASSED: On DX system ,two consecutive k8s upgrade.

Closes-Bug: 2044209

Change-Id: If4bcabcba6aabd1aee6918d73fd6994e94f0b1f8
Signed-off-by: Vanathi.Selvaraju <vanathi.selvaraju@windriver.com>
2023-11-23 18:13:33 +00:00
Vanathi.Selvaraju cdd758efb7 Post K8s upgrade DC system controller swacts in a loop
After K8s upgrade from 1.22 to 1.23 the system
swacts in a loop, this occurs as an atrribute
of the VIM object is not available leading to
multiple restarts of VIM service resulting
in swact.

Test Plan:
PASSED: Induce condition Kube upgrade object null state
causing vim restarts, apply fix, system stabilizes.

Closes-Bug: 2043859

Change-Id: Iccb3106cade1308aa6e9232013366c2b9181557b
Signed-off-by: Vanathi.Selvaraju <vanathi.selvaraju@windriver.com>
2023-11-18 20:45:17 -05:00
Jorge Saffe 9feef4232d sw-manager fails with SSL and CA Cert provided.
When sw-manager is used through a secure connection (https
enabled) either with the remote CLI or within the cluster
via the public interface, the operation fails if the
Certificate Authority's cert is not included among the
system's trusted CAs.

The sw-manager client lacks implemented methods for
referencing a local Certificate Authority Cert during calls.
Therefore, if the CA is not among the system's trusted CAs,
all calls made by sw-manager's CLI will fail since
authentication in Keystone will also fail.

Other CLIs like fm or platform allow referencing a CA Cert
via the "REQUESTS_CA_BUNDLE" environment variable. The fix
involves loading, if defined, the CA Cert referenced by
such an environment variable, and adjusting SSL calls to
verify connections using the provided CA Cert.

Test Plan:
  PASS Fresh Install SX Env

  PASS Source openrc.sh file (internal interface).
  PASS sw-manager patch-strategy show

  PASS Enable secure mode (https)
  PASS Download OpenStack RC File from Horizon.
  PASS Source RC file inside cluster (public interface).
  PASS Set REQUESTS_CA_BUNDLE with CA-Cert path.
  PASS sw-manager patch-strategy show

  PASS Enable secure mode (https)
  PASS Download OpenStack RC File from Horizon.
  PASS Install remote CLI (custom container with changes)
  PASS Source downloaded RC file
  PASS Set REQUESTS_CA_BUNDLE with CA-Cert path.
  PASS sw-manager patch-strategy show

Closes-bug: 2033561

Change-Id: If5b70714cde09bd8c329b976a8148daee9001415
Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com>
2023-08-30 21:43:07 +00:00
Yuxing Jiang 6dba3df3e3 Implement system_config_update orchestration
This commit adds VIM orchestration for system config update to drive the
host swact/lock/unlock once these operations are expected to update the
system config.

CLI: sw-manager kube-rootca-update-strategy
    apply create delete show

Test plan:
1. Create a system config udpate strategy successfully. --passed
2. Create a system config update strategy failed if the host resource
doesn't exist. --passed
3. Create a system config update strategy failed if an unexpected alarm
exists and the strategy alarm restrictions is set as strict. --passed
4. Create a system config update strategy successfully if an unexpected
alarm exists and the strategy alarm restrictions is set as relax.
--passed
5. Create a system config update strategy failed if a controller offline
in a DX system. --passed
6. Create a system config update strategy failed if a storage offline in
a standard system. --passed
7. Create a system config update strategy successfully if a worker
offline in a standard system. --passed
8. Apply a system config update strategy successfully. --passed
9. Apply a system config update strategy failed if a host failed to
lock. --passed
10. Delete a system config update strategy successful if a the strategy
is complete or failed. --passed
11. Create a system config update strategy with strategy not required
for the host in k8s, verify the host is excluded from the strategy.

Story: 2010719
Task: 47910

Change-Id: I052bc5b2004f17de870a81c523d0a1f4e422a902
Signed-off-by: Yuxing Jiang <Yuxing.Jiang@windriver.com>
2023-07-17 17:36:44 -04:00
Al Bailey 3bd5eed446 Adding kube-upgrade-abort support
Trigger a kube-upgrade abort when kube upgrade steps
encounter a failure.

Trigger a cleanup when constructing a new kube upgrade
strategy where an aborted kube-upgrade is detected.
This cleanup occurs during the 'build' phase.

This commit also includes additional INFO level logs
as each kube-upgrade step is invoked.  Previously some
were info and some were debug.

Test Plan:
  PASS: Trigger the downloading images phase to fail and
   observe that the kube-upgrade becomes 'aborted'.
  PASS: Create a kube-upgrade strategy where an aborted
   kube-upgrade exists, and observe that it is cleaned up
   and a fresh kube upgrade strategy is created.

Story: 2010565
Task: 48219
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I6d0ef0bdaaee73c76d6be40b9d5d0143332f83a0
2023-06-27 17:45:48 +00:00
Zuul 9e519abfb9 Merge "Add logs to inform when nodes are tainted" 2023-06-12 16:55:50 +00:00
Al Bailey f1f7fe3292 Audit kube upgrade changes more frequently
Kube upgrade orchestrator used built-in host-audit events
(emitted at thirty second interval) to determine when a
kube-upgrade or kube-host-upgrade query should be invoked.

This meant that kube-upgrade steps would typically take at
least two of those intervals to detect a relatively quick
transition.

Now the kube-upgrade audit will be responsible for the
kube upgrade and kube host upgrade queries, and will run
at its own interval (5 seconds) to allow the steps to more
rapidly detect completion.

AIO-SX kube upgrade is two to three minutes faster.

Test Plan:
  PASS: AIO-SX kube upgrade

Story: 2010565
Task: 48173
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: Ib4878322d0846b8df935f643352b028ff11fc184
2023-06-05 00:39:43 +00:00
Igor Soares 0ece2f8e0e Add logs to inform when nodes are tainted
Add logs to inform when the 'services=disabled:NoExecute' taint is added
to nodes as well as when they are removed.

This aims to improve future log analysis by facilitating the
identification of cases where taints could be mistakenly added to nodes.

Test Plan:
PASS: lock controller and check if the operation is properly logged
PASS: unlock controller and check if the operation is properly logged

Partial-Bug: 2022008
Change-Id: Ie9c7432211621a9fbf7aa90282dfb91405f90c33
Signed-off-by: Igor Soares <igor.piressoares@windriver.com>
2023-05-31 18:47:37 -03:00
Al Bailey c908b9625d Remove polling from two kube upgrade steps
The kube-upgrade strategy invokes a synchronous POST
operation to start and cleanup a kubernetes upgrade.
kube-upgrade-start takes about 3 seconds.
kube-upgrade-complete takes about one second.

The old code would ignore the results of those REST API
calls, and enter a polling mode to check the host state.
This could take up to two minutes to complete.

Polling is un-neccessary (for these steps), and by using
the results from the NFV plugin being invoked, the step
can quickly be completed.

This reduces the time of a kube-upgrade strategy by
over three minutes.

Test Plan:
  PASS: AIO-SX kube upgrade

Story: 2010565
Task: 48152
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: Icb511e70eea445b0d07a139d437e018fb3c505f3
2023-05-30 19:04:08 +00:00
Al Bailey a4280ebf59 Add host cordon steps to kube upgrade orch
When updating the control plane and kubelets the
host needs to be cordoned to prevent it from
doing kubernetes work during that time period:

system kube-host-cordon <host>
 < update control plane >
 < update kubelet>
system kube-host-uncordon <host>

Currently only supported for simplex.

Depends-On: https://review.opendev.org/c/starlingx/config/+/880333

Test Plan:
   PASS: AIO-SX single kube upgrade (1.24 -> 1.25)
   PASS: Resume AIO-SX single kube upgrade after cordon started.
   PEND: AIO-SX multi-kube upgrade
   PEND: AIO-DX kube upgrade

Story: 2010565
Task: 47772
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I54262d4ff31a2da005fffb6d30bb6872ee52f6d4
2023-05-19 18:47:22 +00:00
Zuul 4ff5e50e91 Merge "Combine multiple kube-upgrades into one strategy" 2023-05-08 15:26:33 +00:00
Davlet Panech ffe7ea0784 Fix github mirroring for this repo
Updating the rsa ssh host key based on:
https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/

Note: In the future, StarlingX should have a zuul job and
secret setup for all repos so we do not need to do this
for every repo.

Needed to rename the secret, because zuul fails if like-named
secrets have diffent values in different branches of the same
repo.

Partial-Bug: #2015246
Change-Id: I1d4a2a4e8b220d8966b7ee67243445eeb1601296
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
2023-04-28 12:38:52 -04:00
Al Bailey 0f83fc1169 Combine multiple kube-upgrades into one strategy
This algorithm change is Simplex only.
The algorithm for multi-version upgrade for k8s is:
- system kube-upgrade-start <final version>
- system kube-upgrade-download-images
- system kube-upgrade-networking
- system kube-host-cordon controller-0 (future)
- loop <v> from current version to final version
 - system kube-host-upgrade controller-0 control-plane <v>
 - system kube-host-upgrade controller-0 kubelet <v>
- system kube-host-uncordon controller-0 (future)
- system kube-upgrade-complete
- system kube-upgrade-delete

This change does the following additional cleanup:
- remove patch-apply intermediate steps during kube-upgrade
- remove all patching mixings from kube upgrade strategy

Test Plan:
  PASS: (multi-upgrade) AIO-SX kube upgrade orchestration
        v1.21.8 to v1.24.4
  PASS: (single upgrade) AIO-DX kube-upgrade orchestration
        v1.21.8 to v1.22.5

Depends-On: https://review.opendev.org/c/starlingx/config/+/877988

Story: 2010565
Task: 47741
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: Id654212e198321c6518b8feaa85cd5301167735c
2023-04-28 13:10:29 +00:00
Luan Nunes Utimura db3f5525d8 NFVI: Default guest_plugin_disabled to True
Following the work previously done in [1] and [2] to deactivate
guest-related services in VIM, this commit changes the default value of
config variable `guest_plugin_disabled` to `True` so that it reflects
the change proposed in [3] (same modification but on Puppet's side).

As reported in [3], loading this plugin while having some of its
services deactivated (or functionalities removed) has proven to be
a problem when stx-openstack is applied, as both nova-compute service
and hypervisor are caught in an enable/disable loop indefinitely after
the first host lock/unlock with the application applied.

[1] https://review.opendev.org/c/starlingx/nfv/+/869817
[2] https://review.opendev.org/c/starlingx/nfv/+/870538
[3] https://review.opendev.org/c/starlingx/stx-puppet/+/879359

Test Plan (on AIO-SX):
PASS - Remove 'guest_plugin_disabled' from /etc/nfv/vim/config.ini,
       reload VIM services and verify that the guest plugin wasn't
       loaded by default:
       $ tail -f /var/log/nfv*.log

Related-Bug: 2015088

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/879359

Signed-off-by: Luan Nunes Utimura <LuanNunes.Utimura@windriver.com>
Change-Id: I7e254fe2db2a6bcc6b98a26cc712c5d03ef7ffad
2023-04-05 08:58:50 -03:00
Al Bailey 9784b7526a pylint cleanup for nfv to use standard modules
Cleanup the code to allow unsuppressing the following
pylint checks:

 C0207 use-maxsplit-arg
 R1730 consider-using-min-builtin
 R1731 consider-using-max-builtin

These three types of pylint checks report scenarios where
a standard python command can be used rather than trying
to 'reinvent the wheel'.

There is no functional difference for any of these changes

Test Plan:
  PASS: tox
  PASS: create / delete a kubernetes upgrade strategy

Story: 2010531
Task: 47617
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: Ieb638cbcf7280f6fa322a062467dfc098efdec5e
2023-03-15 15:28:54 +00:00
Zuul 4ae4240c80 Merge "Cleanup pep8 un-used variable warnings" 2023-03-13 20:14:09 +00:00
Zuul 248410c78e Merge "Adding unit tests to improve nfvi infra coverage" 2023-03-13 20:14:03 +00:00
Al Bailey 6f68610806 Cleanup pep8 un-used variable warnings
Enable F841 flake8 check 'variables assigned but unused'.

This primarily involved removing the variable or renaming
it as '_' to indicate it is an unused variable.

There should be no runtime difference from these changes.

Test Plan:
  PASS: tox
  PASS: sw-manager strategy creation (CLI commands)

Story: 2010531
Task: 47606
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I2431fe0c0ec7d6e01ab72668efc20c9c5e77e88f
2023-03-08 15:18:00 +00:00
Zuul 39238b4467 Merge "Change pod rollout timeout for K8s root CA change" 2023-03-01 14:53:51 +00:00
Al Bailey a867908fc5 Adding unit tests to improve nfvi infra coverage
The nfvi_infrastructure_api plugin has almost no unit test
line coverage.

This review adds some basic creation tests and validates
some some static methods and properties.

This improves overall code coverage from 22% to 23%
This improves code coverage for that module from 0% to 6%

This change has no impact on runtime.

Test Plan:
  PASS: tox -e coverage

Story: 2010531
Task: 47551
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I24a532739e3bd62e88aab96cf6aaa3f607b5d198
2023-03-01 14:10:25 +00:00
Some Body 3b002163e1 Unsuppress pylint E1101 no-member
E1101 assignment-from-no-return was globally suppressed in pylint.rc
Remove the suppression and instead suppress on a per-line basis.
This protects future submissions from allowing this type of error.

The per-line suppressions may be cleaned up in the future when the
classes are refactored into their own files.

These changes have no runtime impact.  They are comments only.

Test Plan:
  Pass: tox -e pylint

Story: 2010531
Task: 47534
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I07895bc5e9d6fb11d62a8da50e8b02835493aa76
2023-02-25 21:07:08 +00:00
Zuul 7c0853eb4a Merge "Unsuppress pylint E1111 assignment-from-no-return" 2023-02-17 20:42:49 +00:00
Joao Victor Portal 244b29636c Change pod rollout timeout for K8s root CA change
The Puppet timeout for pod rollout in stages "trustbothcas" and
"trustnewca" was recently changed from 600s to 3600s. In this commit,
the timeouts for these stages in the kubernetes root CA update strategy
are also updated.

Test Plan:

PASS: In a AIO-SX, execute the Kubernetes Root CA update through
sw-manager strategy and check that it is completed successfully without
reaching any timeout in stages "trustbothcas" and "trustnewca".

PASS: In a AIO-SX, artificially make the pod rollout script hang for 15
minutes and check that the stages "trustbothcas" and "trustnewca" are
still completed successfully.

Closes-Bug: 2004594
Signed-off-by: Joao Victor Portal <Joao.VictorPortal@windriver.com>
Change-Id: I96d04de95e424e15bd79f049be644909bb0dcff7
2023-02-15 20:31:11 -03:00
Al Bailey 94321e9d57 Debian: python3 fix for OpenStackRestAPIExceptions
When the NFV uses tasks and futures and coroutines to
interact with openstack APIs, an OpenStackRestAPIException
can be returned as a task result.

The exception needs to be 'pickled' when sent across the
queue/socket for the 'simulated' asyncio workflow.

However, the pickle code for that exception was broken in
python3. It was relying on a python2 'message' attribute
of the base Exception class to exist, which no longer
exists (in python3)

This was causing the pickle command to quietly fail and
the code waiting for the task result would timeout and
not report back the failure information.

The fix is to ensure that there is a 'message' property
on that exception type.

Unit tests have been added for all the pickleable
exceptions, to ensure their '__reduce__' and other
interactions with 'pickle' are not reporting any failures.

Test Plan:
 PASS: create and apply a kube-upgrade-strategy for an
 older version of kubernetes and observe it reports its
failure error (rather than a timeout)

Closes-Bug: #2007285
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I3a8776163a78330810ae1097ddd1831b1b26a212
2023-02-14 15:21:13 +00:00
Zuul 7753ea0dc7 Merge "Add support for SQLAlchemy 1.4 to NFV" 2023-02-13 16:51:51 +00:00
Zuul 8e5be6762c Merge "Update debian package versions to use git commits" 2023-02-13 16:48:44 +00:00
Al Bailey d42b8694d1 Unsuppress pylint E1111 assignment-from-no-return
E1111 assignment-from-no-return had being suppressed in pylint.rc
and now is no longer being suppressed.
This allows future code changes to be validated against this check.

- the kubernetes client was suppressing the incorrect code as part of a
workaround to support 'None' names.
- A unit test was incorrectly assigning the result of a method call and
has been updated.

These changes have no runtime impact.

Test Plan:
  Pass: tox -e pylint

Story: 2010531
Task: 47344
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I4c2a5ce60c0ceef4328af8ccedcd1e71b1d42330
2023-02-11 11:45:31 -05:00
Zuul 319d78f0dc Merge "Host compute service failure alarm removal" 2023-02-11 13:54:13 +00:00
Vanathi.Selvaraju 65bbbe1f0d Host compute service failure alarm removal
Removal of stale alarm 270.001(Host compute service failure)
is raised by the vim. This might be an old reference to nova.
It’s likely not in use since stx.

Test Plan:
PASS: Verify with a load without the changes (removal of alarm)
and the event log in platform.log shows an entry for 270.001 alarm.
PASS: Verify with a load with changes of alarm removal and
the event log in platform.log does not show an entry for 270.001 alarm.

Depends-On: https://review.opendev.org/c/starlingx/fault/+/872603

Closes-Bug: 2004744

Change-Id: Icafb079fc2b58fb4126ac325804901ebd3f8f66e
Signed-off-by: Vanathi.Selvaraju <vanathi.selvaraju@windriver.com>
2023-02-10 09:27:46 -05:00
Al Bailey 611cb9317c Update debian package versions to use git commits
The Debian packaging has been changed to reflect all the
git commits under the directory, and not just the commits
to the metadata folder.

This ensures that any new code submissions under those
directories will increment the versions.

Note:
 nfv uses GITREVCOUNT because its src_path is 'None'

Test Plan:
 PASS: build-iso and unlock AIO-SX
 PASS: build-pkgs -p nfv
 PASS: build-pkgs -p nova-api-proxy
 PASS: commit a file change under nfv, and rebuild and
     see that the nfv version has increased.
 PASS: commit a file change one directory before nfv, and
    rebuild and see that the nfv version has not increased.

Story: 2010550
Task: 47222

Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I06804af6e356174b608b18219b1f1c8176c99375
2023-02-09 17:00:49 +00:00
Luan Nunes Utimura 1e475dca0c Debian: Fix nova actions
Since the platform migration to Debian, it was observed that the
following Nova actions stopped working:
  - pause;
  - unpause;
  - suspend;
  - resume;
  - live-migration.

The reason behind that is that some packages related to Nova, which have
already been migrated to Debian, still have some incompatibilities with
Python 3. Consequently, whenever these Nova actions were executed, some
exceptions occurred on the nova-api-proxy and NFV side, preventing them
from working.

Therefore, this change aims to improve this compatibility.

Most of the changes were necessary due to the fact that in Python 3
there is more of a distinction between `bytes` and `str`, whereas in
Python 2 `bytes` is just an alias for `str`.

Test Plan (on AIO-DX):
PASS - Successfully perform a VM pause, unpause, suspend, resume.
PASS - Successfully perform a VM live-migration.

Closes-Bug: 2003813

Signed-off-by: Luan Nunes Utimura <LuanNunes.Utimura@windriver.com>
Change-Id: I918fe6e3deaa68630c797449649012e9fbf16fe4
2023-02-03 08:53:52 -03:00
Al Bailey c4890e651b Suppress new pylint 2.16 errors in nfv
pylint 2.16.0 was released on  2023-02-01
One of the new check is causing stx/nfv to fail tox pylint

From the 2.16.0 release notes:
Rename broad-except to broad-exception-caught and add new
checker broad-exception-raised which will warn if general
exceptions BaseException or Exception are raised.

The fix is to suppress the new error code, and possibly
fix it in a future task.

Story: 2010531
Task: 47254

Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I3e353809943ffa66660bf39b99ecaf90f5f3fddc
2023-02-01 21:20:15 +00:00
Al Bailey b8ff450f58 Add support for SQLAlchemy 1.4 to NFV
An internal variable was dropped in SQLALchemy 1.4.

This update attempts to use the newer syntax from 1.4
and falls back to the older syntax if the newer mechanism
cannot be used.

This will allow StarlingX to migrate to a newer version of
SQLALchemy without breaking the NFV database code.

Test Plan:
  PASS: verify coverage using SQLAlchemy 1.2 that the
 exception code is executed and unit tests pass.
  PASS: verify coverage using SQLAlchemy 1.4 that the
 exception code is not executed and unit tests pass.
  PASS: Build/Install/Bootstrap/Unlock AIO-SX to verify
 existing runtime behaviour is not impacted.

Story: 2010531
Task: 47237

Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I4063dac0b3229b4c1fdb6c5121154665ffc32903
2023-01-30 21:08:56 +00:00
Zuul 2debfbc72a Merge "Replace SafeConfigParser with ConfigParser" 2023-01-30 17:43:15 +00:00
Zuul 824804422c Merge "Cleanup pylint.rc file" 2023-01-30 17:35:22 +00:00
Zuul 5bf5c96c7d Merge "Port stx-nova-api-proxy image to stx-debian" 2023-01-26 17:56:04 +00:00
Luan Nunes Utimura 6a552b6528 Port stx-nova-api-proxy image to stx-debian
This change enables building the stx-nova-api-proxy image within the
Debian build framework. It is now based on stx-debian and following the
new convention for StarlingX images.

Test Plan:
PASS: Build stx-nova-api-proxy image
PASS: Manually upload built image to a system, use helm-override to
      change the nova-api-proxy container image and apply stx-openstack
PASS: Ensure the nova-api-proxy pod successfully starts and is running
PASS: Ensure nova-api-proxy pod liveness and readiness probes are
      healthy

Depends-On: https://review.opendev.org/c/starlingx/root/+/871314

Story: 2010072
Task: 47217

Signed-off-by: Luan Nunes Utimura <LuanNunes.Utimura@windriver.com>
Change-Id: I80e4a046e6ae89b1dae2f583fb14723713194e45
2023-01-26 07:39:17 -03:00