Commit Graph

5324 Commits

Author SHA1 Message Date
Zuul db68afcb23 Merge "Create unit tests for the metadata validation logic" 2024-04-18 20:42:33 +00:00
Zuul 37beadd020 Merge "Introduce multi-version auto downgrade for apps" 2024-04-18 17:53:43 +00:00
Zuul 4f66b01ad8 Merge "Dual-stack: ceph matches address name and family" 2024-04-18 15:41:36 +00:00
Zuul 4d02bf979a Merge "i_host capabilities column cleaning" 2024-04-18 12:39:40 +00:00
Eduardo Juliano Alberti f5c6e2130c i_host capabilities column cleaning
This change adds a new activate script to cleaning old information
from capabilities column from i_host table.

To allow the usage of Kubernetes Power Manager StarlingX app, the
information of minimum and maximum allowed frequency, and cstates
available were stored in capabilities column for each node (i_host
table).

Due to a new approach, information is stored in specific columns
for each parameter. To avoid residual information in the capabilities
column, during the information migration process, the script removes
such parameters from the column.

Test Plan:
PASS: capabilities column cleaned (target information was removed)
PASS: script successfully skipped in other action steps

Story: 2011069
Task: 49822

Change-Id: Ifc955ecdb0c2a5b47c0dadbda083811fd2456b05
Signed-off-by: Eduardo Juliano Alberti <eduardo.alberti@windriver.com>
2024-04-18 08:37:40 -03:00
Zuul b6682707ae Merge "Implement new certificate APIs" 2024-04-17 21:55:09 +00:00
Zuul 5d18fe7b75 Merge "Add new sysinv unauthenticated region_id api" 2024-04-17 21:55:03 +00:00
rummadis d7d886c3e1 Add new sysinv unauthenticated region_id api
This update adds a new sysinv unauthenticated 'region_id' api that
return region_name

curl  http://pxecontroller:6385/v1/isystems/region_id

System inventory will return a dictionary containing the
region_name

{
"region_name": "<system.region_name>"
}

Test Plan:

PASS: Verify success path handling
PASS: Verify multiple dictionary keys

Story: 2011100
Task: 49864

Change-Id: Iaeb77fdd90e5eb06cf9fc9d7da994dd22bfbee14
Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>
2024-04-17 14:21:28 -04:00
amantri cca5becb65 Implement new certificate APIs
Add an API /v1/certificate/get_all_certs to retrieve all the
platform certs(oidc, wra, adminep, etcd,
service account certs, system-restapi-gui-certificate,
open-ldap, openstack, system-registry-local-certificate,
k8s certs) in JSON response and use this response to format
the "system certificate-list" output as "show-certs.sh" output.

Add an API /v1/certificate/get_all_k8s_certs to retrieve all the
tls,opaque certs in JSON response and use this response to
format the "system k8s-certificate-list" output as
"show-certs.sh -k" output

Implement "system certificate-show <cert name>",
"system k8s-certificate-show <cert name>" to show the full
details of the certificate.

Implement filters in api and cli to show the expired and expiry
certificates

Testcases:
PASS: Verify all the cert values(Residual Time,Issue  Date, Expiry Date
      ,Issuer,Subject,filename,Renewal) are showing fine for all the
      following cert paths when "system certificate-list" is executed
	  /etc/kubernetes/pki/apiserver-etcd-client.crt
	  /etc/kubernetes/pki/apiserver-kubelet-client.crt
	  /etc/pki/ca-trust/source/anchors/dc-adminep-root-ca.crt
	  /etc/ssl/private/admin-ep-cert.pem
	  /etc/etcd/etcd-client.crt
	  /etc/etcd/etcd-server.crt
	  /etc/kubernetes/pki/front-proxy-ca.crt
	  /etc/kubernetes/pki/front-proxy-client.crt
	  /var/lib/kubelet/pki/kubelet-client-current.pem
	  /etc/kubernetes/pki/ca.crt
	  /etc/ldap/certs/openldap-cert.crt
	  /etc/ssl/private/registry-cert.crt
	  /etc/ssl/private/server-cert.pem
PASS: Verify all the cert values(Residual Time,Issue Date, Expiry Date
      ,Issuer,Subject,filename,Renewal) are showing fine for all the
       service accts when "system certificate-list" is executed
          /etc/kubernetes/scheduler.conf
          /etc/kubernetes/admin.conf
	  /etc/kubernetes/controller-manager.conf
PASS: Verify the system-local-ca secret is shown in the output of
      "system certificate-list"
PASS: List ns,secret name in the output of ssl,docker certs if the
      system-restapi-gui-certificate, system-registry-local-certificate
      exist on the system when "system certificate-list" executed
PASS: Apply oidc app verify that in "system certificate-list" output
      "oidc-auth-apps-certificate", oidc ca issuer and wad cert are
      shown with all proper values
PASS: Deploy WRA app verify that "mon-elastic-services-ca-crt",
      "mon-elastic-services-extca-crt" secrets are showing in the
      "system certificate-list" output and also kibana,
      elastic-services cert from mon-elastic-services-secrets secret
PASS: Verify all the cert values(Residual Time,Issue Date, Expiry Date
      ,Issuer,Subject,filename,Renewal) are showing fine for all the
      Opaque,tls type secrets when "system k8s-certificate-list" is
      executed
PASS: Execute "system certificate-show <cert name>" for each
      cert in the "system ceritificate-list" output and
      check all details of it
PASS: Execute "system certificate-list --expired" shows the
      certificates which are expired
PASS: Execute "system certificate-list --soon_to_expiry <N>"
      shows the expiring certificates with in the specified
      N days
PASS: Execute "system k8s-certificate-list --expired" shows the
      certificates which are expired
PASS: Execute "system k8s-certificate-list --soon_to_expiry <N>"
      shows the expiring certificates with in the specified
      N days
PASS: On DC system verify that admin endpoint certificates are
      shown with all values when "system certificate-list" is
      executed
PASS: Verify the following apis
	/v1/certificate/get_all_certs
        /v1/certificate/get_all_k8s_certs
        /v1/certificate/get_all_certs?soon_to_expiry=<no of days>
        /v1/certificate/get_all_k8s_certs?soon_to_expiry=<no of days>
        /v1/certificate/get_all_certs?expired=True
        /v1/certificate/get_all_k8s_certs?expired=True

Story: 2010848
Task: 48730
Task: 48785
Task: 48786

Change-Id: Ia281fe1610348596ccc1e3fad7816fe577c836d1
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
2024-04-17 14:18:21 -04:00
Zuul 8d6d0605c3 Merge "Deprecate,add new system certificate commands" 2024-04-17 17:42:50 +00:00
Zuul a4ab746619 Merge "Update network interface puppet resource gen to support dual-stack" 2024-04-17 15:29:33 +00:00
amantri 732437f3cd Deprecate,add new system certificate commands
Deprecate the existing system certificate commands and
add new commands

Testcases:
PASS: Bootstrap the system with changes and verify that system is
      installed successfully
PASS: Run update_platform_certificates and verify
      it is successful
PASS: Verify the following commands are not working anymore
	system modify --https_enabled=True
	system modify --https_enabled=False
PASS: Verify new following ca commands
	system ca-certificate-install <pemfile>
	system ca-certificate-list
	system ca-certificate-show <UUID>
	system ca-certificate-uninstall
PASS: Verify new openstack commands are working
	system os-certificate-install --mode < server | ca >  <pemfile>
	system os-certificate-list
	system os-certificate-show <UUID>
PASS: Verify the following are not working anymore
	system certificate-install -m ssl <pemfile>
	system certificate-install -m openstack <penfile>
        system certificate-install -m openstack_ca <pemfile>
	system certificate-install -m ssl_ca <pemfile>
	system certificate-install -m docker_registry <penfile>
	system certificate-uninstall -m ssl_ca <penfile>

Story: 2010848
Task: 48474

Change-Id: Ic5d4f3c60b196f5be0602502dcd8a3af50cc8e62
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
2024-04-17 15:20:37 +00:00
Zuul 4604cbf410 Merge "Send the correct mgmt-IP to mtce" 2024-04-17 14:00:47 +00:00
Zuul 5f4e3a3378 Merge "Adding QAT devices support in sysinv" 2024-04-17 13:49:58 +00:00
Zuul 4c510b0cad Merge "Update dnsmasq.hosts on every dhcp lease event and process startup" 2024-04-17 13:18:10 +00:00
Andre Kantek 4459b82f32 Dual-stack: ceph matches address name and family
This change splits the IP service for each platform network into ipv4
and ipv6 to support dual-stack. It still supports single-stack (when
there is only ipv4 or ipv6). Each service is instantiated if there is
a configuration for it.

Ceph was not taking into account the address family to generate the
list of IPs using the primary pool. This lead to a wrong puppet
variable content.

Test Plan:
[PASS] install, lock, unlock and swact for the following setups
       - AIO-SX (IPv4 and IPv6)
       - AIO-DX (IPv4 and IPv6)
       - Standard (IPv4 and IPv6)
       - DC (SisCtrl=AIO-DX, subcloud=AIO-SX)
[PASS] Add dual-stack configuration and validate services operation
       with lock, unlock and swact:
       - AIO-SX (IPv4 and IPv6)
       - AIO-DX (IPv4 and IPv6)
       - Standard (IPv4 and IPv6)
       - DC (SisCtrl=AIO-DX, subcloud=AIO-SX), using the admin network

Story: 2011027
Task: 49763

Change-Id: Icda298c51cdd2535146b1e11669f1c6f64c232b7
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-04-17 07:24:16 -03:00
Zuul e0a9c407ac Merge "Apply Helm Overrides to initially disabled charts." 2024-04-16 20:31:46 +00:00
Lucas Ratusznei Fonseca ff3a5d2341 Update network interface puppet resource gen to support dual-stack
This change updates the puppet resource generation logic for network
interfaces to suport dual-stack.

Change summary
==============

- Aliases / labels
    Previously, each alias was associated to a specific network. Now,
    since more than one address can be associated to the same network,
    the aliases are also associated to addresses. The label name is
    now :<network_id>-<address_id>. The network_id is 0 if there's no
    network associated with the alias, that's the case for the base
    interface config or for the cases where the address is not
    associated to a network. The address_id is 0 if there's no address
    associated with the alias, which is the case for the base config
    and for when there's no static address associated to the network,
    i.e. the method is DHCP.

- Static addresses
    Previously, interfaces with more than one static addresses not
    associated with pools would be assigned just the first one. Now,
    an alias config is generated for each address.

- CentOS compatibility
    All the code related to CentOS was removed.

- Duplex-direct mode
    Duplex-direct systems must have DAD disabled for management and
    cluster-host interfaces. The disable DAD command is now generated
    only in the base interface config for all types of interfaces.

- Address pool names
    The change assumes a new standard for address pool names, they will
    be formed by the old names with the suffixes '-ipv4' or '-ipv6'.
    For example: management-ipv4, management-ipv6. Since other systems
    that rely on the previous standard are not yet upgraded to
    dual-stack, the constant DUAL_STACK_COMPATIBILITY_MODE was
    introduced to control resource generation and validation logic in a
    way that assures compatibility. The constant and the conditionals
    will be removed once the other modules are updated. The
    conditionals were implemented more as a way to highlight which
    parts of the code are affected and make the changes easier in the
    future.

- Tests / DB Base
    The base class for tests was updated to generate more consistent
    database states. Mixins for dual-stack cases were also created.

- Tests / Interface
    Most of the test functions in the class InterfaceTestCase caused
    unnecessary updates to the database and the context. The class
    was splitted in two, the first one containing the tests that only
    need the basic database setup (controller, one interface
    associated with the mgmt network), and the other one for the tests
    that need different setups.
    A new fixture was created to test multiple system configs (IPv4,
    IPv6, dual-stack), which inspects in detail the generated
    hieradata. The tests associated with the InterfaceHostV6TestCase
    were moved to the new fixture, and new ones were introduced.

Test plan
=========

Online setup tests
------------------

System: STANDARD (2 Controllers, 2 Storages, 1 Worker)

Stack setups:
  - Single stack IPv4
  - Single stack IPv6
  - Dual stack, primary IPv4
  - Dual stack, primary IPv6

[PASS] TC1 - Online setup, regular ethernet
    mgmt0 (Ethernet) -> PXEBOOT, MGMT, CLUSTER_HOST

[PASS] TC2 - Online setup, VLAN over ethernet
    pxe0 (Ethernet) -> PXEBOOT
    mgmt0 (VLAN over pxe0) -> MGMT, CLUSTER_HOST

[PASS] TC3 - Online setup, bondig
    mgmt0 (Bond) -> PXEBOOT, MGMT, CLUSTER_HOST

[PASS] TC4 - Online setup, VLAN over bonding
    pxe0 (Bond) -> PXEBOOT
    mgmt0 (VLAN over pxe0) -> MGMT, CLUSTER_HOST

Installation tests
------------------

Systems:
  - AIO-SX
  - AIO-DX
  - Standard (2 Controllers, 2 Storages, 1 Worker)

[PASS] TC5 - Regular installation on VirtualBox, IPv4

[PASS] TC6 - Regular installation on VirtualBox, IPv6

Data interface tests
--------------------

System: AIO-DX

Setup:
    data0 -> Ethernet, ipv4_mode=static, ipv6_mode=static
    data1 -> VLAN on top of data0, ipv4_mode=static, ipv6_mode=static

For both interfaces, the following was performed:

[PASS] TC7 - Add static IPv4 address
[PASS] TC8 - Add static IPv6 address
[PASS] TC9 - Add IPv4 route
[PASS] TC10 - Add IPv6 route
[PASS] TC11 - Remove IPv4 route
[PASS] TC12 - Remove IPv6 route
[PASS] TC13 - Remove static IPv4 address
[PASS] TC14 - Remove static IPv6 address

Story: 2011027
Task: 49815
Change-Id: Ib9603cbd444b21aefbcd417780a12c079f3d0b0f
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
2024-04-16 16:23:15 -03:00
Fabiano Correa Mercer 4919bf7213 Send the correct mgmt-IP to mtce
After the management reconfiguration, it was not possible to apply a reboot-required
patch because the sysinv was sending the old mgmt IP adress to the mtce.
Consequently, mtce wasn't creating the required file (/var/run/.node_locked) during
the host-lock command.
This file is essential for the sw-patch tool to proceed with the installation.

Additionally, the management network reconfiguration runtime manifest can be executed
prematurely if the MGMT_NETWORK_RECONFIGURATION_ONGOING flag is used.
However, users might introduce other changes that could unintentionally trigger the
runtime manifests before the host-unlock command.
This could lead to unexpected keystone changes, potentially causing CLI blockage or
system reboots.

The MGMT_NETWORK_RECONFIGURATION_ONGOING flag is created when initiating management
network reconfiguration commands and it is intended to avoid update on the dnsmasq
files until system reboot.
Changed to MGMT_NETWORK_RECONFIGURATION_UNLOCK because this flag is intended to
guarantee keystone changes only occur during the unlock command.

Tests dome:
IPv4 AIO-SX fresh install
IPv4 AIO-DX with mgmt in vlan fresh install
IPv4 DC with subcloud AIO-SX
IPv4 AIO-SX mgmt reconfig and apply a reboot-required patch
IPv4 subcloud AIO-SX mgmt reconfig and apply a reboot-required patch

Partial-Bug: #2060066
Story: 2010722
Task: 49810

Change-Id: I138d8e31edd60a41a4595cfb8bd2dc478bc01013
2024-04-16 18:00:26 +00:00
Zuul 78b3fe851f Merge "Create a set_users_options method in openstack endpoint config" 2024-04-15 18:32:23 +00:00
Eric MacDonald 2a2733e7b3 Update dnsmasq.hosts on every dhcp lease event and process startup
A call to _generate_dnsmasq_hosts_file is added to sysinv
conductor handle_dhcp_lease.

There are currently a number of calls to _generate_dnsmasq_hosts_file
as shown in this list.

  _create_or_update_address
  _allocate_addresses_for_host
  _unallocate_addresses_for_host
  _remove_addresses_for_host
   mgmt_ip_set_by_ihost
   reserve_ip_for_third_monitor_node
   reserve_ip_for_cinder
  _init_controller_for_upgrade

Most of these are for specific system configuration changes.
However, the introduction of the pxeboot network repurposed
the use and content of the dnsmasq.hosts file.

Given that the dnsmasq hosts file no longer maintains management
network addresses and hostname associations, the question of whether
the call to _generate_dnsmasq_hosts_file is needed for them at all
anymore. That could be discussed or commented on in this review.

Testing system responsiveness to dhcp lease changes revealed
that the dnsmasq.hosts file would only get updated when one of
the aforementioned procedures are called. In many cases it took
a long time after the leases file was updated before the change
was realized by the system through a dnamasq.hosts update.

Adding this call to every dhcp lease update makes sense in that
it makes the responsiveness of dhcp lease changes immediate.

Also, there is the possibility that a lease event is missed if it
occurs while the conductor is not running. This may occur over a
swact, patch or simply process restart by puppet.

This update addrersses this gap by adding a one time call to
_generate_dnsmasq_hosts_file in the _controller_config_active_apply
audit so that the dnsmasq hosts and addn_hosts files get update
over a conductor process restart.

Test Plan:

PASS: Build, test and install AIO DX Plus system.
PASS: Verify the dnsmasq.hosts and dnsmasq.addn_hosts files get
      updated immediately following a dnsmasq dhcp lease event
      and conductor process restart.
PASS: Verify the dnsmasq.hosts and dnsmasq.addn_hosts files get
      update over a swact.

Story: 2010940
Task: 49788
Change-Id: I8d55b865f682bd5e0e210481a9dff318baab436b
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-04-15 17:46:55 +00:00
Igor Soares 1d228bab28 Introduce multi-version auto downgrade for apps
Introduce automatic downgrade of StarlingX applications to the
multiple application version feature.

Auto downgrades are triggered by default in scenarios which the applied
application bundle is not available anymore under the applications
folder but an older version of the same app is. For instance, when
platform patches are removed and a previously available ostree is
deployed, thus restoring the old set of available apps under the
/usr/local/share/applications/helm/ directory.

A new section called 'downgrades' can be added to the metadata.yaml file
to disable the default behavior. For example:

downgrades:
  auto_downgrade: false

When auto downgrades are disabled the current applied version remains
unchanged.

Test plan:
PASS: build-pkgs -a && build-image
PASS: AIO-SX fresh install.
PASS: Apply platform-integ-apps.
      Update platform-integ-apps using a tarball that is not available
      under /usr/local/share/applications/helm/ and that does not
      contain the downgrade section.
      Confirm that platform-integ-apps is downgraded.
PASS: Apply platform-integ-apps.
      Update platform-integ-apps using a tarball that is not available
      under /usr/local/share/applications/helm/ and that has the
      auto_downgrade metadata option set to 'true'.
      Confirm that platform-integ-apps is downgraded.
PASS: Apply platform-integ-apps.
      Update platform-integ-apps using a tarball that is not available
      under /usr/local/share/applications/helm/ and that has the
      auto_downgrade metadata option set to 'false'.
      Confirm that the originally applied platform-integ-apps version
      remains unchanged.
PASS: Run a kubernetes upgrade with apps to be pre and post updated.
      Confirm that apps are successfully updated and not downgraded
      after the Kubernetes upgrade has finished.

Story: 2010929
Task: 49847

Change-Id: I33f0e0a5b8db128aef76fb93ba322364881097cf
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-04-15 12:56:05 -03:00
Zuul 2b50ac3e06 Merge "Add pxeboot network hostname resolution for controllers" 2024-04-15 14:52:38 +00:00
Joshua Reed 967eedadb7 Apply Helm Overrides to initially disabled charts.
The previous implementation of the _get_list_of_charts
method would not take into account whether or not a
particular application chart was enabled or disabled.

This change now only includes charts that are enabled, or
if the function caller asks for all of them with the
include_disabled override set.  The override is set as a
part of the perform_app_upload routine to ensure overrides
are generated and applied to all charts, including those
which are initially disabled.

This change also seeks to handle issues where the
kustomize-orig.yaml file is not created by the time the
perform_app_upload routine runs _get_list_of_charts by
including an extra check.

Finally, the override generation in the perform_app_apply
function is moved to happen first in the sequence of events
such that the app object is populated with overrides prior
to any other operations occuring.  This must be done to
ensure the correct chart list is used.

This fix ensures that:

1. When all charts are needed then an option can be specified
   (i.e. when determining all the container images needed for
   the application) This is done with include_disabled flag.
2. All possible charts, as filtered by the metadata/user
   driven and DB stored enabled status, are consistently
   returned regardless of the current state of the top-level
   application kustomization.yaml.
3. A final check for kustomization-orig.yaml is performed and
   the file is created, if missing, before
   _get_list_of_charts executes with include_disabled=True

Test Plan:
PASS: build-pkgs -a && build-image
PASS: AIO-SX full install with clean bootstrap
PASS: Enable the cms-replication chart on the dell-storage app
PASS: Use system helm-override-update to pass
      --set config.clusterID=ClusterA
PASS: system application-apply dell-storage
PASS: Check the YAML structure of the
      configmap/dell-replication-controller-config
      for ClusterA and properly formatted.
PASS: Additional check to ensure that stx-openstack application
      successfully uploads and applies.
PASS: Check that a helm override are generated even for an
      application that doesn't have a kustomize operator.  This
      was done for the metrics-server app.  A helm override was
      created and the subsequent metrics-server.yaml file in
      /opt/platform/helm contained the override after the
      system applciation-apply command was run.

Relates to previous attempt at a fix:
https://review.opendev.org/c/starlingx/config/+/890570

Closes-Bug: 2029303

Change-Id: I4c501b982e4061e5067ca0e8e43f37a9eecfcb68
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
2024-04-12 13:12:28 -06:00
Erickson Silva de Oliveira 0c852b54fb Fix condition for deleting database partition
In change [1], a restore in progress check was added, however
the flag used for this is removed at the end of the restore
playbook that is executed on controller-0, causing possible
problems if agents on other nodes send a report with incomplete
information due to restore.

To resolve this, the _verify_restore_in_progress() function
was used, which queries the restore status in the database,
and is only modified when executing the "system restore-complete"
command. This way we will know that from then on the agent's
reports can be considered.

Additionally, the “system host-reinstall” command has also
been observed to cause similar issues if run on a restored system.

To prevent this from happening, another condition was added,
which checks if inv_state is "reinstalling".

[1]: https://review.opendev.org/c/starlingx/config/+/899510

Test-Plan:
  PASS: AIO-SX fresh install
  PASS: Standard fresh install
  PASS: create/modify/delete a partition in the
        controller-0/controller-1/compute-0 followed by a reboot and check the status
        with 'system host-disk-partition-list'.
  PASS: Restart of sysinv-conductor and/or sysinv-agent services
        during puppet manifest applying.
  PASS: AIO-SX Backup and Restore
  PASS: Standard Backup and Restore

Closes-Bug: 2061170

Change-Id: I6c142439c9f13dcdeb493892a5a9283f6a1e2d00
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2024-04-12 14:31:49 -03:00
David Bastos fa5855845c Create unit tests for the metadata validation logic
These changes organize and add more unit tests to
the Validate_metadata_file function.

Improvements implemented:
  - A separate test file was created to be exclusive
    to the app_metadata.py file.
  - Data input was organized in external files.
  - Unit tests added individually to each key in the
    yaml file.

Test plan:
PASS: Run tox py39, pylint  and verify that they are
      all passing.
PASS: The output of the 'tox -e cover' was improved
      from 13% to 58%. Within the same file there are
      other functions to be tested that are not the
      scope of this demand.

Story: 2010929
Task: 49834

Change-Id: If4bdb734990582f302b1e0d20179e02c524de546
Signed-off-by: David Bastos <david.barbosabastos@windriver.com>
2024-04-12 11:47:30 -03:00
Zuul a1aa5b93fb Merge "Add Intermediate CA support to IPsec configuration" 2024-04-12 14:39:31 +00:00
Zuul 019eeb5016 Merge "Fix IPsec certificates renewal script" 2024-04-12 14:31:29 +00:00
Zuul 82c95f934e Merge "Make minimum Kubernetes version field mandatory" 2024-04-11 13:58:16 +00:00
Leonardo Mendes 2446746b41 Fix IPsec certificates renewal script
This commit fix IPsec certificates renewal script, which is set up
as a cron job to run daily at mid night. Due to a recent change, the
name of system-loca-ca certificate was changed to system-local-ca-1
and the function that returns the time left to the certificate
expiration was not working properly.

Test Plan:
PASS: Change system date to simulate IPsec cert is about to expire,
      adjust the system to work properly all pods and services needed
      to run ipsec-client and run the script, verify IPsec cert,
      private key and trusted CA cert are renewed, and IKE SAs and
      CHILD SAs are re-established.
PASS: Change the certificate /etc/swanctl/x509ca/system-local-ca-1.crt
      to simulate the IPsec trusted CA cert is different from
      the system-local-ca in k8s secret, run the script, verify the
      trusted CA and IPsec cert/key are renewed, and IKE SAs and CHILD
      SAs are re-established.

Story: 2010940
Task: 49850

Change-Id: Iea88211221d55df763f3f86853d402fffcb58c68
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
2024-04-11 10:43:51 -03:00
Eric MacDonald 7e34c08e96 Add pxeboot network hostname resolution for controllers
Worker and storage nodes currently support pxeboot-N hostname
nslookup resolution because they dhcp for their pxeboot network
lease address provided by dnsmasq persists.

However, this is not true for the controllers. Although they may
initially dhcp for a pxeboot address, that address is overridden
by their statically assigned pxeboot network address(es).

Adding the controller pxeboot network hostnames and addresses to
the dnsmasq.addn_hosts file yields proper pxeboot hostname resolution
for controllers.

From Controllers:

  [sysadmin@controller-0 ~$ nslookup pxeboot-2
  Server:         fdff:10:80:27::2
  Address:        fdff:10:80:27::2#53

  Name:   pxeboot-2
  Address: 192.168.202.3

  [sysadmin@controller-0 ~$ nslookup pxeboot-1
  Server:         fdff:10:80:27::2
  Address:        fdff:10:80:27::2#53

  Name:   pxeboot-1
  Address: 192.168.202.2

  sysadmin@controller-1:~$ nslookup pxeboot-1
  Server:         fdff:10:80:27::2
  Address:        fdff:10:80:27::2#53

  Name:   pxeboot-1
  Address: 192.168.202.2

  sysadmin@controller-1:~$ nslookup pxeboot-2
  Server:         fdff:10:80:27::2
  Address:        fdff:10:80:27::2#53

  Name:   pxeboot-2
  Address: 192.168.202.3

From Worker:

  sysadmin@worker-0:~$ nslookup pxeboot-1
  Server:         192.168.204.1
  Address:        192.168.204.1#53

  Name:   pxeboot-1
  Address: 169.254.202.2

Now all hosts in the system support pxeboot hostname nslookup

Also, this update adds an explicit call to _generate_dnsmasq_hosts_file
to the conductor process restart. This handles the case where dnsmasq
publishes a new lease to handle while the sysinv conductor is not
running or being restarted.

Test Plan:

PASS: Verify build and install AIO DX Plus system.
PASS: Verify format of new additions to dhsmasq.addn_hosts file.
PASS: Verify nslookup using controller pxeboot hostnames from either
      controller or even a worker node.
PASS: Verify no new pep8 warnings or errors are added to the conductor
      manager.py.

Story: 2010940
Task: 49829
Change-Id: Ibacdaadd24cf8c73fec98167d4a79fece341b1e6
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-04-10 17:16:31 +00:00
Raphael Lima c7f3d71c5f Create a set_users_options method in openstack endpoint config
This commit creates the set_users_options in
openstack_config_endpoints.py, which is required in [1] in order to set
the ignore_lockout_failure_attempts for both sysinv and admin users
during the sysinv bootstrap process.

[1]: https://review.opendev.org/c/starlingx/ansible-playbooks/+/913930

Test plan:
    Note that all of the test cases were performed with the changes from
    [1].
1. PASS: Verify the openstack user, role, service and endpoints
configuration for sysinv after bootstrap
2. PASS: Verify that both the admin and sysinv openstack users contain
the ignore_lockout_failure_attempts option set to True.

Story: 2011035
Task: 49844

Change-Id: I9c11e7305602d24f8170759f5f9363e4a6d012a4
Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>
2024-04-10 10:01:14 -03:00
Leonardo Mendes 49df34a4f4 Add Intermediate CA support to IPsec configuration
The current implementation of IPsec configuration by IPsec
server/client supports Root CA only. This commit adds support
for Intermediate CA. Now, IPSec Auth Server send both certificates
to IPSec Auth client to store. If it's a self-signed certificate,
the same certificate is send as Root CA.

Test plan:
PASS: In a DX system with available enabled active status with IPsec
      server being executed from controller-0 and a self-signed CA
      installed. Run "ipsec-client pxecontroller --opcode 1" in
      controller-1. Observe that 4 CAs certificates are created,
      but they are the same certificate. Observe that a security
      association is established between the hosts via "swanctl
      --list-sas" command.
PASS: In a DX system with available enabled active status with IPsec
      server being executed from controller-0 and a self-signed CA
      installed. Run "ipsec-client pxecontroller --opcode 2" in
      controller-1. Observe the previously created CertificateRequest
      was deleted and generated a new one for controller-1's node.
      The new certificate is sent to IPsec Client with Root and
      Intermediate CA, which is the same, to be stored and the
      swanctl rekey command executed successfully.
PASS: In a DX system with available enabled active status with IPsec
      server being executed from controller-0 and an intermediate CA
      installed. Run "ipsec-client pxecontroller --opcode 1" in
      worker-0. Observe that 4 CAs certificates are created,
      including Root and Intermediate CA. Observe that a security
      association is established between the hosts via "swanctl
      --list-sas" command.
PASS: In a DX system with available enabled active status with IPsec
      server being executed from controller-0 and an Intermediate CA
      installed. Run "ipsec-client pxecontroller --opcode 2" in
      worker-0. Observe the previously created CertificateRequest
      was deleted and generated a new one for worker-0's node.
      The new certificate is sent to IPsec Client with Root and
      Intermediate CA to be stored and the swanctl rekey command
      executed successfully.
PASS: In a DX system, simulate the IPsec cert is about to expire,
      run the script, verify IPsec cert, private key and trusted CA
      cert are renewed.

Story: 2010940
Task: 49825

Change-Id: I25c973350c4f460233a4e6e5ddda8366b948d120
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
2024-04-09 16:01:53 -03:00
Zuul 7299fa6118 Merge "Fix usage of address_get_by_name" 2024-04-05 22:24:24 +00:00
Steven Webster 92f00f80fa Fix usage of address_get_by_name
Recent commit https://opendev.org/starlingx/config/commit/634d4916
introducing changes for dual-stack networking made a change
to the DB api's address_get_by_name to return a list of IPv4 and
IPv6 addresses rather than a singular address.  As such, the list
can be empty if there are no addresses associated with a
particular name, rather than throwing an AddressNotFoundByName
exception.

Currently, the interface_network code depends on the
AddressNotFoundByName exception to determine whether a new
address needs to be allocated for a dynamic network.

This can cause an issue for worker, storage nodes when one of
their interfaces is associated with certain networks (such as
the storage network).

The symptom of this may be an interface which is 'DOWN' after
unlock, as it's interface configuration file is marked for a
'static' address, with no address present (because it wasn't
allocated).

This commit fixes the issue by simply checking that the list
returned by address_get_by_name is empty.

Test Plan:

- Fresh install of a Standard system.
- Ensure named addresses are present in the DB for all
  nodes (mgmt, cluster-host, oam for controllers)
- Create a new address pool and storage network and
  assign it to a worker node interface.
- Unlock the worker node and ensure the address is
  present on the interface and it is in 'UP' state.

Story: 2011027
Task: 49627

Change-Id: I9763f7c71797d9b321e7bf9e1b6db759378af632
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2024-04-05 10:56:47 -04:00
Igor Soares 3773c65f61 Make minimum Kubernetes version field mandatory
Make the supported_k8s_version:minimum metadata field mandatory for
StarlingX applications.

The minimum supported Kubernetes version must be informed in the
application metadata.yaml file. For instance:

supported_k8s_version:
  minimum: 1.24.4

Existing applications were previously updated to include the
mandatory field as part of story 2010929.

Test plan:
PASS: build-pkgs -a && build-image
PASS: AIO-SX fresh install
PASS: Atempt to upload a modified version of platform-integ-apps without
      the supported_k8s_version section.
      Confirm that the upload failed.
PASS: Atempt to upload a modified version of platform-integ-apps with
      the the supported_k8s_version section but containing only the
      maximum supported Kubernetes version.
      Confirm that the upload failed.
PASS: Upload/apply/update/remove/delete a working version of
      platform-integ-apps.

Story: 2010929
Task: 49538

Change-Id: I10160dfcfcc82eb8978b96c87e356db7b6cd227a
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-04-05 11:11:05 -03:00
Zuul 8ea80c4b27 Merge "Filter cert-mon for geo-redundancy in audit and DC_CertWatcher" 2024-04-04 21:54:21 +00:00
Kyle MacLeod 03443ef16c Filter cert-mon for geo-redundancy in audit and DC_CertWatcher
This commit adds a filter for querying all subclouds from dcmanager, to
account for secondary subclouds that should not be audited by cert-mon
for this system controller. The filter is performed against a list of
invalid deploy states that should be considered when querying
the list of subcloud from dcmanager.

Likewise, the DC_CertWatcher -> DCIntermediateCertRenew flow must ensure
that subclouds which are secondary to this system controller are ignored
by the kubernetes watch in place for the DC intermediate cert renewal
detection. Subclouds are filtered by the watch based on their online
state and their deploy-status. A subcloud with invalid deploy state is
ignored by this system controller.

Test Cases

PASS:
- Trigger audits on service restart. Verify that offline/secondary
  subclouds are excluded.
- Ensure full daily audit is executed. Verify that all subclouds
  belonging to this system controller are audited. Secondary subclouds
  are not audited.
- Verify that DC_CertWatcher -> DCIntermediateCertRenew watch fires are
  ignored for offline and/or invalid deploy state

Closes-Bug: 2060068

Change-Id: Iffe3d7c76db8d2f17aed0bfebc792af0f9d75ca2
Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
2024-04-04 15:36:06 -04:00
Rei Oliveira 5d853423ef Wrap 'classes' parameter as a list in config_dict object
This change fixes a type mismatch bug introduced in [1]. A python list
is expected but a python str is provided instead.

[1] https://review.opendev.org/c/starlingx/config/+/893566

This type mismatch will result in the 'deadlock' prevention logic to
never be invoked. In [2] below, the 'if classes' branch is never entered:

[2] 85a548ffcc/sysinv/sysinv/sysinv/sysinv/conductor/manager.py (L13481)

Test plan:

PASS: Run 'sudo chage -M 999 sysadmin; sudo chage -M 888 sysadmin;
      sudo chage -M 777 sysadmin'. Notice 'out of config alarm' in
      'fm alarm-list'. Verify that it clears up after about 5 min.
PASS: Verify in i_user db table and /etc/shadow that it correctly
      contains the last password age, 777 in this case.

Note: In a managed subcloud, the value in /etc/shadow file will
be changed again in about 20 min to sync with the sysadmin password
and age in the system controller.

Closes-Bug: 2034446

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I24d9807e9eb2d94e026be7b8f3448a6cd42fcdd6
2024-04-04 14:45:03 -03:00
Zuul eff4e08e44 Merge "Prevent multiple datanetworks to same interface" 2024-04-03 16:26:02 +00:00
Md Irshad Sheikh 463165eca8 Adding QAT devices support in sysinv
The commit adds code to auto discover QAT devices with ids 4940 & 4942
and list them as part of system host-device-list command.

Also host-device-modify command has been modified to not allow
any QAT device configuration due to upstream qat_service code
limitations. Now QAT devices are already inited with max VF
number and other default configurations during bootstrap, so
no further modification is required.

TEST CASES:

PASSED: The development iso should be successfully deployed.
        And QAT devices should get listed using
        host-device-list command.

PASSED: system host-device-modify command should raise error
        when tried to edit any QAT configuration.

PASSED: system host-device-show command should show all default
        QAT device configurations.

Story: 2010604
Task: 49701

Change-Id: Id6b00b9e69b233d513e42375d5f8196ddd745e20
Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com>
2024-04-03 07:51:28 -04:00
Caio Bruchert f6158f5b02 Prevent multiple datanetworks to same interface
Since sriov-network-device-plugin upgrade to v3.5.1, assigning multiple
datanetworks to the same interface is not possible anymore.

This change restricts the system interface-datanetwork-assign command to
prevent that from happening.

Test Plan:
PASS: assign datanetwork1 to sriov0 interface: ok
PASS: assign datanetwork2 to same sriov0 interface: fails
PASS: create new vf0 interface on top of sriov0: ok
PASS: assign datanetwork1 to vf0: ok
PASS: assign datanetwork2 to vf0: fails
PASS: create new vf1 interface on top of sriov0: ok
PASS: assign datanetwork2 to vf1: ok
PASS: assign datanetwork1 to vf1: fails

Closes-Bug: 2059960

Change-Id: If3ab95594917089f01475f9595c9059edeae85f5
Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
2024-04-02 17:25:41 -03:00
Zuul e4b32b7e16 Merge "Fix charts upload when there are existing ones" 2024-04-02 16:57:39 +00:00
Igor Soares b1b160f48b Fix charts upload when there are existing ones
This fixes a bug that prevents StarlingX application charts from
being uploaded to the helm repository when one or more of them have been
uploaded before.

The charts upload logic was changed to check if all charts provided by
the given application are valid prior to uploading. If a chart is
invalid then no charts for that application will be uploaded, since the
upload process cannot proceed in that scenario.

Test Plan:
PASS: build-pkgs -a && build-image
PASS: AIO-SX fresh install
PASS: Build a platform-integ-apps version containing one existing chart
      and two nonexistent charts in the local Helm repository.
      Update platform-integ-apps to the built version.
      Confirm that the existing chart was not re-uploaded and that the
      nonexistent ones were correctly uploaded to the Helm repository.
PASS: Apply/remove/delete platform-integ-apps

Closes-Bug: 2053074
Depends-on: https://review.opendev.org/c/starlingx/integ/+/912305

Change-Id: I155d457f58be1986cc6f25178929aedfbe1d0693
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-04-02 12:05:28 -03:00
Zuul 1e2bdd1d93 Merge "Migration script to delete encrypted-fs attribute" 2024-04-02 14:03:41 +00:00
Zuul 2cbdc83b04 Merge "Expose Kubernetes ApiextensionsV1Api" 2024-04-01 19:16:12 +00:00
Jagatguru Prasad Mishra 2274dfb942 Migration script to delete encrypted-fs attribute
Previous release contains attribute ’encrypted-fs’ in the
’platform config’ service parameter, which is used to enable
and disable the luks file system. Upgrade activity need to
delete this unused attribute from the database.

This change adds a new migration script to delete 'encrypted-fs'
attribute. The script will run only when from release is 22.12
and during upgrade-activate.

Test Plan:
PASS: build-pkgs -c controllerconfig
PASS: build-image
PASS: AIO-SX upgrade to 24.xx from previous release with luks
      file system disabled. After upgrade activate encrypted-fs
      attribute should be deleted from DB.
PASS: AIO-SX upgrade to 24.xx from previous release with luks
      file system enabled. After upgrade activate encrypted-fs
      attribute should be deleted from DB.
PASS: AIO-DX upgrade to 24.xx from previous release with luks
      file system disabled. After upgrading controller-1(upgraded/
      active controller), script execution should delete
      encrypted-fs attribute from the DB.

Story: 2010873
Task: 49663

Change-Id: I96ed9bf572f20e64419763eb285f4997c37ddf9b
Signed-off-by: Jagatguru Prasad Mishra <jagatguruprasad.mishra@windriver.com>
2024-04-01 05:58:25 +00:00
Zuul 25d58ebcf8 Merge "First check Root CAs on kube-cert-rotation.sh" 2024-03-29 00:06:34 +00:00
Rei Oliveira 01a5ea0843 First check Root CAs on kube-cert-rotation.sh
As of now, the script only verifies the validity of leaf certificates
and, if expired, will regenerate them based on K8s/etcd Root CAs.
It doesn't account for the possibility of Root CAs being expired.
It will generate leaf certificates based on Root CAs, even if said
Root CAs are expired.

This change fixes that behaviour by first checking validity of
Root CAs and only allowing leaf certificate renewal if RCAs are
valid.

Test plan:

PASS: Cause Root CAs to expire, run kube-cert-rotation.sh script
      and verify that it fails with an error saying Root CAs are
      expired and leaf certificates are not renewed.
PASS: Ensure to have valid Root CAs, cause leaf certificates
      to expire, run kube-cert-rotation.sh and verify that the
      script executes normally and is able to renew
      the leaf certificates.

Closes-Bug: 2059708

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I98dfd8d1417754f3c723d8ddd52a856785ffc83b
2024-03-28 14:28:34 -03:00
Zuul de9d380dc9 Merge "Update swanctl.conf cacerts w/ system-local-ca files" 2024-03-28 15:10:34 +00:00