Commit Graph

5276 Commits

Author SHA1 Message Date
Zuul 25d58ebcf8 Merge "First check Root CAs on kube-cert-rotation.sh" 2024-03-29 00:06:34 +00:00
Rei Oliveira 01a5ea0843 First check Root CAs on kube-cert-rotation.sh
As of now, the script only verifies the validity of leaf certificates
and, if expired, will regenerate them based on K8s/etcd Root CAs.
It doesn't account for the possibility of Root CAs being expired.
It will generate leaf certificates based on Root CAs, even if said
Root CAs are expired.

This change fixes that behaviour by first checking validity of
Root CAs and only allowing leaf certificate renewal if RCAs are
valid.

Test plan:

PASS: Cause Root CAs to expire, run kube-cert-rotation.sh script
      and verify that it fails with an error saying Root CAs are
      expired and leaf certificates are not renewed.
PASS: Ensure to have valid Root CAs, cause leaf certificates
      to expire, run kube-cert-rotation.sh and verify that the
      script executes normally and is able to renew
      the leaf certificates.

Closes-Bug: 2059708

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I98dfd8d1417754f3c723d8ddd52a856785ffc83b
2024-03-28 14:28:34 -03:00
Zuul de9d380dc9 Merge "Update swanctl.conf cacerts w/ system-local-ca files" 2024-03-28 15:10:34 +00:00
Manoel Benedito Neto abef79e45f Update swanctl.conf cacerts w/ system-local-ca files
This commit introduces a new configuration for swanctl.conf file
where cacerts references two system-local-ca files. The two files
represents the last (system-local-ca-0.crt) and the current
(system-local-ca-1.crt) certificates associated with system-local-ca.

The main goal of this implementation is to maintain SAs in all nodes
during the update of system-local-ca certificate.

Test plan:
PASS: In a DX system with available enabled active status with IPsec
      server being executed from controller-0. Run "ipsec-client
      pxecontroller --opcode 1" in worker-0. Observe that certificates,
      keys and swanctl.conf files are created in worker-0 node. Observe
      that a security association is established between the hosts via
      "sudo swanctl --list-sas" command.
PASS: In a DX system with available enabled active status with IPsec
      server being executed from controller-0. Run "ipsec-client
      pxecontroller --opcode 2" in controller-1. Observe the previously
      created CertificateRequest was deleted and generated a new one for
      controller-1's node. The new certificate is sent to IPsec Client
      and stored with the swanctl rekey command executed sucessfully.

Story: 2010940
Task: 49777

Change-Id: I638932a602ed9423d20ed448e5aada499ef65d77
Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
2024-03-28 13:40:10 +00:00
Zuul ecdb0d3b9f Merge "Execute data migration to network_addresspools table" 2024-03-26 19:00:31 +00:00
Zuul 160aed20ba Merge "Handle FM user during endpoint config" 2024-03-26 18:01:25 +00:00
Andre Kantek 578c5b73d8 Execute data migration to network_addresspools table
This change implements the upgrade data migration to:
1) fill the network_addresspools table with the information from
   the networks and address_pools tables
2) fill the correct value of primary_pool_family for each networks
   table entry

Test Plan
[PASS] Upgrade controller-1 and verify that the tables networks and
       network_addresspools have the correct values after data
       migration

Story: 2011027
Task: 49774

Change-Id: I91a2471c6ecce47c46945034c9c461ad42ae16d7
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-03-26 14:28:51 -03:00
Zuul 5c1569362b Merge "Report port and device inventory after the worker manifest" 2024-03-26 16:15:05 +00:00
Zuul 99bcb3314f Merge "Create script to update first upgraded controller attributes" 2024-03-26 15:38:57 +00:00
Tara Subedi 933d3a3a73 Report port and device inventory after the worker manifest
This is incremental fix of bug:2053149.
Upon network boot (first boot) of worker node, agent manager is
supposed to report ports/devices, without waiting for worker manifest,
as that would never run on first boot. Without this, after system
restore, it will be unable to unlock compute node due to sriov config
update.

kickstart records first boot as "/etc/platform/.first_boot". Agent
manager deletes this file. In case agent manager get crashed, it will
start again. This time, agent manager don't see .first_boot file, and
don't know this is still first boot and it won't report inventory for
the worker node.

This commit fixes this issue by creating volatile file
"/var/run/.first_boot" before deleting "/etc/platform/.first_boot", and
agent relies on both files to figure out it is first boot or not. This
present same logic for multiple crash/restart of agent manager.

TEST PLAN:
PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues.
PASS: Network-boot worker node, before doing unlock, restart agent
      manager (sysinv-agent), check sysinv.log to see ports are reported.

Closes-Bug: 2053149
Change-Id: Iace5576575388a6ed3403590dbeec545c25fc0e0
Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>
2024-03-26 10:37:56 -04:00
Zuul 85a548ffcc Merge "Correct Kubernetes control-plane upgrade robustness skip_update_config" 2024-03-25 20:20:03 +00:00
Zuul 839b9b554d Merge "Add IPsec certificates renewal cron job" 2024-03-25 15:07:05 +00:00
Jim Gauld 4522150c87 Correct Kubernetes control-plane upgrade robustness skip_update_config
This removes the skip_update_config parameter from the
_config_apply_runtime_manifest() call when upgrading Kubernetes
control-plane. This parameter was unintentially set to True,
so this configuration step did not persist. This caused
generation of 250.001 config-out-of-date alarms during kube
upgrade.

The review that introduced the bug:
https://review.opendev.org/c/starlingx/config/+/911100

TEST PLAN:
- watch /var/log/nfv-vim.log for each orchestrated upgrade
PASS: orchestrated k8s upgrade (no faults)
      - AIO-SX, AIO-DX, Standard

PASS: orchestrated k8s upgrade, with fault insertion during
      control-plane upgrade first attempt
      - AIO-SX
      - AIO-DX (both controller-0, controller-1)
      - Standard (both controller-0, controller-1)

PASS: orchestrated k8s upgrade, with fault insertion during
      control-plane upgrade first and second attempt, trigger abort
      - AIO-SX
      - AIO-DX (first controller)

Closes-Bug: 2056326

Change-Id: I629c8133312faa5c95d06960b15d3e516e48e4cb
Signed-off-by: Jim Gauld <James.Gauld@windriver.com>
2024-03-23 19:56:04 -04:00
Heitor Matsui 26848898b7 Create script to update first upgraded controller attributes
As part of USM major release upgrade, this commit creates a
script to update the first upgraded controller attributes during
deploy start, so that sysinv integration does not fail and allow
deploy host to run successfully.

Test Plan
PASS: run simulated deploy host for AIO-SX successfully
PASS: run simulated deploy host for AIO-DX successfully

Story: 2010676
Task: 49744

Change-Id: Ic179e63dc9088df9ced8aff01ebf320ab8fa6374
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
2024-03-22 17:57:38 -03:00
Zuul 6775a04444 Merge "Fix runtime_config_get method to avoid type error" 2024-03-22 19:59:19 +00:00
Zuul c5b40d42b6 Merge "Prune stale backup in progress alarm 210.001" 2024-03-22 19:38:31 +00:00
rummadis a3a20fcf59 Prune stale backup in progress alarm 210.001
User unable to take subcloud backup when there
is a stale backup in progress alarm

Example:
When user tries to take subcloud backup in
Distributed cloud env if there is stale
210.001 alarm present in subcloud then user
can not trigger the subsequent subcloud backup

This Fix helps to identify the 210.001 alarms
and clear them if they are pending more than 1
hour

TEST PLAN:
PASS: DC-libvirt setup with 2 controllers
and 2 subclouds
PASS: verified stale 210.001 getting removed

Closes-Bug: 2058516

Change-Id: Iedcc5e41cd4245c538d331d9aa8c2b6cc445acce
Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>
2024-03-22 14:44:47 -04:00
Gustavo Pereira b356e7ac5a Add mtce to endpoint reconfiguration script
Add mtce user to endpoint reconfiguration script to improve bootstrap
execution time. The related puppet class and tasks will be removed in
commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/912319.

Test Plan:
PASS: Deploy a subcloud without the changes and record its bootstrap
execution time. Deploy another subcloud with the proposed changes.
Verify successful subcloud deployment and the bootstrap execution
time is 80s faster.

PASS: Verify a successful AIO-SX deployment.

PASS: Verify a successful AIO-DX controller deployment.

PASS: Verify a successful DC environment deployment.

Story: 2011035
Task: 49695

Change-Id: I2075026bd378ef3b30978a6d420fbb2253ba290c
Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
2024-03-22 14:48:15 -03:00
Heitor Matsui fd5d603d86 Fix runtime_config_get method to avoid type error
An issue was found when config_applied for a host assumed
the default value, which is the string "install" (refer to [1]),
returning a type error in runtime_config_get trying to compare
string "install" with a column "id" with type int.

This commit fixes runtime_config_get method by inverting the
logic: if the id passed is an int then compare with id, if it
is not then assume it is a string and compare with config_uuid
column.

[1] 15aefdc468/sysinv/sysinv/sysinv/sysinv/agent/manager.py (L116)

Test Plan
PASS: set config_applied="install" for a host, force inventory
      report and observe no more database errors on sysinv.log
PASS: install/bootstrap/unlock AIO-DX

Story: 2010676
Task: 49745

Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Change-Id: I9c687a1eb67c62291f1d2aa9cef1d6fbe993d0fa
2024-03-21 17:12:17 -03:00
Zuul 1573412c4d Merge "Modify Host Personality for attribute max_cpu_mhz_configured" 2024-03-21 18:48:43 +00:00
Zuul a1211d16d4 Merge "Handle Barbican user during endpoint config" 2024-03-21 17:04:07 +00:00
Poornima Y N 7fc11de9ee Modify Host Personality for attribute max_cpu_mhz_configured
Max_cpu_mhz_personality is the attribute of the host which
can be configured in host where turbo freq is enabled.In case
of host whose role is both controller and worker, the personality
for the attribute was not taken care to include such scenario.

Made the changes in the sysinv conductor to update the host
personalities based on the function that node operates, which
handles the scenario when the host acts as both controller and
worker node.

TEST PLAN:
PASS: Build and deploy ISO on Simplex
PASS: Check whether the max cpu freq set on a simplex
      Below are the commands:
      system host-show <host_id> | grep is_max_cpu_configurable
      system service-parameter-list --name cpu_max_freq_min_percentage
      system service-parameter-modify platform config cpu_max_freq_min_percentage=<>
      system host-update <host_id> max_cpu_mhz_configured=<value in mhz>

      After above commands check whether cpu is set using below command:
      sudo turbostat

Closes-Bug: 2058476

Change-Id: I08a5d1400834afca6a0eeaaa8813ac8d71a9db15
Signed-off-by: Poornima Y N <Poornima.Y.N@windriver.com>
2024-03-21 04:55:02 -04:00
Salman Rana bdac091e77 Handle FM user during endpoint config
Add FM user to endpoint reconfiguration script, following
the migration of FM bootstrap from puppet to Ansible:
https://review.opendev.org/c/starlingx/ansible-playbooks/+/913251

Openstack related operations (user, service and
endpoint configuration) are now handled exclusively
by sysinv config_endpoints

Test Plan:
1. PASS: Verify full DC system deployment - System Controller + 3
         Subclouds install/bootstrap (virtual lab)
2. PASS: Verify Openstack FM user created
3. PASS: Verify Admin role for the FM user set in the services project
4. PASS: Verify Openstack FM service created
5. PASS: Verify admin, internal and public endpoints configured for FM

Story: 2011035
Task: 49722

Change-Id: I7d2f1596595ec2613cd5de1ca3d99427ea32d52d
Signed-off-by: Salman Rana <salman.rana@windriver.com>
2024-03-20 14:24:59 +00:00
Zuul 15aefdc468 Merge "Add retry robustness for Kubernetes upgrade control plane" 2024-03-19 21:23:41 +00:00
Zuul c4b7c51ffb Merge "Update IPsec IKE daemon log config" 2024-03-19 18:44:01 +00:00
Saba Touheed Mujawar 4c42927040 Add retry robustness for Kubernetes upgrade control plane
In the case of a rare intermittent failure behaviour during the
upgrading control plane step where puppet hits timeout first before
the upgrade is completed or kubeadm hits its own Upgrade Manifest
timeout (at 5m).

This change will retry running the process by
reporting failure to conductor when puppet manifest apply fails.
Since it is using RPC to send messages with options, we don't get
the return code directly and hence, cannot use a retry decorator.
So we use the sysinv report callback feature to handle the
success/failure path.

TEST PLAN:
PASS: Perform simplex and duplex k8s upgrade successfully.
PASS: Install iso successfully.
PASS: Manually send STOP signal to pause the process so that
      puppet manifest timeout and check whether retry code works
      and in retry attempts the upgrade completes.
PASS: Manually decrease the puppet timeout to very low number
      and verify that code retries 2 times and updates failure
      state
PASS: Perform orchestrated k8s upgrade, Manually send STOP
      signal to pause the kubeadm process during step
      upgrading-first-master and perform system kube-upgrade-abort.
      Verify that upgrade-aborted successfully and also verify
      that code does not try the retry mechanism for
      k8s upgrade control-plane as it is not in desired
      KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER
      state
PASS: Perform manual k8s upgrade, for k8s upgrade control-plane
      failure perform manual upgrade-abort successfully.
      Perform Orchestrated k8s upgrade, for k8s upgrade control-plane
      failure after retries nfv aborts automatically.

Closes-Bug: 2056326

Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806
            https://review.opendev.org/c/starlingx/stx-puppet/+/911945
            https://review.opendev.org/c/starlingx/integ/+/913422

Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-19 08:49:36 -04:00
Zuul 2a072b65c5 Merge "Allow mgmt and admin network reconfig" 2024-03-19 11:47:32 +00:00
Zuul 78d3acbb5d Merge "Addition of OTS Token activation procedure" 2024-03-18 22:06:07 +00:00
Fabiano Correa Mercer 2fb32cf88d Allow mgmt and admin network reconfig
This change allows the management and admin
network reconfig at same time in an AIO-DX
subcloud.
Currently, it is necessary to lock and unlock
the controller in order to reconfigure the
management network from AIO-SX.
If the customer changes the management network
fist, the new mgmt network will be in the database
but the changes will jsut be applied during the
unlock / reboot of the system.
But the admin network changes are applied in runtime,
if the admin network is changed after the management
network reconfig, the admin will apply the changes on
the system and some of them will apply the new mgmt
network values before the system is updated with the
new mgmt ip range, it will cause a puppet error and
the system will not be correctly configured.

Tests done:
IPv4 AIO-SX subcloud mgmt network reconfig
IPv4 AIO-SX subcloud admin network reconfig
IPv4 AIO-SX subcloud admin and mgmt network reconfig
IPv4 AIO-SX subcloud mgmt and admin network reconfig

Story: 2010722
Task: 49724

Change-Id: I113eab2618f34b305cb7c4ee9bb129597f3898bb
Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>
2024-03-18 15:58:40 -03:00
Hugo Brito 2b07588a8e Handle Barbican user during endpoint config
Add Barbican user to endpoint reconfiguration script.

Openstack related operations (user, service and endpoint configuration)
are now handled exclusively by sysinv config_endpoints

Test Plan:
1. PASS: Verify full DC system deployment - System Controller + 3
         Subclouds install/bootstrap (virtual lab)
2. PASS: Verify Openstack Barbican user created
3. PASS: Verify Admin role for the Barbican user set in the services
         project
4. PASS: Verify Openstack Barbican service created
5. PASS: Verify admin, internal and public endpoints configured for
         Barbican

Story: 2011035
Task: 49738

Change-Id: I8045cb12d3faa20147b0b84bc9e5ce6c2e0cddf2
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
2024-03-18 14:32:51 -03:00
Andy Ning 441097fd18 Update IPsec IKE daemon log config
This change updated IPsec IKE daemon log (charon.log) configuration
so more details are logged and in better format.

Test Plan:
PASS: Run ipsec-client to generate charon-log.conf and restart ipsec,
      verify charon logs capture new details and in the new expected
      format.

Story: 2010940
Task: 49711
Change-Id: I0c2943ba60e1867dfcebddca175058b62dde4ad7
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2024-03-15 11:59:12 -04:00
Zuul ae8bb0f4d5 Merge "Fix failed pods not being detected by rootca health check" 2024-03-14 19:48:59 +00:00
Victor Romano d807f868d6 Fix failed pods not being detected by rootca health check
On the health check prior to rootca update, there was a bug that
prevented CrashLoopBackoff pods being detected as unhealthy. This is
because the pods are in phase "Running", but the status of the
container itself is "ready: false". This commit adds an additional
check to "Running" pods so if any container inside it is not ready,
the pod will be deemed unhealthy.

Test plan:
  - PASS: Attempt to perform a rootca update with a pod in
          CrashloopBackoff state. Verify the update is not possible
          and the health check fails with the pod being show as
          unhealthy is "system health-query-kube-upgrade --rootca"
  - PASS: Verify the rootca update is possible if no pods are in
          CrashloopBackoff state.

Closes-Bug: 2057779

Change-Id: I115b6621df11516db2279fe6bc96452d27975c50
Signed-off-by: Victor Romano <victor.gluzromano@windriver.com>
2024-03-14 08:58:42 -03:00
Manoel Benedito Neto 56e2d1e2cd Addition of OTS Token activation procedure
This commit adds an OTS Token activation procedure to IPsec server
implementation. With this implementation, OTS Token is activated
when PKI Auth response message is sent from IPsec server to IPsec
client. The Token expiry time was increased to 7 seconds due to
Kubernetes API dependability that may delay IPsec Auth procedure
in a few seconds, affecting OTS Token validation criterea.

Test plan:
PASS: Full build, system install, bootstrap and unlock DX system w/
      unlocked enabled available status.
PASS: In a DC system with available enabled active status with IPsec
      server being executed from controller-0. Run "ipsec-client
      pxecontroller --opcode 1" in worker-0. Observe that certificates,
      keys and swanctl.conf files are created in worker-0 node. Observe
      that a security association is established between the hosts via
      "sudo swanctl --list-sas" command.
PASS: In a DC system with available enabled active status with IPsec
      server being executed from controller-0. Run "ipsec-client
      pxecontroller --opcode 2" in controller-1. Observe the previously
      created CertificateRequest was deleted and generated a new one for
      controller-1's node. The new certificate is sent to IPsec Client
      and stored with the swanctl rekey command executed sucessfully.

Story: 2010940
Task: 49712

Change-Id: I1c65edf14fd7ae3f47309b35048a805e0306038d
Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
2024-03-13 18:32:13 -03:00
Zuul b5344801fd Merge "Fix LDAP issue for DC subcloud" 2024-03-13 20:18:24 +00:00
Steven Webster f8d30588ad Fix LDAP issue for DC subcloud
This commit fixes an LDAP authentication issue seen on worker nodes
of a subcloud after a rehoming procedure was performed.

There are two main parts:

1. Since every host of a subcloud authenticates with the system
   controller, we need to reconfigure the LDAP URI across all nodes
   of the system when the system controller network changes (upon
   rehome).  Currently, it is only being reconfigured on controller
   nodes.

2. Currently, the system uses an SNAT rule to allow worker/storage
   nodes to authenticate with the system controller when the admin
   network is in use.  This is because the admin network only exists
   between controller nodes of a distributed cloud.  The SNAT rule
   is needed to allow traffic from the (private) management network
   of the subcloud over the admin network to the system controller
   and back again.  If the admin network is _not_ being used,
   worker/storage nodes of the subcloud can authenticate with the
   system controller, but routes must be installed on the
   worker/storage nodes to facilitate this.  It becomes tricky to
   manage in certain circumstances of rehoming/network config.
   This traffic really should be treated in the same way as that
   of the admin network.

This commit addresses the above by:

1. Reconfiguring the ldap_server config across all nodes upon
   system controller network changes.

2. Generalizing the current admin network nat implementation to
   handle the management network as well.

Test Plan:

IPv4, IPv6 distributed clouds

1. Rehome a subcloud to another system controller and back again
   (mgmt network)
2. Update the subcloud to use the admin network (mgmt -> admin)
3. Rehome the subcloud to another system controller and back again
   (admin network)
4. Update the subcloud to use the mgmt network (admin -> mgmt)

After each of the numbered steps, the following were performed:

a. Ensure the system controller could become managed, online, in-sync
b. Ensure the iptables SNAT rules were installed or updated
   appropriately on the subcloud controller nodes.
c. Log into a worker node of the subcloud and ensure sudo commands
   could be issued without LDAP timeout.
d. Log into worder node with LDAP USER X via console and verify
   login succeed

In general, tcpdump was also used to ensure the SNAT translation was
actually happening.

Partial-Bug: #2056560

Change-Id: Ia675a4ff3a2cba93e4ef62b27dba91802811e097
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2024-03-13 14:27:13 -04:00
Zuul 74437d5311 Merge "Revert "Modify Memory Field Names"" 2024-03-13 14:50:37 +00:00
Andy Ning 3fbe5f1aa6 Add IPsec certificates renewal cron job
This change added the IPsec certificates renewal script, and set it up
as a cron job to run daily at mid night.

Test Plan:
PASS: After a DX system deployed, verify the script is in the correct
      directory with right permission, and is added in
      /var/spool/cron/crontabs/root
PASS: Simulate the IPsec cert is about to expire, run the script,
      verify IPsec cert, private key and trusted CA cert are renewed,
      and IKE SAs and CHILD SAs are re-established.
PASS: Simulate a failure condition (eg, ipsec-client return non zero),
      run the script, verify the IPsec renewal fails, and alarm
      250.004 is raised.
PASS: Run the script with IPsec cert not being about to expire, verify
      the script finish successfully and alarm 250.004 is cleared.
PASS: Simulate the IPsec trusted CA cert is different from the
      system-local-ca in k8s secret, run the script, verify the trusted
      CA and IPsec cert/key are renewed, and IKE SAs and CHILD SAs are
      re-established.

Story: 2010940
Task: 49705

Depends-On: https://review.opendev.org/c/starlingx/fault/+/912598
Change-Id: I69236399b59655dd67ac7b01c4472a4b7ab911e5
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2024-03-13 10:46:24 -04:00
Zuul 1b9bc6ed76 Merge "Introduce Puppet variables for primary and secondary pool addresses." 2024-03-13 13:33:01 +00:00
Zuul 2a621c1bc5 Merge "Use correct hiera file for downgrade" 2024-03-12 16:55:10 +00:00
Andre Kantek fcebab8ef3 Introduce Puppet variables for primary and secondary pool addresses.
Details:

This change extracts the addresses from both the primary and secondary
address pools and makes them available for use in Puppet manifests.

To accommodate the dual stack configuration, the address allocation
for non-controller nodes was updated for both management and
cluster-host networks.

Since the task for upgrade data-migration is not ready yet, a logic
was added to access directly the network's field pool_uuid and get
the addresses with it, if the network_addresspools is empty (as it
would be the case after an upgrade)

As the data migration functionality for the upgrade is still under
development, a temporary solution was implemented.  Logic was added
to directly access the network's "pool_uuid" field and retrieve
addresses through it whenever the "network_addresspools" list is
empty, which is expected to occur immediately following an upgrade.
This allows for uninterrupted network operation during the upgrade
process.

Variable Naming:

The following naming convention will be used for the variables:
$platform::network::[network_type]::[ipv4/ipv6]::params::{var_name}

Variable Usage:

Primary Pool: Existing variables will be maintained and populated with
addresses from the primary pool. This ensures compatibility with
applications that currently rely on them. They have the format
$platform::network::[network_type]::params::{var_name}

The variable platform::network::[network_type]::params::subnet_version
indicates the primary pool protocol.

Secondary Pool: New variables with the above naming convention will
be introduced, allowing applications to utilize addresses from the
secondary pool if needed.

Benefits:

Improved modularity and reusability of network configurations.
Clear separation of concerns between primary and secondary pools.
Easier implementation of applications requiring addresses from either pool.

Notes:

Replace [network_type] can be oam. mgmt, cluster_host, ...
Replace [ipv4/ipv6] with either "ipv4" or "ipv6" depending on
         the address family.
Replace [variable_name] with a descriptive name for the specific
         variable (e.g., "subnet_version", "interface_address").

Test Plan:

[PASS] unit tests implemented
[PASS] AIO-SX, Standard instalation (IPv4 and IPv6)
       - using the dependency change the secondary pool was introduced
       - system was lock/unlocked and no puppet manifests were
          detected
       - inspection of system.yaml and controller-0.yaml to verify
         variables content
       - no alarms or disabled services were found
       - in standard added hosts with dual-stack config and verified
         that addresses were allocated for mgmt and cluster-host and
         after unlock the interface id was assigned to the respective
         entries.
[PASS] For standard systems during upgrade, simulate node unlock by:
       - Clearing the "network_addresspools" table after Ansible
         execution and before DM configuration.
       - Installing remaining nodes with the table empty. This mimics
         the post-upgrade scenario.

Story: 2011027
Task: 49679

Depends-On: https://review.opendev.org/c/starlingx/config/+/908915

Change-Id: If252fa051b2ba5b5eb3033ff269683af741091d2
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-03-12 07:25:46 -03:00
Zuul e378036a0d Merge "Add sysinv upgrades support for Kubernetes 1.29.2" 2024-03-11 23:14:56 +00:00
Zuul 0befaa8ff0 Merge "Implement new network-addrpool CLI" 2024-03-11 20:12:46 +00:00
Zuul 6945a0fd6b Merge "Implement IPsec Cert-Renewal Operation" 2024-03-11 20:03:25 +00:00
Zuul 00e59bc277 Merge "Fix upgrade-script not expecting additional parameter" 2024-03-11 19:01:58 +00:00
Zuul a396dff37c Merge "Prevent configuring the Dell Minerva NIC VFs" 2024-03-11 17:28:01 +00:00
Heitor Matsui 1aa5e59b99 Fix upgrade-script not expecting additional parameter
With commit [1], a new upgrade script was included, but since
it is not expecting the new port parameter it broke the new USM
feature "software deploy start".

This commit fixes the issue.

[1] https://review.opendev.org/c/starlingx/config/+/909866

Test Plan
PASS: run software deploy start successfully

Story: 2010676
Task: 49699

Change-Id: I79101b53e6c335ed9fe5b412ca029d1c17df3cea
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
2024-03-11 13:48:22 -03:00
Zuul 6c3df45f05 Merge "Report port and device inventory after the worker manifest" 2024-03-11 16:19:09 +00:00
Zuul 6e54e4437d Merge "Add CONF option to set default auto_update value" 2024-03-11 13:32:33 +00:00
Fabiano Correa Mercer 8a18249fda Use correct hiera file for downgrade
During an upgrade abort scenario where both
controllers are already upgraded to release N+1,
a potential issue arises.
Release N+1 utilizes a new hieradata file named
hostname-X.yaml, while release N uses the older
ip.yaml.
Controller-0 must be downgraded first, making
controller-1 the active node.
However, controller-1 attempts to update the
hieradata file at
/opt/platform/puppet/<Release N>/.../controller-0.yaml
This file doesn't exist because release N uses ip.yaml
Solution:
The system needs to identify this downgrade scenario
and update the correct hieradata file for release N:
/opt/platform/puppet/<Release N>/hieradata/<ip>.yaml

Tests Done:
AIO-DX IPv6 fresh install
AIO-DX IPv6 upgrade abort


Story: 2010722
Task: 49692

Change-Id: I848543e7606ddc5bb24ddadb07a7a74d56126044
Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>
2024-03-11 13:19:37 +00:00