config

Commit Graph

Author	SHA1	Message	Date
Zuul	37beadd020	Merge "Introduce multi-version auto downgrade for apps"	2024-04-18 17:53:43 +00:00
amantri	cca5becb65	Implement new certificate APIs Add an API /v1/certificate/get_all_certs to retrieve all the platform certs(oidc, wra, adminep, etcd, service account certs, system-restapi-gui-certificate, open-ldap, openstack, system-registry-local-certificate, k8s certs) in JSON response and use this response to format the "system certificate-list" output as "show-certs.sh" output. Add an API /v1/certificate/get_all_k8s_certs to retrieve all the tls,opaque certs in JSON response and use this response to format the "system k8s-certificate-list" output as "show-certs.sh -k" output Implement "system certificate-show <cert name>", "system k8s-certificate-show <cert name>" to show the full details of the certificate. Implement filters in api and cli to show the expired and expiry certificates Testcases: PASS: Verify all the cert values(Residual Time,Issue Date, Expiry Date ,Issuer,Subject,filename,Renewal) are showing fine for all the following cert paths when "system certificate-list" is executed /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/apiserver-kubelet-client.crt /etc/pki/ca-trust/source/anchors/dc-adminep-root-ca.crt /etc/ssl/private/admin-ep-cert.pem /etc/etcd/etcd-client.crt /etc/etcd/etcd-server.crt /etc/kubernetes/pki/front-proxy-ca.crt /etc/kubernetes/pki/front-proxy-client.crt /var/lib/kubelet/pki/kubelet-client-current.pem /etc/kubernetes/pki/ca.crt /etc/ldap/certs/openldap-cert.crt /etc/ssl/private/registry-cert.crt /etc/ssl/private/server-cert.pem PASS: Verify all the cert values(Residual Time,Issue Date, Expiry Date ,Issuer,Subject,filename,Renewal) are showing fine for all the service accts when "system certificate-list" is executed /etc/kubernetes/scheduler.conf /etc/kubernetes/admin.conf /etc/kubernetes/controller-manager.conf PASS: Verify the system-local-ca secret is shown in the output of "system certificate-list" PASS: List ns,secret name in the output of ssl,docker certs if the system-restapi-gui-certificate, system-registry-local-certificate exist on the system when "system certificate-list" executed PASS: Apply oidc app verify that in "system certificate-list" output "oidc-auth-apps-certificate", oidc ca issuer and wad cert are shown with all proper values PASS: Deploy WRA app verify that "mon-elastic-services-ca-crt", "mon-elastic-services-extca-crt" secrets are showing in the "system certificate-list" output and also kibana, elastic-services cert from mon-elastic-services-secrets secret PASS: Verify all the cert values(Residual Time,Issue Date, Expiry Date ,Issuer,Subject,filename,Renewal) are showing fine for all the Opaque,tls type secrets when "system k8s-certificate-list" is executed PASS: Execute "system certificate-show <cert name>" for each cert in the "system ceritificate-list" output and check all details of it PASS: Execute "system certificate-list --expired" shows the certificates which are expired PASS: Execute "system certificate-list --soon_to_expiry <N>" shows the expiring certificates with in the specified N days PASS: Execute "system k8s-certificate-list --expired" shows the certificates which are expired PASS: Execute "system k8s-certificate-list --soon_to_expiry <N>" shows the expiring certificates with in the specified N days PASS: On DC system verify that admin endpoint certificates are shown with all values when "system certificate-list" is executed PASS: Verify the following apis /v1/certificate/get_all_certs /v1/certificate/get_all_k8s_certs /v1/certificate/get_all_certs?soon_to_expiry=<no of days> /v1/certificate/get_all_k8s_certs?soon_to_expiry=<no of days> /v1/certificate/get_all_certs?expired=True /v1/certificate/get_all_k8s_certs?expired=True Story: 2010848 Task: 48730 Task: 48785 Task: 48786 Change-Id: Ia281fe1610348596ccc1e3fad7816fe577c836d1 Signed-off-by: amantri <ayyappa.mantri@windriver.com>	2024-04-17 14:18:21 -04:00
Zuul	a4ab746619	Merge "Update network interface puppet resource gen to support dual-stack"	2024-04-17 15:29:33 +00:00
Zuul	5f4e3a3378	Merge "Adding QAT devices support in sysinv"	2024-04-17 13:49:58 +00:00
Lucas Ratusznei Fonseca	ff3a5d2341	Update network interface puppet resource gen to support dual-stack This change updates the puppet resource generation logic for network interfaces to suport dual-stack. Change summary ============== - Aliases / labels Previously, each alias was associated to a specific network. Now, since more than one address can be associated to the same network, the aliases are also associated to addresses. The label name is now :<network_id>-<address_id>. The network_id is 0 if there's no network associated with the alias, that's the case for the base interface config or for the cases where the address is not associated to a network. The address_id is 0 if there's no address associated with the alias, which is the case for the base config and for when there's no static address associated to the network, i.e. the method is DHCP. - Static addresses Previously, interfaces with more than one static addresses not associated with pools would be assigned just the first one. Now, an alias config is generated for each address. - CentOS compatibility All the code related to CentOS was removed. - Duplex-direct mode Duplex-direct systems must have DAD disabled for management and cluster-host interfaces. The disable DAD command is now generated only in the base interface config for all types of interfaces. - Address pool names The change assumes a new standard for address pool names, they will be formed by the old names with the suffixes '-ipv4' or '-ipv6'. For example: management-ipv4, management-ipv6. Since other systems that rely on the previous standard are not yet upgraded to dual-stack, the constant DUAL_STACK_COMPATIBILITY_MODE was introduced to control resource generation and validation logic in a way that assures compatibility. The constant and the conditionals will be removed once the other modules are updated. The conditionals were implemented more as a way to highlight which parts of the code are affected and make the changes easier in the future. - Tests / DB Base The base class for tests was updated to generate more consistent database states. Mixins for dual-stack cases were also created. - Tests / Interface Most of the test functions in the class InterfaceTestCase caused unnecessary updates to the database and the context. The class was splitted in two, the first one containing the tests that only need the basic database setup (controller, one interface associated with the mgmt network), and the other one for the tests that need different setups. A new fixture was created to test multiple system configs (IPv4, IPv6, dual-stack), which inspects in detail the generated hieradata. The tests associated with the InterfaceHostV6TestCase were moved to the new fixture, and new ones were introduced. Test plan ========= Online setup tests ------------------ System: STANDARD (2 Controllers, 2 Storages, 1 Worker) Stack setups: - Single stack IPv4 - Single stack IPv6 - Dual stack, primary IPv4 - Dual stack, primary IPv6 [PASS] TC1 - Online setup, regular ethernet mgmt0 (Ethernet) -> PXEBOOT, MGMT, CLUSTER_HOST [PASS] TC2 - Online setup, VLAN over ethernet pxe0 (Ethernet) -> PXEBOOT mgmt0 (VLAN over pxe0) -> MGMT, CLUSTER_HOST [PASS] TC3 - Online setup, bondig mgmt0 (Bond) -> PXEBOOT, MGMT, CLUSTER_HOST [PASS] TC4 - Online setup, VLAN over bonding pxe0 (Bond) -> PXEBOOT mgmt0 (VLAN over pxe0) -> MGMT, CLUSTER_HOST Installation tests ------------------ Systems: - AIO-SX - AIO-DX - Standard (2 Controllers, 2 Storages, 1 Worker) [PASS] TC5 - Regular installation on VirtualBox, IPv4 [PASS] TC6 - Regular installation on VirtualBox, IPv6 Data interface tests -------------------- System: AIO-DX Setup: data0 -> Ethernet, ipv4_mode=static, ipv6_mode=static data1 -> VLAN on top of data0, ipv4_mode=static, ipv6_mode=static For both interfaces, the following was performed: [PASS] TC7 - Add static IPv4 address [PASS] TC8 - Add static IPv6 address [PASS] TC9 - Add IPv4 route [PASS] TC10 - Add IPv6 route [PASS] TC11 - Remove IPv4 route [PASS] TC12 - Remove IPv6 route [PASS] TC13 - Remove static IPv4 address [PASS] TC14 - Remove static IPv6 address Story: 2011027 Task: 49815 Change-Id: Ib9603cbd444b21aefbcd417780a12c079f3d0b0f Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>	2024-04-16 16:23:15 -03:00
Igor Soares	1d228bab28	Introduce multi-version auto downgrade for apps Introduce automatic downgrade of StarlingX applications to the multiple application version feature. Auto downgrades are triggered by default in scenarios which the applied application bundle is not available anymore under the applications folder but an older version of the same app is. For instance, when platform patches are removed and a previously available ostree is deployed, thus restoring the old set of available apps under the /usr/local/share/applications/helm/ directory. A new section called 'downgrades' can be added to the metadata.yaml file to disable the default behavior. For example: downgrades: auto_downgrade: false When auto downgrades are disabled the current applied version remains unchanged. Test plan: PASS: build-pkgs -a && build-image PASS: AIO-SX fresh install. PASS: Apply platform-integ-apps. Update platform-integ-apps using a tarball that is not available under /usr/local/share/applications/helm/ and that does not contain the downgrade section. Confirm that platform-integ-apps is downgraded. PASS: Apply platform-integ-apps. Update platform-integ-apps using a tarball that is not available under /usr/local/share/applications/helm/ and that has the auto_downgrade metadata option set to 'true'. Confirm that platform-integ-apps is downgraded. PASS: Apply platform-integ-apps. Update platform-integ-apps using a tarball that is not available under /usr/local/share/applications/helm/ and that has the auto_downgrade metadata option set to 'false'. Confirm that the originally applied platform-integ-apps version remains unchanged. PASS: Run a kubernetes upgrade with apps to be pre and post updated. Confirm that apps are successfully updated and not downgraded after the Kubernetes upgrade has finished. Story: 2010929 Task: 49847 Change-Id: I33f0e0a5b8db128aef76fb93ba322364881097cf Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>	2024-04-15 12:56:05 -03:00
Md Irshad Sheikh	463165eca8	Adding QAT devices support in sysinv The commit adds code to auto discover QAT devices with ids 4940 & 4942 and list them as part of system host-device-list command. Also host-device-modify command has been modified to not allow any QAT device configuration due to upstream qat_service code limitations. Now QAT devices are already inited with max VF number and other default configurations during bootstrap, so no further modification is required. TEST CASES: PASSED: The development iso should be successfully deployed. And QAT devices should get listed using host-device-list command. PASSED: system host-device-modify command should raise error when tried to edit any QAT configuration. PASSED: system host-device-show command should show all default QAT device configurations. Story: 2010604 Task: 49701 Change-Id: Id6b00b9e69b233d513e42375d5f8196ddd745e20 Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com>	2024-04-03 07:51:28 -04:00
Tara Subedi	933d3a3a73	Report port and device inventory after the worker manifest This is incremental fix of bug:2053149. Upon network boot (first boot) of worker node, agent manager is supposed to report ports/devices, without waiting for worker manifest, as that would never run on first boot. Without this, after system restore, it will be unable to unlock compute node due to sriov config update. kickstart records first boot as "/etc/platform/.first_boot". Agent manager deletes this file. In case agent manager get crashed, it will start again. This time, agent manager don't see .first_boot file, and don't know this is still first boot and it won't report inventory for the worker node. This commit fixes this issue by creating volatile file "/var/run/.first_boot" before deleting "/etc/platform/.first_boot", and agent relies on both files to figure out it is first boot or not. This present same logic for multiple crash/restart of agent manager. TEST PLAN: PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues. PASS: Network-boot worker node, before doing unlock, restart agent manager (sysinv-agent), check sysinv.log to see ports are reported. Closes-Bug: 2053149 Change-Id: Iace5576575388a6ed3403590dbeec545c25fc0e0 Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>	2024-03-26 10:37:56 -04:00
Saba Touheed Mujawar	4c42927040	Add retry robustness for Kubernetes upgrade control plane In the case of a rare intermittent failure behaviour during the upgrading control plane step where puppet hits timeout first before the upgrade is completed or kubeadm hits its own Upgrade Manifest timeout (at 5m). This change will retry running the process by reporting failure to conductor when puppet manifest apply fails. Since it is using RPC to send messages with options, we don't get the return code directly and hence, cannot use a retry decorator. So we use the sysinv report callback feature to handle the success/failure path. TEST PLAN: PASS: Perform simplex and duplex k8s upgrade successfully. PASS: Install iso successfully. PASS: Manually send STOP signal to pause the process so that puppet manifest timeout and check whether retry code works and in retry attempts the upgrade completes. PASS: Manually decrease the puppet timeout to very low number and verify that code retries 2 times and updates failure state PASS: Perform orchestrated k8s upgrade, Manually send STOP signal to pause the kubeadm process during step upgrading-first-master and perform system kube-upgrade-abort. Verify that upgrade-aborted successfully and also verify that code does not try the retry mechanism for k8s upgrade control-plane as it is not in desired KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER state PASS: Perform manual k8s upgrade, for k8s upgrade control-plane failure perform manual upgrade-abort successfully. Perform Orchestrated k8s upgrade, for k8s upgrade control-plane failure after retries nfv aborts automatically. Closes-Bug: 2056326 Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806 https://review.opendev.org/c/starlingx/stx-puppet/+/911945 https://review.opendev.org/c/starlingx/integ/+/913422 Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5 Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>	2024-03-19 08:49:36 -04:00
Zuul	a396dff37c	Merge "Prevent configuring the Dell Minerva NIC VFs"	2024-03-11 17:28:01 +00:00
Zuul	6c3df45f05	Merge "Report port and device inventory after the worker manifest"	2024-03-11 16:19:09 +00:00
Zuul	b9ab073997	Merge "Upgrade changes to support MGMT FQDN"	2024-03-08 19:56:49 +00:00
Zuul	c24d0950bc	Merge "Fix delete process to apps that have charts disabled"	2024-03-07 13:43:22 +00:00
Zuul	40168bb769	Merge "Add mgmt_ipsec flag handling"	2024-03-06 18:55:18 +00:00
Leonardo Mendes	31ee720a54	Add mgmt_ipsec flag handling This commit adds mgmt_ipsec flag handling for IPSec Auth Server for successful and failed negotiation following the requirements below: - If the negotiation succeeds, the flag needs to be set to "enabled", which can then be checked during certificate renewal operation. - If the negotiation fails, the flag needs to be removed from host in negotiation, so that the host can retry the negotiation. Test Plan: PASS: Full build, system install, bootstrap and unlock DX system w/ unlocked enabled available status. PASS: Execute "sudo ipsec-client pxecontroller" command. Open another terminal and execute the command "echo "Li69nux*" \| sudo -S -u postgres psql -d sysinv -c "select capabilities from i_host;"" to see mgmt_ipsec flag being updated first to "enabling" and then updated to "enabled" at the end of operation. PASS: To simulate a flag removal, first execute the command "kubectl delete clusterissuer system-local-ca" and then repeat the fisrt test. Observe that the flag will be updated to "enabling" and will be removed when an error occurred during the process. Story: 2010940 Task: 49659 Change-Id: I0746cc890b4bf6d3c9722d096b62247652e164d4 Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>	2024-03-06 11:45:13 -03:00
David Bastos	c9b71ebd65	Fix delete process to apps that have charts disabled When deleting an application that has one chart or more disabled, the app framework was not able to correctly delete the disabled charts from the helm repository. If, after deleting an app, an attempt was made to upload that same app, a failure would occur, informing that the charts were already in the helm repository. The correction consists of using the kustomization-orig.yaml file instead of kustomization.yaml in the deletion process to list the enabled and disabled charts. Another fix was made in case an application has the status of "upload failed" and an attempt is made to delete another app. This caused a Python runtime error because the get_chart_tarball_path function tried to access the dictionary key and it wasn't there. The solution was to check if the key for that chart exists and only then try to access it. New logs are added to alert the user if the chart does not exist. Test Plan: PASS: Build-pkgs PASS: Upload, apply, remove and delete dell-storage PASS: Upload, apply, remove and delete oidc-auth-apps PASS: upload, apply, remove and delete metrics-server PASS: Deletes app that has charts disabled and all charts are deleted from the helm repository correctly. PASS: After deleting and trying to upload the same app, no error occurs and the upload and apply process is completed successfully. PASS: Deleting an app with another app with "upload failed" status and no Python runtime error occurs Closes-Bug: 2055697 Change-Id: I22de414e8780fe3691d06bdd015e4c927dcc10f0 Signed-off-by: David Bastos <david.barbosabastos@windriver.com>	2024-03-05 17:20:31 -03:00
Fabiano Correa Mercer	d449622f4a	Upgrade changes to support MGMT FQDN The release stx.9 with FQDN support for MGMT network uses the hieradata with the new pattern: <hostname>.yaml But the release stx.8 is still using the old name: <mgmt_ip>.yaml During an upgrade controller-0 want to update the <mgmt_ip>.yaml and controller-1 wants to use the <hostname>.yaml, so it is necessary to change the code to use/update the right hieradata. Additionally, during an upgrade the active controller running the old release can't resolve the FQDN (i.e: controller.internal ), for this reason during the controller-1 upgrade, the FQDN can not be used. Test Plan: IPv6 AIO-SX fresh install IPv6 AIO-DX fresh install IPv4 AIO-SX upgrade from previous release without story 2010722 to new release that has the story 2010722 (not master) IPv4 AIO-DX upgrade from previous release without story 2010722 to new release that has the story 2010722 (not master) IPv4 STANDARD upgrade from previous release without story 2010722 to new release that has the story 2010722 (not master) IPv6 AIO-DX upgrade from previous release without story 2010722 to new release that has the story 2010722 (not master) IPv6 DC lab upgrade from previous release without story 2010722 to new release that has the story 2010722 (not master) Story: 2010722 Task: 48609 Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com> Change-Id: I555185bea7fadb772a4023b6ecb4379e01e0f16c	2024-03-05 12:42:21 -03:00
Tara Subedi	9c3bf050cd	Report port and device inventory after the worker manifest The SR-IOV configuration of a device is not retained across reboots, until puppet manifests bind/enable completes. The sysinv-agent should not report device inventory at any time after it is started, it should wait until puppet worker manifest completes. Though during bootstrap (fresh install), restore, network-boot and subsequent reboots in case of non-worker roles (controller, storage) sysinv-agent can report at any time it is started. Upon reboot, SR-IOV configuration (of ACC100) (sriov_numvfs=0) is updated to intended configuration by puppet worker manifest. In this case, there is a small chance that the sysinv-agent audit (every 60 seconds) will run before the driver configuration. Since the agent will only actually report the port and device inventory once, the SR-IOV configuration data is not accurately reflected in the db, thus requiring additional lock/unlock(s) to force correction. After fresh-install/restore/network-boot and reboot, there was no /etc/platform/.initial_worker_config_complete and /var/run/.worker_config_complete files until puppet worker manifest completes. sysinv-agent audit happened to read device inventory before the driver configuration (i.e. before worker manifest completed), thus not accurately reflected in the db. This commit fixes such that port and device configuration are only reported after the worker manifest has completed, in case the host is being configured as worker subfunction. TEST PLAN: PASS: Fresh install node (that has ACC100 device) AIO, check host-device-list/show (before config/unlock) to see ACC100 device config:: driver:None, vf-driver:None, N:0. PASS: After above, update config (ACC100 device config:: driver:igb_uio, vf-driver:igb_uio, N:1) and also use host-label-assign as sriovdp=enabled and unlock, for subsequent reboots validate device config as (driver:igb_uio, vf-driver:igb_uio, N:1) and validate content of /etc/pcidp/config.json. PASS: Restore node from backup (ACC100 device config:: driver:igb_uio, vf-driver:igb_uio, N:1 and also host-label-assing as sriovdp=enabled), once node come back up, check host-device-list/show for after-boot update time and num_vfs = 1. Also validate content of /etc/pcidp/config.json. PASS: In AIO-DX setup, ports and devices can be listed and and second worker node can be unlocked, after the network-boot. Closes-Bug: 2053149 Change-Id: I69d483041bd75ea0abbd68cedccfbc5f10062c75 Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>	2024-03-01 09:21:21 -05:00
Caio Bruchert	46fc50f419	Prevent configuring the Dell Minerva NIC VFs Since the Dell Operator will be responsible for VF configuration, the configuration of this NIC using sysinv is being blocked to prevent user mistakes. This is valid for the Dell Minerva NICs using Marvell CNF105xx family devices. The CNF105xx device IDs were found in the octeon_ep driver source code: https://github.com/MarvellEmbeddedProcessors/pcie_ep_octeon_host /drivers/octeon_ep/octep_main.h Test Plan: PASS: host-if-modify class from none to pci-passthrough: allowed PASS: host-if-modify class from pci-passthrough to none: allowed PASS: host-if-modify class from none to pci-sriov w/ VFs: blocked PASS: host-if-modify class from none to pci-sriov w/ VFs for other devices: allowed Story: 2010047 Task: 49650 Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com> Change-Id: Ib6a20952060331ac230b01813be28116fbceef36	2024-02-29 13:46:35 -03:00
Zuul	16589b9828	Merge "Remove support for ignoring k8s isolated CPUs in sysinv"	2024-02-27 20:47:11 +00:00
Zuul	e6610a898a	Merge "Kubernetes periodic audit for cluster health"	2024-02-27 06:33:01 +00:00
Zuul	5815c70a88	Merge "Use mgmt_ipsec in sysinv for ipsec request check"	2024-02-23 17:02:30 +00:00
rakshith mr	1dc7a93f82	Kubernetes periodic audit for cluster health A periodic audit for K8S cluster that checks the health of the endpoints: APISERVER, SCHEDULER, CONTROLLER and KUBELET. The audit will set/clear K8S cluster health alarm every 3 minutes. Test Plan: PASS: Trigger K8S cluster down alarm by manually modifying /etc/kubernetes/manifests/kube-apiserver.yaml configuration file to break the K8s cluster. Verify that alarm is raised within 3 minutes. PASS: Restore the manually modified configuration. This will restore K8S service. Expect to see the alarm cleared within 3 minutes. PASS: Fresh install on AIO-SX, and checked alarm audit log. PASS: With K8S cluster down (for several minutes), initiate platform upgrade. Verify that k8s health check blocks the upgrade due to 850.002 alarm. Story: 2011037 Task: 49534 Depends-On: https://review.opendev.org/c/starlingx/fault/+/907054 Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/907345 Change-Id: I958dbb46f151df602030bd2d7576b3b3705b8ca2 Signed-off-by: rakshith mr <rakshith.mr@windriver.com>	2024-02-21 01:56:21 -05:00
Andy Ning	a79dae2db6	Use mgmt_ipsec in sysinv for ipsec request check Currently ipsec server use inv_state in sysinv to check the validation of the auth request from client. This is found to be problematic. This change updated ipsec server to use "mgmt_ipsec:enabling\|enabled" flag in capabilities of the i_host table for the validation checking. Also a minor refactoring to move the flag setting into a function in the utils module, since the flag setting will be eventually used in multiple places. Test Plan: PASS: DX deployment in VBox. Verify controller-0 and controller-1 are installed, bootstrap and unlocked successfully with IPSec configured and enabled. Story: 2010940 Task: 49558 Change-Id: I397ea29d73ad8a3b8b8ce5500a4501c7bc2fbfbc Signed-off-by: Andy Ning <andy.ning@windriver.com>	2024-02-20 15:37:23 -05:00
Zuul	767b30be38	Merge "Add coredump default service parameters"	2024-02-14 21:33:32 +00:00
Heron Vieira	50a658cedd	Add coredump default service parameters Adding coredump process_size_max, external_size_max and keep_free default service parameters so coredump service is configured with default values from the start, keeping it explicit for the user what is configured on a fresh install. Test plan PASS: AIO-SX install, bootstrap and initial unlock PASS: Verify if coredump service parameters are added after initial unlock. PASS: Verify if coredump config file is changed after initial unlock Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/897856 Closes-bug: 2039064 Change-Id: I13b1c1e0d6c34b34cf6ed3f1cb86c8511ac24b44 Signed-off-by: Heron Vieira <heron.vieira@windriver.com>	2024-02-14 18:41:57 +00:00
Kaustubh Dhokte	e276cac428	Remove support for ignoring k8s isolated CPUs in sysinv As we no longer have any users for this feature, we remove support for ignoring isolated CPUs. This change removes code that supports this feature in sysinv. Test Plan: AIO-SX: PASS: Manually create /etc/kubernetes/ignore_isolcpus, assign host label 'kube-ignore-isol-cpus=enabled', yet a test pod is allocated to the application-isolated CPUs. Story: 2010878 Task: 49571 Change-Id: I21d3319bd967a7a0524e922295fbcc75770a02e6 Signed-off-by: Kaustubh Dhokte <kaustubh.dhokte@windriver.com>	2024-02-14 17:47:15 +00:00
Zuul	15dc296f4a	Merge "Optimizing image downloads"	2024-01-16 17:11:37 +00:00
Zuul	9e0c55868d	Merge "Introduce support for multiple application bundles"	2024-01-16 14:10:31 +00:00
Thiago Miranda	caf9de1603	Optimizing image downloads In this commit, we obtain a list of images already present in containerd to avoid unnecessary checks and pulls, reducing CPU consumption. TEST PLAN: PASS: Lock/Unlock controllers PASS: Successfully swact between controllers PASS: Successfully recover after power down and up both controllers PASS: Successfully bootstrap (Simplex and Duplex) PASS: Successfully recover after active controller goes down PASS: Successfully application lifecycle Story: 2010985 Task: 49228 Change-Id: I58dd11c8d590b60ab100f79a03e17c5921e3721b Signed-off-by: Thiago Miranda <tmarques@windriver.com> Co-authored-by: Eduardo Juliano Alberti <eduardo.alberti@windriver.com>	2024-01-16 12:43:29 +00:00
Igor Soares	ea00765271	Introduce support for multiple application bundles Parse the metadata of all application bundles under the helm application folder and save to the kube_app_bundle table. This is done during sysinv startup and when a new ostree commit is deployed. The auto update logic was changed to enable retrieving metadata from the database for all available bundles of a given app and compute which bundle should be used to carry out the upgrade. The bundle choice is done based on the minimum and maximum Kubernetes versions supported by the application. If multiple bundles fit that criteria then the application with the highest version number is chosen. The 65-k8s-app-upgrade.sh script also takes into account multiple bundles during the platform upgrade activation step, prioritizing lowest versions available to ensure compatibility with the Kubernetes version carried over from the N release. A follow-up change will improve this mechanism to discover specific app versions. When platform patches are applied and the ostree is changed then the content of the helm application folder is reevaluated and the database updated accordingly if there are new or removed bundles. Test plan: PASS: build-pkgs -a & build-image PASS: Fresh AIO-SX install. PASS: Fresh AIO-DX install. PASS: Manually place multiple tarballs of one application with different versions under /usr/local/share/applications/helm/ and check if the app is updated correctly. PASS: Build a reboot required patch that removes the istio bundle and adds a new metrics-server version. Apply the reboot required patch. Check if istio was removed from the kube_app_bundle table. Check if the metrics-server previous version was removed from the kube_app_bundle table. Check if the metrics-server new version was added to the kube_app_bundle table. Check if metrics-server was updated to the new version added to the database. PASS: Build a no reboot required patch that does not restart sysinv, removes the istio bundle and adds a new metrics-server version. Apply the no reboot required patch. Check if istio was removed from the kube_app_bundle table. Check if the metrics-server previous version was removed from the kube_app_bundle table. Check if the metrics-server new version was added to the kube_app_bundle table. Check if metrics-server was updated to the new version added to the database. PASS: Build a no reboot required patch that restarts sysinv, removes the istio bundle and adds a new metrics-server version. Apply the no reboot required patch. Check if istio was removed from the kube_app_bundle table. Check if the metrics-server previous version was removed from the kube_app_bundle table. Check if the metrics-server new version was added to the kube_app_bundle table and was updated. Check if metrics-server was updated to the new version added to the database. PASS: Install power-metrics on stx-8. Run platform upgrade from stx-8 placing two different versions of metrics-server under /usr/local/share/applications/helm/. Check if default apps and metrics-server were properly updated during upgrade-activate step. Check if power-metrics was auto updated after upgrade-complete step. Story: 2010929 Task: 49097 Change-Id: I46f7cb6ebc59ad49157e9044a4937a406313671e Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>	2024-01-15 17:49:29 -03:00
Zuul	68859e37db	Merge "Create kube_app_bundle table"	2024-01-15 15:19:47 +00:00
Zuul	d65514be34	Merge "Steps for kube-upgrade-storage"	2024-01-09 22:25:06 +00:00
Igor Soares	ab469de093	Create kube_app_bundle table This commit creates a new table called kube_app_bundle. This table will be used to store metadata extracted from StarlingX application bundles. Database API methods were created to allow bulk inserts to the table, checking whether it is empty, retrieving entries by application name and pruning all data. A follow-up commit will enable the Application Framework to populate and retrieve data from the table. Test plan: PASS: build-pkgs -a && build-image PASS: AIO-SX fresh install Check if the kube_app_bundle table was created as expected PASS: AIO-DX fresh install Check if the kube_app_bundle table was created as expected PASS: upgrade from stx-8 Check if the kube_app_bundle table was created as expected Story: 2010929 Task: 49097 Change-Id: Ifd10f9e5e4a2d26c42d2b83084e073c7834cd75a Signed-off-by: Igor Soares <igor.piressoares@windriver.com>	2024-01-08 16:53:40 -03:00
Zuul	4189d9a116	Merge "Avoid self-signed cert creation for HTTPS"	2023-12-18 14:34:54 +00:00
Zuul	2032f761ce	Merge "Handling Luks filesystem"	2023-12-12 21:32:37 +00:00
Rahul Roshan Kachchap	dea0a20af2	Handling Luks filesystem Updating is_system_usable_block_device() method to make sure that LUKS filesystems are ignored by sysinv-agent when detecting partion-able block devices. Depends on: https://review.opendev.org/c/starlingx/config-files/+/903438 Test Plan: PASS: build-pkgs -c -p sysinv-agent PASS: build-image PASS: AIO-DX bootstrap PASS: No LOG.errors in the sysinv logs stemming from the sysinv-agent PASS: No LUKS filesystem device reported in system host-disk-list Story: 2010872 Task: 49234 Change-Id: I9c8afbb203fbc914021ed25593ab9124df00d599 Signed-off-by: Rahul Roshan Kachchap <rahulroshan.kachchap@windriver.com>	2023-12-12 20:22:24 +00:00
Fabiano Correa Mercer	661ab6480a	Updates after the mgmt network reconfiguration Updates the no_proxy list in the service-parameter-list during the management network reconfiguration. In the first reboot after the management network reconfiguration, The system will use the new management IPs and some files will be updated, like the /etc/hosts. It is necessary to update the following paths, with the new values: /opt/platform/sysinv /opt/platform/config Additionally, during the first reboot the system is still using the old mgmt IPs until the apply_network_config.sh and puppet code updates the system. The sw-patch services starts before or at same time of these operations and can use the old MGMT IPs and failed to answer audit requests. For this reason it is necessary to restart these services. Tests: IPv6 mgmt network reconfig in subcloud AIO-SX IPv4 mgmt network reconfig in standalone AIO-SX AIO-DX Fresh install AIO-SX Fresh install AIO-SX IPv4 apply patch after mgmt reconfig Story: 2010722 Task: 49203 Change-Id: I8a17f50c229a53965e13c889f0ea6ff8efd687c3 Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2023-12-07 10:58:18 -03:00
Marcelo Loebens	f23b3f1a89	Avoid self-signed cert creation for HTTPS REST API & Web Server TLS certificate (system-restapi-gui-certificate) is now being installed at bootstrap in fs to be used for HTTPS. This guarantees that the server-cert.pem is already present upon the first unlock of the system, removing the need to create a self-signed cert. The self-signed cert will only be created if the system-restapi-gui-certificate does not exist (test scenarios), to avoid hard failures when switching to HTTPS. Test plan: PASS: Deploy an AIO-SX. Verify: - system-restapi-gui-certificate TLS cert is correctly installed in /etc/ssl/private/server-cert.pem before unlocking the controller. - HTTPS is enabled and openstack public endpoints change into it after unlocking the controller. - The target certificates are issued by 'system-local-ca', and are managed by cert-manager. - The certificates in /etc/ssl/private are correct. - It's possible to log into the local Docker Registry. - Horizon is working as expected. PASS: Deploy an AIO-DX. After unlocking controller-1, SSH to it and verify that the Rest API / GUI certificate created during bootstrap is installed as the file '/etc/ssl/private/server-cert.pem'. Story: 2009811 Task: 48976 Depends-on: https://review.opendev.org/c/starlingx/ansible-playbooks/+/902088 Change-Id: If9aa644898b179fbae2b5248c84c764199bb9b7c Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>	2023-12-04 16:29:02 -04:00
Jagatguru Prasad Mishra	0fb91eb62a	Block host-unlock till apparmor manifest completes If the following commands are issued in quick succession, 1. system host-update controller-0 apparmor=enabled 2. system host-unlock controller-0 The puppet runtime manifest, which is executed asynchronously, will not have enough time to run and apparmor module won't get loaded after unlock. This feature will add reporting of apparmor runtime manifest status. The 'in progress' status will be persisted in the i_host table and used to validate host-unlock Closes-Bug: 2042926 Test plan: PASS: AIO-DX: Issue host-unlock command soon after 'system host-update <host> apparmor=enabled' command. Verify that host-unlock fails with message 'Can not unlock <hostname> apparmor configuration in progress.' PASS: AIO-DX: Enable/disable the apparmor module on a host using host-update command and verify if it is enabled/disabled respectively after reboot PASS: AIO-SX: Enable/disable the apparmor module on a host using host-update command and verify if it is enabled/disabled respectively after reboot Change-Id: I8f13ad4316e4edd4a6c73648ee4b06eb379ebe76 Signed-off-by: Jagatguru Prasad Mishra <jagatguruprasad.mishra@windriver.com>	2023-11-16 02:49:42 -05:00
Zuul	171b6b99ff	Merge "Use FQDN for MGMT network"	2023-11-02 19:37:55 +00:00
Zuul	4657131748	Merge "Fix the condition to delete a stuck partition in the database"	2023-11-01 21:11:30 +00:00
Zuul	23578c3f71	Merge "Additional mechanism for unsafe force"	2023-11-01 18:30:19 +00:00
Zuul	b4872623a4	Merge "Introduce Kubernetes upgrade metadata for stx apps"	2023-11-01 16:52:42 +00:00
Zuul	1d6ef90409	Merge "Create runtime_config table"	2023-11-01 16:39:03 +00:00
Gabriel de Araújo Cabral	2a39372b51	Fix the condition to delete a stuck partition in the database In the changes of [1] review, there is a condition to delete a partition in the database that doesn't exist in the agent report and at the same moment, there is no puppet from the 'platform::partitions::runtime' class running. In case of a partition with the status "Creating on unlock", it satisfies both conditions I mentioned, because the agent won't report it, and Puppet to create a partition won't be running. This commit changes the behavior to not delete the partition with this status, because it will still be created during unlock. Additionally, a failure was also identified in the check condition when puppet is running, which was causing the partition to be deleted incorrectly. To fix this, an in-file flag was implemented to identify puppet manifest execution. [1] https://review.opendev.org/c/starlingx/config/+/889090 Test-Plan: PASS: AIO-SX fresh install PASS: AIO-DX fresh install PASS: create/modify/delete a partition in the controller-0\|1 followed by a reboot and check the status with 'system host-disk-partition-list'. PASS: Restart of sysinv-conductor and/or sysinv-agent services during puppet manifest applying. PASS: AIO-SX upgrade stx 7.0 to stx 8.0 PASS: AIO-SX Backup and Restore Closes-Bug: 2028254 Change-Id: I2024ab841ca3edbcc140de9b4ea0fbea12044791 Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com> Signed-off-by: Erickson Silva <Erickson.SilvadeOliveira@windriver.com>	2023-11-01 12:31:55 -03:00
Fabiano Correa Mercer	a06a299c84	Use FQDN for MGMT network The management network is used extensively for all internal communication. Since the original use of the network was a private network before it was exposed for external communication in a distributed cloud configuration, it was never designed to be reconfigured. To support MGMT network reconfiguration the idea is to configure the applications to use the hostname/FQDN instead of a static MGMT IP address. In this way the MGMT network can be changed and the services and applications will still work since they are using the hostname/FQDN and the DNS will be responsible to translate to the current MGMT IP address. The use of FQDN will be applied for all installation modes: AIO-SX, AIO-DX, Standard, AIO-PLUS and DC subclouds. But given the complexities of supporting the multi-host reconfiguration, the MGMT network reconfiguration will focus on support for AIO-SX only. The DNSMASQ service must start as soon as possible to translate the FQDN to IP address. Test plan ( Debian only ) - AIO-SX and AIO-DX virtualbox installation IPv4/IPv6 - Standard virtualbox installation IPv6 - DC virtualbox installation IPv4 ( AIO-SX/DX subclouds ) - AIO-SX and AIO-DX installation IPv4/IPv6 - AIO-DX plus installation IPv6 - DC IPv6 and subcloud AIO-SX - AIO-DX host-swact - DC IPv4 virtualbox with subcloud AIO-DX and AIO-DX - AIO-SX to AIO-DX migration - netstat -tupl ( no services are using the MGMT IP address ) - Ran sanity/regression tests - Backup and Restore for AIO-SX/AIO-DX Story: 2010722 Task: 48241 Change-Id: If340354755ec401dac1b0da2c93e278e390f81a9 Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2023-10-31 20:45:40 -04:00
Matheus Guilhermino	b73ab54bdd	Additional mechanism for unsafe force In some scenarios, a force operation should not override a protective semantic check, even when --force is used. To provide a way to bypass those semantic checks completely, a new "--unsafe" option is introduced. Whenever an unsafe scenario is identified, with or without using --force, the following message is displayed in addition to the specific warning: "Use --force --unsafe if you wish to lock anyway." This change includes a bypass for the following scenario (only one identified so far): 3 hosts in the quorum: controller-0 unlocked and enabled controller-1 unlocked and enabled storage-0 unlocked and enabled Expected behavior: Storage-0 is locked Attempt to lock controller-1 (which is rejected) Attempt to --force lock controller-1 (which should be rejected) Attempt to --force --unsafe lock controller-1 (which is allowed) Test Plan: PASS: Fresh Install and Bootstrap (AIO-SX and Storage) PASS: Can't lock a controller when only 2 storage monitors are available PASS: Can't force lock a controller when only 2 storage monitors are available PASS: Successfully unsafe lock a controller when only 2 storage monitors are available Closes-bug: 2027685 Change-Id: I1d9a57c472d888b9ffc9bbe3acd87fd77f84fa52 Signed-off-by: Matheus Guilhermino <matheus.machadoguilhermino@windriver.com>	2023-10-27 17:12:04 -03:00
Igor Soares	3511174f95	Introduce Kubernetes upgrade metadata for stx apps This commit handles Kubernetes upgrade related metadata for StarlingX applications. The metadata retrieved is parsed, validated and stored into the appropriated variables for future use. The new metadata section introduced has the following form: k8s_upgrades: auto_update: true/false timing: pre/post This new block aims to inform to the Application Framework whether apps should be automatically updated (auto_update: true/false) if a Kubernetes upgrade is taking place. It also informs when applications should be updated, either during kube-upgrade-start (timing: pre) or during kube-upgrade-complete (timing: post). In addition, improvements were made to the already existing metadata section: supported_k8s_version: minimum: <version> maximum: <version> A bug was found on the existing method that checks the supported Kubernetes version. An exception was being raised when comparing different formats such as 'v1.0.0' and '1.0.0'. This bug was fixed by standardizing the formats on the comparison code. It is not the goal of this commit to implement the logic to check whether an app should be updated based on the active Kubernetes version. Test plan: PASS: Create a test application containing new valid metadata Upload the test application Apply the test application PASS: Create a test application without supported_k8s_version:minimum Upload the test application Check if a warning message was raised on the logs PASS: Create a test application without supported_k8s_version:minimum Move test application tarball to Helm applications folder Wait for the auto update process to start Check if a warning message was raised on the logs Check if the application was successfully updated PASS: Create a test application without the k8s_upgrades section Check if k8s_upgrades:auto_update defaults to true Check if k8s_upgrades:timing defaults to false Check if a warning message was raised on the logs Check if the application was successfully updated Story: 2010929 Task: 48929 Change-Id: I54362b036b25b6f42a18a2a29e43e2936a8a328d Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>	2023-10-25 14:10:45 +00:00
Kyale, Eliud	703592fa1a	Block host-unlock till kernel manifest completes If the following commands are issued in quick succession, 1. system host-kernel-modify controller-0 lowlatency 2. system host-unlock controller-0 The puppet runtime manifests , which is executed asyncronously, will not have enough time to run and will end up being run on the next reboot leading to alarms being raised. This feature will add reporting of kernel runtime manifest status. The 'in progress' status will be persisted in the ihost table and used to validate host-unlock Story: 2010731 Task: 48684 Test plan: PASS - AIO-SX: DM config with kernel: lowlatency Verify no kernel config alarms raised and lowlatency kernel is running PASS - AIO-DX: DM config with kernel: lowlatency Verify no kernel config alarms raised and lowlatency kernel is running PASS - AIO-DX: Test really fast unlock Verify unlock is blocked Change-Id: I5f30e6f94eae3b287b402a15d1739d61b7d20ca9 Signed-off-by: Kyale, Eliud <Eliud.Kyale@windriver.com>	2023-10-18 14:42:50 -04:00

1 2 3 4 5 ...

598 Commits