distcloud/distributedcloud/dcmanager/manager
Zhang Rong(Jon) 26bb7011e4 Fix issues with PGA sync_status
This commit addresses the issue where the primary site's PGA status
remains 'in-sync' even after the secondary site becomes unreachable.
With this fix, the PGA status will be updated to 'unknown' upon the
secondary site's failure. Additionally, the status will transition to
'in-sync' once the secondary site is operational again.
If there are any changes in the association while the secondary site is
down, the PGA status will be set to failed. The sync status will
transition to "out-of-sync" upon secondary site recovery.

In this commit, the audit thread in the primary site will also update
PGA sync_status. If the primary site is down and the SPG is migrated to
secondary site, upon primary site recovery, its audit thread will update
the PGA sync_status on both sites accordingly.

Finally, the commit prevents the peergroup to from being updated in the
secondary site.

Test Case:
1. PASS - Shutdown of site2 (secondary site) results in the
        synchronization status of the peer group association
        transitioning from 'in-sync' to 'unknown'.
2. PASS - Restoration of site2 (secondary site) leads to the
        synchronization status of the peer group association on
        the primary site changing to 'in-sync', and the peer
        group association status on site2 also reflects 'in-sync'.
3. PASS - While secondary is is offline, execute some operations which
        result in PGA sync_status being set to "failed". Recover
        secondary site and verify that the PGA sync_status is set to
        out-of-sync on both sites.
4. PASS - Verify that updating peer group on secondary site is
        disallowed.
5. PASS - Shut down the primary site, migrate the SPG to secondary site.
        Restore the primary site while migration is in progress. Verify
        that the PGA sync_status is out-of-sync. Verify that PGA
        sync_status is set to in-sync shortly after the migration is
        complete.

Closes-Bug: 2055030

Change-Id: I67f4200118621205c539b24eb764e3cc5acf12c0
Signed-off-by: Zhang Rong(Jon) <rong.zhang@windriver.com>
2024-03-08 00:41:09 +08:00
..
README.rst Move content to subdir to support relocated packaging 2019-11-04 13:57:02 -05:00
__init__.py Move content to subdir to support relocated packaging 2019-11-04 13:57:02 -05:00
peer_group_audit_manager.py Fix issues with PGA sync_status 2024-03-08 00:41:09 +08:00
peer_monitor_manager.py Fix issues with PGA sync_status 2024-03-08 00:41:09 +08:00
service.py Update tox pylint/pep8 for dcmanager 2024-01-18 21:51:25 +00:00
subcloud_manager.py Merge "Report rehoming playbook failures" 2024-01-31 17:29:40 +00:00
system_peer_manager.py Fix issues with PGA sync_status 2024-03-08 00:41:09 +08:00

README.rst

Service

DC Manager Service has responsibility for:

Main subcloud state machine as well as all operations on subclouds including creation, deletion and update.

service.py:

run DC Manager service in multi-worker mode, and establish RPC server

subcloud_manager.py:

Manages all subcloud related activities such as creation, deletion, availability status, management state

audit_manager.py:

A Periodic audit to contact each subcloud and ensure that at least one of each service group is up and active, which is a pre-requisite for declaring a subcloud as online.

scheduler.py:

Thread group manager, also responsible for periodic timer tasks - ie. audit.