21 KiB
Subcloud GEO Redundancy Error Root Cause and Correction Action
This section describes different error scenarios that can occur while using the GEO Redundancy feature. The error scenarios described here are based on the assumption that you are dealing with two distributed clouds, site A and site B. In this context, the GEO Redundancy feature is activated designating site A as the primary site and site B as the non-primary site. The GEO Redundancy feature allows migration of subclouds to the non-primary site when the primary site becomes unavailable, and also allows migrating them back to the primary site when it becomes available again.
The error scenarios are divided into the following categories:
Protection group setup
This scenario covers the errors detected during setup of the protection group and issues.
Error scenarios | Recovery mechanism |
---|---|
Site A goes down temporarily in the middle of association. |
|
Site A is down in the middle of synchronization and remains offline for an extended period of time. How does the user check the syncing status from site B to initiate the migration? |
|
After initial sync is completed, site B goes down. How does site A sync to site B after site B comes back online? | Site A needs to keep track of subcloud group updates when site B is down. The sync status will go into unknown status in site A.
|
Site B is offline while creating peer group association to associate peer and a . |
|
Swact occurs in site A while a peer group association is syncing. |
|
Swact occurs in site B while a peer group association is syncing. |
|
In the event of either site going down or swact occurring:
|
|
Migration
Assumption: Subclouds will be migrated to site B if site A goes down.
The following are the error scenarios that can occur during peer group migration.
Error scenarios | Recovery mechanism |
---|---|
What will be the status of the if some subclouds failed to migrate? |
|
How to recover when the subcloud rehome fails because of incorrect bootstrap address or bootstrap values and site A cannot recover in a time period? |
|
How to fix when the subcloud has incorrect bootstrap address or bootstrap values in the following situations of the migration of site B?
|
|
Site B goes down during migration. |
|
Post migration
Audit operations will be triggered when the network is restored or
migration_status
of the peer group retrieved is changed to
complete
.
Error scenarios | Recovery mechanism |
---|---|
Site B goes down after the has been migrated to its site. | Upon site A recovery, the administrator can trigger the migration of the back to site A. |