Merge "Document the best practices, considerations, and recommendation for backups (r8,dsR8)" into r/stx.8.0

This commit is contained in:
Zuul 2023-08-31 17:09:01 +00:00 committed by Gerrit Code Review
commit 4840f18260
3 changed files with 190 additions and 82 deletions

View File

@ -2,112 +2,220 @@
.. hgq1552923986183
.. _backing-up-starlingx-system-data:
===================
Back Up System Data
===================
============================
Back Up System Data Overview
============================
A system data backup of a |prod-long| system captures core system
information needed to restore a fully operational |prod-long| cluster.
A system data backup of the |prod-long| system captures core system information
needed to restore a fully operational |prod-long| cluster. See :ref:`Run
Ansible Backup Playbook Locally on the Controller
<running-ansible-backup-playbook-locally-on-the-controller>` or :ref:`Run
Ansible Backup Playbook Remotely <running-ansible-backup-playbook-remotely>`
for information on how to backup up a standalone system or a System Controller.
.. contents:: In this section:
In a |prod-dc| environment, see :ref:`Backup a Subcloud/Group of Subclouds
using DCManager CLI
<backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42>` for
how to remotely backup a subcloud from the System Controller.
.. contents:: |minitoc|
:local:
:depth: 1
.. _backing-up-starlingx-system-data-section-N1002E-N1002B-N10001:
System Data Backups include:
.. _backing-up-starlingx-system-data-ul-enh-3dl-lp:
- platform configuration details
- system databases
- patching and package repositories
- home directory for the **sysadmin** user and all |LDAP| user accounts.
.. warning::
During a system backup, if the files contained in 'sysadmin' user's home
directory (``/home/sysadmin``) result in the overall size of the backup
being larger than 2 Gbytes, the backup operation may fail.
.. xreflink See |sec-doc|: :ref:`Local LDAP Linux User Accounts
<local-ldap-linux-user-accounts>` for additional information.
.. note::
If there is any change in hardware configuration, for example, new
NICs, a system backup is required to ensure that there is no
configuration mismatch after system restore.
.. _backing-up-starlingx-system-data-section-N10089-N1002B-N10001:
------------------------------------
Detailed contents of a system backup
------------------------------------
The backup contains details as listed below:
Contents of System Backup
-------------------------
.. _backing-up-starlingx-system-data-ul-s3t-bz4-kjb:
- Platform Configuration Data.
The following content is included in the backup:
All platform configuration data and files required to fully restore the
system to a working state following the platform restore procedure.
- All platform configuration data required to fully restore the system to a
working state following the platform restore procedure.
- \(Optional\) Any end user container images in **registry.local**; that
is, any images other than |org| system and application images.
|prod| system and application images are repulled from their
original source, external registries during the restore procedure.
- Platform and Kubernetes databases.
- Home directory 'sysadmin' user, and all |LDAP| user accounts
\(item=/etc\)
- Platform configuration files.
- Patching and package repositories:
- Platform certificates and keys.
- item=/opt/patching
- Home directory for the sysadmin user and all |LDAP| user accounts.
- item=/var/www/pages/updates
- End-user container images in ``registry.local``; that is, any images other
than |org| system and application images. |prod| system and application
images are re-pulled from their original source, and (optional) external
registries during the restore procedure.
- Distributed Cloud Vault (Central System Controller only).
.. _backing-up-starlingx-system-data-section-N1021A-N1002B-N10001:
The following content is excluded from the backup:
-----------------------------------
Data not included in system backups
-----------------------------------
- Application |PVC| data on Ceph clusters.
.. _backing-up-starlingx-system-data-ul-im2-b2y-lp:
- Modifications manually made to the file systems, such as configuration
changes on the ``/etc`` directory. After a restore operation has been
completed, these modifications must be reapplied.
- Application |PVCs| on Ceph clusters.
- Home directories and passwords of local user accounts. They must be backed up
manually by the sysadmin.
- StarlingX application data. Use the command :command:`system
application-list` to display a list of installed applications.
- Modifications manually made to the file systems, such as configuration
changes on the /etc directory. After a restore operation has been completed,
these modifications have to be reapplied.
- Home directories and passwords of local user accounts. They must be
backed up manually by the system administrator.
- The /root directory. Use the **sysadmin** account instead when root
access is needed.
- The ``/root`` directory. Use the sysadmin account instead when root access is
needed.
.. note::
The system data backup can only be used to restore the cluster from
which the backup was made. You cannot use the system data backup to
restore the system to different hardware. Perform a system data backup
for each cluster and label the backup accordingly.
To ensure recovery from the backup file during a restore procedure,
containers must be in the active state when performing the backup.
Containers that are in a shutdown or paused state at the time of the
backup will not be recovered after a subsequent restore procedure.
Ceph data may be retained when restoring to the same servers and cluster.
System Backup Size
------------------
Consider the following for backup size:
- The base size of a platform system backup sizes range from 10MB to 30MB,
depending on the size of the system and deployment. |AIO-SX| systems are
typically 20MB or less.
- Backup of user home directories can cause the backup archive to be very large
and is limited to 2GB or less.
- Total backup size of the System Controller should be below 100MB when using
centralized backup and restore of subclouds.
- Container images are large and will only be backed up locally to avoid large
image archives being transferred for each subcloud. Container images
that are not present on the system may be pulled as part of platform and
application deployment, or restored separately to the local registry
(``registry.local``).
System Backup Filesystem Usage
------------------------------
The following filesystems are used during the backup operations of the system
for both local and centralized backup.
**Staging Storage**
The host filesystem used to stage temporary files during backup operations. The
filesystem may also be used to temporarily store final backup images prior to
transfer, if the filesystem is sufficiently sized to store the backup archives.
Host filesystem name: backup
System path: ``/opt/backups``
Default size: 25GB
For more information on how to modify the host filesystem sizes see
:ref:`Resize Filesystems on a Host <resizing-filesystems-on-a-host>`.
**Local Storage**
The host filesystem used to store backup files in a protected partition which
does not get wiped during system reinstallation. The protected local backup
partition is typically used by |AIO-SX| systems where there is no redundant
filesystem storage and it is the default for local backups.
.. note::
The filesystem is shared with system release pre-staging and needs to be
sized for both pre-staging installation media and backup archives.
System Path: ``/opt/platform-backup/backups``
Default Size: 30GB
**Centralized Storage**
The |prod-dc| Vault filesystem is used to store backup archives when using
centralized backup and restore of subclouds. The filesystem size must be
increased to accommodate subcloud backup archive storage. A separate backup
archive is stored per subcloud and release, and therefore, must be sized to
accommodate all backups.
System path: ``/opt/dc-vault/backups/<subcloud-name>/<release-version>``
Default size: 15GB
.. note::
The filesystem is shared for |prod-dc| subcloud deployment and management
and must be sized to store subcloud deployment files (subcloud
configuration, ISO images and subcloud staging files).
For more information on how to modify the controller filesystem sizes see
:ref:`Storage on Controller Hosts
<controller-hosts-storage-on-controller-hosts>`.
Distributed Cloud Centralized Backups
-------------------------------------
A subcloud's system data and optionally container images (from
``registry.local``) can be backed up using DCManager CLI command line
interface. The subcloud's system backup data can either be stored locally on
the subcloud or on the System Controller. The subcloud's container image
backup (from ``registry.local``) can only be stored locally on the subcloud to
avoid overloading the central storage and the network with large amount of data
transfer and redundant storage of images in a central location.
.. image:: figures/system-controller-backup-and-restore.png
:width: 800
For more information on the CLI operation of the centralized backup
capability see :ref:`Backup a Subcloud/Group of Subclouds using DCManager CLI
<backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42>`.
For more information on DCManager - Subcloud Backup API see `Subcloud
Backups
<https://docs.starlingx.io/api-ref/distcloud/api-ref-dcmanager-v1.html#subcloud-backups>`__.
Execution Time for System Backups
---------------------------------
- The time to execute system backups is approximately 3-4 minutes for an idle
system.
- Centralized backups may require additional time for network transfer for
larger backups.
- Subcloud backups may be initiated and monitored from the DCManager |CLI| or
API, including parallel backups.
- A minor alarm (210.001) "System Backup in progress" is raised while backing
up an individual system.
- Systems with at least 4 platform cores will have much faster execution times.
Recommended Backup and Retention Policies
-----------------------------------------
- Use of the centralized backup mechanism of the System Controllers to perform
and store the backups of subclouds. Standalone systems and System Controllers
backups can be performed locally or remotely, and the archive must be stored
off the system.
- All backups are done during off-peak hours (i.e. maintenance window).
- Weekly backups should be performed under normal steady state conditions to
ensure the system can be restored to a fully operational state.
- Nightly backups are the exception and should only be performed in periods
of significant reconfiguration to the system such as during large/mass
rollout (addition of subclouds), upgrade cycle of multiple sites, or
disaster recovery rehoming of subclouds.
- Backups should be performed prior to performing maintenance operations or
applying configuration changes to the platform or hosted applications.
- The retention period of backups should be approximately one month.
- Since Kubernetes is an intent-based system, the most recent backup is the
most important.
When the system data backup is complete, the backup file must be kept in a
secured location, probably holding multiple copies of them for redundancy
purposes.
.. seealso::
:ref:`Run Ansible Backup Playbook Locally on the Controller

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

View File

@ -70,7 +70,7 @@ End user container images in ``registry.local`` will be backed up during the
upgrade process. This only includes images other than |prod| system and
application images. These images are limited to 5 GB in total size. If the
system contains more than 5 GB of these images, the upgrade start will fail.
For more details, see :ref:`Detailed contents of a system backup
For more details, see :ref:`Contents of System Backup
<backing-up-starlingx-system-data-ul-s3t-bz4-kjb>`.
.. rubric:: |proc|