diff --git a/doc/source/specs/stx-6.0/approved/storage_2009074_upgrade-ceph-from-mimic-to-nautilus.rst b/doc/source/specs/stx-6.0/approved/storage_2009074_upgrade-ceph-from-mimic-to-nautilus.rst new file mode 100644 index 0000000..82aefd9 --- /dev/null +++ b/doc/source/specs/stx-6.0/approved/storage_2009074_upgrade-ceph-from-mimic-to-nautilus.rst @@ -0,0 +1,249 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + +=================================== +Ceph upgrade from Mimic to Nautilus +=================================== + +Storyboard: +https://storyboard.openstack.org/#!/story/2009074 + +This story covers the upgrade of Ceph from Mimic to Nautilus. The upgrade also includes code and +configuration changes in StarlingX components that are needed to support Nautilus. + +Official instructions about how to migrate from Mimic to Nautilus can be found in [1]_. + +Problem description +=================== + +Mimic end of life happened in 2020-07-22. It is needed to choose an active release that supports +an automated version migration between releases (i.e. start MON/OSD/MDS and service data formatting +is migrated to new formats if required from Mimic version). + +This will require to evaluate historic HA reliability code and remove/retire unneeded code, +enable new/default supported features (Bluestore, systemd service files, use ceph-volume instead +of ceph-disk for OSD deployment) and enable ease of future upgrades. + +Use Cases +--------- + +Users should be able to have access to the same current storage features without noticing a +difference between ceph versions. + +Proposed change +=============== + +Firstly we should focus in building a StarlingX ISO having Ceph Nautilus built. + +Other choices such as Octopus or Pacific are ruled out because we want to align with what is currently supported +by Debian Bullseye which is Nautilus. In addition, Pacific only supports migration from Octopus or Nautilus [2]_. + +After having the image built, we can evaluate the changes made in Ceph Mimic downstream and port those +that are needed for Ceph Nautilus downstream. Next, we should be able to install an AIO-SX +successfully having Ceph Nautilus built in it. + +It will be needed to check integration of Ceph subsystems with the new Ceph version. The +subsystems are: + +* ceph-manager + +* python-cephclient + +* mgr-restful-plugin + +* puppet + +* ansible playbooks + +New features enablement +----------------------- + +Having an ISO, we can verify the enablement of some new features such as: + +* Switch OSDs to BlueStore from FileStore + + * BlueStore is the default technology for OSDs (starting in Luminous) which improves performance over the + previous FileStore technology. More details can be found in [3]_. + +* Switch from sysvinit services/HA scripts to systemd/HA driven services + + * Ceph upstream uses systemd to control ceph process initialization. This was disabled in downstream to maintain + historical (Ceph Hammer and Ceph Jewel) use of sysvinit script optimizations. + +* Migrate OSD deployment to use ceph-volume due to ceph-disk deprecation + +Investigate differences between ceph versions +--------------------------------------------- + +It is possible that some commands were changed/deprecated between migration from Mimic to Nautilus. It will be +needed to verify what are those commands and see what are the difference and their impact in the overall system. +The impacts might happen in the following projects/modules: + +* config: + + * sysinv/cgts-client + * sysinv/sysinv + +* stx-puppet: + + * puppet-manifests/src/modules/platform/manifests/ceph.pp + +* utilities: + + * ceph/ceph-manager + * ceph/python-cephclient + +* integ: + + * ceph + * config/puppet-modules/openstack/puppet-ceph-2.2.0 - upgrade to 3.1.1 + +Alternatives +------------ + +Ceph Octopus might be an alternative if it shows up in Debian bullseye package list and if time permits. + +Data model impact +----------------- + +N/A + +REST API impact +--------------- + +Impact will depend on the required changes of Nautilus commands. + +Security impact +--------------- + +N/A + +Other end user impact +--------------------- + +N/A + +Performance Impact +------------------ + +* Performance improvement should happen when switching from FileStore OSDs to BlueStore. +* Replacement of ceph-disk by ceph-volume should increase reliability and improve performance. + Details to be verified in [4]_. + +Other deployer impact +--------------------- + +N/A + +Developer impact +---------------- + +N/A + +Upgrade impact +-------------- + +It should be possible to upgrade to the next releases in a simpler way. +New features to be enabled should provide a better user experience. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Vinícius Lopes da Silva (viniciuslopesdasilva) + +Other contributors: + - Delfino Gomes Curado Filho (dcuradof) + - Felipe Sanches Zanoni (fsanches) + - Mauricio Biasi do Monte Carmelo (mbiasido) + - Thiago Oliveira Miranda (thiagooliveiramiranda) + - Alan Kyoshi (akyoshi) + - Daniel Pinto Barros (dbarros) + +Repos Impacted +-------------- + +- config +- integ +- stx-puppet +- ha +- ansible-playbook +- utilities + +Work Items +---------- + +* Verify compatibility between Nautilus and Mimic. According to + `upgrade compatibility notes `_, + there are some commands that have changed between versions and we should make sure + of the impact in current implementation. + +* Current OSDs are FileStore based, Nautilus supports BlueStore OSDs. So it will be needed + to determine the feasibility of migrating from FileStore to BlueStore OSDs. It will also + be required to determine if FileStore and BlueStore OSDs can coexist. + +* Current Ceph's default use of systemd to control ceph process initialization is + `disabled `_. It should + be re-enabled and evaluate the changes to be done in + `init script `_ and + `pmon `_. + +* Currently ceph-disk is being used to deploy OSDs. Problem is ceph-disk is deprecated and we should + use ceph-volume in its place. This will require an investigation about the impacts of this change. In the worst case + scenario, it is possible to still use ceph-disk since this is available through the Ceph Pacific release + (latest to date). + +* Evaluate code from `current patch `_ set applied on + Mimic and port the relevant patches to Nautilus branch. + +* Ensure integration between Ceph and its subsystems (ceph-manager, python-cephclient, mgr-restful-plugin, Puppet code, + ansible-playbooks) are working correctly. + +Dependencies +============ + +N/A + +Testing +======= + +All validation activities should pass Sanity/Storage regression tests. + +Standard configurations scenarios +--------------------------------- +* AIO-SX +* AIO-DX +* Standard 2C+2W +* Storage 2C+2S+2W +* Storage Tiers - Can be done on AIO-SX, should be valid across all installs + +Additional scenarios +-------------------- +* SSD Journal Disks - Use SSD journal disks validate proper configuration on storage lab +* Peer Groups - Provision system with up to 8 (replication 2) and 9 (replication 3) storage hosts +* OSD disk replacement - Validate OSD disk replacement procedure + +Backup and restore scenarios +---------------------------- +* B&R - AIO-SX +* B&R - AIO-DX +* B&R - Standard 2C+2W +* B&R - Storage 2C+2S+2W + +Documentation Impact +==================== + +The changes to be made shouldn't interfere with system usage. At this time, +there is expected to be no documentation changes required. + +References +========== + +.. [1] https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous +.. [2] https://docs.ceph.com/en/latest/releases/pacific/#upgrade-from-pre-nautilus-releases-like-mimic-or-luminous +.. [3] https://ceph.io/en/news/blog/2017/new-luminous-bluestore/ +.. [4] https://docs.ceph.com/en/latest/ceph-volume/intro/#ceph-disk-replaced