Merge "stx-6.0: Initial spec for ceph upgrade"
This commit is contained in:
commit
fbae7bdad8
|
@ -0,0 +1,249 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===================================
|
||||
Ceph upgrade from Mimic to Nautilus
|
||||
===================================
|
||||
|
||||
Storyboard:
|
||||
https://storyboard.openstack.org/#!/story/2009074
|
||||
|
||||
This story covers the upgrade of Ceph from Mimic to Nautilus. The upgrade also includes code and
|
||||
configuration changes in StarlingX components that are needed to support Nautilus.
|
||||
|
||||
Official instructions about how to migrate from Mimic to Nautilus can be found in [1]_.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Mimic end of life happened in 2020-07-22. It is needed to choose an active release that supports
|
||||
an automated version migration between releases (i.e. start MON/OSD/MDS and service data formatting
|
||||
is migrated to new formats if required from Mimic version).
|
||||
|
||||
This will require to evaluate historic HA reliability code and remove/retire unneeded code,
|
||||
enable new/default supported features (Bluestore, systemd service files, use ceph-volume instead
|
||||
of ceph-disk for OSD deployment) and enable ease of future upgrades.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
Users should be able to have access to the same current storage features without noticing a
|
||||
difference between ceph versions.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Firstly we should focus in building a StarlingX ISO having Ceph Nautilus built.
|
||||
|
||||
Other choices such as Octopus or Pacific are ruled out because we want to align with what is currently supported
|
||||
by Debian Bullseye which is Nautilus. In addition, Pacific only supports migration from Octopus or Nautilus [2]_.
|
||||
|
||||
After having the image built, we can evaluate the changes made in Ceph Mimic downstream and port those
|
||||
that are needed for Ceph Nautilus downstream. Next, we should be able to install an AIO-SX
|
||||
successfully having Ceph Nautilus built in it.
|
||||
|
||||
It will be needed to check integration of Ceph subsystems with the new Ceph version. The
|
||||
subsystems are:
|
||||
|
||||
* ceph-manager
|
||||
|
||||
* python-cephclient
|
||||
|
||||
* mgr-restful-plugin
|
||||
|
||||
* puppet
|
||||
|
||||
* ansible playbooks
|
||||
|
||||
New features enablement
|
||||
-----------------------
|
||||
|
||||
Having an ISO, we can verify the enablement of some new features such as:
|
||||
|
||||
* Switch OSDs to BlueStore from FileStore
|
||||
|
||||
* BlueStore is the default technology for OSDs (starting in Luminous) which improves performance over the
|
||||
previous FileStore technology. More details can be found in [3]_.
|
||||
|
||||
* Switch from sysvinit services/HA scripts to systemd/HA driven services
|
||||
|
||||
* Ceph upstream uses systemd to control ceph process initialization. This was disabled in downstream to maintain
|
||||
historical (Ceph Hammer and Ceph Jewel) use of sysvinit script optimizations.
|
||||
|
||||
* Migrate OSD deployment to use ceph-volume due to ceph-disk deprecation
|
||||
|
||||
Investigate differences between ceph versions
|
||||
---------------------------------------------
|
||||
|
||||
It is possible that some commands were changed/deprecated between migration from Mimic to Nautilus. It will be
|
||||
needed to verify what are those commands and see what are the difference and their impact in the overall system.
|
||||
The impacts might happen in the following projects/modules:
|
||||
|
||||
* config:
|
||||
|
||||
* sysinv/cgts-client
|
||||
* sysinv/sysinv
|
||||
|
||||
* stx-puppet:
|
||||
|
||||
* puppet-manifests/src/modules/platform/manifests/ceph.pp
|
||||
|
||||
* utilities:
|
||||
|
||||
* ceph/ceph-manager
|
||||
* ceph/python-cephclient
|
||||
|
||||
* integ:
|
||||
|
||||
* ceph
|
||||
* config/puppet-modules/openstack/puppet-ceph-2.2.0 - upgrade to 3.1.1
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Ceph Octopus might be an alternative if it shows up in Debian bullseye package list and if time permits.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
N/A
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Impact will depend on the required changes of Nautilus commands.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
N/A
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
N/A
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
* Performance improvement should happen when switching from FileStore OSDs to BlueStore.
|
||||
* Replacement of ceph-disk by ceph-volume should increase reliability and improve performance.
|
||||
Details to be verified in [4]_.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
N/A
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
N/A
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
It should be possible to upgrade to the next releases in a simpler way.
|
||||
New features to be enabled should provide a better user experience.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Vinícius Lopes da Silva (viniciuslopesdasilva)
|
||||
|
||||
Other contributors:
|
||||
- Delfino Gomes Curado Filho (dcuradof)
|
||||
- Felipe Sanches Zanoni (fsanches)
|
||||
- Mauricio Biasi do Monte Carmelo (mbiasido)
|
||||
- Thiago Oliveira Miranda (thiagooliveiramiranda)
|
||||
- Alan Kyoshi (akyoshi)
|
||||
- Daniel Pinto Barros (dbarros)
|
||||
|
||||
Repos Impacted
|
||||
--------------
|
||||
|
||||
- config
|
||||
- integ
|
||||
- stx-puppet
|
||||
- ha
|
||||
- ansible-playbook
|
||||
- utilities
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Verify compatibility between Nautilus and Mimic. According to
|
||||
`upgrade compatibility notes <https://docs.ceph.com/en/latest/releases/nautilus/#upgrade-compatibility-notes>`_,
|
||||
there are some commands that have changed between versions and we should make sure
|
||||
of the impact in current implementation.
|
||||
|
||||
* Current OSDs are FileStore based, Nautilus supports BlueStore OSDs. So it will be needed
|
||||
to determine the feasibility of migrating from FileStore to BlueStore OSDs. It will also
|
||||
be required to determine if FileStore and BlueStore OSDs can coexist.
|
||||
|
||||
* Current Ceph's default use of systemd to control ceph process initialization is
|
||||
`disabled <https://github.com/starlingx-staging/stx-ceph/commit/ecbbc1c833106a1151c6ccb93eebbad93b55b2c2>`_. It should
|
||||
be re-enabled and evaluate the changes to be done in
|
||||
`init script <https://github.com/starlingx-staging/stx-ceph/commits/stx/v13.2.2/src/init-ceph.in>`_ and
|
||||
`pmon <https://opendev.org/starlingx/integ/src/branch/master/ceph/ceph/files/ceph-init-wrapper.sh>`_.
|
||||
|
||||
* Currently ceph-disk is being used to deploy OSDs. Problem is ceph-disk is deprecated and we should
|
||||
use ceph-volume in its place. This will require an investigation about the impacts of this change. In the worst case
|
||||
scenario, it is possible to still use ceph-disk since this is available through the Ceph Pacific release
|
||||
(latest to date).
|
||||
|
||||
* Evaluate code from `current patch <https://github.com/starlingx-staging/stx-ceph/commits/stx/v13.2.2>`_ set applied on
|
||||
Mimic and port the relevant patches to Nautilus branch.
|
||||
|
||||
* Ensure integration between Ceph and its subsystems (ceph-manager, python-cephclient, mgr-restful-plugin, Puppet code,
|
||||
ansible-playbooks) are working correctly.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
N/A
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
All validation activities should pass Sanity/Storage regression tests.
|
||||
|
||||
Standard configurations scenarios
|
||||
---------------------------------
|
||||
* AIO-SX
|
||||
* AIO-DX
|
||||
* Standard 2C+2W
|
||||
* Storage 2C+2S+2W
|
||||
* Storage Tiers - Can be done on AIO-SX, should be valid across all installs
|
||||
|
||||
Additional scenarios
|
||||
--------------------
|
||||
* SSD Journal Disks - Use SSD journal disks validate proper configuration on storage lab
|
||||
* Peer Groups - Provision system with up to 8 (replication 2) and 9 (replication 3) storage hosts
|
||||
* OSD disk replacement - Validate OSD disk replacement procedure
|
||||
|
||||
Backup and restore scenarios
|
||||
----------------------------
|
||||
* B&R - AIO-SX
|
||||
* B&R - AIO-DX
|
||||
* B&R - Standard 2C+2W
|
||||
* B&R - Storage 2C+2S+2W
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The changes to be made shouldn't interfere with system usage. At this time,
|
||||
there is expected to be no documentation changes required.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous
|
||||
.. [2] https://docs.ceph.com/en/latest/releases/pacific/#upgrade-from-pre-nautilus-releases-like-mimic-or-luminous
|
||||
.. [3] https://ceph.io/en/news/blog/2017/new-luminous-bluestore/
|
||||
.. [4] https://docs.ceph.com/en/latest/ceph-volume/intro/#ceph-disk-replaced
|
Loading…
Reference in New Issue