When analyzing the ceph-process-states.log file, we observed a
recurring error scenario. In the /etc/init.d/ceph-init-wrapper
osd.0 script, the 'timeout' was consistently failing with error
code 125 on the execute_ceph_cmd function call. This failure was
due to the absence of a mandatory value parameter, causing
'timeout' to interpret 'ceph' as an invalid time interval.
To solve this bug, we introduced the necessary initialization
of the $WAIT_FOR_CMD variable. This ensures that the command is
executed correctly, addressing the issue and preventing the
recurrence of the 'timeout' error."
Test Plan:
- PASS: Force the disk process to be reported as hung and
check the aforementioned log for the desired output.
Closes-Bug: 2037728
Change-Id: Ic337b212b74c0cc76f25f4aaf9a99d77f8d9250d
Signed-off-by: Tiago Leal <Tiago.Leal@windriver.com>
The change [1] added a syntax issue where it was not
possible to use the variable ceph_mgr_lifecycle_days.
Analyzing the code, it was possible to observe that
the ceph_mgr_lifecycle_days variable is part of the
Config class, not of the ServiceMonitor.
So to fix the problem, just replace the use of
'self' with CONFIG.
Test Plan:
PASS: AIO-SX fresh install
PASS: Notice that there are no errors in the
log /var/log/mgr-restful-plugin.log
Closes-Bug: 2023553
[1]: https://review.opendev.org/c/starlingx/integ/+/885881
Change-Id: Icb46f1589057607e24123b69e9ab44994580585a
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
The Ceph is reporting the wrong version when running the
commands:
ceph --version;
ceph mgr versions;
ceph tell osd.* version.
To avoid this, dl_hook was adjusted so that the ceph version
is returned and the SHA of the last commit is taken and saved
inside the .git_version file. Patch 0001 was removed as it
was not letting the .git_version file be read.
With this fix, the following output is shown after running
the 'ceph --version' command:
ceph version 14.2.22 (2ebd6bae80aca32269eb24d6471ebd72c22da03b)
nautilus (stable)
Test Plan:
PASS AIO-SX fresh install
PASS Run and check output ceph version commands
Closes-Bug: 2024681
Change-Id: I331e8b74b964b752e57b7a27b7c0c9054119ea51
Signed-off-by: Pedro Vinícius Silva da Cruz <pedro.silvadacruz@windriver.com>
The ceph-mgr has a behavior where its RSS memory grows continuously.
In a few months, depending on the system, this may carry out more
than 1GB of growth. In tests performed on storage and duplex systems,
the average growth is around 10MiB per day on the active controller.
Since Ceph is open source, a thorough search was performed on the
Internet and Ceph repo for information about this growth behavior
in memory consumption of ceph-mgr, both in Ceph 14.2.22 (present
on the system) and in later versions. However, nothing that could
help to fix the problem was found. As there were no reports about
this bug, I reported it on the Ceph tracker: https://tracker.ceph.com/issues/61702
A new approach to fix the problem is to automatically restart
ceph-mgr every 7 days, so the memory use goes back to the initial
state when the daemon is restarted, avoiding the possibility of
memory overflow. Also, it was verified that there weren't any
impacts on the running processes after the restart.
Test-Plan:
PASS: Changed the fix in an AIO-DX to restart ceph-mgr every one
day.
PASS: After one day, ceph-mgr restarted and its RSS memory use went
back to the initial state.
Closes-Bug: 2023553
Change-Id: I1c62efaf0ca1d37ba93a24fc99b8db7156973102
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
As the current ceph depends on the containerd docker runtime, so
we move the original config file from docker service to containerd.
Make sure the correct dependency order and avoid the failure to
shutdown.
Meanwhile rename the config script to make it more clearer.
TestPlan:
PASS: build-pkgs -a
PASS: build-image
PASS: Jenkins Installation
PASS: Run the testcase pods
PASS: shutdown -hP now
Closes-Bug: 2020610
Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I84719699bdc245cc6f3c0eb6ee4b81544d35459d
When nodes are unlocked, mtc will start ceph processes after a
successful boot. It was also observed at the end of AIO-SX optimized
restore playbook.
This is causing racing condition with sm and pmon leading to failure
in some scenarios.
To avoid this, the script called by mtc will not start ceph processes
anymore. It will only set the flag to enable ceph to run on the node
and the processes will be started by sm or pmon later.
Test Plan:
For each installation setup do:
- Fresh install and verify Ceph is running with HEALTH_OK status;
- Swact controllers and verify Ceph has HEALTH_OK status;
- Run DOR (Dead Office Recover) and verify Ceph has HEALTH_OK
status;
- Lock/Unlock Controllers/Storage nodes and check Ceph has
HEALTH_OK status;
- Reboot active Controller and check Ceph has HEALTH_OK status.
PASS: AIO-SX
PASS: AIO-DX
PASS: Standard (2+2)
PASS: Standard with dedicated Storage (2+2+2)
PASS: B&R AIO-SX
PASS: B&R Optimized AIO-SX
Closes-bug: 2023445
Change-Id: I0c81749c6db1e17761aa8aca6276eff50f135959
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
For AIO-DX, Ceph monitor was not being started after an uncontrolled
swact caused by sudden power off/reboot of the active controller,
breaking the system high availability. This happens because there is a
flag to indicate on which controller the last active ceph monitor was
running to prevent starting ceph monitor without drbd-cephmon data in
sync, what could cause Ceph data corruption. That flag was also
avoiding data corruption caused when mgmt network was down and both
controllers were set to be active, starting ceph monitor without
drbd-cephmon in sync.
To prevent data corruption and to maintain system high availability,
this fix checks the mgmt network carrier instead of managing flags.
If no carrier is detected on mgmt network interface, then ceph mon and
osd are stopped and only allowed to start again after mgmt network has
carrier.
For the AIO-DX Direct, all networks are also verified. If all networks
have no carrier, then the other controller is considered down, letting
the working controller to be in active state even if mgmt network has
no carrier.
Test-Plan:
PASS: Run system host-swact on AIO-DX and verify ceph is running
with status HEALTH_OK
PASS: Force an uncontrolled swact on AIO-DX by killing a critical
process and verify if ceph is running with status HEALTH_OK
PASS: Disconnect OAM and MGMT networks for both controllers on
AIO-DX and verify ceph mon and osd stop on both controllers.
Reconnect OAM and MGMT networks and verify if ceph is running
and status is HEALTH_OK
PASS: Reboot or power off active controller and verify on the other
controller if ceph is running with status HEALT_WARN because
one host is down. Power on the controller, wait until it is
online/available. Verify if ceph HEALTH_OK after data is
all ODSs are up and data is recovered.
Closes-bug: 2020889
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I38470f43eba86f88fb9cfe47869d2393cacbd365
This change is the solution to resolve the scenario where after
an uncontrolled swact due to killing one of the critical processes
twice, the ceph-mon service doesn't start in the new active
controller occasioning a new swact.
It was created a flag to signalize a complete shutdown of ceph-mon.
After an uncontrolled swact, the system verifies if the flag
exists, and if so starts the ceph-mon service in the new active
controller.
Test Plan:
PASS: System host-swact.
PASS: Ceph recovery after rebooting the active controller.
PASS: Ceph recovery after uncontrolled swact killing a critical
process twice.
PASS: Ceph recovery after mgmt network outage for a few minutes
even when rebooting controllers.
PASS: Ceph recovery after case of dead office recovery (DOR).
PASS: Upgrade success from stx 7.0 to 8.0 in a duplex lab.
Closes-bug: 2017133
Signed-off-by: Pedro Vinícius Silva da Cruz <pedro.silvadacruz@windriver.com>
Change-Id: I6784ec76afa3e62ee14e8ca8f3d6c0212a9f6f3e
The Debian ceph packaging has been changed to track changes
to the following locations:
pkg_path/debian
pkg_path/files
stx/git/ceph
This ensures that any new code submissions under those
directories will increment the pkg version.
Test Plan:
PASS: build-pkgs -p ceph
Story: 2010550
Task: 47717
Signed-off-by: Luis Sampaio <luis.sampaio@windriver.com>
Change-Id: I08d944da7ffcf446028fc1a2add9553408043c6f
Problem:
On node shutdown, ceph is getting shut down while it is still in use by
the pods/containers. This leads to hangs which eventually leads to the
hostwd service timing out and triggering a reboot.
Solution:
The old dependencies are not suitable for the current version of ceph
because we are now using the containerd docker runtime instead of
docker service. Meanwhile the ceph init script uses systemd-run to
launch the systemd scopes for ceph components(ceph-mon|osd|mds).
The script generates transient systemd scope files with basic
configuration.
This update patches the ceph init script to generate systemd overrides
config files for the ceph components that provide improved ordering
during shutdown. This ordering ensures kubelet and containerd services
are shut down first, then the ceph scopes and service management (SM).
As a result the timeout of hostwd service isn't triggered and the
shutdown now works properly.
TestPlan:
PASS: build-pkgs
PASS: build-image
PASS: Jenkins installation
PASS: kubectl create -f ceph-fuse.yaml
PASS: After checking the pod is running with 'kubectl get pods',
execute the command "sudo shutdown -hP now"
PASS: The shutdown works well without os reboot.
The yaml file is as follows:
$cat ceph-fuse.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: rwx-test-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wrx-centos
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
selector:
matchLabels:
run: centos
template:
metadata:
labels:
run: centos
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: run
operator: In
values:
- centos
topologyKey: kubernetes.io/hostname
containers:
- name: centos
image: centos/tools
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do dd if=/dev/zero of=/mnt1/aaa bs=1K count=100 && sleep 1; done;" ]
volumeMounts:
- name: pvc1
mountPath: "/mnt1"
restartPolicy: Always
volumes:
- name: pvc1
persistentVolumeClaim:
claimName: rwx-test-claim
Closes-Bug: 2011610
Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I2c093c490ba177fbfc816e44dc227890270cac83
This change is part of the solution to resolve the scenario where
Ceph MON starts without having data in sync when there is no
communication with the peer, leading to PG issues.
Improvements:
Removed starting Ceph MON and MDS from ceph.sh script called by
mtcClient for AIO-DX:
- Ceph MDS was not being managed, only started by ceph.sh
script called from mtcClient. Now it will be managed by PMON.
- Ceph MON will continue to be managed by SM.
Ceph-init-wrapper script will verify some conditions to start
Ceph MON safely:
- First, check if drbd-cephmon role is Primary.
- Then, check if drbd-cephmon partition is mounted correctly.
- Check flags (inside drbd-cephmon path) for last active Ceph MON
process (Controller-0 or Controler-1). This flag will be created
by the last Ceph MON successful start.
- If the last active monitor is the other one, check if
drbd-cephmon is UpToDate/UpToDate, meaning that data is synchronized
between controllers.
We also made some improvements to /etc/init.d/ceph script to be able
to stop Ceph OSD even if Ceph MON was not available. Stopping OSD
without a Ceph Monitor was hanging when the command to flush the
journal would wait forever to communicate to any available Ceph Monitor.
Test Plan:
PASS: system host-swact.
PASS: Ceph recovery after mgmt network outage for few minutes
even when rebooting controllers.
PASS: Ceph recovery after rebooting active controller.
PASS: Ceph recovery after case of dead office recovery (DOR).
PASS: Running shellcheck on ceph-base.ceph.init, ceph.sh,
and ceph-init-wrapper.sh files without any complaints
about the lines related to the changes.
Closes-bug: 2004183
Signed-off-by: Hediberto Cavalcante da Silva <hediberto.cavalcantedasilva@windriver.com>
Change-Id: Id09432aecef68b39adabf633c74545f2efa02e99
The Debian packaging has been changed to reflect all the
git commits under the directory, and not just the commits
to the metadata folder.
Package builder concatenates the number of revisions added since
the one defined in the BASE_SRCREV to the name of the .deb file
generated. Thus, any new submission to the code in these directories
increment this revision counter in the filename.
Test Plan:
PASS: Created new code submissions to package ceph and verify that
the version in the package name was incremented.
PASS: Created new code submissions to package parted and verify that
the version in the package name was incremented.
PASS: Created new code submissions to package trident-installer and
verify that the version in the package name was incremented.
Story: 2010550
Task: 47490
Signed-by: Pedro Vinícius Silva da Cruz <pedro.silvadacruz@windriver.com>
Change-Id: I16f834fd77f0abafaed39a0d4e6cd78d35fa4b98
Ceph initialization script /etc/init.d/ceph was failing to start osd
when osd disk is already mounted and the umount fails because disk is
in use.
The script line has an umount command that fails if the partition is
in use. Then, the next mount command will fail returning 32.
If the error is that the partition is already mounted, look for
'already mounted on ${fs_path}' text in the output and then ignore
the mount error returning success and continuing the start script.
An example of error text output:
=== osd.0 ===
Mounting xfs on controller-0:/var/lib/ceph/osd/ceph-0
umount: /var/lib/ceph/osd/ceph-0: target is busy.
mount: /var/lib/ceph/osd/ceph-0: /dev/nvme2n1p1 already mounted
on /var/lib/ceph/osd/ceph-0.
failed: 'modprobe xfs ; egrep -q '^[^ ]+ /var/lib/ceph/osd/ceph-0 '
/proc/mounts && umount /var/lib/ceph/osd/ceph-0 ;
mount -t xfs -o rw,noatime,inode64,logbufs=8,logbsize=256k
/dev/disk/by-path/pci-0000:11:00.0-nvme-1-part1
/var/lib/ceph/osd/ceph-0'
Test-Plan:
PASS: Validate the new script with partition already mounted
on right location in AIO-SX and AIO-DX.
PASS: Validate the new script with partition already mounted
but on a different location in AIO-SX and AIO-DX.
PASS: Validate the new script with partition not mounted in
AIO-SX and AIO-DX.
Closes-bug: 1999826
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I6f0c1a3c2742de62040a690dd3d65785bdc1de73
* Add the missing 's' to fix the syntax error:
File "/usr/sbin/ceph-manage-journal", line 200, in mount_data_partition
print("Failed to mount %(node)s to %(path), aborting" % params)
ValueError: unsupported format character ',' (0x2c) at index 35
* Add a function to find mpath node in /dev/mapper
Test Plan:
PASS: AIO-SX with Ceph, 1 osd
PASS: AIO-SX with Ceph, 2 osd
PASS: AIO-SX with Ceph, 4 osd
Story: 2010046
Task: 45427
Signed-off-by: Jackie Huang <jackie.huang@windriver.com>
Signed-off-by: Thiago Miranda <ThiagoOliveira.Miranda@windriver.com>
Change-Id: I08f1f226343bf0140abb1ec8825533abb3f57e43
This work is part of Debian integration effort.
This work only affect Debian. We can port this to CentOS without
issues.
This prevents Maintenance check for /etc/services.d/controller/ceph.sh
from successfully completing after unlock, which results in a reboot.
Debian uses /lib/lsb/init-functions vs CentOS /etc/init.d/functions.
init-functions calls hooks from /lib/lsb/init-functions.d/.
One of the hooks redirect the lsb script call to a systemctl call.
Systemctl calls for ceph service don't work on CentOS or Debian.
There is no sourcing of /etc/init.d/functions so we don't need it for
/lib/lsb/init-functions either.
Using the reasoning above drop sourcing of /lib/lsb/init-functions.
Tests on AIO-SX:
CentOS: not affected, skip
Debian:
PASS: live patch controller, unlock, no unwanted reboot initiated by
Maintenance
PASS: build-pkgs, extract contents and check /etc/init.d/ceph
Story: 2009101
Task: 44791
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Change-Id: I49b79e78b0f832096dca98ca2cfd68c454679b95
Add debian packaging infrastructure for
integ/ceph to build a debian package.
Test Plan: build-pkg; build-image; same contents as RPM
PASS build-pkg
PASS build-image
PASS same contents and permissions as RPM
Attention:
In order to avoid memory issues during the build,
please do one of the following:
- Developers with only 32G RAM will need to
temporarily unmount /var/lib/sbuild/build
so that the build system uses the disk instead of tmpfs
OR
- update /etc/fstab to set the size for
the sbuild tmpfs filesystem in the pkgbuilder container:
tmpfs /var/lib/sbuild/build tmpfs uid=sbuild,gid=sbuild,mode=2770,size=40G 0 0
Note:
Build times can be long. In order to accelerate it,
adjust the values of MINIKUBECPUS/MINIKUBEMEMORY
in import-stx file (tools repo) before building
the containers with stx-init-env.
Depends-On: https://review.opendev.org/c/starlingx/tools/+/827884
Story: 2009101
Task: 44304
Signed-off-by: Leonardo Fagundes Luz Serrano <Leonardo.FagundesLuzSerrano@windriver.com>
Change-Id: Idc8ee1ebac5c973622c1c599f4a04c001bfa89a6
Ceph build failure after nspr library update.
This library is used only for library tests.
To fix this and preventing to happen again, all tests
are not compiled anymore.
Test Plan:
PASS: Compile master branch without build-avoidance and
verify it finishes with no errors.
Closes-Bug: 1958560
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I74046f1e76b242655f86c71354248f1bcb9ff76a
Changed ceph.spec to enable the generation of python 3 packages.
It's important to highlight that the python 2 packages will continue
to be generated and they are the ones used on StarlingX installation.
The python 3 packages will only be, originally on stx-base-image.
There is also a clean up on centos_tarball-dl.lst of commented lines
of ceph submodules that were updated.
Test plan:
Complete build run
Starlingx installation
stx-openstack apply - check that the helm chart can create ceph pools
Depends-On: https://review.opendev.org/c/starlingx/tools/+/824575
Story: 2009074
Task: 44281
Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: I52dac30849a7072b80cad388b16d2b50ea22391a
Ceph mgr-restful-plugin was running ceph-mgr on port 8003 instead of
port 7999.
The problem was that mgr-restful-plugin was configuring the server
port at mgr/restful/server_port key in Mimic.
This key has changed to config/mgr/mgr/restful/server_port in
Nautilus.
Test Plan:
- Tested on AIO-SX using netstat to check the port and curl to get
data using port 7999.
Story: 2009074
Task: 44160
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: Ib534089bd30c5b1e2c7db98bbd2f495b1545f420
After upgrading to ceph nautilus, the mgr-restful-plugin log shows a
message of command failure when running 'ceph config-key get
config/mgr/restful/controller-0/crt'.
This happens on both controllers and can lead to spotty access by
components that need REST API access.
Changing the path to the certificate from
'config/mgr/restful/controller-0/crt' to
'config/mgr/mgr/restful/controller-0/crt' and the path to the key from
'config/mgr/restful/controller-0/key' to
'config/mgr/mgr/restful/controller-0/key' fixed the problem
Test plan:
- Tested on AIO-DX
Story: 2009074
Task: 44100
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: Ifb0d3c7b8b3669472ef3b579951b9850fdf4bbbc
Updating TIS_BASE_SRCREV to reflect the source rev of branch 14.2.22
of stx-ceph.
Updating TIS_PATCH_VER to account for the 43 previous packaging
changes that went in with Mimic.
Test plan:
- Build ceph package and check the package name
Story: 2009074
Task: 44013
Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: I6e51dedd62e851c4716bc27812a447d08694ed46
Disabling by default the warnings related to monitors allowing
insecure global_id reclaim as well as defining
"auth allow insecure global id reclaim" to true by default to
all monitors. The main goal here is to enable a mixed set of
ceph versions.
A next step is to enable through service parameters to the user
to mix non-compliant ceph clients installed by other application.
Gdisk was added again as this is necessary for StarlingX
Test plan:
PASS: Build successfully
PASS: Install on AIO-SX, AIO-DX, Standard and Storage configs
successfully and without alarms (fm alarm-list) or ceph warnings
(ceph -s).
PASS: platform-integ-apps is applied successfully
Story: 2009074
Task: 43464
Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: I5f3e432444b60ab73136431bb94bb6ab532ae0ab
This needs to be done because we want to keep compatibility with
puppet-ceph-2.4.1-1 and our current version of puppet.
As this version of puppet-ceph only uses ceph-disk we will keep it
until we are able to move on to ceph-volume. Probably this will be
possible when StartlingX is using version 3.1.1 of puppet-ceph.
Test plan:
PASS: Build successfully
Story: 2009074
Task: 43465
Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: Ie9570f01728df28ee4ea357b1e618c5a4c0a3803
Add the upgraded submodules as dependencies in
centos_tarball-dl.lst file. It's important to highlight that dpdk is
added twice because seastar and SPDK depends on different versions of
dpdk.
* boost_1_72_0.tar.bz2
* c-ares-fd6124c74da0801f23f9d324559d8b66fb83f533.tar.gz
* civetweb-bb99e93da00c3fe8c6b6a98520fb17cf64710ce7.tar.gz
* dmclock-4496dbc6515db96e08660ac38883329c5009f3e9.tar.gz
* dpdk-96fae0e24c9088d9690c38098b25646f861a664b.tar.gz
* dpdk-a1774652fbbb1fe7c0ff392d5e66de60a0154df6.tar.gz
* fmt-80021e25971e44bb6a6d187c0dac8a1823436d80.tar.gz
* intel-ipsec-mb-134c90c912ea9376460e9d949bb1319a83a9d839.tar.gz
* rocksdb-4c736f177851cbf9fb7a6790282306ffac5065f8.tar.gz
* seastar-0cf6aa6b28d69210b271489c0778f226cde0f459.tar.gz
* spawn-5f4742f647a5a33b9467f648a3968b3cd0a681ee.tar.gz
* spdk-fd292c568f72187e172b98074d7ccab362dae348.tar.gz
* zstd-b706286adbba780006a47ef92df0ad7a785666b6.tar.gz
Merged the changes of ceph 14 spec in this repo. For now python3
is disabled by default for StarlingX build but this will probably
change in the future. For python3 build to work, more dependencies
will be needed.
Test plan:
PASS: Build successfully
Depends-On: https://review.opendev.org/c/starlingx/tools/+/814591
Story: 2009074
Task: 42946
Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: Iab9d0b57b00da4ba595d2b2f24194f058c850f5b
- socket requires bytes and we need to explicitly convert str to bytes.
- check_output() returns bytes, while python2 returns str, passing
universal_newlines=True it will return str no matter what python
version is used.
Story: 2006796
Task: 42297
Signed-off-by: Charles Short <charles.short@windriver.com>
Change-Id: Ie3921c4ae6211a8b0d290bdbdb195ce07036afbc
(cherry picked from commit 26c16b3eb8)
Defer start of Ceph ODSs to SM in order to avoid a race condition
between MTC and SM when starting OSDs. This is only required for AIO-DX
where SM manages the floating monitor and OSDs.
Closes-Bug: 1932351
Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Change-Id: Ia718ae696d8158e63660ee54d226271a6bcb476e
This patch updates our Popen call to enable
newlines for calls that we parse or consume the output for.
Without universal_newlines=True, the output is treated as bytes
under python3 which leads to issues later where we are using it as
strings.
See https://docs.python.org/3/glossary.html#term-universal-newlines
Story: 2006796
Task: 42696
Signed-off-by: Charles Short <charles.short@windriver.com>
Change-Id: I9b93907c05486b1f76aebe181af812c243285d6a
The MTC client manages ceph services via ceph.sh which
is installed on all node types in
/etc/service.d/{controller,worker,storage}/ceph.sh
Since the AIO controllers have both controller and worker
personalities, the MTC client will execute the ceph script
twice (/etc/service.d/worker/ceph.sh,
/etc/service.d/controller/ceph.sh).
This behavior will generate some issues.
We fix this by exiting the ceph script if it is the one from
/etc/services.d/worker on AIO systems.
Closes-Bug: 1928934
Change-Id: I3e4dc313cc3764f870b8f6c640a6033822639926
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
ceph-preshutdown.sh is called as a post operation when docker is
stopped/restarted. Based on current service dependencies, when docker is
restarted this will also trigger a restart of containerd.
Puppet manifests will restart containerd and docker for various
operations both on system boot and during runtime operations when their
configuration has changed.
This update adds conditions to ensure that the RBD devices are only
unmounted when the system is shutting down. This avoids the RBD backed
persistent volumes from being forcibly removed from running pods and
being remounted read-only during these restart scenarios.
Change-Id: I7adfddf135debcc8bcaa1f93866e1a276b554c88
Closes-Bug: #1901449
Signed-off-by: Robert Church <robert.church@windriver.com>
This update makes use of the PKG_GITREVCOUNT variable
to auto-version the packages in this repo.
Story: 2007750
Task: 39951
Change-Id: I854419c922b9db4edbbf6f1e987a982ec2ec7b59
Signed-off-by: Dongqi Chen <chen.dq@neusoft.com>
Free port 5001 to be used by keystone.
Story: 2007347
Task: 39392
Change-Id: Id789591bf22931494e970aaf3b12e9e5cbe223fa
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
During stuck peering recovery if file descriptors are
not released the state machine does not advance to
OPERATIONAL state
Partial-bug: 1856064
Change-Id: I3fba7be661ebf223eac63608574323ad98d33b75
Signed-off-by: Paul Vaduva <Paul.Vaduva@windriver.com>
OSDs might become stuck peering.
Recover from such state.
Closes-bug: 1851287
Change-Id: I2ef1a0e93d38c3d041ee0c5c1e66a4ac42785a68
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Explicitly set ceph-mgr configuration file path to
/etc/ceph/ceph.conf to avoid surprises. ceph-mon
and ceph-osd are also started with '-c' (--conf)
pointing to /etc/ceph/ceph.conf.
Change-Id: I4915952f17b4d96a8fce3b4b96335693f9b6c76b
Closes-bug: 1843082
Signed-off-by: Daniel Badea<daniel.badea@windriver.com>
python-cephclient certificate validation fails when connecting
to ceph-mgr restful plugin because server URL doesn't match
CommonName (CN) or SubjectAltName (SAN).
Setting CN to match server hostname fixes this issue but
raises a warning caused by missing SAN.
Using CN=ceph-restful and SAN=<hostname> fixes the issue
and clears the warning.
Change-Id: I6e8ca93c7b51546d134a6eb221c282961ba50afa
Closes-bug: 1828470
Signed-off-by: Daniel Badea <daniel.badea@windriver.com>
When swact occurs and ceph-init-wrapper is slow to respond
to a status request it gets killed by SM. This means the
corresponding flag file that marks status in progress is left
behind.
When controller swacts back ceph-init-wrapper sees status
in progress and waits for it to finish (with a timeout).
Because it does not respond fast enough SM tries to start
again ceph-init-wrapper to get ceph-mon service up and running.
This happens a couple of times until the service is declared
failed and controller swacts back.
To fix this we need to use flock instead of flag files as the
locks will be automatically released by the OS when process
is killed.
Change-Id: If1912e8575258a4f79321d8435c8ae1b96b78b98
Closes-bug: 1840176
Signed-off-by: Daniel Badea <daniel.badea@windriver.com>