Commit Graph

91 Commits

Author SHA1 Message Date
Tiago Leal 10a6701d71 Fix timeout command in ceph-init-wrapper
When analyzing the ceph-process-states.log file, we observed a
recurring error scenario. In the /etc/init.d/ceph-init-wrapper
osd.0 script, the 'timeout' was consistently failing with error
code 125 on the execute_ceph_cmd function call. This failure was
due to the absence of a mandatory value parameter, causing
'timeout' to interpret 'ceph' as an invalid time interval.

To solve this bug, we introduced the necessary initialization
of the $WAIT_FOR_CMD variable. This ensures that the command is
executed correctly, addressing the issue and preventing the
recurrence of the 'timeout' error."

Test Plan:
  - PASS: Force the disk process to be reported as hung and
    check the aforementioned log for the desired output.

Closes-Bug: 2037728
Change-Id: Ic337b212b74c0cc76f25f4aaf9a99d77f8d9250d
Signed-off-by: Tiago Leal <Tiago.Leal@windriver.com>
2024-01-08 19:38:04 +00:00
Erickson Silva de Oliveira 2737967430 Fix use of ceph_mgr_lifecycle_days variable
The change [1] added a syntax issue where it was not
possible to use the variable ceph_mgr_lifecycle_days.

Analyzing the code, it was possible to observe that
the ceph_mgr_lifecycle_days variable is part of the
Config class, not of the ServiceMonitor.

So to fix the problem, just replace the use of
'self' with CONFIG.

Test Plan:
  PASS: AIO-SX fresh install
  PASS: Notice that there are no errors in the
        log /var/log/mgr-restful-plugin.log

Closes-Bug: 2023553

[1]: https://review.opendev.org/c/starlingx/integ/+/885881

Change-Id: Icb46f1589057607e24123b69e9ab44994580585a
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2023-07-05 13:50:44 +00:00
Pedro Vinícius Silva da Cruz 21990f2259 Present the correct version of ceph
The Ceph is reporting the wrong version when running the
commands:

ceph --version;
ceph mgr versions;
ceph tell osd.* version.

To avoid this, dl_hook was adjusted so that the ceph version
is returned and the SHA of the last commit is taken and saved
inside the .git_version file. Patch 0001 was removed as it
was not letting the .git_version file be read.

With this fix, the following output is shown after running
the 'ceph --version' command:

ceph version 14.2.22 (2ebd6bae80aca32269eb24d6471ebd72c22da03b)
nautilus (stable)

Test Plan:
PASS AIO-SX fresh install
PASS Run and check output ceph version commands

Closes-Bug: 2024681

Change-Id: I331e8b74b964b752e57b7a27b7c0c9054119ea51
Signed-off-by: Pedro Vinícius Silva da Cruz <pedro.silvadacruz@windriver.com>
2023-06-23 14:19:04 +00:00
Zuul d1ba1d9e80 Merge "Restart the ceph-mgr daemon every 7 days to control RSS memory growth" 2023-06-15 21:04:10 +00:00
Gabriel de Araújo Cabral 8f6d2eb85a Restart the ceph-mgr daemon every 7 days to control RSS memory growth
The ceph-mgr has a behavior where its RSS memory grows continuously.
In a few months, depending on the system, this may carry out more
than 1GB of growth. In tests performed on storage and duplex systems,
the average growth is around 10MiB per day on the active controller.

Since Ceph is open source, a thorough search was performed on the
Internet and Ceph repo for information about this growth behavior
in memory consumption of ceph-mgr, both in Ceph 14.2.22 (present
on the system) and in later versions. However, nothing that could
help to fix the problem was found. As there were no reports about
this bug, I reported it on the Ceph tracker: https://tracker.ceph.com/issues/61702

A new approach to fix the problem is to automatically restart
ceph-mgr every 7 days, so the memory use goes back to the initial
state when the daemon is restarted, avoiding the possibility of
memory overflow. Also, it was verified that there weren't any
impacts on the running processes after the restart.

Test-Plan:
  PASS: Changed the fix in an AIO-DX to restart ceph-mgr every one
        day.
  PASS: After one day, ceph-mgr restarted and its RSS memory use went
        back to the initial state.

Closes-Bug: 2023553

Change-Id: I1c62efaf0ca1d37ba93a24fc99b8db7156973102
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
2023-06-15 16:29:42 +00:00
Zuul d03fd2ebaa Merge "Fix Ceph processes start race condition" 2023-06-13 23:05:40 +00:00
Zhixiong Chi 8ee6262720 ceph: Adjust the override config file to the containerd runtime
As the current ceph depends on the containerd docker runtime, so
we move the original config file from docker service to containerd.
Make sure the correct dependency order and avoid the failure to
shutdown.
Meanwhile rename the config script to make it more clearer.

TestPlan:
PASS: build-pkgs -a
PASS: build-image
PASS: Jenkins Installation
PASS: Run the testcase pods
PASS: shutdown -hP now

Closes-Bug: 2020610

Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I84719699bdc245cc6f3c0eb6ee4b81544d35459d
2023-06-12 22:17:52 -04:00
Felipe Sanches Zanoni ad20667beb Fix Ceph processes start race condition
When nodes are unlocked, mtc will start ceph processes after a
successful boot. It was also observed at the end of AIO-SX optimized
restore playbook.
This is causing racing condition with sm and pmon leading to failure
in some scenarios.

To avoid this, the script called by mtc will not start ceph processes
anymore. It will only set the flag to enable ceph to run on the node
and the processes will be started by sm or pmon later.

Test Plan:
  For each installation setup do:
    - Fresh install and verify Ceph is running with HEALTH_OK status;
    - Swact controllers and verify Ceph has HEALTH_OK status;
    - Run DOR (Dead Office Recover) and verify Ceph has HEALTH_OK
      status;
    - Lock/Unlock Controllers/Storage nodes and check Ceph has
      HEALTH_OK status;
    - Reboot active Controller and check Ceph has HEALTH_OK status.

  PASS: AIO-SX
  PASS: AIO-DX
  PASS: Standard (2+2)
  PASS: Standard with dedicated Storage (2+2+2)
  PASS: B&R AIO-SX
  PASS: B&R Optimized AIO-SX

Closes-bug: 2023445
Change-Id: I0c81749c6db1e17761aa8aca6276eff50f135959
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
2023-06-10 15:14:10 +00:00
Felipe Sanches Zanoni 655ab05b71 Fix Ceph mon and osd processes start/stop conditions
For AIO-DX, Ceph monitor was not being started after an uncontrolled
swact caused by sudden power off/reboot of the active controller,
breaking the system high availability. This happens because there is a
flag to indicate on which controller the last active ceph monitor was
running to prevent starting ceph monitor without drbd-cephmon data in
sync, what could cause Ceph data corruption. That flag was also
avoiding data corruption caused when mgmt network was down and both
controllers were set to be active, starting ceph monitor without
drbd-cephmon in sync.

To prevent data corruption and to maintain system high availability,
this fix checks the mgmt network carrier instead of managing flags.
If no carrier is detected on mgmt network interface, then ceph mon and
osd are stopped and only allowed to start again after mgmt network has
carrier.

For the AIO-DX Direct, all networks are also verified. If all networks
have no carrier, then the other controller is considered down, letting
the working controller to be in active state even if mgmt network has
no carrier.

Test-Plan:
  PASS: Run system host-swact on AIO-DX and verify ceph is running
        with status HEALTH_OK
  PASS: Force an uncontrolled swact on AIO-DX by killing a critical
        process and verify if ceph is running with status HEALTH_OK
  PASS: Disconnect OAM and MGMT networks for both controllers on
        AIO-DX and verify ceph mon and osd stop on both controllers.
        Reconnect OAM and MGMT networks and verify if ceph is running
        and status is HEALTH_OK
  PASS: Reboot or power off active controller and verify on the other
        controller if ceph is running with status HEALT_WARN because
        one host is down. Power on the controller, wait until it is
        online/available. Verify if ceph HEALTH_OK after data is
        all ODSs are up and data is recovered.

Closes-bug: 2020889

Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I38470f43eba86f88fb9cfe47869d2393cacbd365
2023-05-31 13:38:02 -03:00
Pedro Vinícius Silva da Cruz 09e29800cb Fix AIO-DX Uncontrolled Swact ceph-mon failure
This change is the solution to resolve the scenario where after
an uncontrolled swact due to killing one of the critical processes
twice, the ceph-mon service doesn't start in the new active
controller occasioning a new swact.

It was created a flag to signalize a complete shutdown of ceph-mon.
After an uncontrolled swact, the system verifies if the flag
exists, and if so starts the ceph-mon service in the new active
controller.

Test Plan:
    PASS: System host-swact.
    PASS: Ceph recovery after rebooting the active controller.
    PASS: Ceph recovery after uncontrolled swact killing a critical
          process twice.
    PASS: Ceph recovery after mgmt network outage for a few minutes
          even when rebooting controllers.
    PASS: Ceph recovery after case of dead office recovery (DOR).
    PASS: Upgrade success from stx 7.0 to 8.0 in a duplex lab.

Closes-bug: 2017133

Signed-off-by: Pedro Vinícius Silva da Cruz <pedro.silvadacruz@windriver.com>
Change-Id: I6784ec76afa3e62ee14e8ca8f3d6c0212a9f6f3e
2023-04-26 13:41:25 -04:00
Luis Sampaio dcd05bea43 Update Ceph debian package versionsing
The Debian ceph packaging has been changed to track changes
to the following locations:

pkg_path/debian
pkg_path/files
stx/git/ceph

This ensures that any new code submissions under those
directories will increment the pkg version.

Test Plan:
PASS: build-pkgs -p ceph

Story: 2010550
Task: 47717
Signed-off-by: Luis Sampaio <luis.sampaio@windriver.com>
Change-Id: I08d944da7ffcf446028fc1a2add9553408043c6f
2023-03-27 10:36:47 -07:00
Zuul 406b7b49d8 Merge "Reorder ceph shutdown to after containers" 2023-03-14 21:04:43 +00:00
Zhixiong Chi 54868df244 Reorder ceph shutdown to after containers
Problem:
On node shutdown, ceph is getting shut down while it is still in use by
the pods/containers. This leads to hangs which eventually leads to the
hostwd service timing out and triggering a reboot.

Solution:
The old dependencies are not suitable for the current version of ceph
because we are now using the containerd docker runtime instead of
docker service. Meanwhile the ceph init script uses systemd-run to
launch the systemd scopes for ceph components(ceph-mon|osd|mds).
The script generates transient systemd scope files with basic
configuration.

This update patches the ceph init script to generate systemd overrides
config files for the ceph components that provide improved ordering
during shutdown. This ordering ensures kubelet and containerd services
are shut down first, then the ceph scopes and service management (SM).
As a result the timeout of hostwd service isn't triggered and the
shutdown now works properly.

TestPlan:
PASS: build-pkgs
PASS: build-image
PASS: Jenkins installation
PASS: kubectl create -f ceph-fuse.yaml
PASS: After checking the pod is running with 'kubectl get pods',
      execute the command "sudo shutdown -hP now"
PASS: The shutdown works well without os reboot.

The yaml file is as follows:
$cat ceph-fuse.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rwx-test-claim
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wrx-centos
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      run: centos
  template:
    metadata:
      labels:
        run: centos
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: run
                operator: In
                values:
                - centos
            topologyKey: kubernetes.io/hostname
      containers:
      - name: centos
        image: centos/tools
        imagePullPolicy: IfNotPresent
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do dd if=/dev/zero of=/mnt1/aaa bs=1K count=100 && sleep 1; done;" ]
        volumeMounts:
        - name: pvc1
          mountPath: "/mnt1"
      restartPolicy: Always
      volumes:
      - name: pvc1
        persistentVolumeClaim:
          claimName: rwx-test-claim

Closes-Bug: 2011610

Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I2c093c490ba177fbfc816e44dc227890270cac83
2023-03-14 17:13:52 +00:00
Zuul ab3a42c6db Merge "Update debian package versions to use git commits" 2023-03-13 20:14:15 +00:00
Hediberto Cavalcante da Silva b629db6b9f AIO-DX Ceph Optimizations
This change is part of the solution to resolve the scenario where
Ceph MON starts without having data in sync when there is no
communication with the peer, leading to PG issues.

Improvements:

Removed starting Ceph MON and MDS from ceph.sh script called by
mtcClient for AIO-DX:
- Ceph MDS was not being managed, only started by ceph.sh
  script called from mtcClient. Now it will be managed by PMON.
- Ceph MON will continue to be managed by SM.

Ceph-init-wrapper script will verify some conditions to start
Ceph MON safely:
- First, check if drbd-cephmon role is Primary.
- Then, check if drbd-cephmon partition is mounted correctly.
- Check flags (inside drbd-cephmon path) for last active Ceph MON
process (Controller-0 or Controler-1). This flag will be created
by the last Ceph MON successful start.
- If the last active monitor is the other one, check if
drbd-cephmon is UpToDate/UpToDate, meaning that data is synchronized
between controllers.

We also made some improvements to /etc/init.d/ceph script to be able
to stop Ceph OSD even if Ceph MON was not available. Stopping OSD
without a Ceph Monitor was hanging when the command to flush the
journal would wait forever to communicate to any available Ceph Monitor.

Test Plan:
    PASS: system host-swact.
    PASS: Ceph recovery after mgmt network outage for few minutes
          even when rebooting controllers.
    PASS: Ceph recovery after rebooting active controller.
    PASS: Ceph recovery after case of dead office recovery (DOR).
    PASS: Running shellcheck on ceph-base.ceph.init, ceph.sh,
          and ceph-init-wrapper.sh files without any complaints
          about the lines related to the changes.

Closes-bug: 2004183

Signed-off-by: Hediberto Cavalcante da Silva <hediberto.cavalcantedasilva@windriver.com>
Change-Id: Id09432aecef68b39adabf633c74545f2efa02e99
2023-03-06 12:44:00 -05:00
Pedro Vinícius Silva da Cruz 4bc42115a6 Update debian package versions to use git commits
The Debian packaging has been changed to reflect all the
git commits under the directory, and not just the commits
to the metadata folder.

Package builder concatenates the number of revisions added since
the one defined in the BASE_SRCREV to the name of the .deb file
generated. Thus, any new submission to the code in these directories
increment this revision counter in the filename.

Test Plan:
PASS: Created new code submissions to package ceph and verify that
the version in the package name was incremented.
PASS: Created new code submissions to package parted and verify that
the version in the package name was incremented.
PASS: Created new code submissions to package trident-installer and
verify that the version in the package name was incremented.

Story: 2010550
Task: 47490

Signed-by: Pedro Vinícius Silva da Cruz <pedro.silvadacruz@windriver.com>
Change-Id: I16f834fd77f0abafaed39a0d4e6cd78d35fa4b98
2023-03-01 14:30:15 -05:00
Felipe Sanches Zanoni 08a571dc86 Enable ceph init script to use already mounted osd filesystem
Ceph initialization script /etc/init.d/ceph was failing to start osd
when osd disk is already mounted and the umount fails because disk is
in use.

The script line has an umount command that fails if the partition is
in use. Then, the next mount command will fail returning 32.
If the error is that the partition is already mounted, look for
'already mounted on ${fs_path}' text in the output and then ignore
the mount error returning success and continuing the start script.

An example of error text output:
 === osd.0 ===
 Mounting xfs on controller-0:/var/lib/ceph/osd/ceph-0
 umount: /var/lib/ceph/osd/ceph-0: target is busy.
 mount: /var/lib/ceph/osd/ceph-0: /dev/nvme2n1p1 already mounted
   on /var/lib/ceph/osd/ceph-0.
 failed: 'modprobe xfs ; egrep -q '^[^ ]+ /var/lib/ceph/osd/ceph-0 '
   /proc/mounts && umount /var/lib/ceph/osd/ceph-0 ;
   mount -t xfs -o rw,noatime,inode64,logbufs=8,logbsize=256k
   /dev/disk/by-path/pci-0000:11:00.0-nvme-1-part1
   /var/lib/ceph/osd/ceph-0'

Test-Plan:
  PASS: Validate the new script with partition already mounted
   on right location in AIO-SX and AIO-DX.
  PASS: Validate the new script with partition already mounted
   but on a different location in AIO-SX and AIO-DX.
  PASS: Validate the new script with partition not mounted in
   AIO-SX and AIO-DX.

Closes-bug: 1999826

Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I6f0c1a3c2742de62040a690dd3d65785bdc1de73
2023-01-20 17:06:47 +00:00
Zuul 7cff0c4d7f Merge "ceph-manage-journal: add support for mpath device" 2022-05-24 15:58:07 +00:00
Jackie Huang f00e55b736 ceph-manage-journal: add support for mpath device
* Add the missing 's' to fix the syntax error:

  File "/usr/sbin/ceph-manage-journal", line 200, in mount_data_partition
    print("Failed to mount %(node)s to %(path), aborting" % params)
ValueError: unsupported format character ',' (0x2c) at index 35

* Add a function to find mpath node in /dev/mapper

Test Plan:

PASS: AIO-SX with Ceph, 1 osd
PASS: AIO-SX with Ceph, 2 osd
PASS: AIO-SX with Ceph, 4 osd

Story: 2010046
Task: 45427

Signed-off-by: Jackie Huang <jackie.huang@windriver.com>
Signed-off-by: Thiago Miranda <ThiagoOliveira.Miranda@windriver.com>
Change-Id: I08f1f226343bf0140abb1ec8825533abb3f57e43
2022-05-24 12:40:41 +00:00
Andrei Suciu 0b3bdc6f66 Debian: replace ceph workarounds
Description:
- replace library path
- change call for getting stack trace
- update ownership for /var/lib/ceph
- remove ceph user creation

-Test Plan:
PASSED: build packages and image /Debian
PASSED: bootstrap and unlock /Debian
PASSED: checked for failed processes /Debian
PASSED: check system application-list /Debian
PASSED: checked for ceph alarms /Debian
PASSED: checked ceph status, puppet logs /Debian
PASSED: checked ceph and system application-list status after
unlock /CentOS

Story: 2009965
Task: 45438

Change-Id: If864d288e5b63928f18a5b31551b4cd479b00fe8
2022-05-24 12:31:47 +00:00
Dan Voiculeasa 5bcfd552de debian: Fix ceph lsb script
This work is part of Debian integration effort.
This work only affect Debian. We can port this to CentOS without
issues.

This prevents Maintenance check for /etc/services.d/controller/ceph.sh
from successfully completing after unlock, which results in a reboot.

Debian uses /lib/lsb/init-functions vs CentOS /etc/init.d/functions.
init-functions calls hooks from /lib/lsb/init-functions.d/.
One of the hooks redirect the lsb script call to a systemctl call.
Systemctl calls for ceph service don't work on CentOS or Debian.
There is no sourcing of /etc/init.d/functions so we don't need it for
/lib/lsb/init-functions either.

Using the reasoning above drop sourcing of /lib/lsb/init-functions.

Tests on AIO-SX:
CentOS: not affected, skip
Debian:
PASS: live patch controller, unlock, no unwanted reboot initiated by
Maintenance
PASS: build-pkgs, extract contents and check /etc/init.d/ceph

Story: 2009101
Task: 44791
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Change-Id: I49b79e78b0f832096dca98ca2cfd68c454679b95
2022-03-16 15:42:51 +02:00
Yue Tao 4a709349a9 meta_data.yaml: add sha256sum checksum
Test Plan:
Pass: Verify sha256sum checksum via "download -s"

Story: 2008846
Task: 44578

Signed-off-by: Yue Tao <Yue.Tao@windriver.com>
Change-Id: I78d9dff2af0afb18c6db4e8d2d39ef79b5cf5864
2022-03-03 14:30:40 +08:00
Leonardo Fagundes Luz Serrano 83065c5298 Add debian package for Ceph
Add debian packaging infrastructure for
integ/ceph to build a debian package.

Test Plan: build-pkg; build-image; same contents as RPM

PASS build-pkg
PASS build-image
PASS same contents and permissions as RPM

Attention:

In order to avoid memory issues during the build,
please do one of the following:

- Developers with only 32G RAM will need to
temporarily unmount /var/lib/sbuild/build
so that the build system uses the disk instead of tmpfs

OR

- update /etc/fstab to set the size for
the sbuild tmpfs filesystem in the pkgbuilder container:

tmpfs /var/lib/sbuild/build tmpfs uid=sbuild,gid=sbuild,mode=2770,size=40G 0 0

Note:
Build times can be long. In order to accelerate it,
adjust the values of MINIKUBECPUS/MINIKUBEMEMORY
in import-stx file (tools repo) before building
the containers with stx-init-env.

Depends-On: https://review.opendev.org/c/starlingx/tools/+/827884

Story: 2009101
Task: 44304

Signed-off-by: Leonardo Fagundes Luz Serrano <Leonardo.FagundesLuzSerrano@windriver.com>
Change-Id: Idc8ee1ebac5c973622c1c599f4a04c001bfa89a6
2022-02-11 17:19:41 +00:00
Zuul 459541141c Merge "Enable generation of Ceph's Python 3 packages" 2022-01-21 00:26:05 +00:00
Felipe Sanches Zanoni 94b8a78799 Ceph build failure
Ceph build failure after nspr library update.

This library is used only for library tests.
To fix this and preventing to happen again, all tests
are not compiled anymore.

Test Plan:
    PASS: Compile master branch without build-avoidance and
          verify it finishes with no errors.

Closes-Bug: 1958560
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I74046f1e76b242655f86c71354248f1bcb9ff76a
2022-01-20 14:52:12 -05:00
Delfino Curado 563c59599d Enable generation of Ceph's Python 3 packages
Changed ceph.spec to enable the generation of python 3 packages.
It's important to highlight that the python 2 packages will continue
to be generated and they are the ones used on StarlingX installation.

The python 3 packages will only be, originally on stx-base-image.

There is also a clean up on centos_tarball-dl.lst of commented lines
of ceph submodules that were updated.

Test plan:
Complete build run
Starlingx installation
stx-openstack apply - check that the helm chart can create ceph pools

Depends-On: https://review.opendev.org/c/starlingx/tools/+/824575
Story: 2009074
Task: 44281

Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: I52dac30849a7072b80cad388b16d2b50ea22391a
2022-01-13 11:01:05 -05:00
Felipe Sanches Zanoni b0b59243b2 Ceph mgr-restful-plugin has new server_port config location
Ceph mgr-restful-plugin was running ceph-mgr on port 8003 instead of
port 7999.

The problem was that mgr-restful-plugin was configuring the server
port at mgr/restful/server_port key in Mimic.
This key has changed to config/mgr/mgr/restful/server_port in
Nautilus.

Test Plan:
 - Tested on AIO-SX using netstat to check the port and curl to get
data using port 7999.

Story: 2009074
Task: 44160

Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: Ib534089bd30c5b1e2c7db98bbd2f495b1545f420
2021-12-09 21:23:36 +00:00
Felipe Sanches Zanoni 205b6e48b2 Fix mgr-restful-plugin not running correctly
After upgrading to ceph nautilus, the mgr-restful-plugin log shows a
message of command failure when running 'ceph config-key get
config/mgr/restful/controller-0/crt'.

This happens on both controllers and can lead to spotty access by
components that need REST API access.

Changing the path to the certificate from
'config/mgr/restful/controller-0/crt' to
'config/mgr/mgr/restful/controller-0/crt' and the path to the key from
'config/mgr/restful/controller-0/key' to
'config/mgr/mgr/restful/controller-0/key' fixed the problem

Test plan:
 - Tested on AIO-DX

Story: 2009074
Task: 44100
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: Ifb0d3c7b8b3669472ef3b579951b9850fdf4bbbc
2021-11-30 20:03:45 +00:00
Delfino Curado a869978f09 Updating ceph build_srpm.data
Updating TIS_BASE_SRCREV to reflect the source rev of branch 14.2.22
of stx-ceph.

Updating TIS_PATCH_VER to account for the 43 previous packaging
changes that went in with Mimic.

Test plan:
 - Build ceph package and check the package name

Story: 2009074
Task: 44013

Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: I6e51dedd62e851c4716bc27812a447d08694ed46
2021-11-19 17:45:59 -05:00
Delfino Curado 6db6fe5bbd Change ceph-mon configuration
Disabling by default the warnings related to monitors allowing
insecure global_id reclaim as well as defining
"auth allow insecure global id reclaim" to true by default to
all monitors. The main goal here is to enable a mixed set of
ceph versions.

A next step is to enable through service parameters to the user
to mix non-compliant ceph clients installed by other application.

Gdisk was added again as this is necessary for StarlingX

Test plan:

PASS: Build successfully
PASS: Install on AIO-SX, AIO-DX, Standard and Storage configs
successfully and without alarms (fm alarm-list) or ceph warnings
(ceph -s).
PASS: platform-integ-apps is applied successfully

Story: 2009074
Task: 43464

Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: I5f3e432444b60ab73136431bb94bb6ab532ae0ab
2021-10-26 18:47:36 -04:00
Delfino Curado 0b038dae3c Add ceph-disk to build
This needs to be done because we want to keep compatibility with
puppet-ceph-2.4.1-1 and our current version of puppet.

As this version of puppet-ceph only uses ceph-disk we will keep it
until we are able to move on to ceph-volume. Probably this will be
possible when StartlingX is using version 3.1.1 of puppet-ceph.

Test plan:

PASS: Build successfully

Story: 2009074
Task: 43465

Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: Ie9570f01728df28ee4ea357b1e618c5a4c0a3803
2021-10-26 17:11:56 -04:00
Delfino Curado d92e321f71 Integrate ceph version 14 in StarlingX build
Add the upgraded submodules as dependencies in
centos_tarball-dl.lst file. It's important to highlight that dpdk is
added twice because seastar and SPDK depends on different versions of
dpdk.

    * boost_1_72_0.tar.bz2
    * c-ares-fd6124c74da0801f23f9d324559d8b66fb83f533.tar.gz
    * civetweb-bb99e93da00c3fe8c6b6a98520fb17cf64710ce7.tar.gz
    * dmclock-4496dbc6515db96e08660ac38883329c5009f3e9.tar.gz
    * dpdk-96fae0e24c9088d9690c38098b25646f861a664b.tar.gz
    * dpdk-a1774652fbbb1fe7c0ff392d5e66de60a0154df6.tar.gz
    * fmt-80021e25971e44bb6a6d187c0dac8a1823436d80.tar.gz
    * intel-ipsec-mb-134c90c912ea9376460e9d949bb1319a83a9d839.tar.gz
    * rocksdb-4c736f177851cbf9fb7a6790282306ffac5065f8.tar.gz
    * seastar-0cf6aa6b28d69210b271489c0778f226cde0f459.tar.gz
    * spawn-5f4742f647a5a33b9467f648a3968b3cd0a681ee.tar.gz
    * spdk-fd292c568f72187e172b98074d7ccab362dae348.tar.gz
    * zstd-b706286adbba780006a47ef92df0ad7a785666b6.tar.gz

Merged the changes of ceph 14 spec in this repo. For now python3
is disabled by default for StarlingX build but this will probably
change in the future. For python3 build to work, more dependencies
will be needed.

Test plan:

PASS: Build successfully

Depends-On: https://review.opendev.org/c/starlingx/tools/+/814591
Story: 2009074
Task: 42946
Signed-off-by: Delfino Curado <delfinogomes.curadofilho@windriver.com>
Change-Id: Iab9d0b57b00da4ba595d2b2f24194f058c850f5b
2021-10-26 17:09:26 -04:00
Charles Short 0acb956dce Fix python3 incompatibility
- socket requires bytes and we need to explicitly convert str to bytes.
- check_output() returns bytes, while python2 returns str, passing
  universal_newlines=True it will return str no matter what python
  version is used.

Story: 2006796
Task: 42297

Signed-off-by: Charles Short <charles.short@windriver.com>
Change-Id: Ie3921c4ae6211a8b0d290bdbdb195ce07036afbc
(cherry picked from commit 26c16b3eb8)
2021-07-26 14:35:12 -04:00
Zuul 0f497f800e Merge "On AIO-DX only start Ceph MON and MDS via MTC" 2021-06-29 20:19:03 +00:00
Pedro Henrique Linhares 12d564b37d On AIO-DX only start Ceph MON and MDS via MTC
Defer start of Ceph ODSs to SM in order to avoid a race condition
between MTC and SM when starting OSDs. This is only required for AIO-DX
where SM manages the floating monitor and OSDs.

Closes-Bug: 1932351
Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Change-Id: Ia718ae696d8158e63660ee54d226271a6bcb476e
2021-06-29 13:26:30 -04:00
Charles Short 3cec8b6ac9 Address python3 string issues with subprocess
This patch updates our Popen call to enable
newlines for calls that we parse or consume the output for.
Without universal_newlines=True, the output is treated as bytes
under python3 which leads to issues later where we are using it as
strings.

See https://docs.python.org/3/glossary.html#term-universal-newlines

Story: 2006796
Task: 42696

Signed-off-by: Charles Short <charles.short@windriver.com>
Change-Id: I9b93907c05486b1f76aebe181af812c243285d6a
2021-06-25 12:19:10 -04:00
Mihnea Saracin 3225570530 Execute once the ceph services script on AIO
The MTC client manages ceph services via ceph.sh which
is installed on all node types in
/etc/service.d/{controller,worker,storage}/ceph.sh

Since the AIO controllers have both controller and worker
personalities, the MTC client will execute the ceph script
twice (/etc/service.d/worker/ceph.sh,
/etc/service.d/controller/ceph.sh).
This behavior will generate some issues.

We fix this by exiting the ceph script if it is the one from
/etc/services.d/worker on AIO systems.

Closes-Bug: 1928934
Change-Id: I3e4dc313cc3764f870b8f6c640a6033822639926
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
2021-05-20 18:08:47 +03:00
Robert Church 46d8d8fdf1 Add conditions to when RBD devices are unmounted
ceph-preshutdown.sh is called as a post operation when docker is
stopped/restarted. Based on current service dependencies, when docker is
restarted this will also trigger a restart of containerd.

Puppet manifests will restart containerd and docker for various
operations both on system boot and during runtime operations when their
configuration has changed.

This update adds conditions to ensure that the RBD devices are only
unmounted when the system is shutting down. This avoids the RBD backed
persistent volumes from being forcibly removed from running pods and
being remounted read-only during these restart scenarios.

Change-Id: I7adfddf135debcc8bcaa1f93866e1a276b554c88
Closes-Bug: #1901449
Signed-off-by: Robert Church <robert.church@windriver.com>
2020-12-14 19:04:31 -05:00
Dongqi Chen af359d4938 Add auto-versioning to starlingx/integ packages
This update makes use of the PKG_GITREVCOUNT variable
to auto-version the packages in this repo.

Story: 2007750
Task: 39951
Change-Id: I854419c922b9db4edbbf6f1e987a982ec2ec7b59
Signed-off-by: Dongqi Chen <chen.dq@neusoft.com>
2020-06-24 09:48:28 +08:00
Zuul 502e80c7fa Merge "Change ceph manager port" 2020-04-15 14:27:21 +00:00
Dan Voiculeasa e7bbd7e7b1 Change ceph manager port
Free port 5001 to be used by keystone.

Story: 2007347
Task: 39392

Change-Id: Id789591bf22931494e970aaf3b12e9e5cbe223fa
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
2020-04-14 10:55:44 +03:00
Paul Vaduva bed7388b67 Release FDs when stuck peering recovery
During stuck peering recovery if file descriptors are
not released the state machine does not advance to
OPERATIONAL state

Partial-bug: 1856064

Change-Id: I3fba7be661ebf223eac63608574323ad98d33b75
Signed-off-by: Paul Vaduva <Paul.Vaduva@windriver.com>
2020-03-11 08:11:51 -04:00
Dan Voiculeasa 11fd5d9cd4 ceph-init-wrapper: Detect stuck peering OSDs and restart them
OSDs might become stuck peering.
Recover from such state.

Closes-bug: 1851287

Change-Id: I2ef1a0e93d38c3d041ee0c5c1e66a4ac42785a68
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
2019-11-25 09:37:48 +00:00
Zuul d51e846143 Merge "ceph: mgr-restful-plugin set ceph-mgr config file path" 2019-09-11 18:16:56 +00:00
Zuul bc4877e5bb Merge "ceph: mgr restful plugin set certificate to match host name" 2019-09-11 16:35:05 +00:00
Daniel Badea edc7f8495d ceph: mgr-restful-plugin set ceph-mgr config file path
Explicitly set ceph-mgr configuration file path to
/etc/ceph/ceph.conf to avoid surprises. ceph-mon
and ceph-osd are also started with '-c' (--conf)
pointing to /etc/ceph/ceph.conf.

Change-Id: I4915952f17b4d96a8fce3b4b96335693f9b6c76b
Closes-bug: 1843082
Signed-off-by: Daniel Badea<daniel.badea@windriver.com>
2019-09-11 16:30:06 +00:00
Zuul 4b6a275e4f Merge "ceph-init-wrapper use flock instead of flag files" 2019-09-09 19:34:31 +00:00
Daniel Badea fcaa49ecaf ceph: mgr restful plugin set certificate to match host name
python-cephclient certificate validation fails when connecting
to ceph-mgr restful plugin because server URL doesn't match
CommonName (CN) or SubjectAltName (SAN).

Setting CN to match server hostname fixes this issue but
raises a warning caused by missing SAN.

Using CN=ceph-restful and SAN=<hostname> fixes the issue
and clears the warning.

Change-Id: I6e8ca93c7b51546d134a6eb221c282961ba50afa
Closes-bug: 1828470
Signed-off-by: Daniel Badea <daniel.badea@windriver.com>
2019-09-09 06:53:58 +00:00
Scott Little 062ec89dbb Relocated some packages to repo 'utilities'
List of relocated subdirectories:

ceph/ceph-manager
ceph/python-cephclient
filesystem/nfscheck
logging/logmgmt
security/tpm2-openssl-engine
security/wrs-ssl
tools/collector
tools/engtools/hostdata-collectors
utilities/build-info
utilities/namespace-utils
utilities/pci-irq-affinity-agent
utilities/platform-util
utilities/tis-extensions
utilities/update-motd

Story: 2006166
Task: 35687
Depends-On: I665dc7fabbfffc798ad57843eb74dca16e7647a3
Change-Id: I2bf543a235507a4eff644a7feabd646a99d1474f
Signed-off-by: Scott Little <scott.little@windriver.com>
Depends-On: I85dda6d09028f57c1fb0f96e4bcd73ab9b9550be
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-09-05 20:31:36 -04:00
Daniel Badea 9faad45703 ceph-init-wrapper use flock instead of flag files
When swact occurs and ceph-init-wrapper is slow to respond
to a status request it gets killed by SM. This means the
corresponding flag file that marks status in progress is left
behind.

When controller swacts back ceph-init-wrapper sees status
in progress and waits for it to finish (with a timeout).
Because it does not respond fast enough SM tries to start
again ceph-init-wrapper to get ceph-mon service up and running.

This happens a couple of times until the service is declared
failed and controller swacts back.

To fix this we need to use flock instead of flag files as the
locks will be automatically released by the OS when process
is killed.

Change-Id: If1912e8575258a4f79321d8435c8ae1b96b78b98
Closes-bug: 1840176
Signed-off-by: Daniel Badea <daniel.badea@windriver.com>
2019-08-27 14:53:32 +00:00