On AIO deployments puppet is run twice with two different manifests:
1. 'controller': to configure controller services
2. 'worker': to configure worker services.
Ceph is configured when 'controller' manifests are applied, there is
no need to run them a second time, when 'worker' set is applied.
Commit adds new puppet classes to encapsulate ceph configuration
based on node personality and adds a check to not apply it a 2nd
time on controllers.
If the ceph manifests are executed a second time then we get into
a racing issue between SM's process monitoring and 'worker' puppet
manifests triggering a restart of ceph-mon as part of reconfiguration
After a reboot on AIO, SM takes control of ceph-mon monitoring
after 'controller' puppet manifests finish applying. As part of this,
SM monitors processes death notification and gets the pid from the
.pid file. And periodically executes '/etc/init.d/ceph status
mon.controller' for a more advanced monitoring.
When the 'worker' manifests are executed, they trigger a restart
of ceph-mon through /etc/init.d/ceph restart that has two steps: 'stop'
in which ceph-mon is stopped, and 'start' in which it is restarted.
In the first step, stopping ceph-mon leads to the death of ceph-mon
process and removal of its PID file. This is promptly detected by
SM which immediately triggers a start of ceph-mon that creates a
new pid file. Problem is that ceph-mon was already in a restart,
and at the end of the 'stop' step the init script cleans up the
new pid file instead of the old.
This leads to controllers swacting a couple of times before the system
gets rid of the rogue process.
Change-Id: I2a0df3bab716a553e71e322e1515bee2bb2f700d
Co-authored-by: Ovidiu Poncea <ovidiu.poncea@windriver.com>
Story: 2002844
Task: 29214
Signed-off-by: Ovidiu Poncea <ovidiu.poncea@windriver.com>