Update SM lsb script for quick start

pidof command returns subprocess id when SM main process terminates.
This result a false postive that SM is already running so the start
action is skipped.

Make changes to the SM lsb script to distingrish if a subprocess ID
is returned, and attempt to kill it to speed up recovery of SM.

Revert the change to extend startuptime to 15 seconds back to 5.

Test Cases:
    kill SM process, observe SM process starts immediately after the
    subprocess is killed. SM is recovered within 2 seconds.
    (calculated by last and first logging of SM)

Change-Id: Ida834e7dd31a493ee6193b4d8ee73ebd97513de2
Closes-Bug: 1998349
Signed-off-by: Bin Qian <bin.qian@windriver.com>
This commit is contained in:
Bin Qian 2022-12-02 17:38:23 -05:00
parent 8b5ee400b5
commit 88aeba251b
2 changed files with 40 additions and 7 deletions

View File

@ -55,19 +55,52 @@ case "$1" in
fi
echo -n "Starting ${SM_NAME}: "
if [ -n "`pidof ${SM}`" ]
then
# PMOND might have restarted SM already.
RETVAL=0
else
c=0
p=$(pidof ${SM})
# pidof /usr/bin/sm return 2 pids. When SM main process is killed
# subprocess id is returned until the subprocess goes away.
# calling start-stop-daemon --start too early will fail
#
# Add a loop below to wait up to 10 seconds for sub process to finish.
# Sub process terminates in around 5 seconds by itself, try killing it.
# A slight longer wait time to see main process goes away is not a concern,
# as if the SM is actually running (possibly started by pmon or systemd),
# sm is already functioning.
c=0
p=$(pidof ${SM})
while [[ $c -lt 10 && ${p} ]]
do
logger "SM waiting ${p}"
if [[ $(echo ${p} | grep "^[0-9]*$") ]]; then
# only subprocess running, try killing it
kill -9 ${p}
fi
sleep 1
c=$(( ${c} + 1 ))
p=$(pidof ${SM})
done
running=0
if [[ "${c}" == "10" ]]; then
if [[ $(echo ${p} | grep "^[0-9]*$") ]]; then
kill -9 ${p}
elif [[ $(echo ${p} | grep "^[0-9]* [0-9]*$") ]]; then
running=1
logger "pidof ${SM} still running."
RETVAL=0
fi
fi
if [[ "${running}" == "0" ]]; then
start-stop-daemon --start -b -x ${SM} -- ${sm_args}
RETVAL=$?
fi
if [ ${RETVAL} -eq 0 ]
then
echo "OK"
else
echo "FAIL"
echo "FAIL ${RETVAL}"
RETVAL=1
fi
;;

View File

@ -10,7 +10,7 @@ script = /etc/init.d/sm
style = lsb ; lsb
severity = critical ; minor, major, critical
restarts = 3 ; restarts before error assertion
startuptime = 15 ; seconds to wait after process start
startuptime = 5 ; seconds to wait after process start
interval = 5 ; number of seconds to wait between restarts
debounce = 20 ; number of seconds to wait before degrade clear
quorum = 1 ; process is in the host watchdog quorum