update/cgcs-patch/restart-info.html

713 lines
26 KiB
HTML
Executable File

<!DOCTYPE html>
<html>
<head>
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
padding: 10px;
}
</style>
</head>
<body>
<table>
<caption>
<font size=10>Process restart information</font>
</caption>
<thead>
<tr>
<th>Process/Service</th>
<th>Function</th>
<th>In service patchable</th>
<th>Managed by</th>
<th>Restart command</th>
<th>Patch Restart command</th>
<th>Restart dependency</th>
<th>Impact(if restarted while in operation)</th>
<th>Special handling required</th>
</tr>
</thead>
<tr>
<td><font color="blue">ceilometer-polling</font></td>
<td>Daemon that polls Openstack services and build meters</td>
<td>Y</td>
<td>PMON</td>
<td><b>/etc/init.d/openstack-ceilometer-polling restart</b></td>
<td><b></b></td>
<td>N</td>
<td>As batch_polled_samples is set to True, may lose some samples that
are in the pollsters memory if the process is restarted exactly
when they have just finished polling for samples and are about to
publish these samples to RabbitMQ. This is about 10 millisecond
window for cpu_source and 0.03 millisecond 1 second window for
meter related sources.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">ceilometer-agent-notification</font></td>
<td>Daemon that listens to notifications on message queue, converts
them to Events and Samples and applies pipeline actions
</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service ceilometer-agent-notification</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/ceilometer-agent-notification stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/ceilometer-agent-notification start
</td>
<td><b></b></td>
<td>N</td>
<td>May lose some samples/events if the process is restarted while they
are being transformed or converted.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">ceilometer-collector</font></td>
<td>Daemon that gathers and records event and metering data created by
notification and polling agents
</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service ceilometer-collector</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/ceilometer-collector stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/ceilometer-collector
start
</td>
<td><b></b></td>
<td>N</td>
<td>May lose some samples/events if the process is restarted while they
are being persisted in Postgres DB. This is a tiny window
especially with recent optimization work (no message signature
verification, one single call to create_sample stored proc).<br>
Note: Making sure that child processes and their database
connections are released when a parent process is stopped is part
of collector functionality. It is not specific to in-service
patching.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">ceilometer-api</font></td>
<td>Service to query and view data recorded by the collector</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service ceilometer-api</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/ceilometer-api stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/ceilometer-api start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, horizon or CLI ceilometer request
will fail. Horizon request will be re-established automatically in
its next polling interval. CLI command needs to be re-issued.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">ceilometer-expirer-active</font></td>
<td>Cron job that purges expired samples and events as well as related
meter and resource data
</td>
<td>Y</td>
<td>CRON</td>
<td><b>N/A</b><br><br>
To run the expirer manually: /usr/bin/ceilometer-expirer-active
</td>
<td><b></b></td>
<td>N</td>
<td>There is no need to restart after patch. The change will take
effect next time the expirer cron job is run.<br>
Unless there are new features specifically planned for expirer,
this code is very stable.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">haproxy</font></td>
<td>A Proxy service that is responsible for forwarding external REST
API requests to Open Stack and Titanium Cloud services that listening on the
internal interfaces.
</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service haproxy</b><br>
which runs the following:<br><br>
/bin/sh /etc/init.d/haproxy stop<br>
/bin/sh /etc/init.d/haproxy start
</td>
<td><b>/usr/local/sbin/patch-restart-haproxy</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will fail
and new requests will get connection error until the service is
re-enabled.
</td>
<td>Y</td>
</tr>
<tr>
<td><font color="blue">sm</font></td>
<td>Service management daemon</td>
<td>N</td>
<td>PMON</td>
<td><b>/etc/init.d/sm restart</b></td>
<td><b></b></td>
<td>N</td>
<td>Will cause all services disabled on the active controller before
the standby controller takes over the control.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">sm-api</font></td>
<td>Daemon that provides sm api</td>
<td>N</td>
<td>PMON</td>
<td><b></b></td>
<td><b></b></td>
<td>N</td>
<td></td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">sm-eru</font></td>
<td>Daemon that records sm eru data</td>
<td>N</td>
<td></td>
<td><b></b></td>
<td><b></b></td>
<td>N</td>
<td></td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">sm-watchdog</font></td>
<td>Daemon that loads NFS watchdog module to look for and handle
stalled NFS threads
</td>
<td>N</td>
<td></td>
<td><b></b></td>
<td><b></b></td>
<td>N</td>
<td></td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">neutron-server</font></td>
<td>Service that manages network functions</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service neutron-server</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/neutron-server stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/neutron-server start
</td>
<td><b>/bin/neutron-restart neutron-server</b><br/>or<br/><b>/bin/neutron-restart --all</b></td>
<td>N</td>
<td>Will cause neutron services to not be available while restarting,
which will prevent instances from being created while it is down.
Could cause RPCs from computes to fail while it is restarting.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">neutron-dhcp-agent</font></td>
<td>Agent on compute node that manages DHCP servers for tenant
networks
</td>
<td>Y</td>
<td>PMON</td>
<td><b>/etc/init.d/neutron-dhcp-agent restart</b></td>
<td><b>/bin/neutron-restart neutron-dhcp-agent</b><br/>or<br/><b>/bin/neutron-restart --all</b></td>
<td>N</td>
<td>Will prevent binding new DHCP servers while it is down. Requires
special handling to kill metadata haproxy processes for networks.
</td>
<td>Y</td>
</tr>
<tr>
<td><font color="blue">neutron-metadata-agent</font></td>
<td>Agent on compute node serving metadata to nodes</td>
<td>Y</td>
<td>PMON</td>
<td><b>/etc/init.d/neutron-metadata-agent restart</b></td>
<td><b>/bin/neutron-restart neutron-metadata-agent</b><br/>or<br/><b>/bin/neutron-restart --all</b></td>
<td>N</td>
<td>Nodes will not be able to receive metadata while it is down</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">neutron-sriov-nic-agent</font></td>
<td>Agent on compute node responsible for setting SR-IOV port
information
</td>
<td>Y</td>
<td>PMON</td>
<td><b>/etc/init.d/neutron-sriov-nic-agent restart</b></td>
<td><b>/bin/neutron-restart neutron-sriov-nic-agent</b><br/>or<br/><b>/bin/neutron-restart --all</b></td>
<td>N</td>
<td>Will not be able to set device parameters while restarting</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">neutron-bgp-dragent</font></td>
<td>BGP dynamic routing agent on controller node
</td>
<td>Y</td>
<td>PMON</td>
<td><b>/etc/init.d/neutron-bgp-dragent restart</b></td>
<td><b>/bin/neutron-restart neutron-bgp-dragent</b><br/>or<br/><b>/bin/neutron-restart --all</b></td>
<td>N</td>
<td>Will not be able to set device parameters while restarting</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">keystone-all</font></td>
<td>Keystone provides services that support an identity, token
management, and service catalog and policy functionality.
</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service keystone</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/keystone stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/keystone start
</td>
<td><b>/usr/local/sbin/patch-restart-processes keystone-all</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will fail
and new requests will get connection error until the service is
re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">aodh-api</font></td>
<td>Aodh service that handles API requests for openstack alarming.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service aodh-api</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-api stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-api start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will fail
and new requests will get connection error until the service is
re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">aodh-evaluator</font></td>
<td>Aodh service that performs threshold evaluation for openstack
alarming.
</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service aodh-evaluator</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-evaluator stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-evaluator start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted no openstack alarm threshold
evaluations will be executed until the service is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">aodh-listener</font></td>
<td>Aodh service that generates alarms based on events.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service aodh-listener</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-listener stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-listener start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted no openstack event based alarms will
be generated until the service is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">aodh-notifier</font></td>
<td>Aodh service that sends openstack alarm notifications.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service aodh-notifier</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-notifier stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/aodh-notifier start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted no openstack alarm threshold
notifications will be issued until the service is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">aodh-expirer-active</font></td>
<td>Cron job that purges expired openstack alarms</td>
<td>Y</td>
<td>CRON</td>
<td><b>N/A</b><br><br>
To run the expirer manually: /usr/bin/aodh-expirer-active
</td>
<td><b></b></td>
<td>N</td>
<td>There is no need to restart after patch. The change will take
effect next time the expirer cron job is run.<br>
Unless there are new features specifically planned for expirer,
this code is very stable.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">heat-api</font></td>
<td>Heat service for API requests for openstack orchestration.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service heat-api</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-api stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-api start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, horizon or CLI heat requests will
fail. Horizon will re-established automatically. CLI commands needs
to be re-issued. Heat stack updates in progress may fail.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">heat-api-cfn</font></td>
<td>Heat service for AWS Cloudformation API requests.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service heat-api-cfn</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-api-cfn stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-api-cfn start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, cloudformation API requests such as
autoscaling will not be processed.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">heat-api-cloudwatch</font></td>
<td>Heat service for AWS Cloudwatch metric collection.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service heat-api-cloudwatch</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-api-cloudwatch stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-api-cloudwatch start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, stats sent from VMs will not be
processed.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">heat-engine</font></td>
<td>Heat service for AWS Cloudwatch metric collection.</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service heat-engine</b><br>
which runs the following:<br><br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-engine stop<br>
/bin/sh /usr/lib/ocf/resource.d/openstack/heat-engine start
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, openstrack heat orchestration
commands will not be processed. Stacks being created, deleted or
updated will fail and need to be re-initiated.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">heat-purge-deleted-active</font></td>
<td>Cron job that purges deleted openstack heat stacks from the
database
</td>
<td>Y</td>
<td>CRON</td>
<td><b>N/A</b><br><br>
To run the expirer manually: /usr/bin/heat-purge-deleted-active
</td>
<td><b></b></td>
<td>N</td>
<td>There is no need to restart after patch. The change will take
effect next time the cron job is run.<br>
Unless there are new features specifically planned, this code is
very stable.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">Glance</font></td>
<td>Glance imaging service - a single script restarts both glance-api
and glance-registry.
</td>
<td>Y</td>
<td>SM</td>
<td><b>/usr/bin/restart-glance</b><br>
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
continue and new requests will get connection error until the
service is re-enabled. The graceful restart takes more than 30
secs the process is killed. Timers are configurable from the
restart script
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">Cinder</font></td>
<td>Cinder volume service - a single script restarts cinder-volume,
cinder-scheduler, cinder-api and cinder-backup.
</td>
<td>Y</td>
<td>SM</td>
<td><b>/usr/bin/restart-cinder</b><br>
</td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
continue and new requests will get connection error until the
service is re-enabled. Timers are configurable from the restart
script
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">Horizon</font></td>
<td>Horizon - Openstack Dashboard GUI used to control openstack and Titanium Cloud
operations
</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart service horizon</b><br>
</td>
<td><b>/usr/bin/horizon-patching-restart</b></td>
<td>N</td>
<td>When horizon is restarted via the patch restart command all users
will be logged out. If they try to log back in before the server is
up again they will see an internal server error. It usually takes
less than a minute for the service to restart
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">IO-Monitor</font></td>
<td>Daemon which monitors cinder devices and raises alarms for excessive storage IO load.</td>
<td>Y</td>
<td>PMON</td>
<td><b>pmon-restart io-monitor-manager</b></td>
<td><b>/usr/local/sbin/patch-restart-processes io-monitor-manager</b></td>
<td>N</td>
<td>Generally there should be no impact. It is very unlikely for
the system to encounter an excessive storage IO load which will
only last a couple of seconds until the io-monitor process is restarted,
such that it will not be detected.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">vim</font></td>
<td>Virtual Infrastructure Manager</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service vim</b></td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarting, requests through the VIM API or
through the Nova API Proxy will fail. Any instance actions normally
triggered due to instance state changes (from nova) will not occur
until the process starts up again and audits the instance states.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">vim-api</font></td>
<td>Virtual Infrastructure Manager API</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service vim-api</b></td>
<td><b></b></td>
<td>N</td>
<td>While the service is restarting, requests through the external VIM
API will fail.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">vim-webserver</font></td>
<td>Virtual Infrastructure Manager Web Server</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service vim-webserver</b></td>
<td><b></b></td>
<td>N</td>
<td>No impact. This service is for design use only.</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-api</font></td>
<td>Nova API Service</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service nova-api</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-placement-api</font></td>
<td>Nova Placement API Service</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service nova-placement-api</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-conductor</font></td>
<td>Nova Conductor Service</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service nova-conductor</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-scheduler</font></td>
<td>Nova Scheduler Service</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service nova-scheduler</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-console-auth</font></td>
<td>Nova Console Auth Service</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service nova-console-auth</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-novnc</font></td>
<td>Nova VNC Service</td>
<td>Y</td>
<td>SM</td>
<td><b>sm-restart-safe service nova-novnc</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the service is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">nova-compute</font></td>
<td>Nova Compute Service</td>
<td>Y</td>
<td>PMON</td>
<td><b>/usr/local/sbin/pmon-restart nova-compute</b><br></td>
<td><b>/bin/nova-restart</b></td>
<td>N</td>
<td>While the services is restarted, the outstanding requests will
fail and new requests will get connection error until the service
is re-enabled.
</td>
<td>N</td>
</tr>
<tr>
<td><font color="blue">ceph-osd & ceph-mon</font></td>
<td>Ceph OSD and Monitor processes</td>
<td>Y</td>
<td>PMON</td>
<td><b>/etc/ceph/ceph_pmon_wrapper.sh restart</b><br></td>
<td><b>/etc/ceph/ceph_pmon_wrapper.sh restart</b></td>
<td>N</td>
<td>Ceph processes on a node will restart (ceph-mon and ceph-osd). The restart
will take at most 30s and functionality should not be affected. Note that this
command should not be executed at the same time on storage-0 and any of the
controller nodes as we do not support restarting two of the three ceph-mon at
the same time.
</td>
<td>Restarting it on controller-0, controller-1 & storage-0,
at the same time with glance, cinder, nova, ceph-rest-api, sysinv or ceph-manager
on the active controller should be avoided due to ~30 secs delay to ceph APIs.
This delay happens when any of the ceph-mon changes state and may cause timeouts
when dependent services restart. Recommendations: (1) On the active controller,
restart Ceph before the other service; (2) updating ctrl-0,ctrl-1 & storage-0
at the same time should be avoided.</td>
</tr>
<tfoot>
<tr>
<th>Process/Service</th>
<th>Function</th>
<th>In service patchable</th>
<th>Managed by</th>
<th>Restart command</th>
<th>Patch Restart command</th>
<th>Restart dependency</th>
<th>Impact(if restarted while in operation)</th>
<th>Special handling required</th>
</tr>
</tfoot>
</table>
</body>
</html>