BSD LICENSE Copyright(c) 2013-2016, Wind River Systems, Inc. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Wind River Systems nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ----------------------------------------------------------------------- DESCRIPTION =========== Guest Server Scaling is a service to allow a guest to scale the capacity of a single guest server up and down on demand. Current supported scaling operation is CPU scaling. The resources can be scaled up/down from the nova CLI or GUI. Scaling can also be set up via heat to be automatically triggered based on Ceilometer statistics. (This will not be covered in this document, see the full documentation and the heat SDK for how to configure heat templates for scaling a single guest server.) This package contains an agent and a number of scripts to be included in the guest image. These will handle the guest side of the coordinated efforts involved in scaling up/down guest resources. DEPENDENCIES ============ NOTE that this wrs-guest-scale SDK module has both a compile-time and run-time dependency on the wrs-server-group SDK module. This wrs-guest-scale SDK module requires that the wrs-server-group SDK tarball has been previously extracted and built, and that the resulting libraries and headers have been placed in a location that can be found by the normal build tools, or the WRS_SERVER_GROUP_DIR environment variable has been set. The output of BOTH the wrs-guest-scale SDK module and the wrs-server-group SDK module are required to be installed in a guest image for guest resource scaling. REQUIREMENTS ============ Compilation: Linux OS, x86_64 architecture gcc compiler development libraries and headers for glibc development libraries and headers for libguesthostmsg (built by the wrs-server-group SDK package) development libraries and headers for json-c VM Runtime: Linux OS, x86_64 architecture; CONFIG_HOTPLUG_CPU=y|m runtime libraries for glibc runtime libraries for libguesthostmsg "guest_agent" binary daemon (provided by the wrs-server-group SDK package) runtime libraries for json-c The code has been tested with glibc 2.15, gcc 4.6 and json-c 0.12.99 but it should run on other versions without difficulty. DELIVERABLE =========== The Guest Server Scaling service is delivered as source with the required Makefiles in a compressed tarball called "wrs-guest-scale-#.#.#.tgz", such that it can be compiled for the applicable guest linux distribution. COMPILE ======= Pre-requisite: Ensure that the wrs-server-group SDK tarball has been previously extracted and built, and that the resulting libraries and headers have been placed in a location that can be found by the normal build tools, or the WRS_SERVER_GROUP_DIR environment variable has been set. Extract the tarball contents: tar xvf wrs-guest-scale-#.#.#.tgz To compile: # Note: assumes wrs-server-group-#.#.#.tgz has already been extracted and compiled. cd wrs-guest-scale-#.#.# # If wrs-guest-scale-#.#.#.tgz and wrs-server-group-#.#.#.tgz where extracted in a common directory make # Otherwise supply the path to where wrs-server-group can be found. e.g. make WRS_SERVER_GROUP_DIR=/usr/src/wrs-server-group-#.#.# This will produce: 1) An executable "bin/guest_scale_agent". This handles the basic vCPU scaling in the guest, and calls out to a helper script if present to support application-specific customization. It should be configured to respawn (via /etc/inittab or some other process monitor) if it dies for any reason. This executable must be installed into the guest (e.g. in "/usr/sbin") and configured to run at startup as early as possible. NOTE The "guest_agent" executable from the wrs-server-group SDK package MUST ALSO be installed into the guest (e.g. in "/usr/bin"), configured to run at startup as early as possible and configured to respawn via /etc/inittab or some other process monitor (in case it dies for any reason). 2) A script "script/app_scale_helper". This is an optional script that is intended to allow for app-specific customization. If present, it must be installed in "/usr/sbin". If present, it will be called by "guest_scale_agent" when scaling in either direction. 3) A script "script/offline_cpus". This must be run later in the init sequence, after guest_scale_agent has started up but before the application has started any CPU-affined applications. A helper script "script/init_offline_cpus" has been provided, and should be installed to "/etc/init.d/offline_cpus. The "offline_cpus" script will offline vCPUs in the guest to match the status on the hypervisor. This covers the case where we are booting up with some CPUs offlined by the hypervisor. 4) For systemd users, the files "scripts/guest-scale-agent.service" and "scripts/offline-cpus.service" should be copied to /lib/systemd/system/. Note: The inclusion of the files into the build system and the guest image and the configuration of the guest startup scripts is left up to the user to allow for different build systems and init subsystems in the guest. INSTALL ======= Installing in a running VM: As the root user 1) Copy "bin/guest_scale_agent" to /usr/sbin in the VM. 2) Copy "scripts/app_scale_helper" and "scripts/offline_cpus" to /usr/sbin in the VM. 3) Copy "scripts/init_offline_cpus" to "/etc/init.d/offline_cpus". (Note the name change.) 4) Copy "scripts/guest-scale-agent.service" and "scripts/offline-cpus.service" to /lib/systemd/system/. 5) Run "systemctl enable guest-scale-agent.service", "systemctl enable offline-cpus.service", "systemctl start guest-scale-agent.service", "systemctl start offline-cpus.service" The VM should now be ready to scale up and down. USAGE ===== The service is designed to be simple to use. A basic description is given below, but more details are provided in the source and scripts. 1) Create a new flavor (or edit an existing flavor) such that the number of vCPUs in the flavor matches the desired maximum number of vCPUs. To specify the minimum number of vCPUs, create an "extra spec" metadata entry for the flavor with a key of "wrs:min_vcpus" and a value that is an integer number between one and the max number of vCPUs. This can be done from the CLI or the GUI. (In the GUI select the "Admin" tab, go to the "Flavor" navigation link, click on a flavor name, select the "Extra Specs" tab, click on "Create", select "Minimum Number of CPUs" from the pulldown, and enter the desired value.) 2) Build BOTH the wrs-server-group SDK package and this wrs-guest-scale package, and install the output of BOTH packages in an image. Lastly, ensure that the CONFIG_HOTPLUG_CPU kernel config option is set in the image kernel. 3) Boot the image. It will come up with the full set of vCPUs. 4) To reduce the number of online vCPUs in the guest server, run "nova scale cpu down" from the controller (or anywhere else you can run nova commands). This will pass a message up into the guest, where it will be handled by "guest_scale_agent". That in turn will call out to "/usr/sbin/app_scale_helper" (if it exists) which is expected to pick a vCPU to offline. This script can be modified/replaced as needed for application- specific purposes. By default, it will select the highest-numbered online vCPU. If the script isn't present or errors out, then "guest_scale_agent" will itself select the highest-numbered online vCPU as the one to be offlined. It will then tell the guest kernel to offline the selected vCPU, and will pass the selected vCPU back down to the hypervisor, which will adjust vCPU affinity so that the underlying physical CPU can be freed up for use by other VMs. At this point displaying the information for the guest server will show it using less than the maximum number of cpus. 5) To increase the number of online vCPUs, run "nova scale cpu up". Assuming the resources are available the hypervisor will allocate a physical CPU and will associate it with the guest server. "guest_scale_agent" will set the lowest-numbered offline vCPU to "online", and will pass the vCPU number to "/usr/sbin/app_scale_helper" (if it exists) for the application to do any special handling that may be required. The behaviour of a scaled-down server during various nova operations is as follows: live migration: server remains scaled-down pause/unpause: server remains scaled-down stop/start: server remains scaled-down evacuation: server remains scaled-down rebuild: server remains scaled-down automatic restart on crash: server remains scaled-down cold migration: server reverts to max vcpus resize: server reverts to max vcpus for the new flavor If a snapshot is taken of a scaled-down server, a new server booting the snapshot will start with the number of vCPUs specified by the flavor. CAVEATS ======= It is possible for the scale-up operation to fail if the worker node has already allocated all of its resources to other guests. If this happens, the system will not do any automatic migration to try to free up resources. Manual action will be required to free up resources. Any CPUs that are handling userspace DPDK/AVP packet processing should not be offlined. It may appear to work, but may lead to packet loss. This can be enforced by setting the wrs:min_vcpus value appropriately high. If hyperthreading is used, the flavor must set hw:cpu_thread_policy to isolate and set cpu_policy to dedicated.