From 2bd78beba8aaef4ebaa2a5ca0290d4a8887bef5e Mon Sep 17 00:00:00 2001 From: Guilherme Santos Date: Wed, 17 Apr 2024 11:46:00 -0300 Subject: [PATCH] C-state Management Application on StarlingX This commit introduces the StarlingX specification for the C-state Management. An application that allows Kubernetes resources to dynamically control their C-states. Story: 2011105 Task: 49878 Author: Guilherme Santos Co-author: Vinicius Lobo Change-Id: Iebae30c72d94e3d490ecc00a55462aa70fa77516 Signed-off-by: Guilherme Santos --- .../starlingx-2011105-cstate-management.rst | 303 ++++++++++++++++++ 1 file changed, 303 insertions(+) create mode 100644 doc/source/specs/stx-10.0/approved/starlingx-2011105-cstate-management.rst diff --git a/doc/source/specs/stx-10.0/approved/starlingx-2011105-cstate-management.rst b/doc/source/specs/stx-10.0/approved/starlingx-2011105-cstate-management.rst new file mode 100644 index 0000000..e9af383 --- /dev/null +++ b/doc/source/specs/stx-10.0/approved/starlingx-2011105-cstate-management.rst @@ -0,0 +1,303 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + +.. + Many thanks to the OpenStack Nova team for the Example Spec that formed the + basis for this document. + +=========================================== +C-state Management Application on StarlingX +=========================================== + +Storyboard: `#2011105`_ + +The objective of this spec is to introduce the C-state Management +Application in StarlingX Platform. + +Problem description +=================== + +StarlingX, in its current versions, offers a comprehensive set of features +for power management. Allowing users and applications to control the acceptable +frequency ranges (minimum and maximum frequency) per core; the behavior of +cores in such ranges (governor); which power levels (C-states) a given +core can access, as well as the behavior of the system in the face of +workloads with known intervals/demands. `Kubernetes Power Manager`_ powers +the control of the aforementioned features in targeted CPUs/cores, allowing +individualized configurations. + +Currently, the power levels of cores allocated to containerized applications +are assigned to pods when they are deployed, and, they are persisted during +their entire lifecycle. The applications, however, need greater granularity +by controlling their CPU idle states (C-states) in execution time. The +`C-state Management Application` offers a set of endpoints that enable pods to +dynamically consult and adjust their C-states. Therefore, it allows users to +save energy by offering the possibility to idle the cores assigned to its +applications based on pre-defined parameters (traffic, time of day, etc). + +Use Cases +--------- + +With the introduction of these new capabilities for C-state management, +StarlingX end users and deployers gain enhanced control over the CPU core +configurations. These new features are beneficial for optimizing power +consumption and performance. + +We identify the following potential impacts to StarlingX's stakeholders with +this dynamic C-state management integration: + +* End users: The ability to adjust the maximum C-state level of CPU cores + assigned to pods through REST API requests offers increased flexibility + without disrupting existing workflows. This feature ensures seamless + integration with applications running on StarlingX, enhancing user + experience. + +* Deployers: The introduction of dynamic C-state management may necessitate + minor adjustments for deployers, primarily related to ensuring that assigned + CPU cores are appropriately configured as application-isolated. Additionally, + deployers may need to ensure that REST API requests for C-state adjustments + originate from the same node where the application's pods are deployed, + maintaining security and efficiency. + +* Developers: The integration of C-state management brings significant + enhancements to the development workflow within StarlingX. By incorporating + a dynamic C-state management functionality, developers gain a more granular + level of control over CPU core configurations, allowing for finer + optimization of power usage and system performance. + +Proposed change +=============== + +The new `C-state Management Application` will be introduced to StarlingX, +resulting in the addition of a REST API that empowers pods to dynamically +control their C-states. When disabled, the application will not add changes to +StarlingX's standard behavior. When enabled, the Kubernetes resources will be +able to programmatically manage their C-state. + +`C-state Management Application` essentially provides endpoints that enable the +following functionalities: + +* Change the maximum C-state Level of CPU Cores. + + * The application, via its REST API, initiates a request to modify the + maximum C-state level of the CPU cores allocated to its pods. + * The assigned CPU cores must adhere to application isolation. + * The request originates from the node on which the application's pods + are deployed. + +* Query the Maximum Available C-state Levels. + + * The application, through its REST API, sends a request to inquire about + the maximum C-state levels available for modification. + +* Query the Maximum C-state Configuration + + * The application, utilizing its REST API, requests information regarding + the configured maximum C-state from the node where its pods are currently + deployed. + + +This specification, also proposes that applications using +`C-state Management Application` API shall be able to execute the following +actions: + +* Request the cloud platform to change the maximum C-state level of the CPU + cores assigned to it if they are configured as application-isolated CPUs. + +* Query the C-state levels that are available for it to change. + +* Request maximum C-state changes from the same node that its pods are + running on. + +* Query the C-state configured from the same node that its pods are running on. + +* Change only the max C-state level of the cores that are assigned to it. + +It is also required that the cloud platform shall be able to: + +* Process the C-state level requests (change/query) and respond if the change + occurred or to report the current max c-state level. + +* Process the max C-state level requests (change/query) on the Platform + cores, in other words, it shall run the API producer on the Platform cores. + +* Fulfill the request to change the max c-state within a granularity of + seconds. + +Alternatives +------------ + +None + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Security impact +--------------- + +None + +Other end user impact +--------------------- + +A new REST API will be available, resulting in procedural changes for +dynamically managing C-states on StarlingX. The users should be aware that +the `C-state Management Application` is not designed to work in tandem with +`Kubernetes Power Manager`_. Therefore, we recommend the use of only one of +the aforementioned applications at a time. + +Performance Impact +------------------ + +Given the nature of dynamic C-state management, impacts related to power +consumption and latency are expected to vary based on the usage of +`C-state Management Application`. The following shall be considered: + +* Power Consumption: By actively monitoring and controlling the C-states, + applications can optimize power consumption based on workload demands, + reducing the overall energy consumption in the cluster. On the other hand, + an incorrect or inconsistent configuration might lead to performance + degradation. + +* Latency: C-States range from C0 to Cn. C0 indicates an active state. All + other C-states (C1-Cn) represent idle sleep states with different parts of + the processor powered down. As the C-States get deeper, the exit latency + duration becomes longer (the time to transition to C0) and the power savings + become greater. This potentially increases the time required for processing + varying workloads based on pre-defined parameters. + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +Please see the `Use Cases`_ section. + +Upgrade impact +-------------- + +None + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + +* Guilherme Batista Leite (guilhermebatista) + +Other contributors: + +* Alyson Deives Pereira (adeivesp) +* Eduardo Juliano Alberti (ealberti) +* Fabio Studyny Higa (fstudyny) +* Guilherme Henrique Pereira dos Santos (gsantos1) +* Vinicius Fernando Rocha Lobo (vrochalo) + +Repos Impacted +-------------- + +* starlingx/docs +* starlingx/config +* starlingx/app-cstate-management (new) + + +Work Items +---------- + +The following work items are expected to be carried out, with the understanding +that the storyboard will be updated as more work items are found to be +necessary. + +Spikes and Design +***************** + +* Basic testing of per-cpu latency specification. +* Review of the proposed design. +* Evaluation of options to reduce latency and expected latency reduction. + +Development Work Items +********************** + +* Publish proposed HLD. +* Share proof of concept for evaluation and testing +* Provide technical support during testing. +* Merge proof of concept to StarlingX codebase. +* Create FluxCD manifest for C-state DaemonSet. +* Create StarlingX application to wrap the FluxCD manifest. +* Enhance C-state application to support IPv6 addresses. +* Enhance C-state application to prevent modification of CPUs allocated to + other Pods. +* Installation via system application. + +Customer Documentation +********************** + +* Publish the usage guide for what functionality is available and how to make + use of it. +* Sample code showing how to make use of the functionality. + +Dependencies +============ + +None + +Testing +======= + +System configuration +-------------------- +The tests will be conducted in the following system configurations: + +* AIO-SX +* AIO-DX +* Standard + +Test Scenarios +-------------- + +* Functional tests for `C-state Management Application` and its customizations. +* Unit testing the impacted code areas. +* Performance testing to identify and address any performance impacts. +* Backup and restore tests. + +Documentation Impact +==================== + +The end-user documentation must be created, adding a guide to +`C-state Management Application` deployments, configurations and +customizations. + +References +========== +#. `Kubernetes Power Manager`_ + + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - stx-10.0 + - Introduced + +.. Links +.. _#2011105: https://storyboard.openstack.org/#!/story/2011105 +.. _Kubernetes Power Manager: https://github.com/intel/kubernetes-power-manager