C-state Management Application on StarlingX

This commit introduces the StarlingX specification for the
C-state Management. An application that allows Kubernetes
resources to dynamically control their C-states.

Story: 2011105
Task: 49878

Author: Guilherme Santos <guilherme.santos@windriver.com>
Co-author: Vinicius Lobo <vinicius.rochalobo@windriver.com>

Change-Id: Iebae30c72d94e3d490ecc00a55462aa70fa77516
Signed-off-by: Guilherme Santos <guilherme.santos@windriver.com>
This commit is contained in:
Guilherme Santos 2024-04-17 11:46:00 -03:00 committed by Vinícius Fernando Rocha Lobo
parent 31bf76b1f8
commit 2bd78beba8
1 changed files with 303 additions and 0 deletions

View File

@ -0,0 +1,303 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode
..
Many thanks to the OpenStack Nova team for the Example Spec that formed the
basis for this document.
===========================================
C-state Management Application on StarlingX
===========================================
Storyboard: `#2011105`_
The objective of this spec is to introduce the C-state Management
Application in StarlingX Platform.
Problem description
===================
StarlingX, in its current versions, offers a comprehensive set of features
for power management. Allowing users and applications to control the acceptable
frequency ranges (minimum and maximum frequency) per core; the behavior of
cores in such ranges (governor); which power levels (C-states) a given
core can access, as well as the behavior of the system in the face of
workloads with known intervals/demands. `Kubernetes Power Manager`_ powers
the control of the aforementioned features in targeted CPUs/cores, allowing
individualized configurations.
Currently, the power levels of cores allocated to containerized applications
are assigned to pods when they are deployed, and, they are persisted during
their entire lifecycle. The applications, however, need greater granularity
by controlling their CPU idle states (C-states) in execution time. The
`C-state Management Application` offers a set of endpoints that enable pods to
dynamically consult and adjust their C-states. Therefore, it allows users to
save energy by offering the possibility to idle the cores assigned to its
applications based on pre-defined parameters (traffic, time of day, etc).
Use Cases
---------
With the introduction of these new capabilities for C-state management,
StarlingX end users and deployers gain enhanced control over the CPU core
configurations. These new features are beneficial for optimizing power
consumption and performance.
We identify the following potential impacts to StarlingX's stakeholders with
this dynamic C-state management integration:
* End users: The ability to adjust the maximum C-state level of CPU cores
assigned to pods through REST API requests offers increased flexibility
without disrupting existing workflows. This feature ensures seamless
integration with applications running on StarlingX, enhancing user
experience.
* Deployers: The introduction of dynamic C-state management may necessitate
minor adjustments for deployers, primarily related to ensuring that assigned
CPU cores are appropriately configured as application-isolated. Additionally,
deployers may need to ensure that REST API requests for C-state adjustments
originate from the same node where the application's pods are deployed,
maintaining security and efficiency.
* Developers: The integration of C-state management brings significant
enhancements to the development workflow within StarlingX. By incorporating
a dynamic C-state management functionality, developers gain a more granular
level of control over CPU core configurations, allowing for finer
optimization of power usage and system performance.
Proposed change
===============
The new `C-state Management Application` will be introduced to StarlingX,
resulting in the addition of a REST API that empowers pods to dynamically
control their C-states. When disabled, the application will not add changes to
StarlingX's standard behavior. When enabled, the Kubernetes resources will be
able to programmatically manage their C-state.
`C-state Management Application` essentially provides endpoints that enable the
following functionalities:
* Change the maximum C-state Level of CPU Cores.
* The application, via its REST API, initiates a request to modify the
maximum C-state level of the CPU cores allocated to its pods.
* The assigned CPU cores must adhere to application isolation.
* The request originates from the node on which the application's pods
are deployed.
* Query the Maximum Available C-state Levels.
* The application, through its REST API, sends a request to inquire about
the maximum C-state levels available for modification.
* Query the Maximum C-state Configuration
* The application, utilizing its REST API, requests information regarding
the configured maximum C-state from the node where its pods are currently
deployed.
This specification, also proposes that applications using
`C-state Management Application` API shall be able to execute the following
actions:
* Request the cloud platform to change the maximum C-state level of the CPU
cores assigned to it if they are configured as application-isolated CPUs.
* Query the C-state levels that are available for it to change.
* Request maximum C-state changes from the same node that its pods are
running on.
* Query the C-state configured from the same node that its pods are running on.
* Change only the max C-state level of the cores that are assigned to it.
It is also required that the cloud platform shall be able to:
* Process the C-state level requests (change/query) and respond if the change
occurred or to report the current max c-state level.
* Process the max C-state level requests (change/query) on the Platform
cores, in other words, it shall run the API producer on the Platform cores.
* Fulfill the request to change the max c-state within a granularity of
seconds.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Other end user impact
---------------------
A new REST API will be available, resulting in procedural changes for
dynamically managing C-states on StarlingX. The users should be aware that
the `C-state Management Application` is not designed to work in tandem with
`Kubernetes Power Manager`_. Therefore, we recommend the use of only one of
the aforementioned applications at a time.
Performance Impact
------------------
Given the nature of dynamic C-state management, impacts related to power
consumption and latency are expected to vary based on the usage of
`C-state Management Application`. The following shall be considered:
* Power Consumption: By actively monitoring and controlling the C-states,
applications can optimize power consumption based on workload demands,
reducing the overall energy consumption in the cluster. On the other hand,
an incorrect or inconsistent configuration might lead to performance
degradation.
* Latency: C-States range from C0 to Cn. C0 indicates an active state. All
other C-states (C1-Cn) represent idle sleep states with different parts of
the processor powered down. As the C-States get deeper, the exit latency
duration becomes longer (the time to transition to C0) and the power savings
become greater. This potentially increases the time required for processing
varying workloads based on pre-defined parameters.
Other deployer impact
---------------------
None
Developer impact
----------------
Please see the `Use Cases`_ section.
Upgrade impact
--------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* Guilherme Batista Leite (guilhermebatista)
Other contributors:
* Alyson Deives Pereira (adeivesp)
* Eduardo Juliano Alberti (ealberti)
* Fabio Studyny Higa (fstudyny)
* Guilherme Henrique Pereira dos Santos (gsantos1)
* Vinicius Fernando Rocha Lobo (vrochalo)
Repos Impacted
--------------
* starlingx/docs
* starlingx/config
* starlingx/app-cstate-management (new)
Work Items
----------
The following work items are expected to be carried out, with the understanding
that the storyboard will be updated as more work items are found to be
necessary.
Spikes and Design
*****************
* Basic testing of per-cpu latency specification.
* Review of the proposed design.
* Evaluation of options to reduce latency and expected latency reduction.
Development Work Items
**********************
* Publish proposed HLD.
* Share proof of concept for evaluation and testing
* Provide technical support during testing.
* Merge proof of concept to StarlingX codebase.
* Create FluxCD manifest for C-state DaemonSet.
* Create StarlingX application to wrap the FluxCD manifest.
* Enhance C-state application to support IPv6 addresses.
* Enhance C-state application to prevent modification of CPUs allocated to
other Pods.
* Installation via system application.
Customer Documentation
**********************
* Publish the usage guide for what functionality is available and how to make
use of it.
* Sample code showing how to make use of the functionality.
Dependencies
============
None
Testing
=======
System configuration
--------------------
The tests will be conducted in the following system configurations:
* AIO-SX
* AIO-DX
* Standard
Test Scenarios
--------------
* Functional tests for `C-state Management Application` and its customizations.
* Unit testing the impacted code areas.
* Performance testing to identify and address any performance impacts.
* Backup and restore tests.
Documentation Impact
====================
The end-user documentation must be created, adding a guide to
`C-state Management Application` deployments, configurations and
customizations.
References
==========
#. `Kubernetes Power Manager`_
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - stx-10.0
- Introduced
.. Links
.. _#2011105: https://storyboard.openstack.org/#!/story/2011105
.. _Kubernetes Power Manager: https://github.com/intel/kubernetes-power-manager