From 1ed103250b3d6df2bae92b0cee794bfce835d7af Mon Sep 17 00:00:00 2001 From: Greg Waines Date: Tue, 20 Nov 2018 12:57:44 -0500 Subject: [PATCH] Create Spec: StarlingX - Distributed Cloud - Synchronized Keystone As agreed upon within Edge-Computing meetings, this specification proposes an additional Identity solution for the Edge Reference Architecture; i.e. a 'Synchronized Keystone' solution. This solution addresses Edge-Computing Use Cases where full autonomy is required on network connectivity loss but without the overhead of running an Identity Provider (IDP) presence at each Edge Cloud site. Change-Id: Ie60c324e01c23b262336ce24c481e359c5bd61d7 Signed-off-by: Greg Waines --- ...distcloud-2002842-synchronizedKeystone.rst | 545 ++++++++++++++++++ 1 file changed, 545 insertions(+) create mode 100644 specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst diff --git a/specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst b/specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst new file mode 100644 index 0000000..76cf352 --- /dev/null +++ b/specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst @@ -0,0 +1,545 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + +.. + Many thanks to the OpenStack Nova team for the Example Spec that formed the + basis for this document. + +========================================= +Distributed Cloud - Synchronized Keystone +========================================= + +| Storyboard: https://storyboard.openstack.org/#!/story/2002842 +| ( Distributed Cloud Keystone Scalability ) +| + +The OpenStack Edge-Computing group has defined an Edge Reference Architecture. +For Identity Management, it uses Federated Keystone to manage Identity across +all Edge Clouds. If 'full autonomy' is required at Edge Clouds, this requires +a Distributed Identity Provider Solution with an Identity Provider (IDP) +presence at every Edge Cloud. + +The Federated Keystone solution makes sense where: + +* Integration with an existing IDP infrastructure is already required, +* In large deployments that would benefit from distributed IDP solutions, +* Where partial autonomy is acceptable in the presence of edge cloud isolation + or +* The cost of hosting an IDP presence at every Edge Cloud is acceptable for + full autonomy. + +The OpenStack Edge-Computing group recognizes that there is more than a +'one-size-fits-all' architecture for the Edge. As agreed upon within +the OpenStack Edge-Computing meetings, this specification proposes an +additional Identity solution for the Edge Reference Architecture; i.e. a +'Synchronized Keystone' solution. In the Synchronized Keystone solution, +a Synchronization Framework synchronizes the Identity Resources of a Central +Cloud to all of the Edge Clouds. + +Synchronized Keystone provides an Identity solution for the edge where : + +* a simpler standalone Identity solution can be used for the edge cloud + deployments, and +* the edge cloud sites are compute-power-limited deployments, e.g. small + All-In-One (AIO) simplex / duplex servers, where the cost of hosting + an IDP presence in support of full autonomy is too high. + +Problem description +=================== + +In a distributed edge cloud environment, with 100s or 1000s of edge cloud +sites, the centralized orchestration of cloud services across all the edge +cloud sites is imperative for operational usability. This specification +deals specifically with the centralized orchestration of the Identity Cloud +Service across all the edge cloud sites. + +For the Identity Cloud Service, in a distributed edge cloud environment, it is +desired to support the same set of Users and Projects across all edge clouds. +I.e. At any edge cloud, be able to login with the same User name and Project +name, using the same authentication credentials and getting the same +authorization capabilities and roles. + +Note that for some use cases, network connectivity between the edge cloud and +the central cloud is not reliable. The Identity Cloud Service at the edge +cloud must be fully autonomous in the event of network connectivity loss to +the central cloud. I.e. both Service Users as well as Tenant Users must +continue to be able to authenticate and be authorized when the edge cloud is +isolated from the central cloud. + +This specification also enables an optimization for orchestration scalability +in the distributed edge cloud environment. The orchestration of services +across all edge clouds requires authentication, typically of the same user, +across 100s/1000s of edge clouds. With the Identity Service's Users and +Projects now synchronized across all edge clouds, then by additionally +synchronizing Fernet Keys across all edge clouds, an authenticated Fernet +Token generated at the Central Cloud can be used at any or all edge clouds; +reducing the 100s or 1000s of authentication operations to a single +authentication. + +Use Cases +========= + +The requirement for common Identity Users and Projects across all edge clouds +applies to all Edge Computing Use Cases. + +The Use Cases that require full autonomy of edge clouds (in the event of edge +cloud isolation) are Use Cases where: + +* There are both + + * Remote Physical users (at a central cloud site) and + * Local physical users (at edge cloud sites). + +* All 'userids' are centrally managed for security reasons, +* At the edge cloud site, + + * When connectivity to central cloud is lost + + * local edge users must be able to manage their edge cloud and workloads on + the edge cloud, + * ... using their normal userid credentials. + +Examples of such Use Cases are: + +* Management of Retail Chains (e.g. Walmart) +* Large Hospital Campus +* Large Control Plant + +These are also Use Cases where the simplicity of a standalone Identity solution +for the edge would be desirable. + +Background +========== + +The Distributed Cloud (DC) sub-project within StarlingX, already supports a +Synchronization Framework which is used to synchronize Nova, Neutron, Cinder +and StarlingX resources from the Central Cloud to all of the Edge Clouds. + +This Synchronization Framework provides: + +* Synchronization Request Management + + * Managing Synchronization Request Message Queues per Edge Cloud, + * With retry on failure. + +* The Overall Synchronization Audit Sequencing, +* Connectivity Status tracking for Edge Clouds, and +* Synchronization Status tracking for Edge Clouds. + +For the existing framework, each Service being synchronized implements the +following within the Synchronization Framework: + +* an API Proxy + + * For intercepting Service API calls in order to trigger immediate + synchronization to Edge Clouds, + +* a DC Orchestration Module + + * For Service-specific details of Service API Request building and auditing, + * For managing the mapping of resources in each subcloud to the canonical + resource in the central cloud, and + * (in future) for dealing with any API / Schema differences between Central + Cloud and Edge Cloud (e.g. in Software Upgrade scenario). + +Currently the existing Synchronization Framework supports REST API -based +synchronization of a Service's resources. + +For OpenStack Keystone, a REST API -based synchronization approach will not +work since not all details of Keystone resources are exposed thru Keystone's +REST APIs, e.g.: + +* User-IDs and Project-IDs can NOT be set on POST + (required to be synchronized so that Fernet Tokens can be used on any/all + edge clouds) +* Revocation events, generated internally by Keystone to track events that + affect token validity, are NOT exposed via Keystone REST API, + +Proposed change +=============== + +Synchronization Framework Support for Keystone DB-based Synchronization +----------------------------------------------------------------------- + +This specification proposes enhancing the StarlingX's Distributed Cloud's +Synchronization Framework to support DB-based synchronization of a Service's +resources. + +I.e. use the existing Synchronization Framework in order to leverage the +existing retry mechanisms, audit mechanisms, synch status tracking, etc., +but in this case, the Service Module within the 'DC Orchestration Engine' +would synchronize DB Records by: + +* Directly querying/setting the Services' DB, and +* Using a new (admin-only) StarlingX DC DB SYNC Service and its REST API + on the StarlingX Edge Cloud which exposes the DB operations remotely + for synchronization purposes. + +The Service's API Proxy triggers an immediate DB sync of the affected row(s) +of the Service's DB table(s), due to particular API request, while the +Synchronization Framework's Audit Mechanism (default every 10 mins) deals +with non-API events, unexpected events and/or errors to ensure required DB +Table(s) are in-sync. + +The following Keystone resources will be synchronized with this method: +Users, Passwords, Projects, Roles, Role Assignments and Token Revocation +Events. + +Synchronization of Fernet Keys +------------------------------ + +This specification also proposes enhancing the StarlingX's Distributed +Cloud's Synchronization Framework to support API-based synchronization of +the Fernet Key Repo. + +New REST APIs for bulk synching of the Fernet Key Repo, updating the Fernet +Key Repo (on rotation of keys) and auditing of the Fernet Key Repo are +added to the STX-CONFIG service. + +The Synchronization Framework will be extended to support Fernet Key Repo +synchronization thru the STX-CONFIG service; adding a Fernet Key Manager to +the STX-CONFIG DC Orchestration Module for managing the Fernet Key Repo +synchronization messaging done by the Synchronization Framework. + +Alternatives +============ + +An alternative solution considered for synchronizing keystone would be to use +built-in DB synchronization of open-source DBs used within StarlingX for +the OpenStack Service DBs. I.e. use the built-in DB Synchronization +capabilities of mariaDB or postgresDB, both of which support replication +of DB Tables from a single R/W Master to multiple ReadOnly Slaves. + +However, the built-in DB synchronization solutions of mariaDB or postgresDB, +do NOT support the ability of handling different DB Schemas in the Central +Cloud and Edge Clouds; i.e. required for Software Upgrade scenarios, or even +just a heterogeneous mix of openstack-versioned edge clouds. + +Data model impact +================= + +There are no DB Model changes required to any Services. + +REST API impact +=============== + +Synchronization Framework Support for Keystone DB-based Synchronization +----------------------------------------------------------------------- + +The following REST APIs were added to the STX-DISTCLOUD service to support +DB-based synchronization of Services between the Central Cloud and the +Edge Clouds: + +NOTE: These are public REST APIs in the sense that the Central Cloud +will use these REST APIs to synchronize data to the Edge Clouds. HOWEVER +these REST APIs are NOT intended to be used by an end user. + +* GET /v1.0/identity/users + + * Description: DB SYNC List all identity users + * Normal Reponse Codes: 200 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Response Parameters: + + * < all users of the Keystone DB Table > + + * < all the attributes of the Keystone User DB Table > + +* GET /v1.0/identity/users/ + + * Description: DB SYNC Get specific identity user + * Normal Reponse Codes: 200 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Response Parameters: + + * < all the attributes of the Keystone User DB Table > + +* POST /v1.0/identity/users + + * Description: DB SYNC create identity user (and password) + * Normal Reponse Codes: 201 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Request Parameters: + + * < all the attributes of the Keystone User DB Table > + +* PUT /v1.0/identity/users/ + + * Description: DB SYNC update identity user (and password) + * Normal Reponse Codes: 202 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Request Parameters: + + * < all the attributes of the Keystone User DB Table > + + +... and similarly for the other Keystone DB Resources + +* GET /v1.0/identity/projects +* GET /v1.0/identity/projects/ +* POST /v1.0/identity/projects +* PUT /v1.0/identity/projects/ + +| + +* GET /v1.0/identity/assignments +* GET /v1.0/identity/assignments/ +* POST /v1.0/identity/assignments +* PUT /v1.0/identity/assignments/ + +| + +* GET /v1.0/identity/token-revocation-events +* GET /v1.0/identity/token-revocation-events/ +* POST /v1.0/identity/token-revocation-events + +Synchronization of Fernet Keys +------------------------------ + +The following REST APIs were added to the STX-CONFIG service to support +synchronization of Fernet Key Repo between the Central Cloud and the +Edge Clouds: + +NOTE: These are public REST APIs in the sense that the Central Cloud +will use these REST APIs to synchronize data to the Edge Clouds. HOWEVER +these REST APIs are NOT intended to be used by an end user. + +* POST /v1/fernet_repo + + * Description: Distribute fernet repo + * Normal Reponse Codes: 201 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Request Parameters: + + * Content-Type application/json + + * Style: Plain + * Type: Xsd:String + * Description: The list of Fernet Keys. + +* PUT /v1/fernet_repo + + * Description: Update fernet repo with keys + * Normal Reponse Codes: 202 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Request Parameters: + + * Content-Type application/json + + * Style: Plain + * Type: Xsd:String + * Description: The list of Fernet Keys. + +* GET /v1/fernet_repo + + * Description: List contents of fernet_repo (the keys) + * Normal Reponse Codes: 200 + * Error Response Codes: computeFault (400, 500, …), + serviceUnavailable (503), badRequest (400), unauthorized (401), + forbidden (403), badMethod (405), overLimit (413), badMediaType (415) + * Response Parameters: + + * Fernet_keys + + * Style: Plain + * Type: Xsd:List + * Description: The list of fernet keys + +Security impact +=============== + +This work only impacts security in a Distributed Cloud environment. + +In a Distributed Cloud environment, this work directly manipulates Identity +data by synchronizing selected Keystone resources and Fernet Keys between +the Central Cloud and the Edge Clouds. + +The only external impact is that in a Distributed Cloud environment, +a Token created on any Cloud (Central or Edge) can be used on any or +all Clouds (Central or Edge). + +Other end user impact +===================== + +This work only impacts end user in a Distributed Cloud environment. + +In a Distributed Cloud environment, a user can indirectly interact with the +feature when using ANY OpenStack Service API across Edge Clouds by +leveraging the fact that a Token created on the Central Cloud can be +used on any or all Edge Clouds. + +In a Distributed Cloud environment, in an edge cloud network isolation +scenario, an end user, local to the edge site, can now login / authenticate +with his normal userid and credentials and manage his workloads. + +Performance Impact +================== + +This work only impacts performance in a Distributed Cloud environment. + +Overall there is a reduced amount of synchronization messaging between +the Central Cloud and the Edge Clouds in a Distributed Cloud Environment. + +Logically more data is being synchronized; i.e. Fernet Keys and selected +Keystone DB Resources, in addition to the existing selected STX, Nova, +Neutron and Cinder DB Resources. However with the ability to use a +single Token, generated on the Central Cloud, for ALL Edge Cloud +synchronization messages, this drastically reduces the Synchronization +Framework messaging. + +Other deployer impact +===================== + +There are no deployer impacts with this work. + +Developer impact +================= + +In a Distributed Cloud environment, developers implementing new services +that orchestrate across all Edge Clouds should leverage the fact that +a Token created on the Central Cloud can be used on ANY / ALL Edge Clouds, +in order to reduce their messaging impact on the system. + + +Upgrade impact +=============== + +In a Distributed Cloud environment, there are upgrade impacts with this work; +i.e. when upgrading from OpenStack Version N to OpenStack Version N+1. + +This work is sensitive to any Keystone DB Model changes. However the +architecture of the DB-based synchronization within the StarlingX +Distributed Cloud Synchronization Framework does support the ability +to manage DB Schema changes between the Central Cloud and the Edge Cloud. +This was one of the major reasons for choosing this approach. + +The plan for Software Upgrades (from one OpenStack Version to another), in +a Distributed Cloud environment, is that the Central Cloud will be +upgraded first to version N+1, and then the Edge Clouds. + +If the Keystone DB Schema changes between version N and version N+1, +the N+1 version of Distributed Cloud Synchronization Framework must +implement the Keystone DB Schema conversions between N+1 and N, +for all synchronization messages during the Rolling Software Upgrade +across the entire Distributed Cloud system. + +Implementation +============== + +Assignee(s) +=========== + +Primary assignee: + Andy Ning + +Other contributors: + Tao Liu + +Repos Impacted +============== + +Repositories in StarlingX that are impacted by this spec: + +* stx-distcloud + +Work Items +=========== + +Synchronization Framework Support for Keystone DB-based Synchronization +----------------------------------------------------------------------- + +* Introduce dbsync agent/api on sub cloud, and add it to starlingx as a new + service, +* REST APIs between dcorch engine and dbsync agent (POST/PUT/GET), +* Implement dbsync client to wrap dbsync APIs into python functions, +* Enhance identity module within dcorch engine to do DB based resource + synchronization, +* Enhance identity module within dcorch engine to do DB based resource audit, +* Add new resources to be synced (token revocation events), + + * NOTE: that current code is synching users, passwords, projects, roles and + role assignments ... albeit using API-based synchronization, + +* Deployment and configuration of new StarlingX DistCloud Services, +* Unit test. + + +Synchronization of Fernet Keys +------------------------------ + +* Add new stx-config APIs (POST) for central cloud to distribute fernet repo + including RPC between stx-config API and conductor, +* Add new stx-config APIs (GET) for central cloud to audit existing keys + including RPC between stx-config API and conductor, +* Add new stx-config APIs (PUT) for central cloud to update repo with keys + including RPC between stx-config API and conductor, +* stx-config internally, safely retrieve and update fernet keys, +* Enhance stx-distcloud orch engine (or cron job) to rotate keys and + call stx-config APIs to distribute new keys, +* Enhance stx-distcloud orch engine to audit fernet keys across managed + sub clouds, and call stx-config APIs to distribute keys if mis-matches found, +* Enhance dc manager to trigger key distribution when a sub cloud becomes + managed, +* Add logic to stx-config to empty and re-setup fernet repo locally when + receive an empty POST, +* stx-config/stx-metal/stx-distcloud unit test (Tox), +* Manifest for fernet repo and keys creation during deployment may not need + any changes on both central cloud and sub clouds. + +Dependencies +============ + +There are no external dependencies for this work. + +I.e. there are NO requirements on changes to OpenStack Keystone. + +Testing +======= + +Need to do explicit testing of Fernet Token synchronization and Keystone +DB Resource synchronization between Central Cloud and Edge Clouds. + +Need to do COMPLETE regression of StarlingX Distributed Cloud (DC) +functionality. + +Should qualitatively evaluate performance / messaging scalability +improvements before and after this work. + +Need to do a SANITY regression of StarlingX in an NON-DC environment. + +Documentation Impact +==================== + +Currently there is no documentation on the StarlingX Distributed Cloud +functionality. When this documentation is created, the work of this +specification should be described at a functional level. + +References +========== + +None. + + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - 19.03 + - Introduced