summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTee Ngo <tee.ngo@windriver.com>2019-01-09 12:10:59 -0500
committerTee Ngo <tee.ngo@windriver.com>2019-01-10 13:42:29 -0500
commita76f381204e5ed273385e65d88f1ac5dc6bf65e0 (patch)
tree4e9e0fe5cd100d11401014d745a732a311fe8eef
parent632037e512d6203fa13ebacc851b775c0ee6f4b4 (diff)
Adding spec: Ansible bootstrap deployment
Proposing specification on how the bootstrap and configuration of the initial host can be orchestrated by an Ansible playbook. Story: 2004695 Change-Id: I895768eae975f2b6a880e82db2c0d9e452f8099c Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
Notes
Notes (review): Code-Review+1: Bart Wensley <barton.wensley@windriver.com> Code-Review+2: Brent Rowsell <brent.rowsell@windriver.com> Code-Review+2: Ian Jolliffe <ian.jolliffe@windriver.com> Code-Review+1: Matt Peters <matt.peters@windriver.com> Code-Review+2: Curtis Collicutt <curtis@serverascode.com> Code-Review+2: Shuquan Huang <huang.shuquan@99cloud.net> Code-Review+2: Saul Wold <sgw@linux.intel.com> Code-Review+2: Dean Troyer <dtroyer@gmail.com> Workflow+1: Ian Jolliffe <ian.jolliffe@windriver.com> Verified+2: Zuul Submitted-by: Zuul Submitted-at: Thu, 24 Jan 2019 19:11:01 +0000 Reviewed-on: https://review.openstack.org/629581 Project: openstack/stx-specs Branch: refs/heads/master
-rw-r--r--specs/2019.03/approved/deployment-improvements-2004695-ansible-bootstrap-deployment.rst508
1 files changed, 508 insertions, 0 deletions
diff --git a/specs/2019.03/approved/deployment-improvements-2004695-ansible-bootstrap-deployment.rst b/specs/2019.03/approved/deployment-improvements-2004695-ansible-bootstrap-deployment.rst
new file mode 100644
index 0000000..a1e20cd
--- /dev/null
+++ b/specs/2019.03/approved/deployment-improvements-2004695-ansible-bootstrap-deployment.rst
@@ -0,0 +1,508 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7
8============================
9Ansible Bootstrap Deployment
10============================
11
12Storyboard: https://storyboard.openstack.org/#!/story/2004695.
13
14This spec describes the initial phase of StarlingX deployment improvement
15effort.
16
17Problem description
18===================
19
20The primary controller is currently configured using the ``config_controller``
21Python script which can only be executed on the controller console. The script
22requires input for many networking aspects upfront in order to run both
23bootstrap operations and host configuration to completion. Over time, the
24script logic has grown overly complex to accommodate a plethora of host
25configuration scenarios and so has increased the configuration time.
26
27Furthermore, once all required input configuration parameters have been
28successfully validated, the script will run all its steps. If the script fails
29due to a software issue or a configuration mistake, a re-install will be
30required. It is not possible for the user to apply a software patch and/or
31rerun the script to apply updated configurations.
32
33Use Cases
34=========
35
36* As a developer/tester/operator, I need the ability to configure the
37 controller remotely.
38* As a developer/tester/operator, I need to the ability to modify and
39 reapply configurations during initial host config.
40* As a developer/tester/operator, I need the ability to automate the
41 initial host deployment and build out my system from there.
42* As a developer of StarlingX community, I would like to streamline
43 the initial host config using an industry adopted tool to enable
44 automation and to promote process/code visibility and customization.
45
46Proposed change
47===============
48
49Existing workflow with config_controller (high level)
50-----------------------------------------------------
51**Config_controller:**
52
531. Create bootstrap hiera config
542. Apply bootstrap puppet manifest
553. Persist local configuration
564. Populate initial system inventory
575. Create system hiera config
586. Apply controller puppet manifest
597. Finalize controller configuration
608. Activate all services
61
62**Host-configuration:**
63
64 Manual or scripted configurations required for unlock.
65
66**Host-unlock:**
67
681. Apply controller puppet manifest (and worker, storage puppet manifests
69 for All-in-one)
702. Activate all services
71
72Proposed workflow with Ansible Playbook (high level)
73----------------------------------------------------
74The bootstrap and configuration of the initial host will be orchestrated
75by an Ansible Playbook [1]_.
76
77**Playbook:**
78
791. Apply bootstrap puppet manifest
802. Populate system configuration (with defaults and user-supplied config)
813. Bring up Kubernetes master node and essential services
82
83**Host-configuration:**
84
85 Manual or scripted configurations required for unlock.
86
87**Host-unlock**
88
891. Apply controller puppet manifest (and worker, storage puppet manifests
90 for All-in-one)
912. Activate all services
92
93After phase #2 of the Playbook, the host configuration will resemble
94All-in-one simplex (i.e. defaulting to the loopback interface) until it
95is unlocked for the first time. Interface configuration is being deferred
96to ensure the network connection is not interrupted while the playbook is
97being *played*. Interface reconfiguration will only take effect on unlock
98operations. Previously, this would occur as part of the controller
99manifest apply which has been eliminated.
100
101Scope of the new workflow
102-------------------------
103The new workflow will cover the **initial config** for all supported system
104configurations in a containerized platform.
105
106Bootstrap playbook roles and tasks (high level)
107-----------------------------------------------
108Below is a list of major roles and tasks. The names are deliberately long
109to make them self-explanatory for review purpose. They can be renamed to
110be more terse as role variables should be prefixed with role names.
111During implementation, some roles and tasks will likely be decomposed or
112combined.
113
114Role: validate-config-input
115 * Task: validate-config
116Role: prepare-environment-for-execution
117 * Task: validate-environment
118 * Task: set-environment-variables
119Role: cleanup-environment-after-execution
120 * Task: unset-environment-variables
121 * Task: remove-temp-files
122Role: store-admin-password
123 * Task: validate-password
124 * Task: store-password
125Role: apply-bootstrap-manifest
126 * Task: generate-bootstrap-data
127 * Task: apply-manifest
128Role: populate-initial-config
129 * Task: persist-keyring
130 * Task: set-permanent-puppet-workdir
131 * Task: set-permanent-pxe-configdir
132 * Task: set-postgres-config-for-mate
133 * Task: process-branding-and-banner
134 * Task: populate-system-config
135 * Task: populate-load-config
136 * Task: populate-network-config
137 * Task: populate-controller-config
138 * Task: create-loopback-interface
139 * Task: update-local-dns
140 * Task: update-platform-config-file
141 * Task: add-dns-server
142Role: bring-up-kubernetes-master-and-dependent-services
143 * Task: bring-up-kubernetes-master
144 * Task: bring-up-tiller
145 * Task: bring-up-fault-management
146 * Task: bring-up-maintenance
147 * Task: bring-up-vim
148
149Playbook directory layout
150-------------------------
151The directory layout of the playbook initially could be as follows:
152
153bootstrap.yml
154
155roles/
156 validate-config-input/
157 tasks/
158 main.yml
159 handlers/
160 main.yml
161 files/
162 <scripts, files>
163 vars/
164 main.yml
165 defaults/
166 main.yml
167 meta/
168 main.yml
169
170 prepare-environment-for-execution/
171
172 cleanup-environment-after-execution/
173
174 store-admin-password/
175
176 apply-bootstrap-manifest/
177
178 popupate-initial-config/
179
180 bring-up-Kubernetes-master-and-dependent-services/
181
182Playbook pre_tasks and post_tasks
183---------------------------------
184The pre_tasks and post_tasks can be as simple as marking the start and end
185of the playbook execution.
186
187Running ``bootstrap playbook``
188------------------------------
189ansible-playbook bootstrap.yml -u <named-account-with-sudo-privileges>
190[-K -i <config-input-file> -e <list-of-variable-value-pairs-to-overwrite>
191--ask-vault-password]
192
193The playbook should be run using wrsroot account. However, it can be run using
194another account with sudo privileges if desired provided that the account has
195already been setup beforehand. Many playbook tasks must be run as root.
196The option -K will prompt for privilege escalation password.
197
198Overwriting playbook defaults
199-----------------------------
200The ``bootstrap playbook`` will come with default variables and Ansible
201hosts file /etc/ansible/hosts.yml. These defaults and content of the hosts
202file are meant for running the playbook locally and bootstrapping the initial
203controller for All-in-one simplex in virtual box. In practice, some of these
204defaults will need to be overwritten with user supplied values.
205
206Variables that usually require overwriting are:
207
208* host IP (for running the playbook remotely)
209* system properties
210* Management, OAM, PXE, cluster subnets
211* Default DNS server
212
213There are various ways to overwrite variables in Ansible Playbook.
214
215**Overwrite with configuration input file**
216
217One simple and clean option is to overwrite with -i command line parameter.
218The content of the provided configuration input file must be in YAML format.
219
220The default hosts (Ansible inventory) file will have the following entries:
221
222bootstrap:
223 hosts:
224 local:
225 ansible_connection: local
226
227 vars:
228 ansible_user: wrsroot
229 ansible_become: true
230
231To overwrite the bootstrap host for remote execution and/or user in the custom
232configuration input file:
233
234bootstrap:
235 hosts:
236 remote:
237 ansible_host: '128.224.150.83'
238 ansible_connection: ssh
239
240 vars:
241 ansible_user: wrsroot
242 ansible_become: true
243
244To overwrite the role default variables, one option is to add the list of of
245overwritten variables under ``vars`` section of the configuration input file:
246
247 vars:
248 system_mode: duplex-direct
249 dns_server: 8.8.8.8
250
251**Overwrite with role vars**
252
253Another option to overwrite role defaults is to replace main.yml file under
254``vars`` directory of the corresponding role(s) with custom one(s) before
255running the playbook. This takes precedence over the overwriting method above.
256
257**Overwrite with extra vars**
258
259Command line -e option which has the highest precedence can also be used
260to overwrite defaults. However, this method can be cumbersome if many
261defaults need overwriting and the playbook is run manually.
262
263The list of role defaults as well as the preferred method to overwrite
264these defaults will be documented after the playbook has been developed.
265
266Overwriting sensitive variables
267-------------------------------
268The admin password is a sensitive variable that usually needs to be
269overwritten. To ensure sensitive information is encrypted, sensitive
270variables and values are copied to a vault file and secure using
271ansible-vault encrypt command. The corresponding defaults will need to be
272mapped to the variables in vaulted file using jinja2 syntax.
273
274The command line argument --ask-vault-pass or --vault-password-file will need
275to be supplied when running the playbook with encrypted vault file.
276
277For development/test purposes, these variables can simply be overwritten
278using the command line -e option.
279
280Validating configuration parameters
281-----------------------------------
282The config_controller script has extensive logic to validate config
283parameters in user input file which could be leveraged in
284validate-config-input role of the ``bootstrap playbook``.
285
286Config_controller script changes
287--------------------------------
288Currently this complex script has multiple uses: a) perform initial
289configuration required mainly to bring up the controller services,
290b) backup system configuration, c) restore system configuration from
291backup file, d) clone the image, and e) restore the system from a clone.
292
293The proposed Ansible bootstrap deployment will replace the initial system
294configuration aspect of the script. The script will continue to be used for
295other operations. Relevant code will be removed from the script once the
296implementation of the playbook is complete.
297
298Puppet changes
299--------------
300The initial ``bootstrap playbook`` will leverage the existing Puppet
301bootstrap.pp manifest to bring up the following services that will be
302used by the playbook for the remaining tasks:
303
304**Required services to bring up Kubernetes master:**
305
306* docker
307* etcd
308
309**Required services for host unlock:**
310
311* fm
312* mtcAgent
313* nfv-vim
314
315The puppet .pp and in some cases .py files related to these services and
316Kubernetes will require update.
317
318Sysinv changes
319--------------
320Traditionally, the ``config_controller`` script is provided with all
321required parameters either interactively or via a config file to perform
322both bootstrap operations and host configuration. Networking and storage
323provisioning using system commands beyond this point have certain
324restrictions as the controller manifest has been applied.
325
326With Ansible bootstrap deployment method, some system commands will
327require changes to support manual configuration adjustments and replays of
328the ``bootstrap playbook``. The ``cgtsclient`` will also need minor
329modification to avoid requesting for smapi endpoint which is not yet
330available in this early stage.
331
332Maintenance changes
333-------------------
334Some minor tweaks to maintenance code will be required for maintenance
335Client and Agent to operate properly during the bootstrap phase.
336
337Packaging of ``bootstrap playbook`` in the ISO and SDK
338------------------------------------------------------
339The playbook will be packaged in the ISO as well as SDK to allow
340both local and remote execution.
341
342Alternatives
343============
344
345Additional host configuration roles to support the initial host-unlock
346were considered. However, this would add much of the complex modeling of
347input configuration (i.e. more upfront planning) to the intial deployment step.
348
349Data model impact
350=================
351
352No impact to existing system inventory data model.
353
354REST API impact
355===============
356
357At this time, no REST API impact is anticipated.
358
359Security impact
360===============
361
362The proposal is to make use of Ansible Playbook which is a well adopted
363multi-node configuration and deployment orchestration tool partly due to
364Ansible secure architecture and design.
365
366The scope of the proposed ``bootstrap playbook`` is limited to bringing the
367initial controller to the state where it can be unlocked and allow other
368Kubernetes nodes on an internal cluster network if configured to join.
369
370The Playbook can only be executed remotely over SSH using a named account
371with sudo privileges. Ansible vault will be used to store secrets/private
372information where applicable. As such, no additional security impact is
373introduced.
374
375Other end user impact
376=====================
377
378The user will be expected to interact with the feature using
379ansible-playbook [2]_ and ansible-vault [3]_ commands. The bootstrap deployment
380method will give the user more flexibility to customize and automate
381the deployment.
382
383Once the initial controller is ready to accept system commands and
384Kubernetes master is up, the user can:
385* perform minimum host configurations and unlock the host
386* join other Kubernetes nodes and perform more extensive custom
387configurations before the unlock
388
389The playbook can be replayed to update system properties and general
390networking information. It will not be playable after the host is unlocked.
391
392Performance Impact
393==================
394
395Ansible execution overhead is unknown at this time. However, as the
396controller manifest application and services activation steps are deferred
397till host-unlock, the time to bring the controller to unlock-ready state
398should be significantly faster than with the traditional method.
399
400Other deployer impact
401=====================
402
403None
404
405Developer impact
406================
407
408See end user impact.
409
410The developers can extend the ``bootstrap playbook`` with custom host
411configuration role(s) or another playbook to suit their specific needs.
412
413Upgrade impact
414==============
415
416None as this is the initial release of Bootstrap Deployment using
417Ansible Playbook.
418
419Implementation
420==============
421
422Assignee(s)
423===========
424
425Primary assignee:
426
427* Tee Ngo (teewrs)
428
429Other contributors:
430
431* Eric McDonald (emacdona)
432
433Repos Impacted
434==============
435
436* stx-config
437* stx-metal
438* stx-root
439* stx-docs
440
441Work Items
442==========
443
444* Modify maintenance to enable maintenance operations during bootstrap
445 phase.
446* Modify sysinv and cgtsclient to be more flexible with configuration
447 updates during bootstrap deployment using either system commands or APIs.
448* Modify puppet classes and python scripts to allow launching a limited
449 number of services required for bootstrap operations and initial host
450 unlock.
451* Create a ``bootstrap`` Playbook to bring up Kubernetes master node and
452 configure the primary controller based on default and user-supplied config
453 parameters.
454* Package the Playbook as part of the ISO & SDK to allow both on premise
455 and remote execution.
456* Make other necessary changes to support primary controller configuration
457 using either the playbook or traditional config_controller until the
458 transition is complete. This includes lab setup tool changes.
459
460
461Dependencies
462============
463
464* config_controller script
465* Ansible [4]_
466* Containerized OpenStack based deployment
467
468Testing
469=======
470
471This story changes the way StarlingX system is deployed, specifically
472how the primary controller is configured, which will require changes in
473existing automated installation and lab setup tools.
474
475The system deployment tests will be limited to All-in-one simplex,
476All-in-one duplex, and Standard configurations. Deployment tests for
477Region and Distributed Cloud configurations are deferred until the support
478for these configurations in a containerized OpenStack based platform is
479available. At which point, either the ``bootstrap playbook`` will be
480extended with additional roles or with new playbook(s) to process steps in
481``config_region`` and ``config_subcloud``. This will be documented either
482in a later version of this spec or in a separate spec.
483
484Documentation Impact
485====================
486
487This story affects the StarlingX installation and configuration
488documentation. Specific details of the documentation changes will be
489addressed once the implementation is complete.
490
491References
492==========
493
494.. [1] https://docs.ansible.com/ansible/2.7/user_guide/playbooks.html
495.. [2] https://docs.ansible.com/ansible/2.7/cli/ansible-playbook.html
496.. [3] https://docs.ansible.com/ansible/2.7/cli/ansible-vault.html
497.. [4] https://docs.ansible.com/ansible/2.7/index.html
498
499History
500=======
501
502.. list-table:: Revisions
503 :header-rows: 1
504
505 * - Release Name
506 - Description
507 * - TBD
508 - Introduced