Add pods wait time to initial bootstrap play

In latest loads that have kernel update among other code
changes to various StarlingX repos, it is observed that
not all kube-system pods get started before the host
becomes online whereas they consistently did in the same
slow lab in an older load. As a result, the bootstrap
playbook often fails in this slow lab toward the end where
it verifies kube-system pods readiness.

This commit is a follow-up of commit
97181aa756. In this commit, a
30 second pause is applied to initial play to ensure all
pods have been started before executing the task that waits
for them to become ready. The total wait time for replay
remains unchanged at 60 seconds.

Tests:
  Play and replay the bootstrap playbook locally on slow
  hardware.

Closes-Bug: 1831664
Change-Id: I525c7771eafad2b9e79dd89e985696fb16bb5b24
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
This commit is contained in:
Tee Ngo 2019-06-13 22:46:03 -04:00
parent dffbfc8df2
commit 9568d970f1
3 changed files with 5 additions and 7 deletions

View File

@ -15,7 +15,7 @@
# - Prepare admin.conf
# - Set k8s environment variable for new shell
# - Prepare Calico config and activate Calico networking
# - Precare Multus config and activate Multus networking
# - Prepare Multus config and activate Multus networking
# - Prepare SRIOV config and activate SRIOV networking
# - Prepare SRIOV device plugin config and activate SRIOV device plugin
# - Restrict coredns to master node and set anti-affnity (duplex system)

View File

@ -65,13 +65,9 @@
until: online_check.rc == 0
retries: 10
# Don't need to run this task for initial play as it will take a while to pull
# Armada image and additional time to wait for controller-0 to become online
# during which time kube-system pods are all started.
- name: Wait for 60 seconds to ensure kube-system pods are all started
- name: Wait for {{ pods_wait_time }} seconds to ensure kube-system pods are all started
wait_for:
timeout: 60
when: restart_services
timeout: "{{ pods_wait_time }}"
- name: Start parallel tasks to wait for Kubernetes component, Networking and Tiller pods to reach ready state
command: kubectl --kubeconfig=/etc/kubernetes/admin.conf wait --namespace=kube-system --for=condition=Ready pods --selector {{ item }} --timeout=30s

View File

@ -190,6 +190,7 @@
derived_network_params:
place_holder: place_holder
ansible_remote_tmp: "{{ ansible_remote_tmp | default(lookup('ini', 'remote_tmp section=defaults file={{ playbook_dir }}/ansible.cfg')) }}"
pods_wait_time: "{{ pods_wait_time | default(30) }}"
- name: Turn on use_docker_proxy flag
set_fact:
@ -358,6 +359,7 @@
- name: Turn on restart services flag if management/oam/cluster network or docker config is changed
set_fact:
restart_services: true
pods_wait_time: "{{ pods_wait_time|int + 30 }}"
when: reconfigure_endpoints or
docker_config_update or
(prev_cluster_host_subnet != cluster_host_subnet) or