Commit Graph

11 Commits

Author SHA1 Message Date
Scott Little 23a41191c1 Relocated some packages to repo 'utilities'
List of relocated subdirectories:

pm-qos-mgr
worker-utils

Story: 2006166
Task: 35687
Depends-On: I665dc7fabbfffc798ad57843eb74dca16e7647a3
Change-Id: I63df9a59a8a409ab4b700b76fd4d39acb6ab0ed7
Signed-off-by: Scott Little <scott.little@windriver.com>
Depends-On: Ie6fc7b2a185168424cb6158e817b6e240af89d5e
2019-09-05 15:35:20 -04:00
Bin Yang a30e09bb87 add get_platform_cpus
Add helper function for platform cpu number calculation

Change-Id: I6f656594e80fb067794cc14ba8f01db84585d198
Partial-Bug: 1834796
Signed-off-by: Bin Yang <bin.yang@intel.com>
2019-08-01 11:23:22 +08:00
Jim Gauld 696f987a17 AIO reaffine DRBD tasks during startup
This will speedup the initial DRBD sync on AIO when there are limited
number of platform cores by reaffining DRBD tasks to use all cpus.

This enhances affine-tasks init script to dynamically reaffine CPU
intensive DRBD tasks. The receiver threads (i.e., drbd_r_*)
may use a full core each. On systems with fast disk, we notice the
receiver threads and softirq processing get CPU limited by the
number of platform cores configured.

The DRBD receiver tasks are reaffined initially to float across all
cores. This will poll for newly created DRBD resources and reaffine
them as they are found until all DRBD resources have started.

This script waits for sufficient platform readiness criteria. Once the
system is at steady-state, this will ensure that DRBD tasks are
constrained to platform cores and do not run on cores with
VMs/containers. The DRBD configuration file affinity option is left
as-is in case the DRBD kernel threads are restarted for some reason.

Change-Id: I019137ea1cf3736768ad8882bd8d8628cc5c2857
Closes-Bug: 1832781
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2019-07-17 16:47:05 -04:00
Jim Gauld dba4175523 AIO reaffine tasks and k8s-infra during startup
This update reimplements the affine-tasks init script and service to
dynamically reaffine tasks and k8s-infra cgroup cpuset on AIO nodes.
This accomodates CPU intensive phases of work. Tasks are initially
allowed to float across all cores. Once system is at steady-state,
this will ensure that K8S pods are constrained to platform cores and
do not run on cores with VMs/containers.

This will speedup the first stx-application apply, as well as pod
recovery after lock/unlock, reboot, and controller swact.

This script waits forever for sufficient platform readiness criteria
(e.g., system critical pods are recovered, critical openstack pods
are running, nova-compute pod is running) before reaffining back
to platform cores.

This corrects the pod affinity problem seen on AIO introduced by fix
for bug: 1826592, commit e513baad44,
i.e., fix allowed the AIO to not timeout, but left pods floating.

Change-Id: Ic257378eac451904a200a0f2e79f7bc4f8373009
Partial-Bug: 1832781
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2019-07-16 12:46:30 -04:00
Zuul 50574cd7b7 Merge "Add to worker-utils scripts the LSB header" 2019-06-13 12:50:12 +00:00
Marcela Rosales 4aff0a9cea Add to worker-utils scripts the LSB header
It is required for worker-utils' scripts to have LSB headers by OBS
infrastructure to build the RPM packages for openSUSE.

Change-Id: I92936f7e2c6fe80a825d23ae8f333e406c45e6d4
Story: 2005679
Task: 33679
Signed-off-by: Marcela Rosales <marcela.a.rosales.jimenez@intel.com>
2019-06-12 15:17:33 -05:00
Marcela Rosales f8a0ce059e Remove execution permissions from worker_reserved.conf in worker-utils
worker_reserved.conf is a configuration file, it should not be
installed with execution permissions in the Makefile. This is
causing an error for openSUSE packaging.

Change-Id: I26b0ecc0266600ee1a9d2eb1fbc3b7b79b6d37d9
Story: 2005679
Task: 33676
Signed-off-by: Marcela Rosales <marcela.a.rosales.jimenez@intel.com>
2019-06-12 12:33:58 -05:00
Jim Gauld 209e346ab4 Container pinning on worker nodes and All-in-one servers
This story will pin the infrastructure and openstack pods to the
platform cores for worker nodes and All-in-one servers.

This configures systemd system.conf parameter
CPUAffinity=<platform_cpus> by generating
/etc/systemd/system.conf.d/platform-cpuaffinity.conf .
All services launch tasks with the appropriate cpu affinity.

This creates the cgroup called 'k8s-infra' for the following subset
of controllers ('cpuacct', 'cpuset', 'cpu', 'memory', 'systemd').
This configures custom cpuset.cpus (i.e., cpuset) and cpuset.mems
(i.e., nodeset) based on sysinv platform configurable cores. This is
generated by puppet using sysinv host cpu information and is stored
to the hieradata variables:
- platform::kubernetes::params::k8s_cpuset
- platform::kubernetes::params::k8s_nodeset

This creates the cgroup called 'machine.slice' for the controller
'cpuset' and sets cpuset.cpus and cpuset.mems to the parent values.
This prevents VMs from inheriting those settings from libvirt.

Note: systemd automatically mounts cgroups and all available
resource controllers, so the new puppet code does not need to do
that.

Kubelet is now launched with --cgroup-root /k8s-infra by configuring
kubeadm.yaml with the option: cgroupRoot: "/k8s-infra" .

For openstack based worker nodes including AIO
(i.e., host-label openstack-compute-node=enabled):
- the k8s cpuset and nodeset include the assigned platform cores

For non-openstack based worker nodes including AIO:
- the k8s cpuset and nodeset include all cpus except the assigned
  platform cores. This will be refined in a later update since
  we need isolate cpusets of k8s infrastructure from other pods.

The cpuset topology can be viewed with the following:
 sudo systemd-cgls cpuset

The task cpu affinity can be verified with the following:
 ps-sched.sh

The dynamic affining of platform tasks during start-up is disabled,
that code requires cleanup, and likely no longer required
since we are using systemd CPUAffinity and cgroups.

This includes a few small fixes to enable testing of this feature:
- facter platform_res_mem was updated to not require 'memtop', since
  that depends on existance of numa nodes. This was failing on QEMU
  environment when the host does not have Numa nodes. This occurs
  when there is no CPU topology specified.
- cpumap_functions.sh updated parameter defaults so that calling
  bash scripts may enable 'set -u' undefined variable checking.
- the generation of platform_cpu_list did not have all threads.
- the cpulist-to-ranges inline code was incorrect; in certain
  senarios the rstrip(',') would take out the wrong commas.

Story: 2004762
Task: 28879

Change-Id: I6fd21bac59fc2d408132905b88710da48aa8d928
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2019-04-11 01:39:44 -04:00
David Sullivan fbbc2f6c29 Remove requires kubelet from affine tasks service
The affine tasks service requires the kubelet service. This causes the
kubelet service to start on worker nodes before the service would be
enabled by the manifests. On AIO nodes a docker interface will appear
before config_controller. The requires parameter is not needed and has
been removed.

Change-Id: I412edae9d0b323e1ae1ff6e81a8d958c38d98609
Closes-Bug: 1814946
Signed-off-by: David Sullivan <david.sullivan@windriver.com>
2019-02-06 21:15:18 +00:00
David Sullivan 76d3082421 Platform process pinning
Add a service to pin processes back to platform cores after
affine-platform. Previously this was done during the nova-compute
wrapper script. In kubernetes this script is not run so we need to add
a new service to pin tasks back to the platform cores.

Depends-On: https://review.openstack.org/#/c/634035/
Story: 2002843
Task: 29125
Change-Id: Ia8ccacb5546a8ea66010b024fe04ed39f9ef447d
Signed-off-by: David Sullivan <david.sullivan@windriver.com>
2019-01-30 20:27:30 +00:00
Tao Liu 6256b0d106 Change compute node to worker node personality
This update replaced the compute personality & subfunction
to worker, and updated internal and customer visible
references.

In addition, the compute-huge package has been renamed to
worker-utils as it contains various scripts/services that
used to affine running tasks or interface IRQ to specific CPUs.
The worker_reserved.conf is now installed to /etc/platform.

The cpu function 'VM' has also been renamed to 'Application'.

Tests Performed:
Non-containerized deployment
AIO-SX: Sanity and Nightly automated test suite
AIO-DX: Sanity and Nightly automated test suite
2+2 System: Sanity and Nightly automated test suite
2+2 System: Horizon Patch Orchestration
Kubernetes deployment:
AIO-SX: Create, delete, reboot and rebuild instances
2+2+2 System: worker nodes are unlock enable and no alarms

Story: 2004022
Task: 27013

Change-Id: I0e0be6b3a6f25f7fb8edf64ea4326854513aa396
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-12-13 14:15:55 -05:00