From acefd544f0f02aa348e29a46be925436349e542d Mon Sep 17 00:00:00 2001 From: Jim Gauld Date: Thu, 14 Feb 2019 15:42:07 -0500 Subject: [PATCH] Mitigate memory leak of sessions by disabling sudo for sriov agent The sriov agent was polling devices via 'sudo ip link show', and this resulted in a severe memory leak. The usage of 'sudo' uses the host 'dbus-daemon', and somewhere the host does not clean up login sessions. Symptoms: - gradual run out of memory until system unstable, host spontaneous reboot due to delay or OOM - huge growth of kernel slab - thousands of /sys/fs/cgroup/systemd/user.slice/user-0.slice session-x*.scope files with empty 'tasks', i.e., sessions that should have deleted - huge latency seen with ssh and various systemd commands The problem is mitigated by disabling 'sudo' for sriov agent, using a helm override that configures [agent]/root_helper='' . Testing: - Verified that we could launch a VM with SR-IOV interface; VFs were able to set MAC and VLAN attributes. Closes-Bug: 1815106 Change-Id: I0c57629c01b7407c99cc7f38b409019ab87af859 Signed-off-by: Jim Gauld --- sysinv/sysinv/sysinv/sysinv/helm/neutron.py | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sysinv/sysinv/sysinv/sysinv/helm/neutron.py b/sysinv/sysinv/sysinv/sysinv/helm/neutron.py index 169e7173e0..4f7bc3d1a5 100644 --- a/sysinv/sysinv/sysinv/sysinv/helm/neutron.py +++ b/sysinv/sysinv/sysinv/sysinv/helm/neutron.py @@ -246,6 +246,14 @@ class NeutronHelm(openstack.OpenstackBaseHelm): 'securitygroup': { 'firewall_driver': 'noop', }, + # Mitigate host OS memory leak of cgroup session-*scope files + # and kernel slab resources. The leak is triggered using 'sudo' + # which utilizes the host dbus-daemon. The sriov agent frequently + # polls devices via 'ip link show' using run_as_root=True, but + # does not actually require 'sudo'. + 'agent': { + 'root_helper': '', + }, 'sriov_nic': sriov_nic, }