Report port and device inventory after the worker manifest

This is incremental fix of bug:2053149.
Upon network boot (first boot) of worker node, agent manager is
supposed to report ports/devices, without waiting for worker manifest,
as that would never run on first boot. Without this, after system
restore, it will be unable to unlock compute node due to sriov config
update.

kickstart records first boot as "/etc/platform/.first_boot". Agent
manager deletes this file. In case agent manager get crashed, it will
start again. This time, agent manager don't see .first_boot file, and
don't know this is still first boot and it won't report inventory for
the worker node.

This commit fixes this issue by creating volatile file
"/var/run/.first_boot" before deleting "/etc/platform/.first_boot", and
agent relies on both files to figure out it is first boot or not. This
present same logic for multiple crash/restart of agent manager.

TEST PLAN:
PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues.
PASS: Network-boot worker node, before doing unlock, restart agent
      manager (sysinv-agent), check sysinv.log to see ports are reported.

Closes-Bug: 2053149
Change-Id: Iace5576575388a6ed3403590dbeec545c25fc0e0
Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>
This commit is contained in:
Tara Subedi 2024-03-25 13:51:26 -04:00
parent 1573412c4d
commit 933d3a3a73
2 changed files with 11 additions and 1 deletions

View File

@ -224,7 +224,8 @@ class AgentManager(service.PeriodicService):
self._first_grub_update = False
self._inventoried_initial = False
self._inventory_reported = set()
self._first_boot_flag = os.path.exists(FIRST_BOOT_FLAG)
self._first_boot_flag = os.path.exists(FIRST_BOOT_FLAG) or \
os.path.exists(constants.VOLATILE_FIRST_BOOT_FLAG)
def start(self):
super(AgentManager, self).start()
@ -579,6 +580,14 @@ class AgentManager(service.PeriodicService):
host_uuid,
msg_dict)
if os.path.exists(FIRST_BOOT_FLAG):
# Create volatile first_boot file, that will be checked by agent manager
# when it get crashed and restarted, so that it will know this boot is still
# first boot.
try:
os.mknod(constants.VOLATILE_FIRST_BOOT_FLAG)
except OSError:
LOG.error("%s could not be created." % constants.VOLATILE_FIRST_BOOT_FLAG)
os.remove(FIRST_BOOT_FLAG)
LOG.info("Removed %s" % FIRST_BOOT_FLAG)
except exception.SysinvException:

View File

@ -2120,6 +2120,7 @@ DEFAULT_DNS_SERVICE_DOMAIN = 'cluster.local'
# First boot
FIRST_BOOT_FLAG = os.path.join(tsc.PLATFORM_CONF_PATH, ".first_boot")
VOLATILE_FIRST_BOOT_FLAG = os.path.join(tsc.VOLATILE_PATH, ".first_boot")
# Ansible bootstrap
ANSIBLE_BOOTSTRAP_FLAG = os.path.join(tsc.VOLATILE_PATH, ".ansible_bootstrap")