kdump-tools: disable AER to fix kdump hung issue

This issue is detected after kernel updated from 5.10.112 version to
5.10.152 version. Bad commit is d83d886e69bd (PCI/ERR: Recover from
RCEC AER errors) which comes from linux-yocto 5.10 stable tree. It
will lead to board hang up after triggering kdump.

This issue can be reproduced on board whose name is Supermicro
A2SDi-16C-TP8F, bios version is 1.4 and build date is 01/29/2021.

We don't need pci AER functionality enabled in the kdump kernel, and it
causes some boards to hang in certain situations as kernel AER error log
shows. So we just disable it.

KERNEL AER ERROR LOG:
[    7.409296] pcieport 0000:00:05.0: AER: Multiple Corrected error
received: 0000:00:05.0
[    7.417311] BUG: kernel NULL pointer dereference, address:
0000000000000028
[    7.418296] #PF: supervisor read access in kernel mode
[    7.418296] #PF: error_code(0x0000) - not-present page
[    7.418296] PGD 0 P4D 0
[    7.418296] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    7.418296] CPU: 0 PID: 93 Comm: irq/25-aerdrv Not tainted
5.10.0-6-amd64 #1 Debian 5.10.152-1.stx.25
[    7.418296] Hardware name: Supermicro
SYS-E300-9A-16CN8TP/A2SDi-16C-TP8F, BIOS 1.4 01/29/2021
[    7.418296] RIP: 0010:pci_walk_bus+0x25/0x90
[    7.418296] Code: 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 49 89 fd
48 c7 c7 20 37 9a 99 41 54 49 89 f4 55 48 89 d5 53 4c 89 eb e8 2b 5a 56
00 <49> 8b 7d 28 eb 1f 48 8b 47 18 48 85 c0 74 31 4c 8b 70 28 48 89 c3
[    7.418296] RSP: 0000:ffffa60040173dc8 EFLAGS: 00010282
[    7.418296] RAX: ffff8b553fded001 RBX: 0000000000000000 RCX:
0000000000000000
[    7.418296] RDX: ffff8b553fded000 RSI: ffffffff9833c6e0 RDI:
ffffffff999a3720
[    7.418296] RBP: ffffa60040173e10 R08: 0000000000000002 R09:
ffffa60040173d74
[    7.418296] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffffff9833c6e0
[    7.418296] R13: 0000000000000000 R14: 0000000000000028 R15:
ffff8b555e206328
[    7.418296] FS:  0000000000000000(0000) GS:ffff8b55bec00000(0000)
knlGS:0000000000000000
[    7.418296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.418296] CR2: 0000000000000028 CR3: 000000087d80a000 CR4:
00000000003506f0
[    7.418296] Call Trace:
[    7.418296]  find_source_device+0x34/0x5a
[    7.418296]  aer_isr.cold+0x89/0x9e
[    7.418296]  ? __set_cpus_allowed_ptr+0xb6/0x220
[    7.418296]  ? disable_irq_nosync+0x10/0x10
[    7.418296]  irq_thread_fn+0x20/0x60
[    7.418296]  irq_thread+0x104/0x1b0
[    7.418296]  ? irq_finalize_oneshot.part.0+0xe0/0xe0
[    7.418296]  ? irq_thread_check_affinity+0xa0/0xa0
[    7.418296]  kthread+0x133/0x150
[    7.418296]  ? __kthread_bind_mask+0x60/0x60
[    7.418296]  ret_from_fork+0x22/0x30
[    7.418296] Modules linked in:
[    7.418296] CR2: 0000000000000028

TEST PLAN:
PASS: build-pkgs -c -p kdump-tools
PASS: build-pkgs -c -p kdump-tools-rt
PASS: boot
PASS: on troublesome and non-troublesome platform
      systemctl enable kdump-tools.service
      systemctl start kdump-tools.service
      echo 1 >/proc/sysrq-trigger
      echo 'c' > /proc/sysrq-trigger
      vmcore has been created successfully
      system boots back up automatically

Closes-Bug: 1999646

Change-Id: I9ffc6e96d4b7fbd0b29a806d4d96dfc8e89dc4c6
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
This commit is contained in:
Peng Zhang 2022-12-17 08:38:58 +08:00
parent 1f965684b8
commit 470193ffc9
2 changed files with 30 additions and 0 deletions

View File

@ -0,0 +1,29 @@
From 88e8f23536d60aa163c72ffdbe453315c5102d3c Mon Sep 17 00:00:00 2001
From: Peng Zhang <Peng.Zhang2@windriver.com>
Date: Thu, 15 Dec 2022 00:09:32 -0800
Subject: [PATCH] kdump-tools: disable AER to fix kdump hung issue
We don't need pci AER functionality enabled in the kdump kernel,
and it causes some boards to hang in certain situations. So just
disable it.
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
---
rules | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/debian/rules b/debian/rules
index 72b7d6d..b428331 100755
--- a/debian/rules
+++ b/debian/rules
@@ -14,7 +14,7 @@ ifeq ($(DEB_HOST_ARCH),arm64)
else ifeq ($(DEB_HOST_ARCH),ppc64el)
KDUMP_CMDLINE_APPEND += maxcpus=1 irqpoll noirqdistrib nousb
else
- KDUMP_CMDLINE_APPEND += nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0
+ KDUMP_CMDLINE_APPEND += nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0 pci=noaer
endif
%:
--
2.34.1

View File

@ -1,2 +1,3 @@
0001-kdump-tools-add-vmlinuz-and-initrd.img-soft-link.patch
0002-kdump-tools-adapt-check_secure_boot-checking.patch
0003-kdump-tools-disable-AER-to-fix-kdump-hung-issue.patch