AR# 69433

2017.1/2 Zynq UltraScale+ MPSoC: Linux kernel crash from EDAC driver

描述

Example Linux kernel crashes from the EDAC driver are captured below with different scenarios.

1) When the CPU is in sleep mode:


PMUFW: PmProcTrSleepToActive: SLEEP->ACTIVE NODE_APU_0
PMUFW: PmPowerRequestParent: NODE_APU_0->NODE_APU
[  149.889952] INFO: rcu_sched self-detected stall on CPU
[  149.895015]     3-...: (5249 ticks this GP) idle=d59/140000000000001/0 softirq=649/649 fqs=2625
[  149.903523]      (t=5250 jiffies g=182 c=181 q=46)
[  149.908029] Task dump for CPU 3:
[  149.911239] kworker/u8:1    R  running task        0    28      2 0x00000002
[  149.918278] Workqueue: edac-poller edac_device_workq_function
[  149.923995] Call trace:
[  149.926432] [<ffffff80080881a8>] dump_backtrace+0x0/0x1a8
[  149.931812] [<ffffff8008088364>] show_stack+0x14/0x20
[  149.936845] [<ffffff80080c0d44>] sched_show_task+0x94/0xf0
[  149.942312] [<ffffff80080c2ee0>] dump_cpu_task+0x40/0x50
[  149.947608] [<ffffff800812f458>] rcu_dump_cpu_stacks+0xb4/0xe8
[  149.953422] [<ffffff80080e9000>] rcu_check_callbacks+0x668/0x838
[  149.959411] [<ffffff80080ec6c4>] update_process_times+0x34/0x60
[  149.965311] [<ffffff80080fbaf4>] tick_sched_handle.isra.4+0x3c/0x50
[  149.971559] [<ffffff80080fbb4c>] tick_sched_timer+0x44/0x90
[  149.977115] [<ffffff80080ed1c8>] __hrtimer_run_queues+0xf0/0x178
[  149.983103] [<ffffff80080ed558>] hrtimer_interrupt+0x98/0x1c8
[  149.988832] [<ffffff80086aac38>] arch_timer_handler_phys+0x30/0x40
[  149.994994] [<ffffff80080dfa00>] handle_percpu_devid_irq+0x78/0x128
[  150.001242] [<ffffff80080da74c>] generic_handle_irq+0x24/0x38
[  150.006968] [<ffffff80080dadd4>] __handle_domain_irq+0x5c/0xb8
[  150.012782] [<ffffff80080814cc>] gic_handle_irq+0x64/0xc0
[  150.018163] Exception stack(0xffffffc87ba37b90 to 0xffffffc87ba37cc0)
[  150.024586] 7b80:                                   0000000000000002 ffffff8008681ce8
[  150.032404] 7ba0: 0000000000000000 0000000000000001 ffffffc87ffb8598 0000000000000000
[  150.040215] 7bc0: ffffffc87ffb8580 0000000000000000 ffffffc87ba33b60 ffffffc87ba34000
[  150.048026] 7be0: 0000000000000780 0000000000000000 0000000000000bbc ffffffc87ae98d00
[  150.055836] 7c00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  150.063647] 7c20: 0000000000000000 0000000000000040 0000000000000002 ffffff8008681ce8
[  150.071458] 7c40: 000000000000000f ffffff8009412590 ffffff8009387000 ffffff8009387000
[  150.079269] 7c60: ffffffc87ba37d48 ffffffc87b87ec78 ffffffc87b87eea8 ffffffc87ba37cc0
[  150.087080] 7c80: ffffff8008681f20 ffffffc87ba37cc0 ffffff8008100408 0000000020000145
[  150.094890] 7ca0: ffffffc87ba37cc0 ffffff8008100430 ffffffffffffffff 0000000000000001
[  150.102701] [<ffffff80080827b0>] el1_irq+0xb0/0x140
[  150.107555] [<ffffff8008100408>] smp_call_function_single+0x88/0x128
[  150.113891] [<ffffff8008681f20>] cortex_arm64_edac_check+0x80/0xd8
[  150.120053] [<ffffff800867e0f8>] edac_device_workq_function+0x78/0xc0
[  150.126475] [<ffffff80080b0da0>] process_one_work+0x120/0x380
[  150.132201] [<ffffff80080b1048>] worker_thread+0x48/0x4b0
[  150.137582] [<ffffff80080b6b18>] kthread+0xd0/0xe8
[  150.142355] [<ffffff8008082e80>] ret_from_fork+0x10/0x50
[  150.147648] INFO: rcu_sched detected stalls on CPUs/tasks:
[  150.153117]     3-...: (5251 ticks this GP) idle=d59/140000000000000/0 softirq=649/649 fqs=2626
[  150.161627]     (detected by 2, t=5318 jiffies, g=182, c=181, q=48)
[  150.167607] Task dump for CPU 3:
[  150.170818] kworker/u8:1    R  running task        0    28      2 0x00000002
[  150.177852] Workqueue: edac-poller edac_device_workq_function
[  150.183573] Call trace:
[  150.186009] [<ffffff8008085360>] __switch_to+0x90/0xa8
[  150.191129] [<ffffffc87b87ec00>] 0xffffffc87b87ec00

 

2) During kernel boot, it can lock up after a few seconds or minutes about 50% of the time.


PetaLinux 2017.1 plnx_aarch64 /dev/ttyPS0

plnx_aarch64 login: root
Password:
root@plnx_aarch64:~# [   80.027055] INFO: rcu_sched self-detected stall on CPU
[   80.032108]  0-...: (5249 ticks this GP) idle=6a1/140000000000001/0 softirq=1297/1297 fqs=2625
[   80.035054] INFO: rcu_sched detected stalls on CPUs/tasks:
[   80.035060]  0-...: (5249 ticks this GP) idle=6a1/140000000000001/0 softirq=1297/1297 fqs=2625
[   80.035064]  (detected by 2, t=5252 jiffies, g=95, c=94, q=2)
[   80.035065] Task dump for CPU 0:
[   80.035070] kworker/u8:1    R  running task        0    28      2 0x00000002
[   80.035082] Workqueue: edac-poller edac_device_workq_function
[   80.035084] Call trace:
[   80.035090] [<ffffff80080852c4>] __switch_to+0x8c/0xa0
[   80.035094] [<ffffffc07586ec00>] 0xffffffc07586ec00
[   80.100425]   (t=5268 jiffies g=95 c=94 q=2)
[   80.106561] Task dump for CPU 0:
[   80.111645] kworker/u8:1    R  running task        0    28      2 0x00000002
[   80.120575] Workqueue: edac-poller edac_device_workq_function
[   80.128221] Call trace:
[   80.132629] [<ffffff80080880f0>] dump_backtrace+0x0/0x198
[   80.140047] [<ffffff800808829c>] show_stack+0x14/0x20
[   80.147140] [<ffffff80080bf064>] sched_show_task+0x94/0xf0
[   80.154685] [<ffffff80080c0f18>] dump_cpu_task+0x40/0x50
[   80.162077] [<ffffff800812a7bc>] rcu_dump_cpu_stacks+0xb4/0xe8
[   80.169978] [<ffffff80080e4c3c>] rcu_check_callbacks+0x67c/0x860
[   80.178015] [<ffffff80080e7d0c>] update_process_times+0x34/0x60
[   80.185978] [<ffffff80080f67f0>] tick_sched_handle.isra.4+0x38/0x48
[   80.194281] [<ffffff80080f6844>] tick_sched_timer+0x44/0x90
[   80.201865] [<ffffff80080e8670>] __hrtimer_run_queues+0xf0/0x178
[   80.209870] [<ffffff80080e8a00>] hrtimer_interrupt+0x98/0x1c8
[   80.217594] [<ffffff80086ae038>] arch_timer_handler_phys+0x30/0x40
[   80.225733] [<ffffff80080dbe00>] handle_percpu_devid_irq+0x78/0x128
[   80.233939] [<ffffff80080d6b24>] generic_handle_irq+0x24/0x38
[   80.241612] [<ffffff80080d7184>] __handle_domain_irq+0x5c/0xb8
[   80.249352] [<ffffff80080814cc>] gic_handle_irq+0x64/0xc0
[   80.256633] Exception stack(0xffffffc075a47b90 to 0xffffffc075a47cc0)
[   80.264967] 7b80:                                   0000000000000000 ffffff8008686400
[   80.274692] 7ba0: 0000000000000000 0000000000000001 ffffffc077f72698 ffffffc077f72680
[   80.284408] 7bc0: 0000000000000000 0000000000000000 ffffffc075a43b60 ffffffc075a44000
[   80.294109] 7be0: 0000000000000780 0000000000000000 ffffffc074f6c800 0000000000000000
[   80.303812] 7c00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   80.313480] 7c20: 0000000000000000 0000000000000000 0000000000000000 ffffff8008686400
[   80.323104] 7c40: 0000000000000000 ffffff8008cf1488 ffffff8008c67000 ffffff8008c67000
[   80.332744] 7c60: ffffffc075a47d48 ffffffc07586ec78 ffffffc07586eea8 ffffffc075a47cc0
[   80.342384] 7c80: ffffff8008686630 ffffffc075a47cc0 ffffff80080fb110 0000000060000145
[   80.352017] 7ca0: ffffffc075a47d40 ffffff800890dac4 ffffffffffffffff ffffff8008c39000
[   80.361647] [<ffffff80080827b0>] el1_irq+0xb0/0x140
[   80.368315] [<ffffff80080fb110>] smp_call_function_single+0x88/0x128
[   80.376458] [<ffffff8008686630>] cortex_arm64_edac_check+0x78/0xd0
[   80.384428] [<ffffff8008682848>] edac_device_workq_function+0x78/0xc0
[   80.392663] [<ffffff80080af654>] process_one_work+0x1bc/0x380
[   80.400182] [<ffffff80080af860>] worker_thread+0x48/0x4a8
[   80.407304] [<ffffff80080b5284>] kthread+0xd4/0xe8
[   80.413780] [<ffffff8008082e80>] ret_from_fork+0x10/0x50

 

3) While configuring an IP address on target:



root@plnx_aarch64:~# ifconfig eth0 192.168.1.44
 
[   96.153479] INFO: rcu_sched self-detected stall on CPU
[   96.158544]  0-...: (5249 ticks this GP) idle=437/140000000000001/0 softirq=1318/1318 fqs=2625
[   96.167216]   (t=5250 jiffies g=-27 c=-28 q=7)
[   96.171645] Task dump for CPU 0
:
[
  96.174856] kworker/u8:2    R  running task        0    32      2 0x000000
02
[  
96.181901] Workqueue: edac-poller edac_device_workq_function
[   96.187616] Call trace:
[   96.190053] [<ffffff8008088138>] dump_backtrace+0x0/0x198
[   96.195431] [<ffffff80080882e4>] show_stack+0x14/0x20
[   96.200466] [<ffffff80080c0db4>] sched_show_task+0x94/0xf0
[   96.205932] [<ffffff80080c2fe0>] dump_cpu_task+0x40/0x50
[   96.211231] [<ffffff800812ec74>] rcu_dump_cpu_stacks+0xb4/0xe8
[   96.217046] [<ffffff80080e8efc>] rcu_check_callbacks+0x67c/0x860
[   96.223034] [<ffffff80080ec5b4>] update_process_times+0x34/0x60
[   96.228936] [<ffffff80080fba68>] tick_sched_handle.isra.4+0x38/0x48
[   96.235184] [<ffffff80080fbabc>] tick_sched_timer+0x44/0x90
[   96.240739] [<ffffff80080ed070>] __hrtimer_run_queues+0xf0/0x178
[   96.246728] [<ffffff80080ed400>] hrtimer_interrupt+0x98/0x1c8
[   96.252460] [<ffffff80084e4298>] arch_timer_handler_phys+0x30/0x40
[   96.258622] [<ffffff80080df8f8>] handle_percpu_devid_irq+0x78/0x128
[   96.264870] [<ffffff80080da5fc>] generic_handle_irq+0x24/0x38
[   96.270598] [<ffffff80080dac74>] __handle_domain_irq+0x5c/0xb8
[   96.276414] [<ffffff80080814cc>] gic_handle_irq+0x64/0xc0
[   96.281795] Exception stack(0xffffffc075a63b90 to 0xffffffc075a63cc0)
[   96.288218] 3b80:                                   0000000000000000 ffffff80084bd4b0
[   96.296030] 3ba0: 0000000000000000 0000000000000001 ffffffc077f7e898 ffffffc077f7e880
[   96.303843] 3bc0: 0000000000000000 0000000000000000 ffffffc075a5f360 ffffffc075a60000
[   96.311654] 3be0: 0000000000000780 0000000000000000 0000000000002ecc 0000000000000000
[   96.319467] 3c00: 0024f47300000000 00003ccab2000000 ffffff80080ed740 0000007f955cf0b0
[   96.327279] 3c20: 0000000000040900 0000000000000000 0000000000000000 ffffff80084bd4b0
[   96.335090] 3c40: 0000000000000000 ffffff800895a580 ffffff80088e7000 ffffff80088e7000
[   96.342903] 3c60: ffffffc075a63d48 0000000000000000 0000000000000000 ffffffc075a63cc0
[   96.350714] 3c80: ffffff80084bd6e4 ffffffc075a63cc0 ffffff80081003e0 0000000060000145
[   96.358527] 3ca0: ffffffc075a63d40 ffffff8008662fbc ffffffffffffffff ffffff800809ab08
[   96.366339] [<ffffff80080827b0>] el1_irq+0xb0/0x140
[   96.371201] [<ffffff80081003e0>] smp_call_function_single+0x88/0x128
[   96.377537] [<ffffff80084bd6e4>] cortex_arm64_edac_check+0x7c/0xd8
[   96.383699] [<ffffff80084b98d8>] edac_device_workq_function+0x78/0xc0
[   96.390124] [<ffffff80080b0ed4>] process_one_work+0x1bc/0x380
[   96.395851] [<ffffff80080b10e0>] worker_thread+0x48/0x4a8
[   96.401233] [<ffffff80080b6cf4>] kthread+0xd4/0xe8
[   96.406006] [<ffffff8008082e80>] ret_from_fork+0x10/0x50
 

4) During shutdown in Linux:

[  192.124571] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u8:4:1264]
[  192.147930] Kernel panic - not syncing: softlockup: hung tasks
[  192.165207] CPU: 1 PID: 1264 Comm: kworker/u8:4 Tainted: G           O L  4.9.0-dcu-v2017.1 #1
[  192.190984] Hardware name: Zyn
qMP
[
 192.204527] Workqueue: edac-poller edac_device_workq_function
[  192.221711] Call trace:
[  192.657623]
fc00: ffffffffffffffff ffffff800813addc
[  192.672205] [<ffffff80080827b0>] el1_irq+0xb0/0x124
[  192.686789] [<ffffff800813ae18>] smp_call_function_single+0x180/0x1d8
[  192.706058] [<ffffff800813afc0>] smp_call_function_any+0x68/0x160
[  192.724287] [<ffffff80086bd34c>] cortex_arm64_edac_check+0xd4/0x110
[  192.743035] [<ffffff80086b879c>] edac_device_workq_function+0x84/0xd0
[  192.762305] [<ffffff80080dd80c>] process_one_work+0x14c/0x490
[  192.779491] [<ffffff80080ddba8>] worker_thread+0x58/0x4a8
[  192.795635] [<ffffff80080e43cc>] kthread+0xec/0x100
[  192.810217] [<ffffff8008082e80>] ret_from_fork+0x10/0x50
[  193.899587] SMP: failed to stop secondary CPUs 0-1

解决方案

To work around this issue, disable the CONFIG_EDAC_CORTEX_ARM64 driver from the kernel.

AR# 69433
日期 07/10/2017
状态 Active
Type 综合文章
器件
Tools
Boards & Kits