linux的crash之hardlock排查記錄
3.10.0-327的內核,crash記錄如下:
KERNEL: vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 48
DATE: Wed Oct 18 20:37:18 2017
UPTIME: 1 days, 09:43:06
LOAD AVERAGE: 13.42, 10.66, 9.48
TASKS: 7329
NODENAME: host-10-229-143-10
RELEASE: 3.10.0-327.22.2.el7.x86_64
VERSION: #1 SMP Fri Sep 29 15:13:08 CST 2017
MACHINE: x86_64 (2199 Mhz)
MEMORY: 383.6 GB
PANIC: "Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10"
PID: 24023
COMMAND: "fas_readwriter"
TASK: ffff882f460a2e00 [THREAD_INFO: ffff882f10c44000]
CPU: 10
STATE: TASK_RUNNING (PANIC)-----------------------------------------R狀態死鎖,進程長時間處於TASK_RUNNING 狀態搶占CPU而不發生切換,一般是,進程關搶占後一直執行任務,或者進程關搶占後處於死循環或者睡眠,此時往往會導致多個CPU互鎖,整個系統異常。
crash> bt
PID: 24023 TASK: ffff882f460a2e00 CPU: 10 COMMAND: "fas_readwriter"
#0 [ffff882fbfd459c8] machine_kexec at ffffffff81051c5b
#1 [ffff882fbfd45a28] crash_kexec at ffffffff810f3ec2
#2 [ffff882fbfd45af8] panic at ffffffff816326d1
#3 [ffff882fbfd45b78] watchdog_overflow_callback at ffffffff8111d0e2
#4 [ffff882fbfd45b88] __perf_event_overflow at ffffffff811608d1
#5 [ffff882fbfd45c00] perf_event_overflow at ffffffff811613a4
#6 [ffff882fbfd45c10] intel_pmu_handle_irq at ffffffff81032628
#7 [ffff882fbfd45e60] perf_event_nmi_handler at ffffffff81642bcb
#8 [ffff882fbfd45e80] nmi_handle at ffffffff81642319
#9 [ffff882fbfd45ec8] do_nmi at ffffffff81642430
#10 [ffff882fbfd45ef0] end_repeat_nmi at ffffffff81641753
[exception RIP: put_compound_page+336]
RIP: ffffffff81178b60 RSP: ffff882f10c47d80 RFLAGS: 00000006
RAX: 006016c60138402c RBX: ffffea0123302a40 RCX: 0000000000000022
RDX: 0000000000000246 RSI: 000000000a6a9000 RDI: ffffea0123300000
RBP: ffff882f10c47d98 R8: ffff882f10c47dc8 R9: ffff882f10c47d74
R10: ffff880000000298 R11: 000000000a6aa000 R12: ffffea0123300000
R13: 0000000000000246 R14: 0000000000000000 R15: ffffea0123302a40
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#11 [ffff882f10c47d80] put_compound_page at ffffffff81178b60
#12 [ffff882f10c47da0] put_page at ffffffff81178bac
#13 [ffff882f10c47db0] get_futex_key at ffffffff810e3c86
#14 [ffff882f10c47e08] futex_wake at ffffffff810e3f1a
#15 [ffff882f10c47e70] do_futex at ffffffff810e6a12
#16 [ffff882f10c47f08] sys_futex at ffffffff810e6f20
#17 [ffff882f10c47f80] system_call_fastpath at ffffffff81649909
首先,一般hardlock的觸發原因是因為關中斷時間太長,那麽需要查找對應的堆棧中是否有如此處理,而關中斷的常見函數,如spinlock,irq_disable等。
根據堆棧,get_futex_key有一段代碼如下:
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
page_head = page;
if (unlikely(PageTail(page))) {
put_page(page);
/* serialize against __split_huge_page_splitting() */
local_irq_disable();-------------------------------------------------------------------------關中斷
if (likely(__get_user_pages_fast(address, 1, !ro, &page) == 1)) {------------------調用了__get_user_pages_fast
page_head = compound_head(page);
/*
* page_head is valid pointer but we must pin
* it before taking the PG_lock and/or
* PG_compound_lock. The moment we re-enable
* irqs __split_huge_page_splitting() can
* return and the head page can be freed from
* under us. We can‘t take the PG_lock and/or
* PG_compound_lock on a page that could be
* freed from under us.
*/
if (page != page_head) {
get_page(page_head);
put_page(page);
}
local_irq_enable();
} else {
local_irq_enable();
goto again;
}
}
#else
page_head = compound_head(page);
if (page != page_head) {
get_page(page_head);
put_page(page);
}
#endif
確定下CONFIG_TRANSPARENT_HUGEPAGE 是否配置了:
grep CONFIG_TRANSPARENT_HUGEPAGE /boot/config-3.10.0-327.22.2.el7.x86_64
CONFIG_TRANSPARENT_HUGEPAGE=y
說明已經配置了,反匯編get_futex_key確認下,通過簡單搜索__get_user_pages_fast是否編譯確認了確實開啟了透明巨頁。
下一步,需要分析,為什麽put_page調用put_compound_page會長時間不返回。
void put_page(struct page *page)
{
if (unlikely(PageCompound(page)))
put_compound_page(page);
else if (put_page_testzero(page))
__put_single_page(page);
}
系統配置的/proc/sys/vm/nr_hugepages為0。
linux的crash之hardlock排查記錄