1. 程式人生 > >linux kernel crash問題分析解決

linux kernel crash問題分析解決

kernel crash linux

一,問題場景和環境

系統環境:

redhat6.4 kernel2.6.32-358


問題:

使用iptablesmangle表添加了一條規則,使用nfqueue做為target。當一個http請求命中這個規則之後,機器直接重啟了。偶發性的出了兩次問題,但是卻在重啟的機器上重現不了這個問題。


二,排查

1,查看messageskerneldmesg相關日誌,未發現有任何異常

2,查看重啟前機器的負載,cpu,內存,磁盤io,網絡io都正常

3,由於是使用了nfqueue做為target才導致的重啟,懷疑是系統的問題,通過現象看應該是iptablesnfqueue導致的問題,而nfqueue用於從內核讀取數據包在用戶態處理。故具體定位在

kernel或者libnetfilter_queue上。

4,通過服務器顯示屏幕來看重啟的時候會有什麽有用的輸出,但是服務器在客戶的機房,查看太麻煩

5,使用last查看服務器的重啟記錄,發現一個意外現象,即:機器因為nfqueue重啟的那個記錄裏面有一個crash記錄,意思即系統奔潰了,從而導致重啟。那就能斷定是系統或者kernel crash了。

6linux系統一般默認都安裝配置了kdump,故當 linux 系統內核發生崩潰的時候,可以通過 kdump 等方式收集內核崩潰之前的內存,在/var/crash/日期 目錄生成一個轉儲文件 vmcore。使用crash工具可以分享vmcore文件,來獲取

kernel crash前的一些重要信息。通過在機器上查找,果然發現了crash相關的vmcore文件。


三,分析vmcore文件

1,安裝指定kerneldebuginfo包:
# yum install kernel-debuginfo-2.6.32-358.el6.x86_64


2,使用系統自帶的crash命令分析vmcore

# crash /usr/lib/debug/lib/modules/2.6.32-358.el6.x86_64/vmlinux vmcore
crash 7.1.0-6.el6
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel version inconsistency between vmlinux and dumpfile
      KERNEL: vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 40
        DATE: Tue Oct 31 11:53:41 2017
      UPTIME: 342 days, 12:15:26
LOAD AVERAGE: 0.00, 0.02, 0.00
       TASKS: 1050
    NODENAME: web_yp_49_202.mobileztgame
     RELEASE: 2.6.32-358.el6.x86_64
     VERSION: #1 SMP Tue Jan 29 11:47:41 EST 2013
     MACHINE: x86_64  (2499 Mhz)
      MEMORY: 128 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at (null)"
         PID: 0
     COMMAND: "swapper"
        TASK: ffff882069324080  (1 of 40)  [THREAD_INFO: ffff881068896000]
         CPU: 5
       STATE: TASK_RUNNING (PANIC)


crash的輸出可以看到kernel崩潰的原因為kernel遇見空指針導致崩潰



bt 命令用於查看系統崩潰前的堆棧等信息

bt命令結果如下:

crash> bt
PID: 0      TASK: ffff882069324080  CPU: 5   COMMAND: "swapper"
 #0 [ffff8800618a3750] machine_kexec at ffffffff81035b7b
 #1 [ffff8800618a37b0] crash_kexec at ffffffff810c0db2
 #2 [ffff8800618a3880] oops_end at ffffffff815111d0
 #3 [ffff8800618a38b0] no_context at ffffffff81046bfb
 #4 [ffff8800618a3900] __bad_area_nosemaphore at ffffffff81046e85
 #5 [ffff8800618a3950] bad_area_nosemaphore at ffffffff81046f53
 #6 [ffff8800618a3960] __do_page_fault at ffffffff810476b1
 #7 [ffff8800618a3a80] do_page_fault at ffffffff8151311e
 #8 [ffff8800618a3ab0] page_fault at ffffffff815104d5
    [exception RIP: nf_queue+152]
    RIP: ffffffff81475718  RSP: ffff8800618a3b60  RFLAGS: 00010207
    RAX: 0000000000000020  RBX: 0000000000000000  RCX: ffff8810638a3c00
    RDX: 0000000000000002  RSI: ffff880959189980  RDI: 0000000000000000
    RBP: ffff8800618a3bd0   R8: 0000000000021773   R9: 0000000000000001
    R10: 000000000000000e  R11: 0000000000000006  R12: ffff880959189980
    R13: 0000000000000000  R14: ffffffff8147e8b0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8800618a3bd8] nf_hook_slow at ffffffff81474800
#10 [ffff8800618a3c58] ip_rcv at ffffffff8147ef54
#11 [ffff8800618a3c98] __netif_receive_skb at ffffffff8144819b
#12 [ffff8800618a3cf8] netif_receive_skb at ffffffff8144a578
#13 [ffff8800618a3d38] napi_skb_finish at ffffffff8144a680
#14 [ffff8800618a3d58] napi_gro_receive at ffffffff8144cc29
#15 [ffff8800618a3d78] ixgbe_poll at ffffffffa015e44c [ixgbe]
#16 [ffff8800618a3e68] net_rx_action at ffffffff8144cd43
#17 [ffff8800618a3ec8] __do_softirq at ffffffff81076fb1
#18 [ffff8800618a3f38] call_softirq at ffffffff8100c1cc
#19 [ffff8800618a3f50] do_softirq at ffffffff8100de05
#20 [ffff8800618a3f70] irq_exit at ffffffff81076d95
#21 [ffff8800618a3f80] do_IRQ at ffffffff81516c95
--- <IRQ stack> ---
#22 [ffff881068897db8] ret_from_intr at ffffffff8100b9d3
    [exception RIP: intel_idle+222]
    RIP: ffffffff812d37ae  RSP: ffff881068897e68  RFLAGS: 00000206
    RAX: 0000000000000000  RBX: ffff881068897ed8  RCX: 0000000000000000
    RDX: 00000000000e3cb1  RSI: 0000000000000000  RDI: 00000000379d13ba
    RBP: ffffffff8100b9ce   R8: 0000000000000004   R9: 0000000000000050
    R10: 0069229e5ea9dbfa  R11: 0000000000000000  R12: ffff8800618b15a0
    R13: 0000000000000000  R14: 0069229c2b297a40  R15: ffff8800618b16a0
    ORIG_RAX: ffffffffffffff62  CS: 0010  SS: 0018
#23 [ffff881068897ee0] cpuidle_idle_call at ffffffff81414ef7
#24 [ffff881068897f00] cpu_idle at ffffffff81009fc6



通過bt分析,我們從下到上來看kernel崩潰前的系統調用,定位到kernel崩潰前的一個exceptionip寄存器RIP的異常,而通過dis 命令來看一下該地址的反匯編結果:

crash> dis -l ffffffff81475718
/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/netfilter/nf_queue.c: 221
0xffffffff81475718 <nf_queue+152>:      mov    (%rbx),%r12


故可定位到出現異常的代碼段:

# vim /usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/netfilter/nf_queue.c +221
215         segs = skb_gso_segment(skb, 0);
216         kfree_skb(skb);
217         if (IS_ERR(segs))
218                 return 1;
219
220         do {
221                 struct sk_buff *nskb = segs->next;
222
223                 segs->next = NULL;
224                 if (!__nf_queue(segs, elem, pf, hook, indev, outdev, okfn,
225                                 queuenum))
226                         kfree_skb(segs);
227                 segs = nskb;
228         } while (segs);
229         return 1;




而通過看skb_gso_segment結構體,可以判斷出是因為skb_gso_segment在某些情況下會返回NULL,從而導致如上代碼segs->next獲取到了空指針,從而導致kernel崩潰。而既然是gso導致的問題,應該可以通過調整系統gso屬性來規避這個問題:

# vim /usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/core/dev.c +1728
1728 /**
1729  *      skb_gso_segment - Perform segmentation on skb.
1730  *      @skb: buffer to segment
1731  *      @features: features for the output path (see dev->features)
1732  *
1733  *      This function segments the given skb and returns a list of segments.
1734  *
1735  *      It may return NULL if the skb requires no segmentation.  This is
1736  *      only possible when GSO is used for verifying header integrity.
1737  */
1738 struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
1739 {
1740         struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
1741         struct packet_type *ptype;
1742         __be16 type = skb->protocol;
1743         int err;


從網上找到的對應patch如下:

https://patchwork.kernel.org/patch/6615071/


四,問題重現

1,最早發現問題,想要重現的辦法是通過如下url訪問:curl “t.test.com”,發現重現不了。

2,之後,通過搜索相關TSO/GSO/LRO/GRO相關的資料,覺得有可能是由於發送的數據包太小,導致沒有觸發相關的數據包分段重組,從而沒有導致重現問題。故增大了請求的數據包,通過如下url重現了問題:

# curl “t.test.com/v2/user-manage/css/bootstrap.min.css?test1=sdfsfsdfsdfa&test2_id=2234234234234234234&test_id=50129009890098&test_token=1670056402|_80_m_lxxj1298|1493196793|c726299f2d03b8462764bacf20e2395f|sdfsdfdsfsdffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffsdfsdfsdfdsfsdfhgjgjghjghjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjfhjgjfghjfjfhjjjjjjjjjjjjjjjjjjjjjfffffadfsfsdfsdfsdfsdfsdfdsfdssdfsdfsdfsdfsdfsdf”

iptables相關規則如下:

# ipset create lee hash:ip hashsize 819200 maxelem 100000 timeout 300
# ipset add lee 1.1.1.1 timeout 300
# iptables -t mangle -I PREROUTING -p tcp -m multiport --dports 80,443 -m set --match-set lee src -m string --string t.test.com --algo kmp --from 0 --to 1480 -j NFQUEUE


五,問題結論

linux kernel bug


六,解決辦法

1,升級kernel。從patch和源代碼可以看出kernel 3.0以後應該fix了這個問題,看了下3.10kernel代碼已經fix

2,使用drop,不再使用nfqueue這個target來添加iptables規則(建議使用這個辦法)

3,調整網卡gso相關屬性,發現通過關閉lro來解決這個重啟問題。具體命令:

# ethtool -K eth0 lro on

LRO簡介:

Linux 2.6.24 中加入了支持 IPv4 TCP 協議的 LRO (Large Receive Offload) ,它通過將多個 TCP 數據聚合在一個 skb 結構,在稍後的某個時刻作為一個大數據包交付給上層的網絡協議棧,以減少上層協議棧處理 skb 的開銷,提高系統接收 TCP 數據包的能力。當然,這一切都需要網卡驅動程序支持。


七,參考

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/kernel_crash_dump_guide/sect-crash-running-the-utility

https://patchwork.kernel.org/patch/6615071/

https://www.ibm.com/developerworks/cn/linux/l-cn-network-pt/index.html























本文出自 “佳” 博客,請務必保留此出處http://leejia.blog.51cto.com/4356849/1978729

linux kernel crash問題分析解決