驅動開發誤用指標錯誤:Unable to handle kernel NULL pointer dereference at virtual address
前言
今天,來說說驅動開發中誤用指標導致的錯誤:Unable to handle kernel NULL pointer dereference at virtual address xxxxxxxx。這個錯誤是我當作在做液晶驅動使用DMA的時候遇到的,在分配DMA傳輸用的記憶體的時候引用了一個空的指標導致的錯誤!錯誤列印資訊如下:
[ 72.820000] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 72.820000] pgd = c0004000 [ 72.820000] [00000000] *pgd=00000000 [ 72.830000] Internal error: Oops: 817 [#1] ARM [ 72.830000] Modules linked in: disp_tft(O) sec_mmap(O) [ 72.830000] CPU: 0 Tainted: G O (3.6.5 #55) [ 72.830000] PC is at __memzero+0x4c/0x80 [ 72.830000] LR is at 0x0 [ 72.830000] pc : [<c0167a0c>] lr : [<00000000>] psr: 00000113 [ 72.830000] sp : c0407db4 ip : 00000000 fp : c0407dcc [ 72.830000] r10: 00000140 r9 : 00200000 r8 : 000000f0 [ 72.830000] r7 : dfcd0000 r6 : de2c0000 r5 : 00000000 r4 : 00000001 [ 72.830000] r3 : 00000000 r2 : 00000000 r1 : ffffffd0 r0 : 00000000 [ 72.830000] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 72.830000] Control: 10c53c7d Table: 7f1cc059 DAC: 00000015 [ 72.830000] Process swapper (pid: 0, stack limit = 0xc04062e8) [ 72.830000] Stack: (0xc0407db4 to 0xc0408000) 。。。。。。 [ 72.830000] Backtrace: [ 72.830000] [<c0172c68>] (sg_init_table+0x0/0x38) from [<bf020324>] (lcd_flash_timer+0x144/0x340 [disp_tft]) [ 72.830000] r5:de81a0d4 r4:de240340 [ 72.830000] [<bf0201e0>] (lcd_flash_timer+0x0/0x340 [disp_tft]) from [<c0057e60>] (run_timer_softirq+0x134/0x1d4) [ 72.830000] [<c0057d2c>] (run_timer_softirq+0x0/0x1d4) from [<c005236c>] (__do_softirq+0xa4/0x164) [ 72.830000] r8:c046f6c0 r7:00000100 r6:c0406000 r5:c046f708 r4:00000001 [ 72.830000] [<c00522c8>] (__do_softirq+0x0/0x164) from [<c00527bc>] (irq_exit+0x48/0x94) [ 72.830000] [<c0052774>] (irq_exit+0x0/0x94) from [<c000ed10>] (handle_IRQ+0x6c/0x8c) [ 72.830000] [<c000eca4>] (handle_IRQ+0x0/0x8c) from [<c0008530>] (gic_handle_irq+0x40/0x58)
後來自己百度了一下,發現導致這個錯誤的原因主要有以下幾點:
1.驅動開發人員在寫驅動的時候引用了一個空的指標,導致核心的分頁機制無法對映指標到一個實體地址, 處理器發出一個頁錯誤給作業系統. 如果地址無效, 核心無法"頁 入"缺失的地址; 它(常常)產生一個 oops 如果在處理器處於管理模式時發生這個情況;
2.檢查驅動依賴的核心選項,可能你遺落了某個關鍵核心選項沒選;
解決辦法
大部分情況下,發生這種錯誤的原因都是驅動開發人員引用了一個無效的指標(或者空指標)導致的。這時候需要你通過一步步的列印除錯或者通過核心列印的錯誤資訊定位到錯誤地方,然後進行修改。
案例展示
為了更直觀的說明如何解決這類問題,作者在此展示一個坐著當初遇到錯誤的例子,方便讀者理解。為了便於理解,作者將程式碼簡化:
1.比如在某個驅動程式的初始化函式中新增如下兩段程式碼:
static int __init disp_init(void)
{
int *ptr = NULL;
*ptr = 0x123456;
...........
}
驅動載入之後就會出現如下錯誤:
[ 101.650000] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 101.660000] pgd = de0ac000 [ 101.660000] [00000000] *pgd=7f21e831, *pte=00000000, *ppte=00000000 [ 101.660000] Internal error: Oops: 817 [#1] ARM [ 101.660000] Modules linked in: disp_tft(O+) prn_ltp02245(O) ope_gpio_tft(O) buzz(O) sec_mmap(O) [last unloaded: disp_tft] [ 101.660000] CPU: 0 Tainted: G O (3.6.5 #56) [ 101.660000] PC is at disp_init+0x28/0x818 [disp_tft] [ 101.660000] LR is at do_one_initcall+0x9c/0x16c [ 101.660000] pc : [<bf07d028>] lr : [<c0008658>] psr: 60000013 [ 101.660000] sp : de2bfe58 ip : de2bfeb0 fp : de2bfeac [ 101.660000] r10: bf07d000 r9 : 00000000 r8 : 00000001 [ 101.660000] r7 : debd1080 r6 : bf079fe8 r5 : 00000000 r4 : bf07a0e0 [ 101.660000] r3 : 00123456 r2 : 00000000 r1 : 00000fff r0 : 18045000
[ 101.660000] Backtrace:
[ 101.660000] [<bf07d000>] (disp_init+0x0/0x818 [disp_tft]) from [<c0008658>] (do_one_initcall+0x9c/0x16c)
[ 101.660000] r8:00000001 r7:debd1080 r6:bf079fe8 r5:00000000 r4:bf079fa0
[ 101.660000] [<c00085bc>] (do_one_initcall+0x0/0x16c) from [<c0078968>] (sys_init_module+0x1590/0x171c)
[ 101.660000] [<c00773d8>] (sys_init_module+0x0/0x171c) from [<c000de20>] (ret_fast_syscall+0x0/0x30)
[ 101.660000] Code: e59f4774 e3001fff e1a02005 e59f076c (e5853000)
由上述出錯資訊,我們可以定位到出現錯誤的函式是在disp_init()中使用了空指標。
關於如何根據Linux中的Oops資訊進行驅動除錯請讀者仔細閱讀以下部落格:
(宣告:部落格引自他人)
以上部落格詳細描述瞭如何根據核心列印的錯誤資訊去解決在驅動開發中使用無效指標導致的錯誤!作者也是參考了這篇部落格才恍然大悟!所以,我也是站在巨人的肩膀上去學習啊!