1. 程式人生 > >一次因記憶體覆蓋引起的system dump問題分析,基於linux的crash工具。

一次因記憶體覆蓋引起的system dump問題分析,基於linux的crash工具。

關於crash工具
sudo mount system.img the-dir   //把system.img掛載到一個目錄,就可以檢視system的檔案了,還用去網上搜什麼解包方法????


對vmlinux進行反彙編:
/home/apuser/mywork/4.4-3.10-prime/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-objdump -D vmlinux > vmlinux.dis
得到的彙編程式碼vmlinux.dis,可用vi檢視。


準備:
1.將dump檔案合一: cat sysdump.core.* > dump.bin
2.到vendor/xxxx/open-source/tools/sysdump目錄下找到crash命令,最好把這個命令放到~/bin目錄中,以後就方便執行了。
3.找到對應的vmlinux檔案


進入crash:
4.  ./crash -m phys_base=0x80000000 dump.bin vmlinux


使用:
在crash命令列下輸入“log” 命令顯示log,檢視是否有堆疊打印出。
p命令可以打印出一個全域性變數,如:


crash> p sprdbl
sprdbl = $9 = {
  pwm_mode = normal_pwm, 
  pwm_index = 3, 
  bldev = 0xda992000, 
  suspend = 0, 
  clk = 0xc09a23c8, 
  sprd_early_suspend_desc = {
    link = {
      next = 0xc09ca484, 
      prev = 0xc09c6ea8
    }, 
    level = 50, 
    suspend = 0xc029c87c <sprd_backlight_earlysuspend>, 
    resume = 0xc029c830 <sprd_backlight_lateresume>
  }
}
crash> 




//===================================
一次syste mdump分析:


首先,獲取最後的log資訊
 
crash> 
crash> log


[ 3067.675155] Unable to handle kernel NULL pointer dereference at virtual address 00000002
[ 3067.675194] pgd = c0004000
[ 3067.675220] [00000002] *pgd=00000000
[ 3067.675263] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[ 3067.726707] Modules linked in: sprdwl goodix_ts trout_fm mali(O)
[ 3067.733073] CPU: 3 PID: 3473 Comm: kworker/u8:1 Tainted: G        W  O 3.10.17-00002-gd176c92 #1
[ 3067.742215] Workqueue: goodix_wq goodix_ts_work_func [goodix_ts]
[ 3067.865490] task: d7d6db00 ti: c329e000 task.ti: c329e000
[ 3067.871319] PC is at memset+0x74/0xe0
[ 3067.875311] LR is at gtp_i2c_test+0x34/0x94 [goodix_ts]
[ 3067.880852] pc : [<c0273834>]    lr : [<bf078594>]    psr: 000f0013
               sp : c329fe18  ip : 00000002  fp : c329fe54
[ 3067.893016] r10: 00000000  r9 : 00000000  r8 : 0000000c
[ 3067.898652] r7 : 00000005  r6 : dafea600  r5 : 00000002  r4 : 00000000
[ 3067.905491] r3 : 00000002  r2 : ffffffff  r1 : 00000000  r0 : 00000002
[ 3067.912431] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[ 3067.920163] Control: 10c53c7d  Table: 8a1a806a  DAC: 00000015
[ 3067.926219] 
               PC: 0xc02737b4:
[ 3067.931166] 37b4  00000000 00000000 00000000 e2103003 e1a0c000 1a00001e e1811401 e1811801
[ 3067.939771] 37d4  e1a03001 e3520010 ba00000f e92d4100 e1a08001 e1a0e001 e2522040 a8ac410a
[ 3067.948499] 37f4  a8ac410a a8ac410a a8ac410a cafffff9 08bd8100 e3120020 18ac410a 18ac410a
[ 3067.957228] 3814  e3120010 18ac410a e8bd4100 e3120008 18ac000a e3120004 148c1004 e3120002
[ 3067.965945] 3834  14cc1001 14cc1001 e3120001 14cc1001 e1a0f00e e2522004 bafffff7 e3530002
[ 3067.974553] 3854  b4cc1001 d4cc1001 e4cc1001 e0822003 eaffffd8 e320f000 e320f000 e320f000
[ 3067.983256] 3874  e320f000 e320f000 e320f000 00000000 e2511004 ba00001d e3530002 b4c02001
[ 3067.991858] 3894  d4c02001 e4c02001 e0811003 e3a02000 e2103003 1afffff5 e3510010 ba00000f
[ 3068.000571] 
               SP: 0xc329fd98:
[ 3068.005414] fd98  dafea408 dafea418 c329fdbc c0273834 000f0013 ffffffff c0273834 000f0013
[ 3068.014130] fdb8  ffffffff c329fe04 c329fe54 c329fdd0 c000f6dc c000916c 00000002 00000000
[ 3068.022734] fdd8  ffffffff 00000002 00000000 00000002 dafea600 00000005 0000000c 00000000
[ 3068.031453] fdf8  00000000 c329fe54 00000002 c329fe18 bf078594 c0273834 000f0013 ffffffff
[ 3068.040161] fe18  0000005d c0010002 c329fe8c 0001005d c065000a c329fe8e c329fe6c da16f440
[ 3068.048878] fe38  da16f400 db404400 ffffff81 00000000 c329fee4 c329fe58 bf078664 bf0784b0
[ 3068.057473] fe58  00000003 c09a5228 004e817c c329fe70 c0012b24 c00129dc c329fe8c c329fe80
[ 3068.066175] fe78  c0071bec c0012b18 c329fe9c c329fe90 c0071c38 00814e81 00ce0310 0000000b
[ 3068.074878] 
               FP: 0xc329fdd4:
[ 3068.079722] fdd4  00000000 ffffffff 00000002 00000000 00000002 dafea600 00000005 0000000c
[ 3068.088432] fdf4  00000000 00000000 c329fe54 00000002 c329fe18 bf078594 c0273834 000f0013
[ 3068.097144] fe14  ffffffff 0000005d c0010002 c329fe8c 0001005d c065000a c329fe8e c329fe6c
[ 3068.105744] fe34  da16f440 da16f400 db404400 ffffff81 00000000 c329fee4 c329fe58 bf078664
[ 3068.114455] fe54  bf0784b0 00000003 c09a5228 004e817c c329fe70 c0012b24 c00129dc c329fe8c
[ 3068.123059] fe74  c329fe80 c0071bec c0012b18 c329fe9c c329fe90 c0071c38 00814e81 00ce0310
[ 3068.131771] fe94  0000000b 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3068.140373] feb4  00000000 c006dc78 ce254b80 da16f440 db404400 db404400 da0b2500 00000000
[ 3068.149086] 
               R6: 0xdafea580:
[ 3068.153926] a580  00000000 00000000 00000001 64727073 6332692d 00000000 00000000 00000000
[ 3068.162629] a5a0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3068.171338] a5c0  00000000 00000000 00000000 00000001 00000000 dafea5d4 dafea5d4 00000000
[ 3068.180038] a5e0  00000000 dafea5e4 dafea5e4 00000000 f536c000 c09a2a68 00000000 00000000
[ 3068.188630] a600  005d0000 646f6f67 745f7869 00000073 00000000 00000000 dafea408 bf07ded4
[ 3068.197336] a620  dafea430 dafb0600 dafb2080 dafea83c dafea43c dafea438 dae5fcc0 c09bbb18
[ 3068.206037] a640  dafb2100 00000003 00000007 00000000 c09c43a8 00000001 00000000 dafea65c
[ 3068.214650] a660  dafea65c 00000000 00000000 c09c43f0 bf07def8 00000000 00000000 00000020
[ 3068.223369] Process kworker/u8:1 (pid: 3473, stack limit = 0xc329e238)
[ 3068.230317] Stack: (0xc329fe18 to 0xc32a0000)
[ 3068.234986] fe00:                                                       0000005d c0010002
[ 3068.243592] fe20: c329fe8c 0001005d c065000a c329fe8e c329fe6c da16f440 da16f400 db404400
[ 3068.252205] fe40: ffffff81 00000000 c329fee4 c329fe58 bf078664 bf0784b0 00000003 c09a5228
[ 3068.260706] fe60: 004e817c c329fe70 c0012b24 c00129dc c329fe8c c329fe80 c0071bec c0012b18
[ 3068.269315] fe80: c329fe9c c329fe90 c0071c38 00814e81 00ce0310 0000000b 00000000 00000000
[ 3068.277915] fea0: 00000000 00000000 00000000 00000000 00000000 00000000 c006dc78 ce254b80
[ 3068.286517] fec0: da16f440 db404400 db404400 da0b2500 00000000 00000000 c329ff24 c329fee8
[ 3068.295019] fee0: c005a8ec bf078600 c329ff0c c329fef8 c006dce4 c00caa7c c098fbf8 ce254b80
[ 3068.303634] ff00: db404400 c329e000 db404400 ce254b98 00000000 db404414 c329ff5c c329ff28
[ 3068.312250] ff20: c005afd0 c005a668 00000000 ce254b80 c005adac db467e78 00000000 ce254b80
[ 3068.320870] ff40: c005adac 00000000 00000000 00000000 c329ffac c329ff60 c0060cec c005adb8
[ 3068.329399] ff60: c065b860 00000000 c329ff94 ce254b80 00000000 00000000 c329ff78 c329ff78
[ 3068.338016] ff80: 00000000 00000000 c329ff88 c329ff88 db467e78 c0060c30 00000000 00000000
[ 3068.346628] ffa0: 00000000 c329ffb0 c000fc48 c0060c3c 00000000 00000000 00000000 00000000
[ 3068.355235] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3068.363738] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffbbd7e8 ffbbd7e8
[ 3068.372349] Backtrace: 
[ 3068.374961] [<bf0784a4>] (gtp_i2c_read+0x0/0xbc [goodix_ts]) from [<bf078664>] (goodix_ts_work_func+0x70/0x288 [goodix_ts])
[ 3068.386522]  r8:00000000 r7:ffffff81 r6:db404400 r5:da16f400 r4:da16f440
[ 3068.393667] [<bf0785f4>] (goodix_ts_work_func+0x0/0x288 [goodix_ts]) from [<c005a8ec>] (process_one_work+0x290/0x48c)
[ 3068.404725] [<c005a65c>] (process_one_work+0x0/0x48c) from [<c005afd0>] (worker_thread+0x224/0x370)
[ 3068.414216] [<c005adac>] (worker_thread+0x0/0x370) from [<c0060cec>] (kthread+0xbc/0xcc)
[ 3068.422741] [<c0060c30>] (kthread+0x0/0xcc) from [<c000fc48>] (ret_from_fork+0x14/0x20)
[ 3068.431065]  r7:00000000 r6:00000000 r5:c0060c30 r4:db467e78
[ 3068.437053] Code: 18ac000a e3120004 148c1004 e3120002 (14cc1001) 
[ 3068.443466] (sprd_debug_save_context) context saved(CPU:3)
[ 3068.449856] (sprd_debug_save_context) context saved(CPU:0)
[ 3068.455824] CPU0: stopping


我的初步分析:


[ 3067.871319] PC is at memset+0x74/0xe0
[ 3067.875311] LR is at gtp_i2c_test+0x34/0x94 [goodix_ts]
PC指向memset,說明問題出在memset函式。
LR指向gtp_i2c_test,說明gtp_i2c_test呼叫了memset
檢視程式碼發現gtp_i2c_test中根本沒有memset呼叫,這什麼情況???因此,對當時的vmlinux進行反彙編:
/home/apuser/mywork/4.4-3.10-prime/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-objdump -D vmlinux > vmlinux.dis
用vi檢視得到的vmlinux.dis就可以檢視彙編程式碼了
 990525 c03b0fb4 <gtp_i2c_test>:
 990526 c03b0fb4:   e1a0c00d    mov ip, sp
 990527 c03b0fb8:   e92dd8f0    push    {r4, r5, r6, r7, fp, ip, lr, pc}
 990528 c03b0fbc:   e24cb004    sub fp, ip, #4
 990529 c03b0fc0:   e24dd008    sub sp, sp, #8
 990530 c03b0fc4:   e52de004    push    {lr}        ; (str lr, [sp, #-4]!)
 990531 c03b0fc8:   ebf17b34    bl  c000fca0 <__gnu_mcount_nc>
 990532 c03b0fcc:   e24b501f    sub r5, fp, #31
 990533 c03b0fd0:   e1a07000    mov r7, r0
 990534 c03b0fd4:   e3a01000    mov r1, #0
 990535 c03b0fd8:   e1a00005    mov r0, r5
 990536 c03b0fdc:   e3a02003    mov r2, #3
 990537 c03b0fe0:   e3a04000    mov r4, #0
 990538 c03b0fe4:   ebfb09f5    bl  c02737c0 <memset>
 990539 c03b0fe8:   e3e0307f    mvn r3, #127    ; 0x7f
 990540 c03b0fec:   e54b301f    strb    r3, [fp, #-31]
 990541 c03b0ff0:   e3a03047    mov r3, #71 ; 0x47
 990542 c03b0ff4:   e54b301e    strb    r3, [fp, #-30]
 990543 c03b0ff8:   e1a00007    mov r0, r7
 990544 c03b0ffc:   e1a01005    mov r1, r5
 990545 c03b1000:   e3a02003    mov r2, #3




雖然可以確認在C程式碼層面上 gtp_i2c_test() 中並沒有顯式呼叫memset函式。
但是上面的彙編程式碼確實表明gtp_i2c_test() 中呼叫了memset函式。


什麼原因呢?
經過編譯實驗發現,反彙編中出現的memset是gtp_i2c_test()中的如下語句導致。(怎麼實驗?靠經驗猜唄,估計就是陣列初始話導致的,果不其然!)
u8 test[3] = {GTP_REG_CONFIG_DATA >> 8, GTP_REG_CONFIG_DATA & 0xff};


如果把上述語句的“賦初值”動作去掉,則不會在反彙編中看到memset。
看樣子,陣列的“賦初值”動作會被編譯系統插入memset來完成,學習了 :)


另外,從gt9xx.c的程式碼路徑上分析,gtp_i2c_test()僅僅在probe和goodix_ts_late_resume中會呼叫到。除此之外沒有別的呼叫者。


[ 3067.875311] c3 LR is at gtp_i2c_test+0x34/0x94 [goodix_ts]
以上,LR顯示呼叫到了gtp_i2c_test,
但是,下面的Backtrace顯示最後由goodix_ts_work_func呼叫的是gtp_i2c_read,在goodix_ts_work_func這條路上是沒有gtp_i2c_test的。


[ 3068.372349] c0 Backtrace: 
[ 3068.374961] c3 [<bf0784a4>] (gtp_i2c_read+0x0/0xbc [goodix_ts]) from [<bf078664>] (goodix_ts_work_func+0x70/0x288 [goodix_ts])


所以,這裡出現了矛盾:PC和LR說明最後掉的是gtp_i2c_test(),但是backtrace卻顯示最後調的是gtp_i2c_read。
因此懷疑,可能是記憶體覆蓋。


基於以下資訊,通過crash工具檢視當時的task資訊
[ 3067.865490] task: d7d6db00 ti: c329e000 task.ti: c329e000


通過task.ti: c329e000知道當前struct thread_info的起始地址在c329e000
因此通過命令crash> struct thread_info c329e000解釋出該起始地址的資訊,如下。


crash> 
crash> 
crash> struct thread_info c329e000
struct thread_info {
  flags = 2, 
  preempt_count = 0, 
  addr_limit = 0, 
  task = 0xd7d6db00, 
  exec_domain = 0xc09a53f8, 
  cpu = 3, 
  cpu_domain = 21, 
  cpu_context = {
    r4 = 3237478720, 
    r5 = 3621182208, 
    r6 = 3274301440, 
    r7 = 3404384512, 
    r8 = 3659349440, 
    r9 = 3659349440, 
    sl = 0, 
    fp = 3274309396, 
    sp = 3274309256, 
    pc = 3227886688, 
    extra = {0, 0}
  }, 
  syscall = 0, 
  used_cp = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
  tp_value = 0, 
  fpstate = {
    hard = {
      save = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
    }, 
    soft = {
      save = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
    }
  }, 
  vfpstate = {
    hard = {
      fpregs = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
      fpexc = 0, 
      fpscr = 0, 
      fpinst = 0, 
      fpinst2 = 0, 
      cpu = 4
    }
  }, 
  restart_block = {
    fn = 0xc00520c8 <do_no_restart_syscall>, 
    {
      futex = {
        uaddr = 0x0, 
        val = 0, 
        flags = 0, 
        bitset = 0, 
        time = 0, 
        uaddr2 = 0x0
      }, 
      nanosleep = {
        clockid = 0, 
        rmtp = 0x0, 
        expires = 0
      }, 
      poll = {
        ufds = 0x0, 
        nfds = 0, 
        has_timeout = 0, 
        tv_sec = 0, 
        tv_nsec = 0
      }
    }
  }
}






struct thread_info資訊看來是正常的。
通過task: d7d6db00得知當前的struct tast_struct 在 0xd7d6db00
通過crash> struct tast_struct 0xd7d6db00   將 0xd7d6db00處的內容解釋為struct tast_struct,結果如下:


crash> struct tast_struct 0xd7d6db00
struct: invalid data structure reference: tast_struct
crash> struct tast_struct d7d6db00
struct: invalid data structure reference: tast_struct
crash> struct task_struct d7d6db00
struct task_struct {
  state = 1767861102, 
  stack = 0x2c333a64, 
  usage = {
    counter = 980248947
  }, 
  flags = 1701588016, 
  ptrace = 741948014, 
  wake_entry = {
    next = 0x656e696c
  }, 
  on_cpu = 726942010, 
  on_rq = 1397772099, 
  prio = -1879044801, 
  static_prio = 1090532352, 
  normal_prio = 540689236, 
  rt_priority = 1598248001, 
  sched_class = 0x4e636552, 
  se = {
    load = {
      weight = 1767073134, 
      inv_weight = 1344285799
    }, 
    run_node = {
      __rb_parent_color = 1702064737, 
      rb_right = 0x20544120, 
      rb_left = 0x20646d63
    }, 
    group_node = {
      next = 0x656e696c, 
      prev = 0x63757320
    }, 
    on_rq = 1936942435, 
    exec_start = 4683809585584406574, 
    sum_exec_runtime = 6864422895733064532, 
    vruntime = 7521983764486120772, 
    prev_sum_exec_runtime = 7236549166089137475, 
    nr_migrations = 2897004667659187295, 
    parent = 0x78652d31, 
    cfs_rq = 0x2d322c65, 
    my_q = 0x2c746573, 
    avg = {
      runnable_avg_sum = 942433377, 
      runnable_avg_period = 1936028717, 
      last_runnable_update = 7854334927154719092, 
      decay_count = 8387188085530784366, 
      load_avg_contrib = 1852664912
    }
  }, 
  rt = {
    run_list = {
      next = 0x6f4d7463, 
crash> 
crash> 


從以上struct tast_struct中各個成員的值很奇怪啊,可以推測,該內容被覆蓋。


通過以下命令將該地址開始的100個地址的內容讀出來,並通過對應的ASCII分析,大概可以知道,該處記憶體被什麼東西覆蓋了。


crash> rd d7d6db00 100
d7d6db00:  695f6b6e 2c333a64 3a6d6973 656c2c30   nk_id:3,sim:0,le
d7d6db10:  2c393a6e 656e696c 2b54413a 53504f43   n:9,line:AT+COPS
d7d6db20:  90000d3f 41003400 203a4354 5f435441   ?....4.ATC: ATC_
d7d6db30:  4e636552 694c7765 6953656e 50202c67   RecNewLineSig, P
d7d6db40:  65737261 20544120 20646d63 656e696c   arse AT cmd line
d7d6db50:  63757320 73736563 9054002e 41003c00    success..T..<.A
d7d6db60:  203a4354 5f435441 70736544 68637461   TC: ATC_Despatch
d7d6db70:  2c646d43 646d6320 7079745f 28343a65   Cmd, cmd_type:4(
d7d6db80:  78652d31 2d322c65 2c746573 65722d34   1-exe,2-set,4-re
d7d6db90:  382c6461 7365742d 90002974 6d003400   ad,8-test)...4.m
d7d6dba0:  6673726e 74654720 6e6d6c50 656c6553   nrsf GetPlmnSele
d7d6dbb0:  6f4d7463 305b6564 65722c5d 2c313d74   ctMode[0],ret=1,
d7d6dbc0:  5f766e20 3d6c6176 6d2c3538 3d65646f    nv_val=85,mode=
d7d6dbd0:  90900031 41001400 203a4354 64695f4c   1......ATC: L_id
d7d6dbe0:  5f533e2d 30206469 904d7400 41004400   ->S_id 0.tM..D.A
d7d6dbf0:  203a4354 6c697542 666e4964 7073526f   TC: BuildInfoRsp
d7d6dc00:  696c202c 695f6b6e 2c333a64 6d697320   , link_id:3, sim
d7d6dc10:  202c303a 69727473 203a676e 504f432b   :0, string: +COP
d7d6dc20:  30203a53 222c322c 30303634 362c2231   S: 0,2,"46001",6
d7d6dc30:  90000000 4d002800 575f5855 65746972   .....(.MUX_Write
d7d6dc40:  61747320 6c207472 696b6e69 2c323a64    start linkid:2,
d7d6dc50:  74616420 6e656c61 20383120 9000000a    datalen 18 ....
d7d6dc60:  73007400 2e667562 73203a63 5f667562   .t.sbuf.c: sbuf_
d7d6dc70:  74697277 68633a65 656e6e61 7369206c   write:channel is
d7d6dc80:  202c3620 69667562 73692064 202c3220    6, bufid is 2,