1. 程式人生 > >[HOWTO]: Linux/Android常用除錯工具

[HOWTO]: Linux/Android常用除錯工具

本文介紹Linux/Android一些常用的除錯工具及其使用說明,作為備忘,持續更新中。

注意:大部分都不是本人原創,是從各地方蒐集而來,原作者也未一一追溯,所以沒有出處說明,如有冒犯,請評論或私信,我會盡快修改。

FIQ-Debugger

fiq debugger是整合到核心中的一種系統除錯手段。

FIQ在arm架構中相當於nmi中斷,fiq debugger把串列埠註冊成fiq中斷,在串列埠fiq中斷服務程式中集成了一些系統除錯命令。

一般情況下串列埠是普通的console模式,minicom下輸入切換命令"Ctrl + A + F",串列埠會切換到fiq debugger模式。

因為FIQ是不可遮蔽中斷,所以這種除錯手段適合除錯cpu被hang住的情況,可以在hang住的時候用fiq debugger打印出cpu的故障現場,常用命令是sysrq。

要使用fiq debugger,需要核心配置:

CONFIG_FIQ_DEBUGGER                         // 使能fiq debugger
CONFIG_FIQ_DEBUGGER_CONSOLE                 // fiq debugger與console可以互相切換
CONFIG_FIQ_DEBUGGER_CONSOLE_DEFAULT_ENABLE  // 啟動時預設串列埠在console模式
Fiq debugger相關使用命令:
debug> help
FIQ Debugger commands:
 pc            PC status
 regs          Register dump
 allregs       Extended Register dump
 bt            Stack trace
 reboot [<c>]  Reboot with command <c>
 reset [<c>]   Hard reset with command <c>
 irqs          Interupt status
 sleep         Allow sleep while in FIQ
 nosleep       Disable sleep while in FIQ
 console       Switch terminal to console
 cpu           Current CPU
 cpu <number>  Switch to CPU<number>
 ps            Process list
 sysrq         sysrq options
 sysrq <param> Execute sysrq with <param>

SysRq

在定位宕機問題時,有時會碰到這樣的場景:系統掛死,但是又不復位。系統不主動復位就無法獲得復位之前打印出的故障堆疊資訊,在這種情況下,如果系統中斷還是使能的情況下,可以使用組合鍵呼叫sysrq的方式來主動dump出系統堆疊資訊。

要想啟用SysRq,需要在配置核心選項CONFIG_MAGIC_SYSRQ。對於支援SysRq的核心,/proc/sys/kernel/sysrq控制SysRq的啟用與否。關於 sysrq的更多描述,請參考核心文件Documentation/sysrq.txt。

SysRq一系列的除錯命令如下:

*  What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'b'     - Will immediately reboot the system without syncing or unmounting your disks.
'c'    - Will perform a system crash by a NULL pointer dereference. A crashdump will be taken if configured.
'd'    - Shows all locks that are held.
'e'     - Send a SIGTERM to all processes, except for init.
'f'    - Will call oom_kill to kill a memory hog process.
'g'    - Used by kgdb (kernel debugger)
'h'     - Will display help (actually any other key than those listed here will display help. but 'h' is easy to remember :-)
'i'     - Send a SIGKILL to all processes, except for init.
'j'     - Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.
'k'     - Secure Access Key (SAK) Kills all programs on the current virtual console. NOTE: See important comments below in SAK section.
'l'     - Shows a stack backtrace for all active CPUs.
'm'     - Will dump current memory info to your console.
'n'    - Used to make RT tasks nice-able
'o'     - Will shut your system off (if configured and supported).
'p'     - Will dump the current registers and flags to your console.
'q'     - Will dump per CPU lists of all armed hrtimers (but NOT regular timer_list timers) and detailed information about all
          clockevent devices.
'r'     - Turns off keyboard raw mode and sets it to XLATE.
's'     - Will attempt to sync all mounted filesystems.
't'     - Will dump a list of current tasks and their information to your console.
'u'     - Will attempt to remount all mounted filesystems read-only.
'v'    - Forcefully restores framebuffer console 'v'    - Causes ETM buffer dump [ARM-specific]
'w'    - Dumps tasks that are in uninterruptable (blocked) state.
'x'    - Used by xmon interface on ppc/powerpc platforms.
'y'    - Show global CPU Registers [SPARC-64 specific]
'z'    - Dump the ftrace buffer
'0'-'9' - Sets the console log level, controlling which kernel messages will be printed to your console. ('0', for example would make
          it so that only emergency messages like PANICs or OOPSes would make it to your console.)
*  Okay, so what can I use them for?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
如我們除錯hang(宕機無響應問題)時,需要找出程序狀態是D的程序(這種程序是Uninterruptible Sleep,不接受任何外來訊號,即是說用kill無法殺死這些程序):

echo w > /proc/sysrq-trigger

P.S. 列一下Process/Thread 狀態:

"R (running)",  /*   0 */
"S (sleeping)",  /*   1 */
"D (disk sleep)", /*   2 */
"T (stopped)",  /*   4 */
"t (tracing stop)", /*   8 */
"Z (zombie)",  /*  16 */
"X (dead)",  /*  32 */
"x (dead)",  /*  64 */
"K (wakekill)",  /* 128 */
"W (waking)",  /* 256 */

通常一般的Process處於的狀態都是S(sleeping),而如果一旦發現處於如D(disk sleep)、T(stopped)、Z(zombie)等就要認真審查。

debuggerd

debuggerd是android的一個daemon程序,負責在程序異常出錯時,將程序的執行時資訊dump出來供分析。debuggerd生成的coredump資料是以文字形式呈現,被儲存在 /data/tombstone/ 目錄下(名字取的也很形象,tombstone是墓碑的意思),共可儲存10個檔案,當超過10個時,會覆蓋重寫最早生成的檔案。從4.2版本開始,debuggerd同時也是一個實用工具:可以在不中斷程序執行的情況下列印當前程序的native堆疊;使用方法是:debuggerd -b <pid>

這可以協助我們分析程序執行行為,但最有用的地方是:它可以非常簡單的定位到native程序中鎖死或錯誤邏輯引起的死迴圈的程式碼位置。

devmem

busybox中集成了一個直接讀寫實體記憶體的工具devmem:

devmem is a small program that reads and writes from physical memory using /dev/mem.

Usage: devmem ADDRESS [WIDTH [VALUE]]

例如,我們需要了解一些GPIO引腳的配置,由於這些GPIO配置暫存器會對映到一個特別的記憶體段上,即SFR(Special Function Registers),我們讀取相應的記憶體地址就可以了,如下讀取0x13470000的值然後往0x13470000寫入0x0:

# busybox devmem 0x13470000 32                                 
0x00022222
# busybox devmem 0x13470000 32 0x0

--to be continued...