1. 程式人生 > >手機錄影出現arn無響應

手機錄影出現arn無響應

個人總結,僅供參考
手機錄影卡死出現anr,由於公司外卡通常比較低廉,經常出現儲存相關問題,分析過程僅供參考。。。

kernel log info:
// NOT RESPONDING: com.android.camera2 (pid 3221) Time Longth Since Monkey Start: 19814808
ANR in com.android.camera2 (com.android.camera2/com.android.camera.CameraLauncher)
PID: 3221
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 1.)
Load: 12.48 / 7.55 / 6.01
CPU usage from 0ms to 13632ms later:
54% 556/system_server: 21% user + 33% kernel / faults: 14242 minor 327 major
39% 186/mediaserver: 16% user + 22% kernel / faults: 11250 minor 21 major
32% 2982/slog: 5.2% user + 27% kernel / faults: 138 minor 9 major
31% 3221/com.android.camera2: 17% user + 13% kernel / faults: 9363 minor 77 major
21% 320/logcat: 4.1% user + 17% kernel / faults: 67 minor
15% 163/surfaceflinger: 3.6% user + 11% kernel / faults: 553 minor 3 major
10% 54/kswapd0: 0% user + 10% kernel
14% 3059/com.android.dialer: 7.7% user + 6.6% kernel / faults: 11417 minor 119 major
0% 162/debuggerd: 0% user + 0% kernel / faults: 6513 minor 68 major
7.1% 1513/com.android.phone: 2.5% user + 4.5% kernel / faults: 5366 minor 74 major
8.8% 1249/com.android.systemui: 3.5% user + 5.3% kernel / faults: 5360 minor 58 major
7.6% 1861/sdcard: 0.2% user + 7.4% kernel / faults: 84 minor 4 major
3.9% 1553/android.process.acore: 1.3% user + 2.5% kernel / faults: 4840 minor 20 major
2.5% 1900/com.android.launcher3: 1.4% user + 1.1% kernel / faults: 4944 minor 68 major
3.7% 1837/android.process.media: 1.4% user + 2.2% kernel / faults: 2154 minor 6 major
2.8% 3/ksoftirqd/0: 0% user + 2.8% kernel
2.4% 19/ksoftirqd/2: 0% user + 2.4% kernel
2.3% 86/mmcqd/0: 0% user + 2.3% kernel
0% 1755/com.android.modemassert: 0% user + 0% kernel / faults: 2296 minor 214 major
0% 1734/com.spreadst.validator: 0% user + 0% kernel / faults: 1971 minor 218 major
1.9% 3007/com.android.commands.monkey: 0.8% user + 1.1% kernel / faults: 658 minor 9 major
1.9% 14/ksoftirqd/1: 0% user + 1.9% kernel
1.9% 24/ksoftirqd/3: 0% user + 1.9% kernel
1.1% 81/ion_heap_system: 0% user + 1.1% kernel
0.3% 9557/com.android.launcher3:wallpaper_chooser: 0.1% user + 0.1% kernel / faults: 2187 minor 142 major
0.8% 159/servicemanager: 0% user + 0.8% kernel / faults: 49 minor
0.8% 21808/kworker/0:2: 0% user + 0.8% kernel
0.4% 130/mmcqd/1: 0% user + 0.4% kernel
0.3% 141/jbd2/mmcblk0p21: 0% user + 0.3% kernel
0.2% 1688/com.android.inputmethod.latin: 0.1% user + 0.1% kernel / faults: 405 minor 1 major
0.3% 4348/com.sprd.note: 0% user + 0.2% kernel / faults: 535 minor 177 major
0.2% 185/drmserver: 0% user + 0.2% kernel / faults: 127 minor
0.2% 19806/kworker/u8:4: 0% user + 0.2% kernel
0.2% 8/rcu_preempt: 0% user + 0.2% kernel
0.1% 133/kworker/0:1H: 0% user + 0.1% kernel
0.1% 2219/com.android.mms: 0.1% user + 0% kernel / faults: 279 minor
0.1% 2449/com.android.gallery3d: 0% user + 0.1% kernel / faults: 402 minor 4 major
0% 22032/kworker/u8:0: 0% user + 0% kernel
0.1% 155/sdcard: 0% user + 0.1% kernel / faults: 42 minor 1 major
0% 158/lmkd: 0% user + 0% kernel / faults: 30 minor
0.1% 2150/com.android.deskclock: 0% user + 0% kernel / faults: 402 minor 3 major
0% 189/wcnd: 0% user + 0% kernel / faults: 102 minor
0% 2111/com.android.quicksearchbox: 0% user + 0% kernel / faults: 221 minor
0% 2372/kworker/1:2: 0% user + 0% kernel
0% 2723/com.android.musicfx: 0% user + 0% kernel / faults: 529 minor
0% 2827/com.android.documentsui: 0% user + 0% kernel / faults: 300 minor
0% 2856/kworker/2:2: 0% user + 0% kernel
0% 3155/com.android.music: 0% user + 0% kernel / faults: 348 minor 6 major
0% 3529/com.android.contacts: 0% user + 0% kernel / faults: 222 minor
0% 6896/com.android.systemui:screenshot: 0% user + 0% kernel / faults: 106 minor
0% 7218/kworker/3:2: 0% user + 0% kernel
0% 18186/wpa_supplicant: 0% user + 0% kernel / faults: 26 minor
+0% 22747/logcat: 0% user + 0% kernel
% TOTAL: 26% user + 56% kernel + 3.8% iowait + 1% softirq

framework層callstack
at libcore.io.Posix.fsync(Native method)
  at libcore.io.BlockGuardOs.fsync(BlockGuardOs.java:143)
  at java.io.FileDescriptor.sync(FileDescriptor.java:74)
  at java.io.RandomAccessFile.<init>(RandomAccessFile.java:122)
  at java.io.RandomAccessFile.<init>(RandomAccessFile.java:149)
  at android.media.MediaRecorder.prepare(MediaRecorder.java:744)
  at com.android.camera.VideoModule.initializeRecorder(VideoModule.java:1268)
  at com.android.camera.VideoModule.access$2200(VideoModule.java:93)
  at com.android.camera.VideoModule$11.onStorageUpdateDone(VideoModule.java:1460)
  at com.android.camera.CameraActivity$19.onPostExecute(CameraActivity.java:2196)
  at com.android.camera.CameraActivity$19.onPostExecute(CameraActivity.java:2180)
  at android.os.AsyncTask.finish(AsyncTask.java:636)
  at android.os.AsyncTask.access$500(AsyncTask.java:177)
  at android.os.AsyncTask$InternalHandler.handleMessage(AsyncTask.java:653)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:135)
  at android.app.ActivityThread.main(ActivityThread.java:5345)
  at java.lang.reflect.Method.invoke!(Native method)
  at java.lang.reflect.Method.invoke(Method.java:372)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:947)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:742)

分析log發現ANR發生在MediaRecorder.java呼叫RandomAccessFile建立檔案時,以下是出錯程式碼位置,應該跟檔案系統有關:
public RandomAccessFile(String fileName, String mode) throws FileNotFoundException {
149        this(new File(fileName), mode);
150    }
即MediaRecorder.prepare的地方呼叫了fsync等函式,造成了超時。

察看kernel 的D狀態的執行緒呼叫棧
01-26 14:02:44.059 <4>[20111.019622] c3 [<c065cce0>] (__schedule+0x398/0x5e0) from [<c065cfb8>] (schedule+0x90/0x94)
01-26 14:02:44.059 <4>[20111.019622] c3 [<c065cfb8>] (schedule+0x90/0x94) from [<c065d204>] (io_schedule+0x78/0xa4)
01-26 14:02:44.059 <4>[20111.019653] c3 [<c065d204>] (io_schedule+0x78/0xa4) from [<c013ef84>] (sleep_on_buffer+0x18/0x20)
01-26 14:02:44.059 <4>[20111.019683] c3 [<c013ef84>] (sleep_on_buffer+0x18/0x20) from [<c065b454>] (__wait_on_bit_lock+0x64/0xac)
01-26 14:02:44.059 <4>[20111.019683] c3 [<c065b454>] (__wait_on_bit_lock+0x64/0xac) from [<c065b524>] (out_of_line_wait_on_bit_lock+0x88/0x94)
01-26 14:02:44.059 <4>[20111.019714] c3 [<c065b524>] (out_of_line_wait_on_bit_lock+0x88/0x94) from [<c013efcc>] (__lock_buffer+0x40/0x48)
01-26 14:02:44.059 <4>[20111.019714] c3 [<c013efcc>] (__lock_buffer+0x40/0x48) from [<c014285c>] (write_dirty_buffer+0x38/0x8c)
01-26 14:02:44.059 <4>[20111.019744] c3 [<c014285c>] (write_dirty_buffer+0x38/0x8c) from [<c0142990>] (sync_mapping_buffers+0xe0/0x274)
01-26 14:02:44.059 <4>[20111.019775] c3 [<c0142990>] (sync_mapping_buffers+0xe0/0x274) from [<c01f56dc>] (fat_file_fsync+0x4c/0x5c)
01-26 14:02:44.059 <4>[20111.019775] c3 [<c01f56dc>] (fat_file_fsync+0x4c/0x5c) from [<c013d588>] (vfs_fsync_range+0x44/0x54)
01-26 14:02:44.059 <4>[20111.019805] c3 [<c013d588>] (vfs_fsync_range+0x44/0x54) from [<c013d5cc>] (vfs_fsync+0x34/0x3c)
01-26 14:02:44.059 <4>[20111.019805] c3 [<c013d5cc>] (vfs_fsync+0x34/0x3c) from [<c013d79c>] (do_fsync+0x38/0x60)
01-26 14:02:44.059 <4>[20111.019836] c3 [<c013d79c>] (do_fsync+0x38/0x60) from [<c013da64>] (SyS_fsync+0x1c/0x20)
01-26 14:02:44.059 <4>[20111.019866] c3 [<c013da64>] (SyS_fsync+0x1c/0x20) from [<c000f880>] (ret_fast_syscall+0x0/0x48)

分析結果如下:
sync_mapping_buffers函式作用是同步inode間接快的資料,間接快儲存的是檔案定址需要的資訊,比如對於fat32,間接塊儲存的是一些簇資訊,方便write或read時通過指定的偏移量
來計算在物理塊上的位置,再比如,對於ext檔案系統,磁碟索引節點ext4_inode的成員i_block是一個有15個元素且包含邏輯塊號的陣列。最初的12個元素產生的邏輯塊號與檔案最初的12個
塊號相當於應,如果檔案大小大於12個塊,第13個元素指向的塊當作間接塊,裡面存放4K/4個元素(加入快大小4k),這些元素又分別指向另外的邏輯塊,檔案塊號最大為11+4K/4
。依次類推,第14個元素為2級索引,15個元素為3級索引,通過三級索引可以計算ext的檔案最大不超過4TB。

cpu使用率不高,IO wait不高,快取不高。但綜合可以分析出:
__lock_buffer應該是針對該buffer的IO操作已經進入隊列了,但是沒有完成,所以fsync會一直等待IO完成並unlock_buffer後才能返回。

從出現ANR的java callstack看,阻塞點是在開始錄影時剛剛建立好檔案,這時候進行fsync操作一般只是同步檔案元資料(因為還沒有檔案資料)。 通常情況下,如果系統沒有什麼IO壓力,單個檔案元資料的同步應該很快的,因為資料量小,只有在IO壓力較高的情況下,才會排隊等待可能消耗較長時間。

我們看到snapshot中的dirty+writeback不高,是因為獲取meminfo的時間點已經距離ANR過去了好幾秒了,包括CPU usage也是,IO WAIT 不高,是因為cpu 佔用率較高cpu idle少的緣故。 
所以fsync耗時長 應該是IO壓力導致的,從mainlog看,ANR是在上一次錄影剛結束時馬上進行下一次錄影時產生的,IO壓力應該來自於上一次的錄影資料。如果T卡的速度足夠快的話,在錄影的任何階段都不應該有很高的IO壓力,只有當T卡速度不滿足錄影需求(低速T卡,或者是T卡碎片化),才會在錄影結束時累積髒頁。從log可以看到flushFileCache超時這樣的情況。

另外,fsync操作本身其實是會降低IO效能的,因為它會強制刷出page cache的資料到T卡,這樣往往不利於IO請求的合併,降低速率。 所以對於T卡一般不推薦使用fsync操作,除非是至關重要的如資料庫資料。 在4.4版本上對視訊檔案並沒有這樣的操作。

從log上並不能看出為什麼T卡寫速度慢,可能是本身效能就不好,或者是碎片化了,碎片化的T卡效能會下降到2MB/s以下,有時候甚至到1MB/s, 顯然是不能滿足錄影的寫需求的。

對於已經慢速或者碎片化的T卡,系統是沒有辦法解決其速度問題的。請嘗試用格式化的高速T卡進行復測。

根據測試結果
讀寫速度:
T卡(已用記憶體大於bug報出時)測試結果(可能存在誤差)如下:
68 1-1.2
寫速度:2.55M/s
讀速度:136M/s

2.55MB/s的寫速度對於開啟slog同時錄影是不夠的。

建議:
換新T卡跟蹤測試。若再現,估計得從應用程式方面進行優化,在錄影開始就在主程序中進行fsync這樣會被IO阻塞的操作,並不是一個良好的設計。