ANR原因分析及解決方法
內容目錄
1 ANR是如何產生的
1.1 ANR產生條件
ANR的產生需要同時滿足三個條件:
- 主執行緒:只有應用程式程序的主執行緒響應超時才會產生ANR;
- 超時時間:產生ANR的上下文不同,超時時間也不同,但只要超過這個時間上限沒有響應就會產生ANR;
- 輸入事件/特定操作:輸入事件是指按鍵、觸屏等裝置輸入事件,特定操作是指BroadcastReceiver和Service的生命週期中的各個函式呼叫。
產生ANR的上下文不同,導致ANR原因也不同,主要有以下三種情況:
- 應用程序的主執行緒對輸入事件在5s內沒有處理完畢;
- 應用程序的主執行緒在執行BroadcastRecevier的onReceive函式時10s內沒有處理完畢;
- 應用程序的主執行緒在執行Service的各個生命週期函式時20s內沒有處理完畢;
1.2 引起ANR的根本原因
引起ANR的根本原因,總的來說可以歸納為兩類:
- 應用程序自身引起的,比如:主執行緒阻塞、掛起、死迴圈,執行耗時操作等;
- 其他程序引起的,比如:其他程序CPU佔用率過高,導致當前應用程序無法搶佔到CPU時間片。常見的問題如檔案讀寫頻繁,io程序CPU佔用率過高,導致當前應用出現ANR;
2 ANR原因分析方法
2.1 通過logcat日誌分析原因
舉一個開發過程中的ANR例項,《從雲盤上下載視訊到本地,進入下載中心刪除任務和檔案,進入檔案管理提示‘檔案管理無響應’》,在logcat日誌中輸出了以下ANR資訊:
01-21 23:58:40.265 975 991 E ActivityManager: ANR in com.letv.filemanager 01-21 23:58:40.265 975 991 E ActivityManager: Reason: Executing service com.letv.filemanager/.netstorage.service.DlnaService 01-21 23:58:40.265 975 991 E ActivityManager: Load: 5.62 / 4.3 / 3.55 01-21 23:58:40.265 975 991 E ActivityManager: CPU usage from 30331ms to 0ms ago: 01-21 23:58:40.265 975 991 E ActivityManager: 26% 1760/com.letv.airplay: 24% user + 1.6% kernel / faults: 1094 minor 01-21 23:58:40.265 975 991 E ActivityManager: 15% 787/cdnclient: 0% user + 15% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 13% 298/kswapd0: 0% user + 13% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 9.5% 1437/com.letv.dmr: 9% user + 0.5% kernel / faults: 572 minor 01-21 23:58:40.265 975 991 E ActivityManager: 9.1% 953/flush-179:0: 0% user + 9.1% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5.8% 788/collector: 0.8% user + 4.9% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5% 419/mmcqd: 0% user + 5% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 2.9% 814/ld-linux.so.3: 1.1% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 2.5% 822/dd: 0.2% user + 2.2% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.4% 2533/com.letv.threeScreen: 0.7% user + 0.7% kernel / faults: 134 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.5% 817/logcat: 0% user + 0.4% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.4% 7/events/0: 0% user + 0.4% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.4% 1254/com.letv.t2.launcher: 0.4% user + 0% kernel / faults: 20 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.2% 975/system_server: 0% user + 0.1% kernel / faults: 34 minor 1 major 01-21 23:58:40.265 975 991 E ActivityManager: 0.2% 1452/com.letv.downloads: 0.1% user + 0.1% kernel / faults: 40 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.1% 808/virtualkeypad: 0% user + 0.1% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.1% 1292/RTW_CMD_THREAD: 0% user + 0.1% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 753/jbd2/mmcblk9-8: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 8/events/1: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 177/bdi-default: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 786/letvmanager: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 1472/com.letv.filemanager: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 1737/com.letv.medialog: 0% user + 0% kernel / faults: 48 minor 01-21 23:58:40.265 975 991 E ActivityManager: 62% TOTAL: 7.2% user + 10% kernel + 43% iowait + 1% softirq 01-21 23:58:40.265 975 991 E ActivityManager: CPU usage from 3195ms to 3733ms later: 01-21 23:58:40.265 975 991 E ActivityManager: 15% 975/system_server: 3.7% user + 11% kernel / faults: 1 minor 01-21 23:58:40.265 975 991 E ActivityManager: 15% 991/ActivityManager: 3.7% user + 11% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 7.5% 1760/com.letv.airplay: 7.5% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5.6% 2615/SocketListener(: 5.6% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5.5% 419/mmcqd: 0% user + 5.5% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 298/kswapd0: 0% user + 3.7% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 788/collector: 0% user + 3.7% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 866/collector: 0% user + 3.7% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.8% 865/collector: 0% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 814/ld-linux.so.3: 1.8% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.8% 1009/814Linux hotplu: 1.8% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.8% 1013/IRQThread_157: 0% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.3% 1254/com.letv.t2.launcher: 0.3% user + 0% kernel / faults: 1 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.3% 1254/etv.t2.launcher: 0.3% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 54% TOTAL: 2.9% user + 5.9% kernel + 45% iowait |
我們可以通過ActivityManagerService的appNotResponding方法看一下ANR資訊是如何記錄到logcat日誌中的:
.\framework\base\services\Java\com\Android\server\am\ActivityManagerService.java
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { ...... synchronized (this) { // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down. if (mShuttingDown) { //重啟時忽略ANR Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation); return; } else if (app.notResponding) { //之前已處理過則不再處理 Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation); return; } else if (app.crashing) { //已經Crash的程序則不再處理ANR Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation); return; } //設定ANR標誌,用於防止重複處理 // In case we come through here for the same app before completing // this one, mark as anring now so we will bail out. app.notResponding = true; // Log the ANR to the event log. EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid, app.processName, app.info.flags, annotation); //將發生ANR的應用pid放入firstPids集合中 // Dump thread traces as quickly as we can, starting with "interesting" processes. firstPids.add(app.pid); //將父PID也放入firstPids集合 int parentPid = app.pid; if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid; if (parentPid != app.pid) firstPids.add(parentPid); //將主PID放入firstPids集合 if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID); //將persistent程序放入firstPids集合 for (int i = mLruProcesses.size() - 1; i >= 0; i--) { ProcessRecord r = mLruProcesses.get(i); if (r != null && r.thread != null) { int pid = r.pid; if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) { if (r.persistent) { firstPids.add(pid); } else { lastPids.put(pid, Boolean.TRUE); } } } } } // 將ANR資訊寫入log // Log the ANR to the main log. StringBuilder info = new StringBuilder(); info.setLength(0); info.append("ANR in ").append(app.processName); if (activity != null && activity.shortComponentName != null) { info.append(" (").append(activity.shortComponentName).append(")"); } info.append("\n"); info.append("PID: ").append(app.pid).append("\n"); if (annotation != null) { info.append("Reason: ").append(annotation).append("\n"); } if (parent != null && parent != activity) { info.append("Parent: ").append(parent.shortComponentName).append("\n"); } final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true); File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids, NATIVE_STACKS_OF_INTEREST); String cpuInfo = null; if (MONITOR_CPU_USAGE) { updateCpuStatsNow(); synchronized (mProcessCpuThread) { cpuInfo = mProcessCpuTracker.printCurrentState(anrTime); } info.append(processCpuTracker.printCurrentLoad()); info.append(cpuInfo); } info.append(processCpuTracker.printCurrentState(anrTime)); Slog.e(TAG, info.toString()); ...... } |
AMS在發生ANR時,會按以下格式將ANR資訊記錄到logcat日誌中:
ActivityManager: ANR in [Process Name]([short component name]) ActivityManager: PID: [Application Pid] ActivityManager: Reason: [Annotation] ActivityManager: Parent: [short component name of parent] [Process CPU state] |
其中的Reason資訊可以給出ANR產生原因的一些詳細資訊,
· 輸入事件處理引起的ANR,提示資訊格式為:“Inputdispatching timed out [anr reason]”
· Service處理引起的ANR,提示資訊格式為:“Executingservice [Package name]/[Short class name]”
· Broadcast處理引起的ANR,提示資訊格式為:“Broadcast of [Intent focused byBroadcast receiver]”
其中的Process CPU state 資訊格式如下:
ActivityManager: Load: [Load1] / [Load5] / [Load15] ActivityManager: CPU usage from [上次取樣與現在的時間差] ms to [當前取樣與現在的時間差] ms ago/later: ActivityManager: [總的CPU時間佔用率]% [PID]/[Process Name]: [使用者CPU時間佔用率]% user + [系統CPU時間佔用率]% kernel + [IO等待CPU時間佔用率]% iowait + [硬中斷CPU時間佔用率]% irq + [軟中斷CPU時間佔用率]% softirq / faults: [次要頁錯誤/主要頁錯誤]minor/major ActivityManager: ..... ActivityManager: [CPU時間佔用率合計統計] |
其中Load1, Load5, Load15分別為CPU 1分鐘平均任務負載數,5分鐘平均任務負載數,15分鐘平均任務負載數,平均任務負載數和CPU佔用率並沒有必然聯絡,可以作為參考資訊。可以通過分析各程序的CPU時間佔用率,來判斷是否為某些程序長期佔用CPU導致該程序無法獲取到足夠的CPU處理時間,而導致發生ANR。
這裡需要重點關注的是Load1,各程序總的CPU時間佔用率,使用者CPU時間佔用率,系統CPU時間佔用率,以及iowait CPU時間佔用率。
2.2 通過trace檔案分析原因
ActivityManagerService的appNotResponding方法在寫入logcat日誌的同時,還會將ANR時的stack trace資訊寫入到trace檔案,具體程式碼流程如下:
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { ...... File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids, NATIVE_STACKS_OF_INTEREST); ...... } public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) { //判斷系統屬性,確認trace檔案路徑 String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null); if (tracesPath == null || tracesPath.length() == 0) { return null; } //若trace檔案不存在則建立,並設定可讀寫屬性 File tracesFile = new File(tracesPath); try { File tracesDir = tracesFile.getParentFile(); if (!tracesDir.exists()) { tracesFile.mkdirs(); if (!SELinux.restorecon(tracesDir)) { return null; } } FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1); // drwxrwxr-x if (clearTraces && tracesFile.exists()) tracesFile.delete(); tracesFile.createNewFile(); FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw- } catch (IOException e) { Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e); return null; } //執行dump操作 dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs); return tracesFile; } private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) { //設定file observer // Use a FileObserver to detect when traces finish writing. // The order of traces is considered important to maintain for legibility. FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) { @Override public synchronized void onEvent(int event, String path) { notify(); } }; try { observer.startWatching(); //向firstPids程序集合傳送關閉訊號,虛擬機器接收到SIGNAL_QUIT訊號後會將程序中各個執行緒的函//數堆疊資訊輸出到traces.txt檔案中 // First collect all of the stacks of the most important pids. if (firstPids != null) { try { int num = firstPids.size(); for (int i = 0; i < num; i++) { synchronized (observer) { Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT); observer.wait(200); // Wait for write-close, give up after 200msec } } } catch (InterruptedException e) { Log.wtf(TAG, e); } } // Next collect the stacks of the native pids if (nativeProcs != null) { int[] pids = Process.getPidsForCommands(nativeProcs); if (pids != null) { for (int pid : pids) { Debug.dumpNativeBacktraceToFile(pid, tracesPath); } } } //進行CPU佔用率取樣,並將佔用率排在前幾位的程序的stack trace資訊dump到檔案 // Lastly, measure CPU usage. if (processCpuTracker != null) { processCpuTracker.init(); System.gc(); processCpuTracker.update(); try { synchronized (processCpuTracker) { processCpuTracker.wait(500); // measure over 1/2 second. } } catch (InterruptedException e) { } processCpuTracker.update(); // We'll take the stack crawls of just the top apps using CPU. final int N = processCpuTracker.countWorkingStats(); int numProcs = 0; for (int i=0; i<N && numProcs<5; i++) { ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i); if (lastPids.indexOfKey(stats.pid) >= 0) { numProcs++; try { synchronized (observer) { Process.sendSignal(stats.pid, Process.SIGNAL_QUIT); observer.wait(200); // Wait for write-close, give up after 200msec } } catch (InterruptedException e) { Log.wtf(TAG, e); } } } } } finally { observer.stopWatching(); } } |
其中firstPids的Pid內容可以參考appNotResponding方法中的實現邏輯。
3 ANR解決方法
因為ANR主要是因為主執行緒由於耗時操作被阻塞而產生的,所以常見的解決方法是不在主執行緒做耗時操作,具體實現時需要注意以下幾點:
· 主執行緒需要做耗時操作時,比如網路訪問、資料庫操作及點陣圖變換等,必須啟動一子執行緒處理,並利用handler來更新UI;
· 子執行緒儘量使用Android提供的API,比如HandlerThread,AsyncTask,AsyncQueryHandler等,這些API都提供了對於執行緒的系統級管理。如果應用直接使用Thread實現的話,則需要對這些子執行緒進行顯式管理,比如執行緒池及執行緒週期的控制,以防止系統資源和記憶體洩漏;
· Broadcast Receiver中如果有耗時操作,可以放到service中來處理;
· 在後臺子執行緒處理耗時操作時,為了提高使用者體驗,可以在前臺介面顯示某些動畫或者progress bar;