1. 程式人生 > >Android ANR 問題第二彈------Input超時實戰問題解析上

Android ANR 問題第二彈------Input超時實戰問題解析上

在前面的Android ANR 問題第二彈一文中,我們簡訴了Android Input超時的原因,我們瞭解到系統Input系統分發Input的事件時如果有5s超時會觸發應用ANR。在實際開發測試中,我們也會經常遇到Input超時導致的ANR問題,那麼現在我們就以實際問題分析一下Input超時的ANR問題,描述一下實際開發測試中我們應該如何定位問題的原因。本文基於AndroidO(8.1)系統。

實戰一:
Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.),手動模擬復現無視窗,有應用型別的ANR問題。直接上Log

05-28 11:24:49.798  1668  3141 I ActivityManager: START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.example.anrtestdemo/.MainActivity} from uid 10007 on display 0 //從手機Launcher介面點選啟動測試demo
//我們可以從11:24:55.536  1668  2070 I InputDispatcher中推算這裡應該發生一次Input事件,而實際測試中,我卻是在這個時間左右多次點選back鍵
05-28 11:24:49.849 23293 23293 D abm     : onStart //測試demo MainActivity執行onStart方法
05-28 11:24:55.536  1668  2070 I InputDispatcher: Application is not responding: AppWindowToken{d262c54 token=Token{cca1066 ActivityRecord{2f3c0c1 u0 com.example.anrtestdemo/.MainActivity t18}}}.  It has been 5018.6ms since event, 5006.2ms since wait started.  Reason: Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.//Input ANR超時第一時間點
05-28 11:24:55.642  1668  1713 I am_anr  : [0,23293,com.example.anrtestdemo,952680262,Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]//Android ActivityManagerService ANR Log列印,本例am_anr Log和InputDispatcher之間相差時間較少,有時也會出現很大的時間差
05-28 11:24:59.849 23293 23293 D abm     : onStart return time is: 10000 //模擬耗時的操作
05-28 11:24:59.852 23293 23293 D abm     : onResume //測試demo MainActivity執行onResume方法
05-28 11:24:59.852 23293 23293 I am_on_resume_called: [0,com.example.anrtestdemo.MainActivity,LAUNCH_ACTIVITY]
05-28 11:25:00.420  1668  1713 E ActivityManager: ANR in com.example.anrtestdemo (com.example.anrtestdemo/.MainActivity)
05-28 11:25:00.420  1668  1713 E ActivityManager: PID: 23293
05-28 11:25:00.420  1668  1713 E ActivityManager: Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
05-28 11:25:00.420  1668  1713 E ActivityManager: Load: 5.72 / 5.5 / 5.29
05-28 11:25:00.420  1668  1713 E ActivityManager: CPU usage from 98992ms to 0ms ago (2018-05-28 11:23:16.593 to 2018-05-28 11:24:55.585):
05-28 11:25:00.420  1668  1713 E ActivityManager:   14% 1668/system_server: 8.8% user + 5.2% kernel / faults: 42792 minor 28 major
05-28 11:25:00.420  1668  1713 E ActivityManager:   9.1% 558/surfaceflinger: 3.5% user + 5.6% kernel / faults: 1609 minor 2 major
05-28 11:25:00.420  1668  1713 E ActivityManager:   5.7% 878/adbd: 0.5% user + 5.2% kernel / faults: 2680 minor
05-28 11:25:00.420  1668  1713 E ActivityManager:   5.4% 2490/com.android.systemui: 3.6% user + 1.8% kernel / faults: 15618 minor 99 major
05-28 11:25:00.420  1668  1713 E ActivityManager:   3.7% 8331/com.blackberry.blackberrylauncher: 2.8% user + 0.8% kernel / faults: 37990 minor 9 major

從adb log中,我們可以知道ANR型別,並且依據該型別的ANR,具體分析trace Log

DALVIK THREADS (12):
"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x757f91f8 self=0x4832e3fa00
  | sysTid=23293 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
  | state=R schedstat=( 16106974966 17948853 313 ) utm=1450 stm=160 core=5 HZ=100
  | stack=0x5c73865000-0x5c73867000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  at com.example.anrtestdemo.MainActivity.onStart(MainActivity.java:22)
  at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1249)
  at android.app.Activity.performStart(Activity.java:6872)
  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2666)
  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2764)
  at android.app.ActivityThread.-wrap12(ActivityThread.java:-1)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1510)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:154)
  at android.app.ActivityThread.main(ActivityThread.java:6157)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:926)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:788)

我們一般檢視tracelog的時候,一般都是檢視main執行緒的堆疊,但不併是意味著其他的執行緒就不據參考性,只是本例中無需檢視,實際遇到的其他問題,我們還是要仔細檢視tarce檔案中顯示的各個執行緒的堆疊,因為我們可以從trace檔案中看出,ANR發生時的某一時刻系統的狀態,並且我們可以依據那一刻系統的狀態,並結合adb log大膽的去想像anr發生時手機在做什麼操作。本例可以從trace log中看出anr發生時,我的這個測試應用在呼叫at com.example.anrtestdemo.MainActivity.onStart(MainActivity.java:22)方法時存在耗時。搜尋程式碼,檢視是否存在耗時。

    @Override
    protected void onStart() {
        super.onStart();
        Log.d("abm", "onStart");
        long currentTimeMillis = System.currentTimeMillis();//記錄系統當前時間
        while (true){//此處迴圈便是模擬的耗時操作,不斷獲取時間10s之後結束。
            long timeMillis = System.currentTimeMillis();
            if ((timeMillis - currentTimeMillis) >= 10000) {
                Log.d("abm", "onStart return time is: " + (timeMillis - currentTimeMillis));
                return;
            }
        }
    }

問題原因找到了,我們的MainActivity複寫了MainActivity的onStart方法並且做了耗時操作,導致Input事件分發超時。那麼問題來了,為什麼我這邊想要給大家介紹無視窗,有應用型別的ANR,為什麼測試demo在onStart或者onCreate方法中做了耗時操作就能復現了呢該型別的ANR問題?那是因為,Android應用啟動的時候都是先建立Application,建立號Application之後,才會執行應用啟動的主Activity,再執行該Activity的onCreate,onStart,onResume方法,然後我們的應用視窗便是在onResume中才去向WindowManager添加註冊的。因此在註冊新增視窗之前,application或者啟動的Activity的生命週期onCreate,onStart的任意方法,做了耗時操作,或者他們載入一起的執行時間過長,都是能夠導致無視窗,有應用型別的Input ANR問題發生的。這樣一來,我們實際開發應用的時候,就要儘可能的把耗時的操作,非同步處理。具體非同步實現思路你可以使用new thread + handler,Asynctask,HandlerThread等等,這裡推薦使用HandlerThread,因為google封裝的介面,使用起來簡單。分析完本例演示的問題,突然發現依據adb log trace可以輕輕鬆鬆的定位問題的原因,但是實際遇到的問題,卻並不如此的簡單,要是遇到如下的trace檔案,就比較尷尬了。

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74374ee8 self=0x149d8c3a00
  | sysTid=8720 nice=0 cgrp=default sched=0/0 handle=0x15226569a8
  | state=S schedstat=( 0 0 0 ) utm=7673 stm=891 core=0 HZ=100
  | stack=0x46803c3000-0x46803c5000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/8720/stack)
  native: #00 pc 000000000006a660  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001fca4  /system/lib64/libc.so (epoll_pwait+52)
  native: #02 pc 0000000000015d08  /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
  native: #03 pc 0000000000015bf0  /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
  native: #04 pc 00000000001112e0  /system/lib64/libandroid_runtime.so (???)
  native: #05 pc 00000000001e166c  /system/framework/arm64/boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:325)
  at android.os.Looper.loop(Looper.java:142)
  at android.app.ActivityThread.main(ActivityThread.java:6558)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:469)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:826)

你會發現trace檔案都看得自己懷疑人生了,根據trace資訊,咋一看,都要懷疑係統的looper messagequeue有問題,其實這是完全不可能的,要是系統looper都出現問題了,那整個系統都估計要掛了。但是這個從這個trace Log中我們也是可以獲取的一些有用資訊的,如果我們熟悉系統looper messagequeue的機制的話,我們不難知道當應用主執行緒的looper messagequeue中沒有訊息的時候,我們會休眠於MessageQueue.nativePollOnce方法,實際是執行底層的Looper的pollInner進入epoll_wait等待。當有訊息來臨的時候,此looper便會被wake,執行訊息。因此我們可以大膽的猜測,ANR問題發生時,應用主執行緒中無可執行的Message,那麼什麼時候沒有訊息呢?一般執行完onResume完之後,並且介面沒有更新,應用UI主執行緒的looper會進入MessageQueue.nativePollOnce。那麼此時我們又應該如何繼續分析呢?前面我們所說trace檔案,我們一般看main thread的堆疊,那麼此時是時候看一波應用其他的執行緒執行情況了。先搜尋關鍵字“Binder:8720_”(這個8720是當前ANR程序的主程序號,系統一般都是按Binder:程序號_,進行拼接binger執行緒名)至於原因,系統程式碼,就是這麼定的,程式碼如下:

String8 ProcessState::makeBinderThreadName() {
    int32_t s = android_atomic_add(1, &mThreadPoolSeq);
    pid_t pid = getpid();
    String8 name;
    name.appendFormat("Binder:%d_%X", pid, s);
    return name;
}

搜尋binder執行緒,看應用是否存在binder呼叫,然後更具trace,結合程式碼,猜想ANR發生時,應用到底在做什麼操作。實際遇到此問題中,說實在的我們並不能從trace中發現什麼問題,此時需要我們多看一些adb log,去看看ANR問題發生之前,系統都在幹嘛,做了哪些操作,或者查出系統的執行狀態等等的一些有用資訊,我們要養成多看Log的習慣,這樣在分析ANR問題的時候,才能更加準確的推演ANR問題發生時的整個場景。

實戰二:手動復現Input事件超時,waitqueue不為空問題

螢幕按鍵:Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0. Wait queue length: 1.

實體按鍵:Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 6.  Wait queue head age: 5507.1ms.
沒什麼說的,直接上Log

05-28 15:16:36.753  1668  1702 I ActivityManager: Start proc 27398:com.example.anrtestdemo/u0a120 for activity com.example.anrtestdemo/.MainActivity
05-28 15:16:36.873  1668 11856 I am_proc_bound: [0,27398,com.example.anrtestdemo]
05-28 15:16:36.880  1668 11856 I am_restart_activity: [0,223351388,31,com.example.anrtestdemo/.MainActivity]
05-28 15:16:36.953 27398 27398 D abm     : onResume
05-28 15:16:36.955 27398 27398 I am_on_resume_called: [0,com.example.anrtestdemo.MainActivity,LAUNCH_ACTIVITY]
05-28 15:16:37.090  1668  1720 I am_activity_launch_time: [0,223351388,com.example.anrtestdemo/.MainActivity,365,365]
05-28 15:16:37.090  1668  1720 I ActivityManager: Displayed com.example.anrtestdemo/.MainActivity: +365ms
05-28 15:16:45.279  1668  2070 I InputDispatcher: Application is not responding: Window{ea02cdb u0 com.example.anrtestdemo/com.example.anrtestdemo.MainActivity}.  It has been 5006.4ms since event, 5005.8ms since wait started.  Reason: Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 6.  Wait queue head age: 5507.1ms.
05-28 15:16:45.343  1668  1713 I am_anr  : [0,27398,com.example.anrtestdemo,952680262,Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 6.  Wait queue head age: 5507.1ms.)]
05-28 15:16:48.456  1668  1713 E ActivityManager: ANR in com.example.anrtestdemo (com.example.anrtestdemo/.MainActivity)
05-28 15:16:48.456  1668  1713 E ActivityManager: PID: 27398
05-28 15:16:48.456  1668  1713 E ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 6.  Wait queue head age: 5507.1ms.)
05-28 15:16:48.456  1668  1713 E ActivityManager: Load: 6.11 / 5.38 / 4.77
05-28 15:16:48.456  1668  1713 E ActivityManager: CPU usage from 36255ms to 0ms ago (2018-05-28 15:16:09.036 to 2018-05-28 15:16:45.291):
05-28 15:16:48.456  1668  1713 E ActivityManager:   10% 1668/system_server: 4.6% user + 6% kernel / faults: 5194 minor 7 major
05-28 15:16:48.456  1668  1713 E ActivityManager:   6.2% 558/surfaceflinger: 2.7% user + 3.4% kernel / faults: 480 minor
05-28 15:16:48.456  1668  1713 E ActivityManager:   3.2% 434/logd: 2.3% user + 0.9% kernel / faults: 3 minor
05-28 15:16:48.456  1668  1713 E ActivityManager:   3.1% 23788/adbd: 0.4% user + 2.6% kernel / faults: 561 minor
05-28 15:16:48.456  1668  1713 E ActivityManager:   2.7% 2490/com.android.systemui: 1.5% user + 1.1% kernel / faults: 588 minor
05-28 15:16:48.456  1668  1713 E ActivityManager:   1.2% 26217/kworker/u16:2: 0% user + 1.2% kernel
實戰一中,已經詳細介紹瞭如何看Log,如何看trace檔案了,這邊就不再贅述,直接貼取關鍵Log,

trace log

DALVIK THREADS (13):
"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x757f91f8 self=0x4832e3fa00
  | sysTid=27398 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
  | state=R schedstat=( 6436596993 5748639 166 ) utm=581 stm=62 core=4 HZ=100
  | stack=0x5c73865000-0x5c73867000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 0000000000479994  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
  native: #01 pc 0000000000479990  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
  native: #02 pc 000000000044dfa0  /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+472)
  native: #03 pc 00000000004658fc  /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
  native: #04 pc 000000000044eeac  /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
  native: #05 pc 0000000000547928  /system/lib64/libart.so (_ZN3art14JniMethodStartEPNS_6ThreadE+168)
  native: #06 pc 00000000000b22f4  /system/framework/arm64/boot.oat (Java_java_lang_System_currentTimeMillis__+96)
  at java.lang.System.currentTimeMillis(Native method)
  at com.example.anrtestdemo.MainActivity$MyHandler$1.run(MainActivity.java:70)
  at android.os.Handler.handleCallback(Handler.java:751)
  at android.os.Handler.dispatchMessage(Handler.java:95)
  at android.os.Looper.loop(Looper.java:154)
  at android.app.ActivityThread.main(ActivityThread.java:6157)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:926)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:788)

定位問題程式碼

    @Override
    protected void onResume() {
        super.onResume();
        Log.d("abm", "onResume");
        final MyHandler handler = new MyHandler();
        new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    Thread.sleep(2000);
                    handler.obtainMessage(666).sendToTarget();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }).start();

    }

    class MyHandler extends Handler {
        @Override
        public void handleMessage(Message msg) {
            super.handleMessage(msg);
            if (msg.what == 666) {
                new Handler().post(new Runnable() {
                    @Override
                    public void run() {
                        long currentTimeMillis = System.currentTimeMillis();//問題程式碼,這邊模擬10s耗時操作
                        while (true){
                            long timeMillis = System.currentTimeMillis();
                            if ((timeMillis - currentTimeMillis) >= 10000) {
                                Log.d("abm", "onResume return time is: " + (timeMillis - currentTimeMillis));
                                return;
                            }
                        }
                    }
                });
            }
        }
    }

這邊寫的程式碼時,當應用介面載入好的時候,使用Handler向UI主執行緒的looper messagequeue post一個runnable callback,這個runnable裡面便是執行的耗時操作,因為這個耗時操作是在UI主執行緒中執行的,此時當我們再點選手機實體按鍵或者螢幕時,如果5s超時,則觸發ANR發生。當然實際操作中,我們並不會直接這麼做如此顯而易見的耗時操作,這裡只是用類似這種簡單粗暴的方式告訴大家,把耗時的操作移到UI執行緒之外,因為UI執行緒負責與使用者互動介面更新,UI執行緒的卡噸直接影響使用者體驗,更會造成ANR問題的發生。

通過上面的兩個小例子,我們已經將最常見到的Input超時導致的ANR問題介紹了。由於篇幅原因,計劃將input各種型別的ANR問題分為兩個部分來寫,本篇先給大家介紹最為常見的兩種型別的Input超時ANR,下篇再介紹剩下的一些型別比較少見的ANR問題。