Android ANR 問題第二彈------Input超時實戰問題解析上
在前面的Android ANR 問題第二彈一文中,我們簡訴了Android Input超時的原因,我們瞭解到系統Input系統分發Input的事件時如果有5s超時會觸發應用ANR。在實際開發測試中,我們也會經常遇到Input超時導致的ANR問題,那麼現在我們就以實際問題分析一下Input超時的ANR問題,描述一下實際開發測試中我們應該如何定位問題的原因。本文基於AndroidO(8.1)系統。
實戰一:
Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.),手動模擬復現無視窗,有應用型別的ANR問題。直接上Log
05-28 11:24:49.798 1668 3141 I ActivityManager: START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.example.anrtestdemo/.MainActivity} from uid 10007 on display 0 //從手機Launcher介面點選啟動測試demo //我們可以從11:24:55.536 1668 2070 I InputDispatcher中推算這裡應該發生一次Input事件,而實際測試中,我卻是在這個時間左右多次點選back鍵 05-28 11:24:49.849 23293 23293 D abm : onStart //測試demo MainActivity執行onStart方法 05-28 11:24:55.536 1668 2070 I InputDispatcher: Application is not responding: AppWindowToken{d262c54 token=Token{cca1066 ActivityRecord{2f3c0c1 u0 com.example.anrtestdemo/.MainActivity t18}}}. It has been 5018.6ms since event, 5006.2ms since wait started. Reason: Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.//Input ANR超時第一時間點 05-28 11:24:55.642 1668 1713 I am_anr : [0,23293,com.example.anrtestdemo,952680262,Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]//Android ActivityManagerService ANR Log列印,本例am_anr Log和InputDispatcher之間相差時間較少,有時也會出現很大的時間差 05-28 11:24:59.849 23293 23293 D abm : onStart return time is: 10000 //模擬耗時的操作 05-28 11:24:59.852 23293 23293 D abm : onResume //測試demo MainActivity執行onResume方法 05-28 11:24:59.852 23293 23293 I am_on_resume_called: [0,com.example.anrtestdemo.MainActivity,LAUNCH_ACTIVITY] 05-28 11:25:00.420 1668 1713 E ActivityManager: ANR in com.example.anrtestdemo (com.example.anrtestdemo/.MainActivity) 05-28 11:25:00.420 1668 1713 E ActivityManager: PID: 23293 05-28 11:25:00.420 1668 1713 E ActivityManager: Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.) 05-28 11:25:00.420 1668 1713 E ActivityManager: Load: 5.72 / 5.5 / 5.29 05-28 11:25:00.420 1668 1713 E ActivityManager: CPU usage from 98992ms to 0ms ago (2018-05-28 11:23:16.593 to 2018-05-28 11:24:55.585): 05-28 11:25:00.420 1668 1713 E ActivityManager: 14% 1668/system_server: 8.8% user + 5.2% kernel / faults: 42792 minor 28 major 05-28 11:25:00.420 1668 1713 E ActivityManager: 9.1% 558/surfaceflinger: 3.5% user + 5.6% kernel / faults: 1609 minor 2 major 05-28 11:25:00.420 1668 1713 E ActivityManager: 5.7% 878/adbd: 0.5% user + 5.2% kernel / faults: 2680 minor 05-28 11:25:00.420 1668 1713 E ActivityManager: 5.4% 2490/com.android.systemui: 3.6% user + 1.8% kernel / faults: 15618 minor 99 major 05-28 11:25:00.420 1668 1713 E ActivityManager: 3.7% 8331/com.blackberry.blackberrylauncher: 2.8% user + 0.8% kernel / faults: 37990 minor 9 major
從adb log中,我們可以知道ANR型別,並且依據該型別的ANR,具體分析trace Log
DALVIK THREADS (12): "main" prio=5 tid=1 Runnable | group="main" sCount=0 dsCount=0 obj=0x757f91f8 self=0x4832e3fa00 | sysTid=23293 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98 | state=R schedstat=( 16106974966 17948853 313 ) utm=1450 stm=160 core=5 HZ=100 | stack=0x5c73865000-0x5c73867000 stackSize=8MB | held mutexes= "mutator lock"(shared held) at com.example.anrtestdemo.MainActivity.onStart(MainActivity.java:22) at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1249) at android.app.Activity.performStart(Activity.java:6872) at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2666) at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2764) at android.app.ActivityThread.-wrap12(ActivityThread.java:-1) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1510) at android.os.Handler.dispatchMessage(Handler.java:102) at android.os.Looper.loop(Looper.java:154) at android.app.ActivityThread.main(ActivityThread.java:6157) at java.lang.reflect.Method.invoke!(Native method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:926) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:788)
我們一般檢視tracelog的時候,一般都是檢視main執行緒的堆疊,但不併是意味著其他的執行緒就不據參考性,只是本例中無需檢視,實際遇到的其他問題,我們還是要仔細檢視tarce檔案中顯示的各個執行緒的堆疊,因為我們可以從trace檔案中看出,ANR發生時的某一時刻系統的狀態,並且我們可以依據那一刻系統的狀態,並結合adb log大膽的去想像anr發生時手機在做什麼操作。本例可以從trace log中看出anr發生時,我的這個測試應用在呼叫at com.example.anrtestdemo.MainActivity.onStart(MainActivity.java:22)方法時存在耗時。搜尋程式碼,檢視是否存在耗時。
@Override
protected void onStart() {
super.onStart();
Log.d("abm", "onStart");
long currentTimeMillis = System.currentTimeMillis();//記錄系統當前時間
while (true){//此處迴圈便是模擬的耗時操作,不斷獲取時間10s之後結束。
long timeMillis = System.currentTimeMillis();
if ((timeMillis - currentTimeMillis) >= 10000) {
Log.d("abm", "onStart return time is: " + (timeMillis - currentTimeMillis));
return;
}
}
}
問題原因找到了,我們的MainActivity複寫了MainActivity的onStart方法並且做了耗時操作,導致Input事件分發超時。那麼問題來了,為什麼我這邊想要給大家介紹無視窗,有應用型別的ANR,為什麼測試demo在onStart或者onCreate方法中做了耗時操作就能復現了呢該型別的ANR問題?那是因為,Android應用啟動的時候都是先建立Application,建立號Application之後,才會執行應用啟動的主Activity,再執行該Activity的onCreate,onStart,onResume方法,然後我們的應用視窗便是在onResume中才去向WindowManager添加註冊的。因此在註冊新增視窗之前,application或者啟動的Activity的生命週期onCreate,onStart的任意方法,做了耗時操作,或者他們載入一起的執行時間過長,都是能夠導致無視窗,有應用型別的Input ANR問題發生的。這樣一來,我們實際開發應用的時候,就要儘可能的把耗時的操作,非同步處理。具體非同步實現思路你可以使用new thread + handler,Asynctask,HandlerThread等等,這裡推薦使用HandlerThread,因為google封裝的介面,使用起來簡單。分析完本例演示的問題,突然發現依據adb log trace可以輕輕鬆鬆的定位問題的原因,但是實際遇到的問題,卻並不如此的簡單,要是遇到如下的trace檔案,就比較尷尬了。
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x74374ee8 self=0x149d8c3a00
| sysTid=8720 nice=0 cgrp=default sched=0/0 handle=0x15226569a8
| state=S schedstat=( 0 0 0 ) utm=7673 stm=891 core=0 HZ=100
| stack=0x46803c3000-0x46803c5000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/8720/stack)
native: #00 pc 000000000006a660 /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 000000000001fca4 /system/lib64/libc.so (epoll_pwait+52)
native: #02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
native: #03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
native: #04 pc 00000000001112e0 /system/lib64/libandroid_runtime.so (???)
native: #05 pc 00000000001e166c /system/framework/arm64/boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:325)
at android.os.Looper.loop(Looper.java:142)
at android.app.ActivityThread.main(ActivityThread.java:6558)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:469)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:826)
你會發現trace檔案都看得自己懷疑人生了,根據trace資訊,咋一看,都要懷疑係統的looper messagequeue有問題,其實這是完全不可能的,要是系統looper都出現問題了,那整個系統都估計要掛了。但是這個從這個trace Log中我們也是可以獲取的一些有用資訊的,如果我們熟悉系統looper messagequeue的機制的話,我們不難知道當應用主執行緒的looper messagequeue中沒有訊息的時候,我們會休眠於MessageQueue.nativePollOnce方法,實際是執行底層的Looper的pollInner進入epoll_wait等待。當有訊息來臨的時候,此looper便會被wake,執行訊息。因此我們可以大膽的猜測,ANR問題發生時,應用主執行緒中無可執行的Message,那麼什麼時候沒有訊息呢?一般執行完onResume完之後,並且介面沒有更新,應用UI主執行緒的looper會進入MessageQueue.nativePollOnce。那麼此時我們又應該如何繼續分析呢?前面我們所說trace檔案,我們一般看main thread的堆疊,那麼此時是時候看一波應用其他的執行緒執行情況了。先搜尋關鍵字“Binder:8720_”(這個8720是當前ANR程序的主程序號,系統一般都是按Binder:程序號_,進行拼接binger執行緒名)至於原因,系統程式碼,就是這麼定的,程式碼如下:
String8 ProcessState::makeBinderThreadName() {
int32_t s = android_atomic_add(1, &mThreadPoolSeq);
pid_t pid = getpid();
String8 name;
name.appendFormat("Binder:%d_%X", pid, s);
return name;
}
搜尋binder執行緒,看應用是否存在binder呼叫,然後更具trace,結合程式碼,猜想ANR發生時,應用到底在做什麼操作。實際遇到此問題中,說實在的我們並不能從trace中發現什麼問題,此時需要我們多看一些adb log,去看看ANR問題發生之前,系統都在幹嘛,做了哪些操作,或者查出系統的執行狀態等等的一些有用資訊,我們要養成多看Log的習慣,這樣在分析ANR問題的時候,才能更加準確的推演ANR問題發生時的整個場景。
實戰二:手動復現Input事件超時,waitqueue不為空問題
實體按鍵:Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.
沒什麼說的,直接上Log
05-28 15:16:36.753 1668 1702 I ActivityManager: Start proc 27398:com.example.anrtestdemo/u0a120 for activity com.example.anrtestdemo/.MainActivity
05-28 15:16:36.873 1668 11856 I am_proc_bound: [0,27398,com.example.anrtestdemo]
05-28 15:16:36.880 1668 11856 I am_restart_activity: [0,223351388,31,com.example.anrtestdemo/.MainActivity]
05-28 15:16:36.953 27398 27398 D abm : onResume
05-28 15:16:36.955 27398 27398 I am_on_resume_called: [0,com.example.anrtestdemo.MainActivity,LAUNCH_ACTIVITY]
05-28 15:16:37.090 1668 1720 I am_activity_launch_time: [0,223351388,com.example.anrtestdemo/.MainActivity,365,365]
05-28 15:16:37.090 1668 1720 I ActivityManager: Displayed com.example.anrtestdemo/.MainActivity: +365ms
05-28 15:16:45.279 1668 2070 I InputDispatcher: Application is not responding: Window{ea02cdb u0 com.example.anrtestdemo/com.example.anrtestdemo.MainActivity}. It has been 5006.4ms since event, 5005.8ms since wait started. Reason: Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.
05-28 15:16:45.343 1668 1713 I am_anr : [0,27398,com.example.anrtestdemo,952680262,Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.)]
05-28 15:16:48.456 1668 1713 E ActivityManager: ANR in com.example.anrtestdemo (com.example.anrtestdemo/.MainActivity)
05-28 15:16:48.456 1668 1713 E ActivityManager: PID: 27398
05-28 15:16:48.456 1668 1713 E ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.)
05-28 15:16:48.456 1668 1713 E ActivityManager: Load: 6.11 / 5.38 / 4.77
05-28 15:16:48.456 1668 1713 E ActivityManager: CPU usage from 36255ms to 0ms ago (2018-05-28 15:16:09.036 to 2018-05-28 15:16:45.291):
05-28 15:16:48.456 1668 1713 E ActivityManager: 10% 1668/system_server: 4.6% user + 6% kernel / faults: 5194 minor 7 major
05-28 15:16:48.456 1668 1713 E ActivityManager: 6.2% 558/surfaceflinger: 2.7% user + 3.4% kernel / faults: 480 minor
05-28 15:16:48.456 1668 1713 E ActivityManager: 3.2% 434/logd: 2.3% user + 0.9% kernel / faults: 3 minor
05-28 15:16:48.456 1668 1713 E ActivityManager: 3.1% 23788/adbd: 0.4% user + 2.6% kernel / faults: 561 minor
05-28 15:16:48.456 1668 1713 E ActivityManager: 2.7% 2490/com.android.systemui: 1.5% user + 1.1% kernel / faults: 588 minor
05-28 15:16:48.456 1668 1713 E ActivityManager: 1.2% 26217/kworker/u16:2: 0% user + 1.2% kernel
實戰一中,已經詳細介紹瞭如何看Log,如何看trace檔案了,這邊就不再贅述,直接貼取關鍵Log,trace log
DALVIK THREADS (13):
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x757f91f8 self=0x4832e3fa00
| sysTid=27398 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
| state=R schedstat=( 6436596993 5748639 166 ) utm=581 stm=62 core=4 HZ=100
| stack=0x5c73865000-0x5c73867000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 0000000000479994 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
native: #01 pc 0000000000479990 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
native: #02 pc 000000000044dfa0 /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+472)
native: #03 pc 00000000004658fc /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
native: #04 pc 000000000044eeac /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
native: #05 pc 0000000000547928 /system/lib64/libart.so (_ZN3art14JniMethodStartEPNS_6ThreadE+168)
native: #06 pc 00000000000b22f4 /system/framework/arm64/boot.oat (Java_java_lang_System_currentTimeMillis__+96)
at java.lang.System.currentTimeMillis(Native method)
at com.example.anrtestdemo.MainActivity$MyHandler$1.run(MainActivity.java:70)
at android.os.Handler.handleCallback(Handler.java:751)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:154)
at android.app.ActivityThread.main(ActivityThread.java:6157)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:926)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:788)
定位問題程式碼
@Override
protected void onResume() {
super.onResume();
Log.d("abm", "onResume");
final MyHandler handler = new MyHandler();
new Thread(new Runnable() {
@Override
public void run() {
try {
Thread.sleep(2000);
handler.obtainMessage(666).sendToTarget();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}).start();
}
class MyHandler extends Handler {
@Override
public void handleMessage(Message msg) {
super.handleMessage(msg);
if (msg.what == 666) {
new Handler().post(new Runnable() {
@Override
public void run() {
long currentTimeMillis = System.currentTimeMillis();//問題程式碼,這邊模擬10s耗時操作
while (true){
long timeMillis = System.currentTimeMillis();
if ((timeMillis - currentTimeMillis) >= 10000) {
Log.d("abm", "onResume return time is: " + (timeMillis - currentTimeMillis));
return;
}
}
}
});
}
}
}
這邊寫的程式碼時,當應用介面載入好的時候,使用Handler向UI主執行緒的looper messagequeue post一個runnable callback,這個runnable裡面便是執行的耗時操作,因為這個耗時操作是在UI主執行緒中執行的,此時當我們再點選手機實體按鍵或者螢幕時,如果5s超時,則觸發ANR發生。當然實際操作中,我們並不會直接這麼做如此顯而易見的耗時操作,這裡只是用類似這種簡單粗暴的方式告訴大家,把耗時的操作移到UI執行緒之外,因為UI執行緒負責與使用者互動介面更新,UI執行緒的卡噸直接影響使用者體驗,更會造成ANR問題的發生。
通過上面的兩個小例子,我們已經將最常見到的Input超時導致的ANR問題介紹了。由於篇幅原因,計劃將input各種型別的ANR問題分為兩個部分來寫,本篇先給大家介紹最為常見的兩種型別的Input超時ANR,下篇再介紹剩下的一些型別比較少見的ANR問題。