你以為的timeout,不一定是使用者的timeout[轉]
轉自https://zhuanlan.zhihu.com/p/31640388
你以為的timeout,不一定是使用者的timeout
小樓一夜聽春雨
61 人讚了該文章
引言
最近在協助業務團隊解決一些疑難問題,其中有一個就是有些使用者反饋在進行某個特定的操作時,偶爾會遇到載入很久的情況,就好像是timeout不起作用一樣,但是業務開發的同學明明將網路請求的timeout設定為30s,這是為什麼呢?難道是okhttp有bug?還是說使用者操作不當?
最終花費了3天時間,終於找到了問題的根因。
先說一下關鍵字: okio, 超時機制, 弱網,關鍵引數
1.確認問題
由於產品經理收集到的使用者反饋比較模糊,為了準確定位問題存在,就需要拿資料說話,於是檢視這個請求的埋點資料,發現確實有幾十個使用者在這個請求上花費的時間超過30s,有些甚至達到了90s,這樣的體驗就非常差了。
那會不會是業務的童鞋在初始化OkHttpClient時timeout設定錯誤了呢,於是檢視初始化程式碼,如下:
OkHttpClient.Builder httpClientBuilder = new OkHttpClient.Builder() .readTimeout(30, TimeUnit.SECONDS) .connectTimeout(30, TimeUnit.SECONDS) .writeTimeout(30, TimeUnit.SECONDS) .addInterceptor(new HeaderInterceptor())
顯然,三個timeout值都設定成了30s,並沒有問題。這樣的話只能懷疑是okhttp有bug或者我們對於okhttp的使用不當了。
2.okhttp原始碼中timeout呼叫
在建立OkHttpClient時設定的timeout,會在何時使用呢?
readTimeout,connectTimeout和writeTimeout的使用有兩個地方,一個是StreamAllocation,一個是在Http2Codec中,由於我們這個請求是http 1.1協議,所以Http2Codec就不用看了。
2.1 引數傳遞
在StreamAllocation中的newStream()方法中,timeout的使用如下:
public HttpCodec newStream(OkHttpClient client, boolean doExtensiveHealthChecks) {
int connectTimeout = client.connectTimeoutMillis();
int readTimeout = client.readTimeoutMillis();
int writeTimeout = client.writeTimeoutMillis();
boolean connectionRetryEnabled = client.retryOnConnectionFailure();
try {
RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
writeTimeout, connectionRetryEnabled, doExtensiveHealthChecks);
HttpCodec resultCodec;
if (resultConnection.http2Connection != null) {
resultCodec = new Http2Codec(client, this, resultConnection.http2Connection);
} else {
resultConnection.socket().setSoTimeout(readTimeout);
resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
resultCodec = new Http1Codec(
client, this, resultConnection.source, resultConnection.sink);
}
synchronized (connectionPool) {
codec = resultCodec;
return resultCodec;
}
} catch (IOException e) {
throw new RouteException(e);
}
}
可以看到這三個timeout都用於與連線有關的引數設定中,首先看findHealthyConnection()方法:
/**
* Finds a connection and returns it if it is healthy. If it is unhealthy the process is repeated
* until a healthy connection is found.
*/
private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
int writeTimeout, boolean connectionRetryEnabled, boolean doExtensiveHealthChecks)
throws IOException {
while (true) {
RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
connectionRetryEnabled);
// If this is a brand new connection, we can skip the extensive health checks.
synchronized (connectionPool) {
if (candidate.successCount == 0) {
return candidate;
}
}
// Do a (potentially slow) check to confirm that the pooled connection is still good. If it
// isn't, take it out of the pool and start again.
if (!candidate.isHealthy(doExtensiveHealthChecks)) {
noNewStreams();
continue;
}
return candidate;
}
}
發現這個方法主要就是會迴圈呼叫findConnection()直到找到一個健康的連線,而findConnection()如下:
/**
* Returns a connection to host a new stream. This prefers the existing connection if it exists,
* then the pool, finally building a new connection.
*/
private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
boolean connectionRetryEnabled) throws IOException {
Route selectedRoute;
synchronized (connectionPool) {
if (released) throw new IllegalStateException("released");
if (codec != null) throw new IllegalStateException("codec != null");
if (canceled) throw new IOException("Canceled");
RealConnection allocatedConnection = this.connection;
if (allocatedConnection != null && !allocatedConnection.noNewStreams) {
return allocatedConnection;
}
// Attempt to get a connection from the pool.
RealConnection pooledConnection = Internal.instance.get(connectionPool, address, this);
if (pooledConnection != null) {
this.connection = pooledConnection;
return pooledConnection;
}
selectedRoute = route;
}
if (selectedRoute == null) {
selectedRoute = routeSelector.next();
synchronized (connectionPool) {
route = selectedRoute;
refusedStreamCount = 0;
}
}
RealConnection newConnection = new RealConnection(selectedRoute);
synchronized (connectionPool) {
acquire(newConnection);
Internal.instance.put(connectionPool, newConnection);
this.connection = newConnection;
if (canceled) throw new IOException("Canceled");
}
newConnection.connect(connectTimeout, readTimeout, writeTimeout, address.connectionSpecs(),
connectionRetryEnabled);
routeDatabase().connected(newConnection.route());
return newConnection;
}
可以發現,就是在呼叫RealConnection的connect()方法時用到了三個timeout,該方法如下:
public void connect(int connectTimeout, int readTimeout, int writeTimeout,
List<ConnectionSpec> connectionSpecs, boolean connectionRetryEnabled) {
if (protocol != null) throw new IllegalStateException("already connected");
RouteException routeException = null;
ConnectionSpecSelector connectionSpecSelector = new ConnectionSpecSelector(connectionSpecs);
if (route.address().sslSocketFactory() == null) {
if (!connectionSpecs.contains(ConnectionSpec.CLEARTEXT)) {
throw new RouteException(new UnknownServiceException(
"CLEARTEXT communication not enabled for client"));
}
String host = route.address().url().host();
if (!Platform.get().isCleartextTrafficPermitted(host)) {
throw new RouteException(new UnknownServiceException(
"CLEARTEXT communication to " + host + " not permitted by network security policy"));
}
}
while (protocol == null) {
try {
if (route.requiresTunnel()) {
buildTunneledConnection(connectTimeout, readTimeout, writeTimeout,
connectionSpecSelector);
} else {
buildConnection(connectTimeout, readTimeout, writeTimeout, connectionSpecSelector);
}
} catch (IOException e) {
closeQuietly(socket);
closeQuietly(rawSocket);
socket = null;
rawSocket = null;
source = null;
sink = null;
handshake = null;
protocol = null;
if (routeException == null) {
routeException = new RouteException(e);
} else {
routeException.addConnectException(e);
}
if (!connectionRetryEnabled || !connectionSpecSelector.connectionFailed(e)) {
throw routeException;
}
}
}
}
不需要走代理時,呼叫到buildConnection()方法:
/** Does all the work necessary to build a full HTTP or HTTPS connection on a raw socket. */
private void buildConnection(int connectTimeout, int readTimeout, int writeTimeout,
ConnectionSpecSelector connectionSpecSelector) throws IOException {
connectSocket(connectTimeout, readTimeout);
establishProtocol(readTimeout, writeTimeout, connectionSpecSelector);
}
這裡就開始分開了,其中connectTimeout和readTimeout用於socket連線,而readTimeout和writeTimeout則是用於與http 2有關的設定。
2.2 connectSocket()分析
先看connectSocket()方法:
private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
Proxy proxy = route.proxy();
Address address = route.address();
rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
? address.socketFactory().createSocket()
: new Socket(proxy);
rawSocket.setSoTimeout(readTimeout);
try {
Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
} catch (ConnectException e) {
ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
ce.initCause(e);
throw ce;
}
source = Okio.buffer(Okio.source(rawSocket));
sink = Okio.buffer(Okio.sink(rawSocket));
}
可以看到:
- readTimeout最終被用於rawSocket.setSoTimeout(),而setSoTimeout()的作用是在建立連線之後,對於InputStream進行read()操作時的時間限制,所以這裡採用readTimeout
- connectTimeout則會最終根據不同的平臺進行設定,在Android系統上最終會呼叫AndroidPlatform的connectSocket()方法,如下:
@Override public void connectSocket(Socket socket, InetSocketAddress address,
int connectTimeout) throws IOException {
try {
socket.connect(address, connectTimeout);
} catch (AssertionError e) {
if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
throw e;
} catch (SecurityException e) {
// Before android 4.3, socket.connect could throw a SecurityException
// if opening a socket resulted in an EACCES error.
IOException ioException = new IOException("Exception in connect");
ioException.initCause(e);
throw ioException;
}
}
可見這裡就是為socket設定連線超時,所以是使用connectTimeout.
2.3 establishProtocol()分析
再回到RealConnection的buildConnection()方法中,在呼叫完connectSocket()之後,就呼叫了establishProtocol()方法了:
private void establishProtocol(int readTimeout, int writeTimeout,
ConnectionSpecSelector connectionSpecSelector) throws IOException {
if (route.address().sslSocketFactory() != null) {
connectTls(readTimeout, writeTimeout, connectionSpecSelector);
} else {
protocol = Protocol.HTTP_1_1;
socket = rawSocket;
}
if (protocol == Protocol.HTTP_2) {
socket.setSoTimeout(0); // Framed connection timeouts are set per-stream.
Http2Connection http2Connection = new Http2Connection.Builder(true)
.socket(socket, route.address().url().host(), source, sink)
.listener(this)
.build();
http2Connection.start();
// Only assign the framed connection once the preface has been sent successfully.
this.allocationLimit = http2Connection.maxConcurrentStreams();
this.http2Connection = http2Connection;
} else {
this.allocationLimit = 1;
}
}
可見如果是https連線則會呼叫connectTls()方法:
private void connectTls(int readTimeout, int writeTimeout,
ConnectionSpecSelector connectionSpecSelector) throws IOException {
Address address = route.address();
SSLSocketFactory sslSocketFactory = address.sslSocketFactory();
boolean success = false;
SSLSocket sslSocket = null;
try {
// Create the wrapper over the connected socket.
sslSocket = (SSLSocket) sslSocketFactory.createSocket(
rawSocket, address.url().host(), address.url().port(), true /* autoClose */);
// Configure the socket's ciphers, TLS versions, and extensions.
ConnectionSpec connectionSpec = connectionSpecSelector.configureSecureSocket(sslSocket);
if (connectionSpec.supportsTlsExtensions()) {
Platform.get().configureTlsExtensions(
sslSocket, address.url().host(), address.protocols());
}
// Force handshake. This can throw!
sslSocket.startHandshake();
Handshake unverifiedHandshake = Handshake.get(sslSocket.getSession());
// Verify that the socket's certificates are acceptable for the target host.
if (!address.hostnameVerifier().verify(address.url().host(), sslSocket.getSession())) {
X509Certificate cert = (X509Certificate) unverifiedHandshake.peerCertificates().get(0);
throw new SSLPeerUnverifiedException("Hostname " + address.url().host() + " not verified:"
+ "\n certificate: " + CertificatePinner.pin(cert)
+ "\n DN: " + cert.getSubjectDN().getName()
+ "\n subjectAltNames: " + OkHostnameVerifier.allSubjectAltNames(cert));
}
// Check that the certificate pinner is satisfied by the certificates presented.
address.certificatePinner().check(address.url().host(),
unverifiedHandshake.peerCertificates());
// Success! Save the handshake and the ALPN protocol.
String maybeProtocol = connectionSpec.supportsTlsExtensions()
? Platform.get().getSelectedProtocol(sslSocket)
: null;
socket = sslSocket;
source = Okio.buffer(Okio.source(socket));
sink = Okio.buffer(Okio.sink(socket));
handshake = unverifiedHandshake;
protocol = maybeProtocol != null
? Protocol.get(maybeProtocol)
: Protocol.HTTP_1_1;
success = true;
} catch (AssertionError e) {
if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
throw e;
} finally {
if (sslSocket != null) {
Platform.get().afterHandshake(sslSocket);
}
if (!success) {
closeQuietly(sslSocket);
}
}
}
在這個呼叫中完成了握手以及證書校驗,最後可以看到socket這個成員其實是SSLSocket物件。另外,在這裡其實readTimeout和writeTimeout都沒有用到,這兩個引數其實是沒必要傳遞進來的。
3.socket, source, sink的超時設定
3.1 超時設定主流程梳理
再回到StreamAllocation的newStream()方法中,可以看到在findHealthyConnection()這個呼叫中,由於我們是http 1.1協議,所以其實我們只用到了readTimeout和connectTimeout,而並沒有用到writeTimeout.
之後,就呼叫如下程式碼:
resultConnection.socket().setSoTimeout(readTimeout);
resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
resultCodec = new Http1Codec(
client, this, resultConnection.source, resultConnection.sink);
1)通過剛剛的梳理,我們發現在AndroidPlatform中給rawSocket(java.net.Socket物件)設定過readTimeout和connectTimeout,而這裡的resultConnection.socket()返回的並不是rawSocket,而是socket成員,在採用https連線時它跟rawSocket是不一樣的,它其實是SSLSocket物件,所以這裡setSoTimeout()並不跟之前的setSoTimeout()重複。
2)source是在哪裡建立的呢?其實我們剛剛分析過,就是在RealConnection的connectSocket()方法中:
private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
Proxy proxy = route.proxy();
Address address = route.address();
rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
? address.socketFactory().createSocket()
: new Socket(proxy);
rawSocket.setSoTimeout(readTimeout);
try {
Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
} catch (ConnectException e) {
ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
ce.initCause(e);
throw ce;
}
source = Okio.buffer(Okio.source(rawSocket));
sink = Okio.buffer(Okio.sink(rawSocket));
}
可見source其實是先獲取到rawSocket的輸入流,然後呼叫Okio.buffer()進行包裝,而sink則是先獲取rawSocket的輸出流,然後呼叫Okio.buffer()進行包裝。先看一下Okio.source()方法:
public static Source source(Socket socket) throws IOException {
if(socket == null) {
throw new IllegalArgumentException("socket == null");
} else {
AsyncTimeout timeout = timeout(socket);
Source source = source((InputStream)socket.getInputStream(), (Timeout)timeout);
return timeout.source(source);
}
}
可見這裡其實建立了一個AsyncTimeout物件,利用這個物件來實現超時機制,那具體是如何實現的呢?請看下一小節分析。
3.2 AsyncTimeout原理
Okio中的與source()有關的timeout()方法,如下:
private static AsyncTimeout timeout(final Socket socket) {
return new AsyncTimeout() {
protected IOException newTimeoutException(IOException cause) {
InterruptedIOException ioe = new SocketTimeoutException("timeout");
if(cause != null) {
ioe.initCause(cause);
}
return ioe;
}
protected void timedOut() {
try {
socket.close();
} catch (Exception var2) {
Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var2);
} catch (AssertionError var3) {
if(!Okio.isAndroidGetsocknameError(var3)) {
throw var3;
}
Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var3);
}
}
};
}
可見這裡其實就是建立了一個AsyncTimeout物件,這個物件重寫了newTimeoutException()和timedout()方法,這兩個方法都是定義在AsyncTimeout()中,其中前者用於在超時時丟擲指定的異常,如果沒有指定則丟擲InterruptedIOException,而後者其實是用於在超時發生時的回撥,以完成相關的業務操作(在這裡就是關閉socket)。
那AsyncTimeout是如何實現超時機制的呢?會不會在這裡面有bug呢?
首先找到呼叫鏈為Sink.sink()/Source.read()—>AsyncTimeout.enter()—>AsyncTimeout.scheduleTimeout(),這個scheduleTimeout()是很關鍵的一個方法:
private static synchronized void scheduleTimeout(
AsyncTimeout node, long timeoutNanos, boolean hasDeadline) {
// Start the watchdog thread and create the head node when the first timeout is scheduled.
if (head == null) {
head = new AsyncTimeout();
new Watchdog().start();
}
long now = System.nanoTime();
if (timeoutNanos != 0 && hasDeadline) {
// Compute the earliest event; either timeout or deadline. Because nanoTime can wrap around,
// Math.min() is undefined for absolute values, but meaningful for relative ones.
node.timeoutAt = now + Math.min(timeoutNanos, node.deadlineNanoTime() - now);
} else if (timeoutNanos != 0) {
node.timeoutAt = now + timeoutNanos;
} else if (hasDeadline) {
node.timeoutAt = node.deadlineNanoTime();
} else {
throw new AssertionError();
}
// Insert the node in sorted order. 在這裡進行排序
long remainingNanos = node.remainingNanos(now);
for (AsyncTimeout prev = head; true; prev = prev.next) {
if (prev.next == null || remainingNanos < prev.next.remainingNanos(now)) {
node.next = prev.next;
prev.next = node;
if (prev == head) {
AsyncTimeout.class.notify(); // Wake up the watchdog when inserting at the front.
}
break;
}
}
}
這個方法主要做了如下兩件事:
- 如果是首次建立AsyncTimeout物件時,會啟動Watchdog執行緒
- 所有的AsyncTimeout物件構成一個連結串列,這個連結串列是按剩餘時間由短到長排列的
- 呼叫notify()以喚醒等待執行緒
那麼這個等待執行緒是誰呢?其實就是Watchdog,看一下它定義就知道了:
private static final class Watchdog extends Thread {
public Watchdog() {
super("Okio Watchdog");
setDaemon(true);
}
public void run() {
while (true) {
try {
AsyncTimeout timedOut = awaitTimeout();
// Didn't find a node to interrupt. Try again.
if (timedOut == null) continue;
// Close the timed out node.
timedOut.timedOut();
} catch (InterruptedException ignored) {
}
}
}
}
而awaitTimeout()方法如下:
private static synchronized AsyncTimeout awaitTimeout() throws InterruptedException {
// Get the next eligible node.
AsyncTimeout node = head.next;
// The queue is empty. Wait for something to be enqueued.
if (node == null) {
AsyncTimeout.class.wait();
return null;
}
long waitNanos = node.remainingNanos(System.nanoTime());
// The head of the queue hasn't timed out yet. Await that.
if (waitNanos > 0) {
// Waiting is made complicated by the fact that we work in nanoseconds,
// but the API wants (millis, nanos) in two arguments.
long waitMillis = waitNanos / 1000000L;
waitNanos -= (waitMillis * 1000000L);
AsyncTimeout.class.wait(waitMillis, (int) waitNanos); //這裡其實是把waitNanos一分為二,比如1000003分為1ms和3ns,其實通過waitNanos/1000000L和waitNanos%1000000L也可以實現,不過採用減法更高效
return null;
}
// The head of the queue has timed out. Remove it.
head.next = node.next;
node.next = null;
return node;
}
結合上面兩個方法可知,Watchdog執行緒有個死迴圈,在每次迴圈中會取出連結串列的頭部節點,然後檢查它是否已經超時,如果還沒則陷入等待;否則就將頭部節點從連結串列中移除,然後返回頭部的下一個節點,此時由於該節點已經超時了,所以可直接呼叫它的timedOut()方法。
3.3 System.nanoTime()
這裡需要注意的一點是System.nanoTime()與System.currentTimeMillis()方法的區別:
- System.nanoTime()返回的是納秒,nanoTime可能是任意時間,甚至可能是負數,因為它可能以未來某個時間點為參照。所以nanoTime的用途不是絕對時間,而是衡量一個時間段,比如說一段程式碼執行所用的時間,獲取資料庫連線所用的時間,網路訪問所用的時間等。另外,nanoTime提供了納秒級別的精度,但實際上獲得的值可能沒有精確到納秒。
- System.currentTimeMillis()返回的毫秒,這個毫秒其實就是自1970年1月1日0時起的毫秒數,Date()其實就是相當於Date(System.currentTimeMillis());因為Date類還有構造Date(long date),用來計算long秒與1970年1月1日之間的毫秒差
可見,Okio中使用System.nanoTime()來衡量時間段是一個很好的選擇,既保證了足夠的精度,又能保證不受系統時間的影響,因為如果採用System.currentTimeMillis()的話如果在超時等待的過程中系統時間發生變化,那麼這個超時機制就可能會提前或延後,那樣顯然是不可靠的。
3.4 okhttp超時總結
再回到3.1節開頭,它們呼叫的timeout()方法其實是Timeout類中的方法:
public Timeout timeout(long timeout, TimeUnit unit) {
if (timeout < 0) throw new IllegalArgumentException("timeout < 0: " + timeout);
if (unit == null) throw new IllegalArgumentException("unit == null");
this.timeoutNanos = unit.toNanos(timeout);
return this;
}
顯然,這裡就是將傳入的時間轉化為納秒,這個timeoutNanos在scheduleTimeout()會用到。
綜合前面3個小節,可以得到如下結論:
- Source,Sink物件的超時都是通過Timeout的子類AsyncTimeout來實現的
- 所有的AsyncTimeout物件構成一個連結串列
- 每個AsyncTimeout在會按照它的剩餘時間來插入到連結串列中的合適位置
- 有一個叫Watchdog的daemon執行緒會維護該連結串列,如果發現連結串列頭部節點還沒超時,則會陷入等待;否則將該節點從表中移除,並且呼叫它的timedout()方法,在該方法中會完成相應的操作,比如socket.close()操作
目前看來,okhttp以及okio的超時機制的實現是足夠可靠和準確的,並沒有發現什麼bug,既然這樣,那隻能從其他地方入手了。
4.竟然是預設引數的鍋
既然okhttp的超時機制沒什麼問題,那就從業務直接呼叫okhttp的程式碼入手吧,由於是呼叫Retrofit中Call.enqueue()方法,那就從這個方法入手吧。
看過我部落格中Retrofit原始碼分析的同學,應該知道其實這裡的Call其實是OkHttpCall物件,這個類是為了將Retrofit與okhttp進行銜接而創造的,它的enqueue()方法如下:
@Override public void enqueue(final Callback<T> callback) {
if (callback == null) throw new NullPointerException("callback == null");
okhttp3.Call call;
Throwable failure;
synchronized (this) {
if (executed) throw new IllegalStateException("Already executed.");
executed = true;
call = rawCall;
failure = creationFailure;
if (call == null && failure == null) {
try {
call = rawCall = createRawCall();
} catch (Throwable t) {
failure = creationFailure = t;
}
}
}
if (failure != null) {
callback.onFailure(this, failure);
return;
}
if (canceled) {
call.cancel();
}
call.enqueue(new okhttp3.Callback() {
@Override public void onResponse(okhttp3.Call call, okhttp3.Response rawResponse)
throws IOException {
Response<T> response;
try {
response = parseResponse(rawResponse);
} catch (Throwable e) {
callFailure(e);
return;
}
callSuccess(response);
}
@Override public void onFailure(okhttp3.Call call, IOException e) {
try {
callback.onFailure(OkHttpCall.this, e);
} catch (Throwable t) {
t.printStackTrace();
}
}
private void callFailure(Throwable e) {
try {
callback.onFailure(OkHttpCall.this, e);
} catch (Throwable t) {
t.printStackTrace();
}
}
private void callSuccess(Response<T> response) {
try {
callback.onResponse(OkHttpCall.this, response);
} catch (Throwable t) {
t.printStackTrace();
}
}
});
}
顯然,這個方法的主要目的就是呼叫okhttp3.Call的enqueue()方法並且將okhttp3.Call的回撥最終轉換為Retrofit中的回撥。而這裡的call其實是okhttp3.RealCall物件(因為OkHttpCall中的createRawCall()呼叫serviceMethod.callFactory.newCall(),而callFactory其實就是OkHttpClient物件,OkHttpClient的newCall()方法返回的是RealCall物件),RealCall的enqueue()方法如下:
@Override public void enqueue(Callback responseCallback) {
synchronized (this) {
if (executed) throw new IllegalStateException("Already Executed");
executed = true;
}
captureCallStackTrace();
client.dispatcher().enqueue(new AsyncCall(responseCallback));
}
顯然,這個方法建立了一個AsyncCall物件並且呼叫dispatcher()這個排程器來處理:
synchronized void enqueue(AsyncCall call) {
if (runningAsyncCalls.size() < maxRequests && runningCallsForHost(call) < maxRequestsPerHost) {
runningAsyncCalls.add(call);
executorService().execute(call);
} else {
readyAsyncCalls.add(call);
}
}
這個方法非常重要,因為就是在這裡潛藏著使用者等待時間比timeout更長的危險,注意這裡的兩個限制條件:
- 第一個是當前執行的請求數必須小於maxRequests,否則就加入等待佇列中。而maxRequests預設值是64
- 第二個是runningCallsForHost(call)必須小於maxRequestsPerHost,也就是說屬於當前請求的host的請求數必須小於maxRequestsPerHost,否則就先加入等待佇列中。而maxRequestsPerHost預設值非常小,為5
再看一下排程器中執行緒池的建立:
public synchronized ExecutorService executorService() {
if (executorService == null) {
executorService = new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(), Util.threadFactory("OkHttp Dispatcher", false));
}
return executorService;
}
顯然,排程用的執行緒池足夠大,一般情況下maxRequests預設為64也足夠使用了。
但是! 凡事就怕個但是!
如果是弱網環境,請求密集,並且timeout設定得比較大的情況下呢?
那麼,就有可能發生如下情況:
- 正在執行的請求數在短時間內(極端一點,比如3s內)就超過maxRequests,那麼在3s之後的請求都只能先進入等待佇列,然後如果網路足夠差,每個連線都是等到發生超時異常後被迫關閉,那麼就意味著在3s之後的請求至少要等待timeout-3s的時間,這個時間再加上它自身的timeout,那麼使用者的等待時間就是timeout-3s+timeout,顯然這個值遠大於timeout了
- 雖然總的請求數不密集,但是恰好在某個很短的時間段內針對同一個host的請求比較密集(類似地,比如3s內),那麼在3s之後針對這個host的請求也要先進入等待佇列中,同樣地在這之後的請求,使用者至少要等待timeout-3s+timeout的時間
再結合業務中的初始化程式碼發現,並沒有對於Dispatcher中的maxRequestsPerHost進行自定義設定,也就意味著同一時間對於每個host的請求數不能大於5,那麼考慮到我分析的這個業務請求對應的host下有很多請求,那就很有可能是這個原因導致的,並且業務同學在這個地方其實也犯了一個低階錯誤,就是在使用者點選隱藏載入框時,沒有及時取消掉對應的請求,這樣其實也造成了請求的浪費。
為了驗證這個結論,查看了10多位發生超時遠大於timeout的使用者日誌,發現都是在Ta們的網路環境切換到2G或者是無網,並且在某個時間段內請求密集時就會發生,說明這個結論是可靠的。
4.解決方法及使用okhttp的建議
找到了原因之後,解決辦法就很簡單了,這其實也是使用okhttp的一點建議:
- 初始化okhttp時,將Dispatcher中maxRequests和maxRequestsPerHost都設定得比預設值大一些
- 當用戶點選隱藏載入框時,需要把對應的請求也及時取消掉
- timeout儘量設定得小一些(比如10s),這樣可以減小弱網環境下手機的負載,同時對於使用者體驗也有好處