1. 程式人生 > >你以為的timeout,不一定是使用者的timeout[轉]

你以為的timeout,不一定是使用者的timeout[轉]

轉自https://zhuanlan.zhihu.com/p/31640388

你以為的timeout,不一定是使用者的timeout

Seasoninthesun

Seasoninthesun

小樓一夜聽春雨

61 人讚了該文章

引言

最近在協助業務團隊解決一些疑難問題,其中有一個就是有些使用者反饋在進行某個特定的操作時,偶爾會遇到載入很久的情況,就好像是timeout不起作用一樣,但是業務開發的同學明明將網路請求的timeout設定為30s,這是為什麼呢?難道是okhttp有bug?還是說使用者操作不當?

最終花費了3天時間,終於找到了問題的根因。

先說一下關鍵字: okio, 超時機制, 弱網,關鍵引數

1.確認問題

由於產品經理收集到的使用者反饋比較模糊,為了準確定位問題存在,就需要拿資料說話,於是檢視這個請求的埋點資料,發現確實有幾十個使用者在這個請求上花費的時間超過30s,有些甚至達到了90s,這樣的體驗就非常差了。

那會不會是業務的童鞋在初始化OkHttpClient時timeout設定錯誤了呢,於是檢視初始化程式碼,如下:

OkHttpClient.Builder httpClientBuilder = new OkHttpClient.Builder()
                .readTimeout(30, TimeUnit.SECONDS)
                .connectTimeout(30, TimeUnit.SECONDS)
                .writeTimeout(30, TimeUnit.SECONDS)
                .addInterceptor(new HeaderInterceptor())

顯然,三個timeout值都設定成了30s,並沒有問題。這樣的話只能懷疑是okhttp有bug或者我們對於okhttp的使用不當了。

2.okhttp原始碼中timeout呼叫

在建立OkHttpClient時設定的timeout,會在何時使用呢?

readTimeout,connectTimeout和writeTimeout的使用有兩個地方,一個是StreamAllocation,一個是在Http2Codec中,由於我們這個請求是http 1.1協議,所以Http2Codec就不用看了。

2.1 引數傳遞

在StreamAllocation中的newStream()方法中,timeout的使用如下:

public HttpCodec newStream(OkHttpClient client, boolean doExtensiveHealthChecks) {
   int connectTimeout = client.connectTimeoutMillis();
   int readTimeout = client.readTimeoutMillis();
   int writeTimeout = client.writeTimeoutMillis();
   boolean connectionRetryEnabled = client.retryOnConnectionFailure();
   try {
     RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
         writeTimeout, connectionRetryEnabled, doExtensiveHealthChecks);
     HttpCodec resultCodec;
     if (resultConnection.http2Connection != null) {
       resultCodec = new Http2Codec(client, this, resultConnection.http2Connection);
     } else {
       resultConnection.socket().setSoTimeout(readTimeout);
       resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
       resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
       resultCodec = new Http1Codec(
           client, this, resultConnection.source, resultConnection.sink);
     }
     synchronized (connectionPool) {
       codec = resultCodec;
       return resultCodec;
     }
   } catch (IOException e) {
     throw new RouteException(e);
   }
 }

可以看到這三個timeout都用於與連線有關的引數設定中,首先看findHealthyConnection()方法:

/**
 * Finds a connection and returns it if it is healthy. If it is unhealthy the process is repeated
 * until a healthy connection is found.
 */
private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
    int writeTimeout, boolean connectionRetryEnabled, boolean doExtensiveHealthChecks)
    throws IOException {
  while (true) {
    RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
        connectionRetryEnabled);
    // If this is a brand new connection, we can skip the extensive health checks.
    synchronized (connectionPool) {
      if (candidate.successCount == 0) {
        return candidate;
      }
    }
    // Do a (potentially slow) check to confirm that the pooled connection is still good. If it
    // isn't, take it out of the pool and start again.
    if (!candidate.isHealthy(doExtensiveHealthChecks)) {
      noNewStreams();
      continue;
    }
    return candidate;
  }
}

發現這個方法主要就是會迴圈呼叫findConnection()直到找到一個健康的連線,而findConnection()如下:

/**
  * Returns a connection to host a new stream. This prefers the existing connection if it exists,
  * then the pool, finally building a new connection.
  */
 private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
     boolean connectionRetryEnabled) throws IOException {
   Route selectedRoute;
   synchronized (connectionPool) {
     if (released) throw new IllegalStateException("released");
     if (codec != null) throw new IllegalStateException("codec != null");
     if (canceled) throw new IOException("Canceled");
     RealConnection allocatedConnection = this.connection;
     if (allocatedConnection != null && !allocatedConnection.noNewStreams) {
       return allocatedConnection;
     }
     // Attempt to get a connection from the pool.
     RealConnection pooledConnection = Internal.instance.get(connectionPool, address, this);
     if (pooledConnection != null) {
       this.connection = pooledConnection;
       return pooledConnection;
     }
     selectedRoute = route;
   }
   if (selectedRoute == null) {
     selectedRoute = routeSelector.next();
     synchronized (connectionPool) {
       route = selectedRoute;
       refusedStreamCount = 0;
     }
   }
   RealConnection newConnection = new RealConnection(selectedRoute);
   synchronized (connectionPool) {
     acquire(newConnection);
     Internal.instance.put(connectionPool, newConnection);
     this.connection = newConnection;
     if (canceled) throw new IOException("Canceled");
   }
   newConnection.connect(connectTimeout, readTimeout, writeTimeout, address.connectionSpecs(),
       connectionRetryEnabled);
   routeDatabase().connected(newConnection.route());
   return newConnection;
 }

可以發現,就是在呼叫RealConnection的connect()方法時用到了三個timeout,該方法如下:

public void connect(int connectTimeout, int readTimeout, int writeTimeout,
      List<ConnectionSpec> connectionSpecs, boolean connectionRetryEnabled) {
    if (protocol != null) throw new IllegalStateException("already connected");
    RouteException routeException = null;
    ConnectionSpecSelector connectionSpecSelector = new ConnectionSpecSelector(connectionSpecs);
    if (route.address().sslSocketFactory() == null) {
      if (!connectionSpecs.contains(ConnectionSpec.CLEARTEXT)) {
        throw new RouteException(new UnknownServiceException(
            "CLEARTEXT communication not enabled for client"));
      }
      String host = route.address().url().host();
      if (!Platform.get().isCleartextTrafficPermitted(host)) {
        throw new RouteException(new UnknownServiceException(
            "CLEARTEXT communication to " + host + " not permitted by network security policy"));
      }
    }
    while (protocol == null) {
      try {
        if (route.requiresTunnel()) {
          buildTunneledConnection(connectTimeout, readTimeout, writeTimeout,
              connectionSpecSelector);
        } else {
          buildConnection(connectTimeout, readTimeout, writeTimeout, connectionSpecSelector);
        }
      } catch (IOException e) {
        closeQuietly(socket);
        closeQuietly(rawSocket);
        socket = null;
        rawSocket = null;
        source = null;
        sink = null;
        handshake = null;
        protocol = null;
        if (routeException == null) {
          routeException = new RouteException(e);
        } else {
          routeException.addConnectException(e);
        }
        if (!connectionRetryEnabled || !connectionSpecSelector.connectionFailed(e)) {
          throw routeException;
        }
      }
    }
  }

不需要走代理時,呼叫到buildConnection()方法:

/** Does all the work necessary to build a full HTTP or HTTPS connection on a raw socket. */
  private void buildConnection(int connectTimeout, int readTimeout, int writeTimeout,
      ConnectionSpecSelector connectionSpecSelector) throws IOException {
    connectSocket(connectTimeout, readTimeout);
    establishProtocol(readTimeout, writeTimeout, connectionSpecSelector);
  }

這裡就開始分開了,其中connectTimeout和readTimeout用於socket連線,而readTimeout和writeTimeout則是用於與http 2有關的設定。

2.2 connectSocket()分析

先看connectSocket()方法:

private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
    Proxy proxy = route.proxy();
    Address address = route.address();
    rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
        ? address.socketFactory().createSocket()
        : new Socket(proxy);
    rawSocket.setSoTimeout(readTimeout);
    try {
      Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
    } catch (ConnectException e) {
      ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
      ce.initCause(e);
      throw ce;
    }
    source = Okio.buffer(Okio.source(rawSocket));
    sink = Okio.buffer(Okio.sink(rawSocket));
  }

可以看到:

  • readTimeout最終被用於rawSocket.setSoTimeout(),而setSoTimeout()的作用是在建立連線之後,對於InputStream進行read()操作時的時間限制,所以這裡採用readTimeout
  • connectTimeout則會最終根據不同的平臺進行設定,在Android系統上最終會呼叫AndroidPlatform的connectSocket()方法,如下:
@Override public void connectSocket(Socket socket, InetSocketAddress address,
     int connectTimeout) throws IOException {
   try {
     socket.connect(address, connectTimeout);
   } catch (AssertionError e) {
     if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
     throw e;
   } catch (SecurityException e) {
     // Before android 4.3, socket.connect could throw a SecurityException
     // if opening a socket resulted in an EACCES error.
     IOException ioException = new IOException("Exception in connect");
     ioException.initCause(e);
     throw ioException;
   }
 }

可見這裡就是為socket設定連線超時,所以是使用connectTimeout.

2.3 establishProtocol()分析

再回到RealConnection的buildConnection()方法中,在呼叫完connectSocket()之後,就呼叫了establishProtocol()方法了:

private void establishProtocol(int readTimeout, int writeTimeout,
     ConnectionSpecSelector connectionSpecSelector) throws IOException {
   if (route.address().sslSocketFactory() != null) {
     connectTls(readTimeout, writeTimeout, connectionSpecSelector);
   } else {
     protocol = Protocol.HTTP_1_1;
     socket = rawSocket;
   }
   if (protocol == Protocol.HTTP_2) {
     socket.setSoTimeout(0); // Framed connection timeouts are set per-stream.
     Http2Connection http2Connection = new Http2Connection.Builder(true)
         .socket(socket, route.address().url().host(), source, sink)
         .listener(this)
         .build();
     http2Connection.start();
     // Only assign the framed connection once the preface has been sent successfully.
     this.allocationLimit = http2Connection.maxConcurrentStreams();
     this.http2Connection = http2Connection;
   } else {
     this.allocationLimit = 1;
   }
 }

可見如果是https連線則會呼叫connectTls()方法:

private void connectTls(int readTimeout, int writeTimeout,
      ConnectionSpecSelector connectionSpecSelector) throws IOException {
    Address address = route.address();
    SSLSocketFactory sslSocketFactory = address.sslSocketFactory();
    boolean success = false;
    SSLSocket sslSocket = null;
    try {
      // Create the wrapper over the connected socket.
      sslSocket = (SSLSocket) sslSocketFactory.createSocket(
          rawSocket, address.url().host(), address.url().port(), true /* autoClose */);
      // Configure the socket's ciphers, TLS versions, and extensions.
      ConnectionSpec connectionSpec = connectionSpecSelector.configureSecureSocket(sslSocket);
      if (connectionSpec.supportsTlsExtensions()) {
        Platform.get().configureTlsExtensions(
            sslSocket, address.url().host(), address.protocols());
      }
      // Force handshake. This can throw!
      sslSocket.startHandshake();
      Handshake unverifiedHandshake = Handshake.get(sslSocket.getSession());
      // Verify that the socket's certificates are acceptable for the target host.
      if (!address.hostnameVerifier().verify(address.url().host(), sslSocket.getSession())) {
        X509Certificate cert = (X509Certificate) unverifiedHandshake.peerCertificates().get(0);
        throw new SSLPeerUnverifiedException("Hostname " + address.url().host() + " not verified:"
            + "\n    certificate: " + CertificatePinner.pin(cert)
            + "\n    DN: " + cert.getSubjectDN().getName()
            + "\n    subjectAltNames: " + OkHostnameVerifier.allSubjectAltNames(cert));
      }
      // Check that the certificate pinner is satisfied by the certificates presented.
      address.certificatePinner().check(address.url().host(),
          unverifiedHandshake.peerCertificates());
      // Success! Save the handshake and the ALPN protocol.
      String maybeProtocol = connectionSpec.supportsTlsExtensions()
          ? Platform.get().getSelectedProtocol(sslSocket)
          : null;
      socket = sslSocket;
      source = Okio.buffer(Okio.source(socket));
      sink = Okio.buffer(Okio.sink(socket));
      handshake = unverifiedHandshake;
      protocol = maybeProtocol != null
          ? Protocol.get(maybeProtocol)
          : Protocol.HTTP_1_1;
      success = true;
    } catch (AssertionError e) {
      if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
      throw e;
    } finally {
      if (sslSocket != null) {
        Platform.get().afterHandshake(sslSocket);
      }
      if (!success) {
        closeQuietly(sslSocket);
      }
    }
  }

在這個呼叫中完成了握手以及證書校驗,最後可以看到socket這個成員其實是SSLSocket物件。另外,在這裡其實readTimeout和writeTimeout都沒有用到,這兩個引數其實是沒必要傳遞進來的。

3.socket, source, sink的超時設定

3.1 超時設定主流程梳理

再回到StreamAllocation的newStream()方法中,可以看到在findHealthyConnection()這個呼叫中,由於我們是http 1.1協議,所以其實我們只用到了readTimeout和connectTimeout,而並沒有用到writeTimeout.

之後,就呼叫如下程式碼:

resultConnection.socket().setSoTimeout(readTimeout);
resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
resultCodec = new Http1Codec(
    client, this, resultConnection.source, resultConnection.sink);

1)通過剛剛的梳理,我們發現在AndroidPlatform中給rawSocket(java.net.Socket物件)設定過readTimeout和connectTimeout,而這裡的resultConnection.socket()返回的並不是rawSocket,而是socket成員,在採用https連線時它跟rawSocket是不一樣的,它其實是SSLSocket物件,所以這裡setSoTimeout()並不跟之前的setSoTimeout()重複。

2)source是在哪裡建立的呢?其實我們剛剛分析過,就是在RealConnection的connectSocket()方法中:

private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
    Proxy proxy = route.proxy();
    Address address = route.address();
    rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
        ? address.socketFactory().createSocket()
        : new Socket(proxy);
    rawSocket.setSoTimeout(readTimeout);
    try {
      Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
    } catch (ConnectException e) {
      ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
      ce.initCause(e);
      throw ce;
    }
    source = Okio.buffer(Okio.source(rawSocket));
    sink = Okio.buffer(Okio.sink(rawSocket));
  }

可見source其實是先獲取到rawSocket的輸入流,然後呼叫Okio.buffer()進行包裝,而sink則是先獲取rawSocket的輸出流,然後呼叫Okio.buffer()進行包裝。先看一下Okio.source()方法:

public static Source source(Socket socket) throws IOException {
       if(socket == null) {
           throw new IllegalArgumentException("socket == null");
       } else {
           AsyncTimeout timeout = timeout(socket);
           Source source = source((InputStream)socket.getInputStream(), (Timeout)timeout);
           return timeout.source(source);
       }
   }

可見這裡其實建立了一個AsyncTimeout物件,利用這個物件來實現超時機制,那具體是如何實現的呢?請看下一小節分析。

3.2 AsyncTimeout原理

Okio中的與source()有關的timeout()方法,如下:

private static AsyncTimeout timeout(final Socket socket) {
     return new AsyncTimeout() {
         protected IOException newTimeoutException(IOException cause) {
             InterruptedIOException ioe = new SocketTimeoutException("timeout");
             if(cause != null) {
                 ioe.initCause(cause);
             }
             return ioe;
         }
         protected void timedOut() {
             try {
                 socket.close();
             } catch (Exception var2) {
                 Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var2);
             } catch (AssertionError var3) {
                 if(!Okio.isAndroidGetsocknameError(var3)) {
                     throw var3;
                 }
                 Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var3);
             }
         }
     };
 }

可見這裡其實就是建立了一個AsyncTimeout物件,這個物件重寫了newTimeoutException()和timedout()方法,這兩個方法都是定義在AsyncTimeout()中,其中前者用於在超時時丟擲指定的異常,如果沒有指定則丟擲InterruptedIOException,而後者其實是用於在超時發生時的回撥,以完成相關的業務操作(在這裡就是關閉socket)。

那AsyncTimeout是如何實現超時機制的呢?會不會在這裡面有bug呢?

首先找到呼叫鏈為Sink.sink()/Source.read()—>AsyncTimeout.enter()—>AsyncTimeout.scheduleTimeout(),這個scheduleTimeout()是很關鍵的一個方法:

private static synchronized void scheduleTimeout(
      AsyncTimeout node, long timeoutNanos, boolean hasDeadline) {
    // Start the watchdog thread and create the head node when the first timeout is scheduled.
    if (head == null) {
      head = new AsyncTimeout();
      new Watchdog().start();
    }
    long now = System.nanoTime();
    if (timeoutNanos != 0 && hasDeadline) {
      // Compute the earliest event; either timeout or deadline. Because nanoTime can wrap around,
      // Math.min() is undefined for absolute values, but meaningful for relative ones.
      node.timeoutAt = now + Math.min(timeoutNanos, node.deadlineNanoTime() - now);
    } else if (timeoutNanos != 0) {
      node.timeoutAt = now + timeoutNanos;
    } else if (hasDeadline) {
      node.timeoutAt = node.deadlineNanoTime();
    } else {
      throw new AssertionError();
    }
    // Insert the node in sorted order. 在這裡進行排序
    long remainingNanos = node.remainingNanos(now);
    for (AsyncTimeout prev = head; true; prev = prev.next) {
      if (prev.next == null || remainingNanos < prev.next.remainingNanos(now)) {
        node.next = prev.next;
        prev.next = node;
        if (prev == head) {
          AsyncTimeout.class.notify(); // Wake up the watchdog when inserting at the front.
        }
        break;
      }
    }
  }

這個方法主要做了如下兩件事:

  • 如果是首次建立AsyncTimeout物件時,會啟動Watchdog執行緒
  • 所有的AsyncTimeout物件構成一個連結串列,這個連結串列是按剩餘時間由短到長排列的
  • 呼叫notify()以喚醒等待執行緒

那麼這個等待執行緒是誰呢?其實就是Watchdog,看一下它定義就知道了:

private static final class Watchdog extends Thread {
   public Watchdog() {
     super("Okio Watchdog");
     setDaemon(true);
   }
   public void run() {
     while (true) {
       try {
         AsyncTimeout timedOut = awaitTimeout();
         // Didn't find a node to interrupt. Try again.
         if (timedOut == null) continue;
         // Close the timed out node.
         timedOut.timedOut();
       } catch (InterruptedException ignored) {
       }
     }
   }
 }

而awaitTimeout()方法如下:

private static synchronized AsyncTimeout awaitTimeout() throws InterruptedException {
    // Get the next eligible node.
    AsyncTimeout node = head.next;
    // The queue is empty. Wait for something to be enqueued.
    if (node == null) {
      AsyncTimeout.class.wait();
      return null;
    }
    long waitNanos = node.remainingNanos(System.nanoTime());
    // The head of the queue hasn't timed out yet. Await that.
    if (waitNanos > 0) {
      // Waiting is made complicated by the fact that we work in nanoseconds,
      // but the API wants (millis, nanos) in two arguments.
      long waitMillis = waitNanos / 1000000L;
      waitNanos -= (waitMillis * 1000000L);
      AsyncTimeout.class.wait(waitMillis, (int) waitNanos);  //這裡其實是把waitNanos一分為二,比如1000003分為1ms和3ns,其實通過waitNanos/1000000L和waitNanos%1000000L也可以實現,不過採用減法更高效
      return null;
    }
    // The head of the queue has timed out. Remove it.
    head.next = node.next;
    node.next = null;
    return node;
  }

結合上面兩個方法可知,Watchdog執行緒有個死迴圈,在每次迴圈中會取出連結串列的頭部節點,然後檢查它是否已經超時,如果還沒則陷入等待;否則就將頭部節點從連結串列中移除,然後返回頭部的下一個節點,此時由於該節點已經超時了,所以可直接呼叫它的timedOut()方法。

3.3 System.nanoTime()

這裡需要注意的一點是System.nanoTime()與System.currentTimeMillis()方法的區別:

  • System.nanoTime()返回的是納秒,nanoTime可能是任意時間,甚至可能是負數,因為它可能以未來某個時間點為參照。所以nanoTime的用途不是絕對時間,而是衡量一個時間段,比如說一段程式碼執行所用的時間,獲取資料庫連線所用的時間,網路訪問所用的時間等。另外,nanoTime提供了納秒級別的精度,但實際上獲得的值可能沒有精確到納秒。
  • System.currentTimeMillis()返回的毫秒,這個毫秒其實就是自1970年1月1日0時起的毫秒數,Date()其實就是相當於Date(System.currentTimeMillis());因為Date類還有構造Date(long date),用來計算long秒與1970年1月1日之間的毫秒差

可見,Okio中使用System.nanoTime()來衡量時間段是一個很好的選擇,既保證了足夠的精度,又能保證不受系統時間的影響,因為如果採用System.currentTimeMillis()的話如果在超時等待的過程中系統時間發生變化,那麼這個超時機制就可能會提前或延後,那樣顯然是不可靠的。

3.4 okhttp超時總結

再回到3.1節開頭,它們呼叫的timeout()方法其實是Timeout類中的方法:

public Timeout timeout(long timeout, TimeUnit unit) {
   if (timeout < 0) throw new IllegalArgumentException("timeout < 0: " + timeout);
   if (unit == null) throw new IllegalArgumentException("unit == null");
   this.timeoutNanos = unit.toNanos(timeout);
   return this;
 }

顯然,這裡就是將傳入的時間轉化為納秒,這個timeoutNanos在scheduleTimeout()會用到。

綜合前面3個小節,可以得到如下結論:

  • Source,Sink物件的超時都是通過Timeout的子類AsyncTimeout來實現的
  • 所有的AsyncTimeout物件構成一個連結串列
  • 每個AsyncTimeout在會按照它的剩餘時間來插入到連結串列中的合適位置
  • 有一個叫Watchdog的daemon執行緒會維護該連結串列,如果發現連結串列頭部節點還沒超時,則會陷入等待;否則將該節點從表中移除,並且呼叫它的timedout()方法,在該方法中會完成相應的操作,比如socket.close()操作

目前看來,okhttp以及okio的超時機制的實現是足夠可靠和準確的,並沒有發現什麼bug,既然這樣,那隻能從其他地方入手了。

4.竟然是預設引數的鍋

既然okhttp的超時機制沒什麼問題,那就從業務直接呼叫okhttp的程式碼入手吧,由於是呼叫Retrofit中Call.enqueue()方法,那就從這個方法入手吧。

看過我部落格中Retrofit原始碼分析的同學,應該知道其實這裡的Call其實是OkHttpCall物件,這個類是為了將Retrofit與okhttp進行銜接而創造的,它的enqueue()方法如下:

@Override public void enqueue(final Callback<T> callback) {
    if (callback == null) throw new NullPointerException("callback == null");
    okhttp3.Call call;
    Throwable failure;
    synchronized (this) {
      if (executed) throw new IllegalStateException("Already executed.");
      executed = true;
      call = rawCall;
      failure = creationFailure;
      if (call == null && failure == null) {
        try {
          call = rawCall = createRawCall();
        } catch (Throwable t) {
          failure = creationFailure = t;
        }
      }
    }
    if (failure != null) {
      callback.onFailure(this, failure);
      return;
    }
    if (canceled) {
      call.cancel();
    }
    call.enqueue(new okhttp3.Callback() {
      @Override public void onResponse(okhttp3.Call call, okhttp3.Response rawResponse)
          throws IOException {
        Response<T> response;
        try {
          response = parseResponse(rawResponse);
        } catch (Throwable e) {
          callFailure(e);
          return;
        }
        callSuccess(response);
      }
      @Override public void onFailure(okhttp3.Call call, IOException e) {
        try {
          callback.onFailure(OkHttpCall.this, e);
        } catch (Throwable t) {
          t.printStackTrace();
        }
      }
      private void callFailure(Throwable e) {
        try {
          callback.onFailure(OkHttpCall.this, e);
        } catch (Throwable t) {
          t.printStackTrace();
        }
      }
      private void callSuccess(Response<T> response) {
        try {
          callback.onResponse(OkHttpCall.this, response);
        } catch (Throwable t) {
          t.printStackTrace();
        }
      }
    });
  }

顯然,這個方法的主要目的就是呼叫okhttp3.Call的enqueue()方法並且將okhttp3.Call的回撥最終轉換為Retrofit中的回撥。而這裡的call其實是okhttp3.RealCall物件(因為OkHttpCall中的createRawCall()呼叫serviceMethod.callFactory.newCall(),而callFactory其實就是OkHttpClient物件,OkHttpClient的newCall()方法返回的是RealCall物件),RealCall的enqueue()方法如下:

@Override public void enqueue(Callback responseCallback) {
   synchronized (this) {
     if (executed) throw new IllegalStateException("Already Executed");
     executed = true;
   }
   captureCallStackTrace();
   client.dispatcher().enqueue(new AsyncCall(responseCallback));
 }

顯然,這個方法建立了一個AsyncCall物件並且呼叫dispatcher()這個排程器來處理:

synchronized void enqueue(AsyncCall call) {
    if (runningAsyncCalls.size() < maxRequests && runningCallsForHost(call) < maxRequestsPerHost) {
      runningAsyncCalls.add(call);
      executorService().execute(call);
    } else {
      readyAsyncCalls.add(call);
    }
  }

這個方法非常重要,因為就是在這裡潛藏著使用者等待時間比timeout更長的危險,注意這裡的兩個限制條件:

  • 第一個是當前執行的請求數必須小於maxRequests,否則就加入等待佇列中。而maxRequests預設值是64
  • 第二個是runningCallsForHost(call)必須小於maxRequestsPerHost,也就是說屬於當前請求的host的請求數必須小於maxRequestsPerHost,否則就先加入等待佇列中。而maxRequestsPerHost預設值非常小,為5

再看一下排程器中執行緒池的建立:

public synchronized ExecutorService executorService() {
    if (executorService == null) {
      executorService = new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60, TimeUnit.SECONDS,
          new SynchronousQueue<Runnable>(), Util.threadFactory("OkHttp Dispatcher", false));
    }
    return executorService;
  }

顯然,排程用的執行緒池足夠大,一般情況下maxRequests預設為64也足夠使用了。

但是! 凡事就怕個但是!

如果是弱網環境,請求密集,並且timeout設定得比較大的情況下呢?

那麼,就有可能發生如下情況:

  • 正在執行的請求數在短時間內(極端一點,比如3s內)就超過maxRequests,那麼在3s之後的請求都只能先進入等待佇列,然後如果網路足夠差,每個連線都是等到發生超時異常後被迫關閉,那麼就意味著在3s之後的請求至少要等待timeout-3s的時間,這個時間再加上它自身的timeout,那麼使用者的等待時間就是timeout-3s+timeout,顯然這個值遠大於timeout了
  • 雖然總的請求數不密集,但是恰好在某個很短的時間段內針對同一個host的請求比較密集(類似地,比如3s內),那麼在3s之後針對這個host的請求也要先進入等待佇列中,同樣地在這之後的請求,使用者至少要等待timeout-3s+timeout的時間

再結合業務中的初始化程式碼發現,並沒有對於Dispatcher中的maxRequestsPerHost進行自定義設定,也就意味著同一時間對於每個host的請求數不能大於5,那麼考慮到我分析的這個業務請求對應的host下有很多請求,那就很有可能是這個原因導致的,並且業務同學在這個地方其實也犯了一個低階錯誤,就是在使用者點選隱藏載入框時,沒有及時取消掉對應的請求,這樣其實也造成了請求的浪費。

為了驗證這個結論,查看了10多位發生超時遠大於timeout的使用者日誌,發現都是在Ta們的網路環境切換到2G或者是無網,並且在某個時間段內請求密集時就會發生,說明這個結論是可靠的。

4.解決方法及使用okhttp的建議

找到了原因之後,解決辦法就很簡單了,這其實也是使用okhttp的一點建議:

  • 初始化okhttp時,將Dispatcher中maxRequests和maxRequestsPerHost都設定得比預設值大一些
  • 當用戶點選隱藏載入框時,需要把對應的請求也及時取消掉
  • timeout儘量設定得小一些(比如10s),這樣可以減小弱網環境下手機的負載,同時對於使用者體驗也有好處