1. 程式人生 > >HttpClient超時機制(安全問題處理:訪問超大檔案控制)

HttpClient超時機制(安全問題處理:訪問超大檔案控制)

背景

     最近一直在做專案,其中的一個功能點,主要是訪問外部網站並獲取頁面的字串,具體的網站url完全是由使用者輸入,所以存在一定的安全隱患。

    從測試來看,如果給定的一部電影的url地址,連結會一直不能被關閉,直到資料流被讀完,如果來個幾十次這樣的請求,應用估計也差不多崩潰了

說明:   專案中使用的HttpClient版本是3.0.1

測試

一般的HttpClient使用例子:

1.MultiThreadedHttpConnectionManager manager = new MultiThreadedHttpConnectionManager();  
2.        HttpClient client =
new HttpClient(manager); 3. client.setConnectionTimeout(30000); 4. client.setTimeout(30000); 5. 6. GetMethod get = new GetMethod("http://download.jboss.org/jbossas/7.0/jboss-7.0.0.Alpha1/jboss-7.0.0.Alpha1.zip"); 7. try { 8. client.executeMethod(get); //發起請求 9. String result =
get.getResponseBodyAsString(); //獲取資料 10. } catch (Exception e) { 11. } finally { 12. get.releaseConnection(); //釋放連結 13. }

這裡我給出的一個url是近20MB的一個下載資源,很快發現執行緒要等個很久。 咋辦,得加個timeout超時機制。

1."main" prio=10 tid=0x0899e800 nid=0x4010 runnable [0xb7618000..0xb761a1c8]  
2.   java.lang.
Thread.State: RUNNABLE 3. at java.net.SocketInputStream.socketRead0(Native Method) 4. at java.net.SocketInputStream.read(SocketInputStream.java:129) 5. at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) 6. at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) 7. at java.io.BufferedInputStream.read(BufferedInputStream.java:317) 8. - locked <0xb23a4c30> (a java.io.BufferedInputStream) 9. at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:156) 10. at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170) 11. at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:338) 12. at org.apache.commons.httpclient.ContentLengthInputStream.close(ContentLengthInputStream.java:104) 13. at java.io.FilterInputStream.close(FilterInputStream.java:155) 14. at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:179) 15. at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:143) 16. at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1341)

分析

目前httpClient3.1只支援3種timeout的設定:

  1. connectionTimeout  :  socket建立連結的超時時間,Httpclient包中通過一個非同步執行緒去建立socket連結,對應的超時控制。
  2. timeoutInMilliseconds :  socket read資料的超時時間, socket.setSoTimeout(timeout);
  3. httpConnectionTimeout :  如果那個的是MultiThreadedHttpConnectionManager,對應的是從連線池獲取連結的超時時間。
分析一下問題,我們需要的是一個HttpClient整個連結讀取的一個超時時間,包括請求發起,Http Head解析,response流讀取的一系列時間的總和。 

目標很明確,對應的修正後的測試程式碼:

1.final MultiThreadedHttpConnectionManager manager = new MultiThreadedHttpConnectionManager();  
2.        final HttpClient client = new HttpClient(manager);  
3.        client.setConnectionTimeout(30000);  
4.        client.setTimeout(30000);  
5.        final GetMethod get = new GetMethod(  
6.                                            "http://download.jboss.org/jbossas/7.0/jboss-7.0.0.Alpha1/jboss-7.0.0.Alpha1.zip");  
7.  
8.        Thread t = new Thread(new Runnable() {  
9.  
10.            @Override  
11.            public void run() {  
12.                try {  
13.                    client.executeMethod(get);  
14.                    String result = get.getResponseBodyAsString();  
15.                } catch (Exception e) {  
16.                    // ignore  
17.                }  
18.            }  
19.        }, "Timeout guard");  
20.        t.setDaemon(true);  
21.        t.start();  
22.        try {  
23.            t.join(5000l);  //等待5s後結束  
24.        } catch (InterruptedException e) {  
25.            System.out.println("out finally start");  
26.            ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();  
27.            System.out.println("out finally end");  
28.        }  
29.        if (t.isAlive()) {  
30.            System.out.println("out finally start");  
31.            ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();  
32.            System.out.println("out finally end");  
33.            t.interrupt();  
34.            // throw new TimeoutException();  
35.        }  
36.        System.out.println("done");  

這裡通過Thread.join方法,設定了超時時間為5000 ms,這是比較早的用法。 如果熟悉cocurrent包的,可以直接使用Future和ThreadPoolExecutor進行非同步處理,快取對應的Thread。
1.ExecutorService service = Executors.newCachedThreadPool();  
2.        Future future = service.submit(new Callable<String>() {  
3.  
4.            @Override  
5.            public String call() throws Exception {  
6.  
7.                try {  
8.                    client.executeMethod(get);  
9.                    return get.getResponseBodyAsString();  
10.                } catch (Exception e) {  
11.                    e.printStackTrace();  
12.                } finally {  
13.                    System.out.println("future finally start");  
14.                    ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();  
15.                    System.out.println("future finally end");  
16.                }  
17.  
18.                return "";  
19.            }  
20.  
21.        });  
22.  
23.        try {  
24.            future.get(5000, TimeUnit.MILLISECONDS);  
25.        } catch (Exception e) {  
26.            System.out.println("out finally");  
27.            e.printStackTrace();  
28.            ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();  
29.            System.out.println("out finally end");  
30.        }  
31.  
32.        service.shutdown();  

說明: 這裡為什麼釋放連結未採用get.releaseConnection()

看下release的實現: 

1.public void releaseConnection() {  
2.  
3.        if (responseStream != null) {  
4.            try {  
5.                // FYI - this may indirectly invoke responseBodyConsumed.  
6.                responseStream.close(); // 會先關閉流  
7.            } catch (IOException e) {  
8.                // the connection may not have been released, let's make sure  
9.                ensureConnectionRelease();  
10.            }  
11.        } else {  
12.            // Make sure the connection has been released. If the response   
13.            // stream has not been set, this is the only way to release the   
14.            // connection.   
15.            ensureConnectionRelease();  
16.        }  
17.    } 

  1. 這裡會先關閉responseStream流,這就是問題點。
  2. 對應的responseStream是在方法:readResponseBody(HttpConnection conn)。一般的html頁面返回的是一個ContentLengthInputStream物件
  3. ContentLengthInputStream在呼叫close方法時會用ChunkedInputStream.exhaustInputStream讀完所有流資料
    1.public void close() throws IOException {  
    2.        if (!closed) {  
    3.            try {  
    4.                ChunkedInputStream.exhaustInputStream(this);  
    5.            } finally {  
    6.                // close after above so that we don't throw an exception trying  
    7.                // to read after closed!  
    8.                closed = true;  
    9.            }  
    10.        }  
    11.    }  
    

     
  1. ChunkedInputStream.exhaustInputStream程式碼
    1.static void exhaustInputStream(InputStream inStream) throws IOException {  
    2.        // read and discard the remainder of the message  
    3.        byte buffer[] = new byte[1024];  
    4.        while (inStream.read(buffer) >= 0) {  
    5.            ;   
    6.        }  
    7.    }
    

     
 說明: 
  • 因為非sleep和park的方法,不會響應InterruptedException事件,所以普通future超時發起的Thread.interrpt()並沒有效果。
  • 預設的SimpleHttpConnectionManager不支援這樣的操作,所以選MultiThreadedHttpConnectionManager.shutdown()方法,強制關閉底層HttpConnection的sock的輸入輸出流。

總結

  1. 理解一下HttpClient這樣設計的理由: socket重用,keepAlive協議的支援等,保證上一次資料不會對新的請求有影響。
  2. Thread.interrpt()處理,只會在Thread處於sleep或者wait狀態才會被喚醒(api的描述)。而且該方法的呼叫並不自動產生InterruptedException異常,一般是需要自己判斷Thread.isInterrupted(),然後throw異常。 我們目前使用的一些jdk cocurrent類比如future.cancel也是類似處理。