HttpClient超時機制(安全問題處理:訪問超大檔案控制)
阿新 • • 發佈:2019-01-05
背景
最近一直在做專案,其中的一個功能點,主要是訪問外部網站並獲取頁面的字串,具體的網站url完全是由使用者輸入,所以存在一定的安全隱患。
從測試來看,如果給定的一部電影的url地址,連結會一直不能被關閉,直到資料流被讀完,如果來個幾十次這樣的請求,應用估計也差不多崩潰了
說明: 專案中使用的HttpClient版本是3.0.1
測試
一般的HttpClient使用例子:
1.MultiThreadedHttpConnectionManager manager = new MultiThreadedHttpConnectionManager();
2. HttpClient client = new HttpClient(manager);
3. client.setConnectionTimeout(30000);
4. client.setTimeout(30000);
5.
6. GetMethod get = new GetMethod("http://download.jboss.org/jbossas/7.0/jboss-7.0.0.Alpha1/jboss-7.0.0.Alpha1.zip");
7. try {
8. client.executeMethod(get); //發起請求
9. String result = get.getResponseBodyAsString(); //獲取資料
10. } catch (Exception e) {
11. } finally {
12. get.releaseConnection(); //釋放連結
13. }
這裡我給出的一個url是近20MB的一個下載資源,很快發現執行緒要等個很久。 咋辦,得加個timeout超時機制。
1."main" prio=10 tid=0x0899e800 nid=0x4010 runnable [0xb7618000..0xb761a1c8]
2. java.lang. Thread.State: RUNNABLE
3. at java.net.SocketInputStream.socketRead0(Native Method)
4. at java.net.SocketInputStream.read(SocketInputStream.java:129)
5. at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
6. at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
7. at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
8. - locked <0xb23a4c30> (a java.io.BufferedInputStream)
9. at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:156)
10. at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170)
11. at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:338)
12. at org.apache.commons.httpclient.ContentLengthInputStream.close(ContentLengthInputStream.java:104)
13. at java.io.FilterInputStream.close(FilterInputStream.java:155)
14. at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:179)
15. at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:143)
16. at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1341)
分析
目前httpClient3.1只支援3種timeout的設定:
- connectionTimeout : socket建立連結的超時時間,Httpclient包中通過一個非同步執行緒去建立socket連結,對應的超時控制。
- timeoutInMilliseconds : socket read資料的超時時間, socket.setSoTimeout(timeout);
- httpConnectionTimeout : 如果那個的是MultiThreadedHttpConnectionManager,對應的是從連線池獲取連結的超時時間。
目標很明確,對應的修正後的測試程式碼:
1.final MultiThreadedHttpConnectionManager manager = new MultiThreadedHttpConnectionManager();
2. final HttpClient client = new HttpClient(manager);
3. client.setConnectionTimeout(30000);
4. client.setTimeout(30000);
5. final GetMethod get = new GetMethod(
6. "http://download.jboss.org/jbossas/7.0/jboss-7.0.0.Alpha1/jboss-7.0.0.Alpha1.zip");
7.
8. Thread t = new Thread(new Runnable() {
9.
10. @Override
11. public void run() {
12. try {
13. client.executeMethod(get);
14. String result = get.getResponseBodyAsString();
15. } catch (Exception e) {
16. // ignore
17. }
18. }
19. }, "Timeout guard");
20. t.setDaemon(true);
21. t.start();
22. try {
23. t.join(5000l); //等待5s後結束
24. } catch (InterruptedException e) {
25. System.out.println("out finally start");
26. ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();
27. System.out.println("out finally end");
28. }
29. if (t.isAlive()) {
30. System.out.println("out finally start");
31. ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();
32. System.out.println("out finally end");
33. t.interrupt();
34. // throw new TimeoutException();
35. }
36. System.out.println("done");
這裡通過Thread.join方法,設定了超時時間為5000 ms,這是比較早的用法。 如果熟悉cocurrent包的,可以直接使用Future和ThreadPoolExecutor進行非同步處理,快取對應的Thread。
1.ExecutorService service = Executors.newCachedThreadPool();
2. Future future = service.submit(new Callable<String>() {
3.
4. @Override
5. public String call() throws Exception {
6.
7. try {
8. client.executeMethod(get);
9. return get.getResponseBodyAsString();
10. } catch (Exception e) {
11. e.printStackTrace();
12. } finally {
13. System.out.println("future finally start");
14. ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();
15. System.out.println("future finally end");
16. }
17.
18. return "";
19. }
20.
21. });
22.
23. try {
24. future.get(5000, TimeUnit.MILLISECONDS);
25. } catch (Exception e) {
26. System.out.println("out finally");
27. e.printStackTrace();
28. ((MultiThreadedHttpConnectionManager) client.getHttpConnectionManager()).shutdown();
29. System.out.println("out finally end");
30. }
31.
32. service.shutdown();
說明: 這裡為什麼釋放連結未採用get.releaseConnection()
看下release的實現:
1.public void releaseConnection() {
2.
3. if (responseStream != null) {
4. try {
5. // FYI - this may indirectly invoke responseBodyConsumed.
6. responseStream.close(); // 會先關閉流
7. } catch (IOException e) {
8. // the connection may not have been released, let's make sure
9. ensureConnectionRelease();
10. }
11. } else {
12. // Make sure the connection has been released. If the response
13. // stream has not been set, this is the only way to release the
14. // connection.
15. ensureConnectionRelease();
16. }
17. }
- 這裡會先關閉responseStream流,這就是問題點。
- 對應的responseStream是在方法:readResponseBody(HttpConnection conn)。一般的html頁面返回的是一個ContentLengthInputStream物件
- ContentLengthInputStream在呼叫close方法時會用ChunkedInputStream.exhaustInputStream讀完所有流資料
1.public void close() throws IOException { 2. if (!closed) { 3. try { 4. ChunkedInputStream.exhaustInputStream(this); 5. } finally { 6. // close after above so that we don't throw an exception trying 7. // to read after closed! 8. closed = true; 9. } 10. } 11. }
- ChunkedInputStream.exhaustInputStream程式碼
1.static void exhaustInputStream(InputStream inStream) throws IOException { 2. // read and discard the remainder of the message 3. byte buffer[] = new byte[1024]; 4. while (inStream.read(buffer) >= 0) { 5. ; 6. } 7. }
- 因為非sleep和park的方法,不會響應InterruptedException事件,所以普通future超時發起的Thread.interrpt()並沒有效果。
- 預設的SimpleHttpConnectionManager不支援這樣的操作,所以選擇MultiThreadedHttpConnectionManager.shutdown()方法,強制關閉底層HttpConnection的sock的輸入輸出流。
總結
- 理解一下HttpClient這樣設計的理由: socket重用,keepAlive協議的支援等,保證上一次資料不會對新的請求有影響。
- Thread.interrpt()處理,只會在Thread處於sleep或者wait狀態才會被喚醒(api的描述)。而且該方法的呼叫並不自動產生InterruptedException異常,一般是需要自己判斷Thread.isInterrupted(),然後throw異常。 我們目前使用的一些jdk cocurrent類比如future.cancel也是類似處理。