java 實現百度熊掌號歷史資源記錄提交
阿新 • • 發佈:2019-01-23
最近在做一個需求,需要將大量的歷史記錄url提交給百度熊掌號資源搜尋平臺,雖然熊賬號給提供了手動提交的工具,但是這種方式的提交費時費力,尤其是在有很多的url需要提交時使用這個方式提交很明顯效率低下,所以可以採用提供api提交的方式,
一 百度熊掌號賬號獲取(這個可以自己百度申請賬號)
二 看上圖,這是官方提供的api說明(這個需要登入自己的賬號才可以看到),實際上說到這裡基本上已經知道怎麼批量提交資料,但是這裡有幾點需要說明一下:
1)批量提交時url中的type需要設定為batch,進行批量提交
2)單次提交時上限是2000個,否則會返回超出提交上限
三 程式碼實現
import org.apache.commons.io.FileUtils; importorg.apache.commons.lang3.StringUtils; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.StatusLine; import org.apache.http.client.HttpResponseException; import org.apache.http.client.ResponseHandler; import org.apache.http.client.methods.CloseableHttpResponse; importorg.apache.http.client.methods.HttpPost; import org.apache.http.entity.StringEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClientBuilder; import org.apache.http.util.EntityUtils;
private static CloseableHttpClient client;
client = HttpClientBuilder.create().disableAutomaticRetries().build(); //建立客戶端
private void urlsPush(List<String> urlList, String type) { if (CollectionUtils.isEmpty(urlList)) { return; }
//進行分組提交 int times = urlList.size() % 2000 == 0 ? (urlList.size() / 2000) : (urlList.size() / 2000 + 1); for (int i = 0; i < times; i++) { int end = (i + 1) * 2000; if (end >= urlList.size()) { end = urlList.size(); } List<String> subList = urlList.subList(i * 2000, end); StringBuilder sb = new StringBuilder(); subList.stream().forEach(url -> { sb.append(url); sb.append("\r\n"); }); String params = sb.toString(); // saveAsFile(params, type); // 判斷是提交還是儲存到檔案中 if (maxPushSize <= 2000) { saveAsFile(params, type); continue; } HttpPost request = new HttpPost(URL); request.setHeader("content-type", "text/plain"); HttpEntity entity = new StringEntity(params, Charset.defaultCharset()); request.setEntity(entity); CloseableHttpResponse response = null; try { logger.info("正在推送資料,本次推送{}條,推送內容:{}", subList.size(), type); long singlePushStart = System.currentTimeMillis(); response = client.execute(request); logger.info("單次推送完成,本次共計用時{}ms", System.currentTimeMillis() - singlePushStart); } catch (IOException e) { e.printStackTrace(); logger.info("推送資料異常"); saveAsFile(params, type); continue; } StatusLine statusLine = response.getStatusLine(); HttpEntity responseEntity = response.getEntity(); if (statusLine.getStatusCode() != 200 || responseEntity == null) { logger.info("資料獲取異常"); saveAsFile(params, type); continue; } String respStr = ""; try { respStr = EntityUtils.toString(responseEntity); } catch (Exception ex) { ex.printStackTrace(); } if (StringUtils.isNotBlank(respStr)) { try { PushUrlsResponse result = null; result = JSONObject.parseObject(respStr, PushUrlsResponse.class); this.maxPushSize = result.getRemain_batch(); this.successSize += result.getSuccess_batch(); } catch (Exception e) { logger.info("解析返回內容出現問題,返回內容{}", respStr); saveAsFile(params, type); } } } }
/** * url提交響應結果 */ private static class PushUrlsResponse { /** * 成功提交條數 */ private int success_batch; /** * 剩餘可提交數 */ private int remain_batch; public int getSuccess_batch() { return success_batch; } public void setSuccess_batch(int success_batch) { this.success_batch = success_batch; } public int getRemain_batch() { return remain_batch; } public void setRemain_batch(int remain_batch) { this.remain_batch = remain_batch; } }
我這裡對未成功提交的資料,寫到了檔案中進行儲存,所以會有儲存的方法
saveAsFile
引數param是提交的url內容,type為檔案中內容的型別(我這裡url內容分類比較多,所以需要一個type來標記檔案中內容的型別)