【httpclient編寫爬蟲】post提交json資料和普通鍵值
阿新 • • 發佈:2019-01-01
寫在開頭
在開發爬蟲的過程中,難免碰到post提交的問題。
本文比較了兩種資料提交方式,並且使用httpclient模擬網站post提交兩種資料。
我見過的post提交方式有兩種:
- 普通的鍵值對提交方式;
- 提交json資料。
我所使用的httpclient版本
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version >
</dependency>
普通鍵值對的提交方式
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpPost httpPost = new HttpPost("http://targethost/login");
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
nvps.add(new BasicNameValuePair("username", "vip"));
nvps.add(new BasicNameValuePair("password" , "secret"));
httpPost.setEntity(new UrlEncodedFormEntity(nvps));
CloseableHttpResponse response2 = httpclient.execute(httpPost);
try {
System.out.println(response2.getStatusLine());
HttpEntity entity2 = response2.getEntity();
// do something useful with the response body
// and ensure it is fully consumed
EntityUtils.consume(entity2);
} finally {
response2.close();
}
JSON資料提交方式
要提交的資料
{
"username" : "vip",
"password" : "secret"
}
程式碼
import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import java.io.IOException;
/**
* Created by CarlZhang on 2017/1/1.
*/
public class PostJsonTest {
public static void main(String[] args) {
CloseableHttpClient httpclient = HttpClients.createDefault();
try {
HttpPost httpPost = new HttpPost("http://targethost/login");
//json資料{"username":"vip","password":"secret"}
String jsonStr = "{\"username\":\"vip\",\"password\":\"secret\"}";
StringEntity se = new StringEntity(jsonStr, Consts.UTF_8);
se.setContentEncoding("UTF-8");
se.setContentType("application/json");
httpPost.setEntity(se);
CloseableHttpResponse response2 = httpclient.execute(httpPost);
try {
System.out.println(response2.getStatusLine());
HttpEntity entity2 = response2.getEntity();
// do something useful with the response body
// and ensure it is fully consumed
//EntityUtils.consume(entity2);
String res = EntityUtils.toString(entity2);
System.out.println(res);
} finally {
response2.close();
}
} catch (IOException e) {
e.printStackTrace();
}finally {
try {
httpclient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
例項-JSON提交
如下是我在某個網站點擊發帖,然後在chrome (按住F12鍵)開啟的dubug工具,可以看到我提交的post請求。其中Form Data就是使用post提交的json資料。
後端怎麼拿到這些資料的呢?
該網站使用的開源庫latke中的Requests類
可以看到,它是通過Reader物件去讀入流資料的。
/**
* Gets the request json object with the specified request.
*
* @param request the specified request
* @param response the specified response, sets its content type with "application/json"
* @return a json object
* @throws ServletException servlet exception
* @throws IOException io exception
*/
public static JSONObject parseRequestJSONObject(final HttpServletRequest request, final HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("application/json");
final StringBuilder sb = new StringBuilder();
BufferedReader reader;
final String errMsg = "Can not parse request[requestURI=" + request.getRequestURI() + ", method=" + request.getMethod()
+ "], returns an empty json object";
try {
try {
reader = request.getReader();
} catch (final IllegalStateException illegalStateException) {
reader = new BufferedReader(new InputStreamReader(request.getInputStream()));
}
String line = reader.readLine();
while (null != line) {
sb.append(line);
line = reader.readLine();
}
reader.close();
String tmp = sb.toString();
if (Strings.isEmptyOrNull(tmp)) {
tmp = "{}";
}
return new JSONObject(tmp);
} catch (final Exception ex) {
LOGGER.log(Level.ERROR, errMsg, ex);
return new JSONObject();
}
}
另外,前端js程式碼是通過jquery的類庫去提交的json資料