使用requests爬取易物天下商品型別實戰.md
阿新 • • 發佈:2018-12-11
##使用requests爬取易物天下商品型別實戰
-
確定要爬取的資料
-
爬取的是首頁的行業分類
-
確定資料來源
-
先使用requests.get方法獲取網頁並沒有行業分類
response = requests.get(url, params = qs, headers = headers)
-
有可能資料是通過傳送ajax獲取來的
瀏覽器開啟網址,右鍵檢查,選擇network,發現果然是通過ajax傳送來請求資料
-
-
-
開始爬取資料
-
因為資料是通過ajax請求的,所以我直接把瀏覽器上所有的Request.headers中的所有欄位拷貝下來,變成一個字典
headers={ "Accept": "application/json, text/javascript, */*; q=0.01", 'Accept-Encoding': "gzip, deflate", 'Accept-Language': 'zh-CN,zh;q=0.9', 'Connection': 'keep-alive', 'Content-Length': '4', 'Content-Type': 'application/x-www-form-urlencoded', 'Cookie': 'JSESSIONID=C7BD7DFF7031A1A7EE3B71336BE03419; gr_user_id=47edb7df-b13e-4c2c-8c0c-0db4c28f09ff; Hm_lvt_10bdb52fd1832ac4eeceeabdc4df132f=1537604218; Hm_lpvt_10bdb52fd1832ac4eeceeabdc4df132f=1537608109; gr_session_id_a08ca0a390ddd043=9646c1ab-1d0c-41be-819e-51a04b592b26; gr_session_id_a08ca0a390ddd043_9646c1ab-1d0c-41be-819e-51a04b592b26=true', 'Host': 'www.i1515.com', 'Origin': 'http://www.i1515.com', 'Referer': 'http://www.i1515.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest' }
-
檢視是否Form Data中是否有欄位,如果有,轉化成字典
data={ "id":"1" }
-
最後我發現網站一共傳送12次ajax請求,並且每一次的id不同,所以我只需要通過迴圈來發送請求,將資料暫時儲存在json檔案中
for i in range(1,12): data["id"]=str(i) try: response = requests.post(url=url, headers=headers, data=data) print(i) print(type(response.json())) result=response.json() print(type(response.json())=="dict") if type(response.json())==type({}): print(response.json()) with open('type{}.json'.format(i),'w',encoding='utf-8') as f: json.dump(result,f,ensure_ascii=False) f.close() except Exception as ex: print(ex)
-
-
將json檔案中的資料儲存到資料庫中
-
迴圈遍歷每個檔案
with open('myspiders/type{}.json'.format(index), 'r', encoding='utf-8') as f:
-
開啟資料庫
conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='', db='orsp', charset='utf8')
-
最後插入資料
-
##原始碼
-
import requests import json url='http://www.i1515.com/v2/category/getOtherCategory.html' headers={ "Accept": "application/json, text/javascript, */*; q=0.01", 'Accept-Encoding': "gzip, deflate", 'Accept-Language': 'zh-CN,zh;q=0.9', 'Connection': 'keep-alive', 'Content-Length': '4', 'Content-Type': 'application/x-www-form-urlencoded', 'Cookie': 'JSESSIONID=C7BD7DFF7031A1A7EE3B71336BE03419; gr_user_id=47edb7df-b13e-4c2c-8c0c-0db4c28f09ff; Hm_lvt_10bdb52fd1832ac4eeceeabdc4df132f=1537604218; Hm_lpvt_10bdb52fd1832ac4eeceeabdc4df132f=1537608109; gr_session_id_a08ca0a390ddd043=9646c1ab-1d0c-41be-819e-51a04b592b26; gr_session_id_a08ca0a390ddd043_9646c1ab-1d0c-41be-819e-51a04b592b26=true', 'Host': 'www.i1515.com', 'Origin': 'http://www.i1515.com', 'Referer': 'http://www.i1515.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest' } data={ "id":"1" } for i in range(1,12): data["id"]=str(i) try: response = requests.post(url=url, headers=headers, data=data) print(i) print(type(response.json())) result=response.json() print(type(response.json())=="dict") if type(response.json())==type({}): print(response.json()) with open('type{}.json'.format(i),'w',encoding='utf-8') as f: json.dump(result,f,ensure_ascii=False) f.close() except Exception as ex: print(ex)
-
將資料寫入到資料庫中的write_data.py
import json import pymysql for index in range(1,12): try: with open('myspiders/type{}.json'.format(index), 'r', encoding='utf-8') as f: data = json.load(f) print(data["name"]) conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='', db='orsp', charset='utf8') # 建立遊標物件 cursor = conn.cursor() # 先查出name對應的id sql_id_Byname = 'SELECT id FROM product_type WHERE product_type="{}"'.format(data["name"]) cursor.execute(sql_id_Byname) res_id = cursor.fetchone() res_id = res_id[0] print(res_id) # 再插入二級型別 for i in range(len(data["sCate"])): sql_insert_two = "INSERT INTO `product_type_two` (`product_type_one_id`, `type_two_name`) VALUES ('{0}', '{1}')" two_type = data["sCate"][i]["name"] print("two_type", two_type) sql_insert_two = sql_insert_two.format(res_id, two_type) print(sql_insert_two) cursor.execute(sql_insert_two) insert_id = conn.insert_id() print("insert_id", insert_id) three_data = data["sCate"][i]["tCategorys"] for j in three_data: print(j["name"]) sql_insert_three = "INSERT INTO `product_type_three` (`product_type_two_id`, `type_three_name`) VALUES ({0}, '{1}')" sql_insert_three = sql_insert_three.format(insert_id, j["name"]) print(sql_insert_three) cursor.execute(sql_insert_three) conn.commit() except Exception as ex: print(ex)