Python資料爬蟲學習筆記(21)爬取京東商品JSON資訊並解析
阿新 • • 發佈:2018-12-13
一、需求:有一個通過抓包得到的京東商品的JSON連結,解析該JSON內容,並提取出特定id的商品價格p,json內容如下:
jQuery923933([{"op":"7599.00","m":"9999.00","id":"J_5089253","p":"7099.00"}, {"op":"48.00","m":"96.00","id":"J_16463451903","p":"38.00"}, {"op":"59.00","m":"229.00","id":"J_33440061157","p":"59.00"}, {"op":"79.00","m":"80.00","id":"J_6027746","p":"79.00"}, {"op":"32.90","m":"59.00","id":"J_33183063203","p":"32.90"}, {"op":"169.00","m":"699.00","id":"J_33341525798","p":"169.00"}, {"op":"228.00","m":"399.00","id":"J_30639439257","p":"228.00"}, {"op":"188.00","m":"199.00","id":"J_25539002541","tpp":"130.00","up":"tpp","p":"138.00"}, {"op":"55.00","m":"99.00","id":"J_3136674","p":"39.90"}, {"op":"25.90","m":"55.90","id":"J_5338456","p":"22.50"}, {"op":"50.00","m":"50.00","id":"J_11170365589","p":"50.00"}]);
注意到該json內容是一個數組(array),由中括號[ ]括起來,並非是一個由大括號{ }括起來的物件(object)。
二、編寫程式碼:
import urllib.request import re import json #爬取json資料內容 data=urllib.request.urlopen("https://p.3.cn/prices/mgets?callback=jQuery923933&type=1&area=1&pdtk=&pduid=15374502312291140901533&pdpin=&pin=null&pdbp=0&skuIds=J_5089253%2CJ_16463451903%2CJ_33440061157%2CJ_6027746%2CJ_33183063203%2CJ_33341525798%2CJ_30639439257%2CJ_25539002541%2CJ_3136674%2CJ_5338456%2CJ_11170365589&ext=11100000&source=item-pc").read() #將資料內容轉換為字串 str1=str(data) #去掉字串的無用資訊,本例為首尾的圓括號前後部分 str1 = str1[(str1.find('(')+1):str1.rfind(')')] #將json資料轉換為python資料格式,此處jdata為list陣列 jdata=json.loads(str1) #遍歷資料,找出特定id的p數值 for i in range(0,len(jdata)): jdataObj=jdata[i] if jdataObj["id"]=="J_5089253": print(jdataObj["p"])
三、補充: