[python]利用requests爬取成績
阿新 • • 發佈:2018-12-17
新手初學可能有一些地方理解不對的請理解哈
看著我周邊的大佬們爬教務,用python寫程式搶課,我也產生了學習python的想法,然而,菜就是菜,很多東西我都一點都不瞭解,糊里糊塗弄出來這麼個東西,裡面還有許多坑要填
下面列一下我想的東西,首先,我認為我們瀏覽網頁就是從本地給他傳送一個請求,然後接受伺服器端的資料展現在瀏覽器中,所以我們可以通過requests模組來構建post,get請求,模擬訪問。
首先是模擬登陸,用fiddler發現我們傳送的請求裡面學號沒有變,但是密碼變成了一串奇怪的字串,查詢得知這是md5加密(只知道這個名字具體啥也不會),利用hashlib對密碼進行加密,注意首先要將密碼轉化為Bytes格式
def login(): id = input('請輸入學號:') password = input('請輸入密碼:') url = 'http://bkjws.sdu.edu.cn/b/ajaxLogin' code=password.encode(encoding="utf-8") #md5加密演算法 m = hashlib.md5() m.update(code) payload = {'j_username':***, 'j_password': m.hexdigest()} #構建post資訊正文 headers = {'cookie': 'index=***; JSESSIONID=*******', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; LCTE; rv:11.0) like Gecko'} r = requests.post(url, data=payload, headers=headers) print(r.text)
r.text返回成功代表模擬登陸成功,還有cookie裡面的JSESSIONID是可以修改的可能,反正我改了一下沒什麼
接著進行對主頁的訪問請求
def next(): url1 = 'http://bkjws.sdu.edu.cn/f/common/main' headers1 = {'cookie': 'index=*****; JSESSIONID=***********', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; LCTE; rv:11.0) like Gecko'} r1 = requests.get(url1, headers=headers1)
發一個post請求得到成績,正文是經過URL編碼過的所以先在網上進行解碼
def getscore():
url3 = 'http://bkjws.sdu.edu.cn/b/cj/cjcx/xs/lscx'
k = '[{"name":"sEcho","value":1},{"name":"iColumns","value":10},{"name":"sColumns","value":""},{"name":"iDisplayStart","value":0},{"name":"iDisplayLength","value":20},{"name":"mDataProp_0","value":"xnxq"},{"name":"mDataProp_1","value":"kch"},{"name":"mDataProp_2","value":"kcm"},{"name":"mDataProp_3","value":"kxh"},{"name":"mDataProp_4","value":"xf"},{"name":"mDataProp_5","value":"kssj"},{"name":"mDataProp_6","value":"kscjView"},{"name":"mDataProp_7","value":"wfzjd"},{"name":"mDataProp_8","value":"wfzdj"},{"name":"mDataProp_9","value":"kcsx"},{"name":"iSortCol_0","value":5},{"name":"sSortDir_0","value":"desc"},{"name":"iSortingCols","value":1},{"name":"bSortable_0","value":false},{"name":"bSortable_1","value":false},{"name":"bSortable_2","value":false},{"name":"bSortable_3","value":false},{"name":"bSortable_4","value":false},{"name":"bSortable_5","value":true},{"name":"bSortable_6","value":false},{"name":"bSortable_7","value":false},{"name":"bSortable_8","value":false},{"name":"bSortable_9","value":false}]'
payload = {'aoData': k}
headers = {'X-Requested-With': 'XMLHttpRequest',
'cookie': 'index=****; j_username=***********; j_password=*********;JSESSIONID=************',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; LCTE; rv:11.0) like Gecko',
'Accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01'}
r3 = requests.post(url3, data=payload, headers=headers)
q = json.loads(r3.text)
返回值是一個json字串,所以先將它轉化為字典,再進行對於資料的一些處理
def display(q):
q1 = q['object']
sum = q1['iTotalRecords']
print('共有%s門課' % sum)
a = q1['aaData']
for v in a:
print('課程名:%s'%v['kcm'])
print('考試成績:%s 期末成績:%s 平時成績:%s 實驗成績:%s'%(v['kscj'],v['qmcj'],v['pscj'],['sycj']))
print('等級:%s 績點:%s'%(v['wfzdj'],v['wfzjd']))
這個程式就此完成了,雖然這個很簡陋,但是我依然覺著很不錯