1. 程式人生 > >[python]利用requests爬取成績

[python]利用requests爬取成績

新手初學可能有一些地方理解不對的請理解哈

看著我周邊的大佬們爬教務,用python寫程式搶課,我也產生了學習python的想法,然而,菜就是菜,很多東西我都一點都不瞭解,糊里糊塗弄出來這麼個東西,裡面還有許多坑要填

下面列一下我想的東西,首先,我認為我們瀏覽網頁就是從本地給他傳送一個請求,然後接受伺服器端的資料展現在瀏覽器中,所以我們可以通過requests模組來構建post,get請求,模擬訪問。

首先是模擬登陸,用fiddler發現我們傳送的請求裡面學號沒有變,但是密碼變成了一串奇怪的字串,查詢得知這是md5加密(只知道這個名字具體啥也不會),利用hashlib對密碼進行加密,注意首先要將密碼轉化為Bytes格式

def login():
    id = input('請輸入學號:')
    password = input('請輸入密碼:')
    url = 'http://bkjws.sdu.edu.cn/b/ajaxLogin'
    code=password.encode(encoding="utf-8")
    #md5加密演算法
    m = hashlib.md5()
    m.update(code)
    payload = {'j_username':***, 'j_password': m.hexdigest()} #構建post資訊正文
    headers = {'cookie': 'index=***; JSESSIONID=*******',
               'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; LCTE; rv:11.0) like Gecko'}
    r = requests.post(url, data=payload, headers=headers)
    print(r.text)
r.text返回成功代表模擬登陸成功,還有cookie裡面的JSESSIONID是可以修改的可能,反正我改了一下沒什麼

接著進行對主頁的訪問請求

def next():
    url1 = 'http://bkjws.sdu.edu.cn/f/common/main'
    headers1 = {'cookie': 'index=*****; JSESSIONID=***********',
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; LCTE; rv:11.0) like Gecko'}
    r1 = requests.get(url1, headers=headers1)

發一個post請求得到成績,正文是經過URL編碼過的所以先在網上進行解碼

def getscore():
    url3 = 'http://bkjws.sdu.edu.cn/b/cj/cjcx/xs/lscx'
    k = '[{"name":"sEcho","value":1},{"name":"iColumns","value":10},{"name":"sColumns","value":""},{"name":"iDisplayStart","value":0},{"name":"iDisplayLength","value":20},{"name":"mDataProp_0","value":"xnxq"},{"name":"mDataProp_1","value":"kch"},{"name":"mDataProp_2","value":"kcm"},{"name":"mDataProp_3","value":"kxh"},{"name":"mDataProp_4","value":"xf"},{"name":"mDataProp_5","value":"kssj"},{"name":"mDataProp_6","value":"kscjView"},{"name":"mDataProp_7","value":"wfzjd"},{"name":"mDataProp_8","value":"wfzdj"},{"name":"mDataProp_9","value":"kcsx"},{"name":"iSortCol_0","value":5},{"name":"sSortDir_0","value":"desc"},{"name":"iSortingCols","value":1},{"name":"bSortable_0","value":false},{"name":"bSortable_1","value":false},{"name":"bSortable_2","value":false},{"name":"bSortable_3","value":false},{"name":"bSortable_4","value":false},{"name":"bSortable_5","value":true},{"name":"bSortable_6","value":false},{"name":"bSortable_7","value":false},{"name":"bSortable_8","value":false},{"name":"bSortable_9","value":false}]'
    payload = {'aoData': k}
    headers = {'X-Requested-With': 'XMLHttpRequest',
               'cookie': 'index=****; j_username=***********; j_password=*********;JSESSIONID=************',
               'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; LCTE; rv:11.0) like Gecko',
               'Accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01'}
    r3 = requests.post(url3, data=payload, headers=headers)
    q = json.loads(r3.text)

返回值是一個json字串,所以先將它轉化為字典,再進行對於資料的一些處理

def display(q):
    q1 = q['object']
    sum = q1['iTotalRecords']
    print('共有%s門課' % sum)
    a = q1['aaData']
    for v in a:
        print('課程名:%s'%v['kcm'])
        print('考試成績:%s   期末成績:%s  平時成績:%s 實驗成績:%s'%(v['kscj'],v['qmcj'],v['pscj'],['sycj']))
        print('等級:%s  績點:%s'%(v['wfzdj'],v['wfzjd']))

這個程式就此完成了,雖然這個很簡陋,但是我依然覺著很不錯