Python-爬取"我去圖書館"座位編碼
背景
曾幾何時,去圖書館都是在終端上拿校園卡刷,這就意味著,人必須去,當然啦也有拿著卡代刷的,確實不妥。很久沒去過圖書館了,現在的圖書館都採用微信工作號“我去圖書館”,在上面進行預約,然後在預約後規定時間裡去圖書館終端上刷碼,同時也可以進行明日預約,這個功能能夠讓很多人不用擔心明天早起排隊,然後看似很棒的東西,居然出現刷坐程式,特別是明日預約,瞬間被預約完成,有點噁心,但是還是那句話,技術本身是無罪的,有"罪"的是使用的人。這不一個同學介紹了一個同學,他拿到了刷票程式,python實現,但祖傳自南京某大學,不適用whut啊,因此我好好的看了一下,然後首先要解決的就是獲取微信的sessionID,這個已經可以通過抓包實現,見https://fanjiajia.cn/2018/11/21/Mac%E4%B8%8B%E4%BD%BF%E7%94%A8Charles%E6%8A%93%E5%8C%85Android/
然後就是要獲取圖書館的位置編碼,也就是如何給圖書館的位置變號的。
爬取位置頁面
首先肯定是要獲取位置頁面的html,同上抓包工具,抓到了url,But這個連結不能在瀏覽器中直接開啟,瀏覽器會提示說請用微信客戶端開啟,如果在爬蟲程式中直接使用request,那麼封裝的Header肯定指定的發起瀏覽器需要和微信使用的一直。那怎麼辦呢,不著急,在抓包中,直接獲取Text,這就是返回的html文字,copy出來,存在本地,然後讀取本地檔案進行解析。
- 擷取html檔案中的位置繪製部分如下:
<div class="grid_cell grid_1" data-key="5,7" style="left:280px;top:210px;"> <em>3</em> </div> <div class="grid_cell grid_active" data-key="5,8" style="left:315px;top:210px;"> <em>2</em> </div> <div class="grid_cell grid_8" data-key="5,11" style="left:420px;top:210px;"> <em></em> </div> <div class="grid_cell grid_8" data-key="5,12" style="left:455px;top:210px;"> <em></em> </div> <div class="grid_cell grid_8" data-key="5,13" style="left:490px;top:210px;"> <em></em> </div> <div class="grid_cell grid_8" data-key="5,14" style="left:525px;top:210px;"> <em></em> </div> <div class="grid_cell grid_active" data-key="5,17" style="left:630px;top:210px;"> <em>29</em> </div>
分析上面的程式碼,看見一個
div
就是一個位置,位置的編號就是em
的文字內容,我們主要是需要這個位置data-key
頁面解析
上面的程式碼已經很清楚了,需要獲取em標籤文字不為空時的父表情的data-key屬性。- 第一步:匯入需要的庫,
from bs4 import BeautifulSoup
這裡只需要用BeautifulSoup來解析即可。
- 第二步: 定義獲取一個房間(一個html)的位置
def getOneRoomSeats(url): with open(url,'r') as wb_data: soup = BeautifulSoup(wb_data,'lxml') # print(Soup.prettify()) seats = {} for tag in soup.find_all('em'): # for迴圈遍歷所有a標籤,並把返回列表中的內容賦給t if tag.string != None: seats[tag.string] = tag.parent.get('data-key') return seats
這裡採用字典的方式進行儲存,比如{'29':'5,17'}
,就代表29號座的資訊,那麼最後一個房間的所有座位都變成了這樣的字典形式。
- 第三步: 獲取所有房間的位置資訊
def getAllRoomSeats(urls):
AllLibSeats = {}
a = 1
for url in urls:
seats = getOneRoomSeats(url)
AllLibSeats[a] = seats
a = a + 1
return AllLibSeats
urls
是一個列表,表示上面所爬取的存在本地的所有檔案路徑。最後用一個字典儲存所有的房間位置資訊。
最後
- 好了,這樣就把所有的位置資訊獲取到了,這裡,我就把我們學校WHUT的圖書館在"我去圖書館"的位置資訊送上吧,雖然我看的那個刷坐程式現在似乎已經不行了,因為給任何一個結尾的url返回的都是預約成功。
R1_SEATTABLE={'3': '5,7', '2': '5,8', '29': '5,17', '30': '5,18', '4': '7,7', '1': '7,8', '31': '7,17', '32': '7,18', '5': '9,7', '6': '9,8', '7': '11,7', '8': '11,8', '10': '13,7', '9': '13,8', '33': '13,17', '34': '13,18', '11': '15,7', '12': '15,8', '35': '15,17', '36': '15,18', '13': '17,7', '14': '17,8', '37': '17,17', '38': '17,18', '15': '19,7', '16': '19,8', '39': '19,17', '40': '19,18', '17': '21,7', '18': '21,8', '41': '21,17', '42': '21,18', '19': '23,7', '20': '23,8', '43': '23,17', '44': '23,18', '21': '25,7', '22': '25,8', '23': '27,7', '24': '27,8', '25': '29,7', '26': '29,8', '45': '29,17', '46': '29,18', '27': '31,7', '28': '31,8', '47': '31,17', '48': '31,18'}
# 二樓(電子閱覽室)
R2_SEATTABLE={'32': '5,11', '33': '5,12', '26': '5,17', '27': '5,18', '31': '6,10', '34': '6,13', '25': '6,16', '28': '6,19', '36': '7,11', '35': '7,12', '30': '7,17', '29': '7,18', '38': '9,11', '39': '9,12', '20': '9,17', '21': '9,18', '37': '10,10', '40': '10,13', '19': '10,16', '22': '10,19', '42': '11,11', '41': '11,12', '24': '11,17', '23': '11,18', '44': '13,11', '45': '13,12', '14': '13,17', '15': '13,18', '43': '14,10', '46': '14,13', '13': '14,16', '16': '14,19', '48': '15,11', '47': '15,12', '18': '15,17', '17': '15,18', '50': '17,11', '51': '17,12', '8': '17,17', '9': '17,18', '49': '18,10', '52': '18,13', '7': '18,16', '10': '18,19', '54': '19,11', '53': '19,12', '12': '19,17', '11': '19,18', '56': '21,11', '57': '21,12', '2': '21,17', '3': '21,18', '55': '22,10', '58': '22,13', '1': '22,16', '4': '22,19', '60': '23,11', '59': '23,12', '6': '23,17', '5': '23,18'}
# 三樓
R3_SEATTABLE={'1': '5,8', '2': '5,9', '5': '5,11', '6': '5,12', '9': '5,14', '10': '5,15', '3': '7,8', '4': '7,9', '7': '7,11', '8': '7,12', '11': '7,14', '12': '7,15', '13': '9,8', '14': '9,9', '17': '9,11', '18': '9,12', '21': '9,14', '22': '9,15', '15': '11,8', '16': '11,9', '19': '11,11', '20': '11,12', '23': '11,14', '24': '11,15', '25': '13,8', '26': '13,9', '29': '13,11', '30': '13,12', '33': '13,14', '34': '13,15', '27': '15,8', '28': '15,9', '31': '15,11', '32': '15,12', '35': '15,14', '36': '15,15', '37': '17,8', '38': '17,9', '41': '17,11', '42': '17,12', '45': '17,14', '46': '17,15', '39': '19,8', '40': '19,9', '43': '19,11', '44': '19,12', '47': '19,14', '48': '19,15', '49': '21,8', '50': '21,9', '53': '21,11', '54': '21,12', '57': '21,14', '58': '21,15', '51': '23,8', '52': '23,9', '55': '23,11', '56': '23,12', '59': '23,14', '60': '23,15', '61': '25,8', '62': '25,9', '65': '25,11', '66': '25,12', '69': '25,14', '70': '25,15', '63': '27,8', '64': '27,9', '67': '27,11', '68': '27,12', '71': '27,14', '72': '27,15', '73': '29,8', '74': '29,9', '77': '29,11', '78': '29,12', '81': '29,14', '82': '29,15', '75': '31,8', '76': '31,9', '79': '31,11', '80': '31,12', '83': '31,14', '84': '31,15', '85': '33,8', '86': '33,9', '89': '33,11', '90': '33,12', '93': '33,14', '94': '33,15', '87': '35,8', '88': '35,9', '91': '35,11', '92': '35,12', '95': '35,14', '96': '35,15', '97': '37,8', '98': '37,9', '101': '37,11', '102': '37,12', '105': '37,14', '106': '37,15', '99': '39,8', '100': '39,9', '103': '39,11', '104': '39,12', '107': '39,14', '108': '39,15', '109': '41,8', '110': '41,9', '113': '41,11', '114': '41,12', '117': '41,14', '118': '41,15', '111': '43,8', '112': '43,9', '115': '43,11', '116': '43,12', '119': '43,14', '120': '43,15'}
# 四樓
R4_SEATTABLE={'1': '5,8', '2': '5,9', '5': '5,12', '6': '5,13', '9': '5,16', '10': '5,17', '3': '7,8', '4': '7,9', '7': '7,12', '8': '7,13', '11': '7,16', '12': '7,17', '13': '9,8', '14': '9,9', '17': '9,12', '18': '9,13', '21': '9,16', '22': '9,17', '15': '11,8', '16': '11,9', '19': '11,12', '20': '11,13', '23': '11,16', '24': '11,17', '25': '13,8', '26': '13,9', '29': '13,12', '30': '13,13', '33': '13,16', '34': '13,17', '27': '15,8', '28': '15,9', '31': '15,12', '32': '15,13', '35': '15,16', '36': '15,17', '37': '17,8', '38': '17,9', '41': '17,12', '42': '17,13', '45': '17,16', '46': '17,17', '39': '19,8', '40': '19,9', '43': '19,12', '44': '19,13', '47': '19,16', '48': '19,17', '49': '21,8', '50': '21,9', '53': '21,12', '54': '21,13', '57': '21,16', '58': '21,17', '51': '23,8', '52': '23,9', '55': '23,12', '56': '23,13', '59': '23,16', '60': '23,17', '61': '25,8', '62': '25,9', '65': '25,12', '66': '25,13', '69': '25,16', '70': '25,17', '63': '27,8', '64': '27,9', '67': '27,12', '68': '27,13', '71': '27,16', '72': '27,17', '73': '29,8', '74': '29,9', '77': '29,12', '78': '29,13', '81': '29,16', '82': '29,17', '75': '31,8', '76': '31,9', '79': '31,12', '80': '31,13', '83': '31,16', '84': '31,17', '85': '33,8', '86': '33,9', '89': '33,12', '90': '33,13', '93': '33,16', '94': '33,17', '87': '35,8', '88': '35,9', '91': '35,12', '92': '35,13', '95': '35,16', '96': '35,17', '97': '37,8', '98': '37,9', '101': '37,12', '102': '37,13', '105': '37,16', '106': '37,17', '99': '39,8', '100': '39,9', '103': '39,12', '104': '39,13', '107': '39,16', '108': '39,17', '109': '41,8', '110': '41,9', '113': '41,12', '114': '41,13', '117': '41,16', '118': '41,17', '111': '43,8', '112': '43,9', '115': '43,12', '116': '43,13', '119': '43,16', '120': '43,17'}
# 五樓
R5_SEATTABLE={'1': '5,7', '2': '5,8', '5': '5,9', '6': '5,10', '9': '5,12', '10': '5,13', '3': '7,7', '4': '7,8', '7': '7,9', '8': '7,10', '11': '7,12', '12': '7,13', '13': '9,7', '14': '9,8', '17': '9,9', '18': '9,10', '21': '9,12', '22': '9,13', '15': '11,7', '16': '11,8', '19': '11,9', '20': '11,10', '23': '11,12', '24': '11,13', '25': '13,7', '26': '13,8', '29': '13,9', '30': '13,10', '33': '13,12', '34': '13,13', '27': '15,7', '28': '15,8', '31': '15,9', '32': '15,10', '35': '15,12', '36': '15,13', '37': '17,7', '38': '17,8', '41': '17,9', '42': '17,10', '45': '17,12', '46': '17,13', '39': '19,7', '40': '19,8', '43': '19,9', '44': '19,10', '47': '19,12', '48': '19,13', '49': '21,7', '50': '21,8', '53': '21,9', '54': '21,10', '57': '21,12', '58': '21,13', '51': '23,7', '52': '23,8', '55': '23,9', '56': '23,10', '59': '23,12', '60': '23,13', '61': '25,7', '62': '25,8', '65': '25,9', '66': '25,10', '69': '25,12', '70': '25,13', '63': '27,7', '64': '27,8', '67': '27,9', '68': '27,10', '71': '27,12', '72': '27,13', '73': '29,7', '74': '29,8', '77': '29,9', '78': '29,10', '81': '29,12', '82': '29,13', '75': '31,7', '76': '31,8', '79': '31,9', '80': '31,10', '83': '31,12', '84': '31,13', '85': '33,7', '86': '33,8', '89': '33,9', '90': '33,10', '93': '33,12', '94': '33,13', '87': '35,7', '88': '35,8', '91': '35,9', '92': '35,10', '95': '35,12', '96': '35,13', '97': '37,7', '98': '37,8', '101': '37,9', '102': '37,10', '105': '37,12', '106': '37,13', '99': '39,7', '100': '39,8', '103': '39,9', '104': '39,10', '107': '39,12', '108': '39,13', '109': '41,7', '110': '41,8', '113': '41,9', '114': '41,10', '117': '41,12', '118': '41,13', '111': '43,7', '112': '43,8', '115': '43,9', '116': '43,10', '119': '43,12', '120': '43,13'}
# 六樓
R6_SEATTABLE={'1': '5,7', '2': '5,8', '5': '5,9', '6': '5,10', '9': '5,12', '10': '5,13', '3': '7,7', '4': '7,8', '7': '7,9', '8': '7,10', '11': '7,12', '12': '7,13', '13': '9,7', '14': '9,8', '17': '9,9', '18': '9,10', '21': '9,12', '22': '9,13', '15': '11,7', '16': '11,8', '19': '11,9', '20': '11,10', '23': '11,12', '24': '11,13', '25': '13,7', '26': '13,8', '29': '13,9', '30': '13,10', '33': '13,12', '34': '13,13', '27': '15,7', '28': '15,8', '31': '15,9', '32': '15,10', '35': '15,12', '36': '15,13', '37': '17,7', '38': '17,8', '41': '17,9', '42': '17,10', '45': '17,12', '46': '17,13', '39': '19,7', '40': '19,8', '43': '19,9', '44': '19,10', '47': '19,12', '48': '19,13', '49': '21,7', '50': '21,8', '53': '21,9', '54': '21,10', '57': '21,12', '58': '21,13', '51': '23,7', '52': '23,8', '55': '23,9', '56': '23,10', '59': '23,12', '60': '23,13', '61': '25,7', '62': '25,8', '65': '25,9', '66': '25,10', '69': '25,12', '70': '25,13', '63': '27,7', '64': '27,8', '67': '27,9', '68': '27,10', '71': '27,12', '72': '27,13', '73': '29,7', '74': '29,8', '77': '29,9', '78': '29,10', '81': '29,12', '82': '29,13', '75': '31,7', '76': '31,8', '79': '31,9', '80': '31,10', '83': '31,12', '84': '31,13', '85': '33,7', '86': '33,8', '89': '33,9', '90': '33,10', '93': '33,12', '94': '33,13', '87': '35,7', '88': '35,8', '91': '35,9', '92': '35,10', '95': '35,12', '96': '35,13', '97': '37,7', '98': '37,8', '101': '37,9', '102': '37,10', '105': '37,12', '106': '37,13', '99': '39,7', '100': '39,8', '103': '39,9', '104': '39,10', '107': '39,12', '108': '39,13', '109': '41,7', '110': '41,8', '113': '41,9', '114': '41,10', '117': '41,12', '118': '41,13', '111': '43,7', '112': '43,8', '115': '43,9', '116': '43,10', '119': '43,12', '120': '43,13'}
- 還是希望這些軟體能夠設計得更好吧,比較算是商用的了,輕鬆的獲取了原始碼,還有很多資訊未加密,太自信了。
- 最後也希望好好利用圖書館資源吧,不要佔著那啥不那啥。
此致,敬禮