對動態網頁進行逆向工程
阿新 • • 發佈:2019-01-05
通過搜尋字母表的每個字母,然後遍歷json響應的結果頁面,來抓取所有國家資訊。其產生結果將會儲存在表格當中。
import string from downloader import Downloader import json D = Downloader() template_url = 'http://example.webscraping.com/ajax/search.json?page={}&page_size=10&search_term={}' countries = set() for letter in string.lowercase: page = 0while True: html = D(template_url.format(page, letter)) try: ajax = json.loads(html)#將json格式的資料解析成一個字典 except ValueError as e: print e ajax = None else: for record in ajax['records']: countries.add(record['country']) page += 1 if ajax is None or page >= ajax['num_pages']: break open('countries.txt', 'w').write('\n'.join(sorted(countries)))