爬取孔夫子舊書網的評論
阿新 • • 發佈:2018-11-25
這次帶來的是爬取孔夫子舊書網的書店評論,並寫入txt文件和資料庫
這個用到了json模組
json模組是格式轉換模組,json是為了將爬取下來的評論網頁解析成Python方便處理的字典格式。json有.load()和.loads(),dump()和dumps()這幾種方法。
- loads()是將json格式轉換成Python方便處理的字典格式。
- dumps()方法是將Python的字典格式轉換成json格式。
完整程式碼展示:
#-*- coding:utf-8 -*- from bs4 import BeautifulSoup import requests import json import MySQLdb #解決出現的寫入錯誤 import sys reload(sys) sys.setdefaultencoding('utf-8') print('連線到mysql伺服器...') conn = MySQLdb.connect(host='127.0.0.1', user='root', passwd='123mysql', db='onefive',charset='utf8') print('連線上了!') cur = conn.cursor() sql = """CREATE TABLE comment( store CHAR(10), book CHAR(60), comment CHAR(100), time CHAR(20), reviewer CHAR(20))""" cur.execute(sql) conn.commit() url1='http://book.kongfz.com/256332/935885648/' user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134' headers = {'User-Agent': user_agent} html1 = requests.get(url1,headers=headers).content soup = BeautifulSoup(html1,'html.parser') article = soup.find('div',attrs={'class':'main-box'}) #書店名 store = article.find('div',attrs={'class':'shop_top_title'}).get_text() #print title f=open('comment2.txt','a') f.write(store) f.close() cur.execute("INSERT INTO comment(store) VALUES ('%s');" % (store)) conn.commit() #評論的相關資訊 for page in range(1,3): url2 = 'http://book.kongfz.com/Pc/Ajax/getShopReviewList?userId=1710684&itemId=&page=' + str(page) + '&needEmpty=0&rating=all' html2 = requests.get(url2, headers=headers).content b = json.loads(html2) dic = b['result'] for k in dic['reviewList']: book = k['itemName'].encode('UTF-8') print book comment = k['content'].encode('UTF-8') print comment time = k['reviewTime'].encode('UTF-8') print time reviewer = k['appraiserNickname'].encode('UTF-8') print reviewer end = book + '\n' + comment + '\n' + time + '\n' + reviewer + '\n' f=open('comment2.txt','a') f.write(end) f.close() cur.execute("INSERT INTO comment(book,comment,time,reviewer) VALUES ('%s', '%s', '%s','%s');" % (book,comment,time,reviewer)) conn.commit() conn.close()