python爬蟲用到的工具和類庫
阿新 • • 發佈:2019-01-08
需要安裝的工具和庫
開發工具
內建基本庫
urllib re
>>> from urllib.request import urlopen
>>> response = urlopen("http://www.baidu.com")
>>> response
<http.client.HTTPResponse object at 0x1032edb38>
網路請求庫
>>> import requests
>>> response = requests.get("http://www.baidu.com" )
>>> response
<Response [200]>
瀏覽器工具
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get("http://www.baidu.com")
>>> driver.get("https://www.python.org")
>>> html = driver.page_source
>>> from selenium import webdriver
>>> dirver = webdriver.PhantomJS()
>>> dirver.get("http://www.baidu.com")
>>> html = driver.page_source
網頁解析庫
>>> from bs4 import BeautifulSoup as BS
>>> html = "<html><h1></h1></html>"
>>> soup = BS(html, "lxml")
>>> soup.h1
<h1 ></h1>
>>> from pyquery import PyQuery as pq
>>> html = "<html><h1>title</h1></html>"
>>> doc = pq(html)
>>> doc("html").text()
'title'
>>> doc("h1").text()
'title'
資料庫
資料庫包:
pymysql
>>> import pymysql https://pypi.org/project/PyMySQL/
>>> conn = pymysql.connect(host="localhost",
user="root", password="123456",
port=3306, db="demo")
>>> cursor = conn.cursor()
>>> sql = "select * from mytable"
>>> cursor.execute(sql)
3
>>> cursor.fetchone()
(1, datetime.date(2018, 4, 14))
>>> cursor.close()
>>> conn.close()
>>> import pymongo
>>> client = pymongo.MongoClient("localhost")
>>> db = client["newtestdb"]
>>> db["table"].insert({"name": "Tom"})
ObjectId('5adcb250d7696c839a251658')
>>> db["table"].find_one({"name": "Tom"})
{'_id': ObjectId('5adcb250d7696c839a251658'), 'name': 'Tom'}
redis
>>> import redis
>>> r = redis.Redis("localhost", 6379)
>>> r.set("name", "Tom")
True
>>> r.get("name")
b'Tom'
一條命令安裝以上所有庫
pip install requests selenium beautifulsoup4 pyquery pymysql pymongo redis flask django jupyter