學習筆記 urllib
阿新 • • 發佈:2018-05-15
代理ip utf-8 imp post 正則 2.3 ext handler data post
第一步:
get
# -*- coding:utf-8 -*- # 日期:2018/5/15 19:39 # Author:小鼠標 from urllib import request url = ‘http://news.sina.com.cn/guide/‘ response = request.urlopen(url) #返回http對象 web_data = response.read().decode(‘utf-8‘) #響應內容 web_status = response.status #響應狀態碼 print(web_status,web_data)
post
# -*- coding:utf-8 -*- # 日期:2018/5/15 19:39 # Author:小鼠標 from urllib import request,parse url = ‘http://news.sina.com.cn/guide/‘ #post表單提交的內容 data = [ (‘name‘,‘xiaoshubiao‘), (‘pwd‘,‘xiaoshubiao‘) ] login_data = parse.urlencode(data).encode(‘utf-8‘) response = request.urlopen(url,data = login_data) #返回http對象 web_data = response.read().decode(‘utf-8‘) #響應內容 web_status = response.status #響應狀態碼 print(web_status,web_data)
第二步:偽裝瀏覽器
# -*- coding:utf-8 -*- # 日期:2018/5/15 19:39 # Author:小鼠標 from urllib import request,parse url = ‘http://news.sina.com.cn/guide/‘ req = request.Request(url) req.add_header(‘User-Agent‘,‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36‘) req.add_header(‘Accept‘,‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8‘) response = request.urlopen(req) web_data = response.read().decode(‘utf-8‘) #響應內容 web_status = response.status #響應狀態碼 print(web_status,web_data)
第三步:使用代理ip
# -*- coding:utf-8 -*- # 日期:2018/5/15 19:39 # Author:小鼠標 from urllib import request,parse url = ‘http://news.sina.com.cn/guide/‘ req = request.Request(url) #使用代理ip proxy = request.ProxyHandler({‘http‘:‘221.207.29.185:80‘}) opener = request.build_opener(proxy, request.HTTPHandler) request.install_opener(opener) req.add_header(‘User-Agent‘,‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36‘) req.add_header(‘Accept‘,‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8‘) response = request.urlopen(req) web_data = response.read().decode(‘utf-8‘) #響應內容 web_status = response.status #響應狀態碼 print(web_status,web_data)
第四步:內容解析
可以使用封裝好的BeautifulSoup,也可以使用re正則來匹配,原理都差不多。
學習筆記 urllib