Python 3 中的 urllib 例項

阿新 • • 發佈：2019-02-03

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
 
import time
import sys
import gzip
import socket
import urllib.request, urllib.parse, urllib.error
import http.cookiejar
 
class HttpTester:
    def __init__(self, timeout=10, addHeaders=True):
        socket.setdefaulttimeout(timeout)   # 設定超時時間
 
        self.__opener = urllib.request.build_opener()
        urllib.request.install_opener(self.__opener)
 
        if addHeaders: self.__addHeaders()
 
    def __error(self, e):
        '''錯誤處理'''
        print(e)
 
    def __addHeaders(self):
        '''新增預設的 headers.'''
        self.__opener.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'),
                                    ('Connection', 'keep-alive'),
                                    ('Cache-Control', 'no-cache'),
                                    ('Accept-Language:', 'zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3'),
                                    ('Accept-Encoding', 'gzip, deflate'),
                                    ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]
 
    def __decode(self, webPage, charset):
        '''gzip解壓，並根據指定的編碼解碼網頁'''
        if webPage.startswith(b'\x1f\x8b'):
            return gzip.decompress(webPage).decode(charset)
        else:
            return webPage.decode(charset)
 
    def addCookiejar(self):
        '''為 self.__opener 新增 cookiejar handler。'''
        cj = http.cookiejar.CookieJar()
        self.__opener.add_handler(urllib.request.HTTPCookieProcessor(cj))
 
    def addProxy(self, host, type='http'):
        '''設定代理'''
        proxy = urllib.request.ProxyHandler({type: host})
        self.__opener.add_handler(proxy)
 
    def addAuth(self, url, user, pwd):
        '''新增認證'''
        pwdMsg = urllib.request.HTTPPasswordMgrWithDefaultRealm()
        pwdMsg.add_password(None, url, user, pwd)
        auth = urllib.request.HTTPBasicAuthHandler(pwdMsg)
        self.__opener.add_handler(auth)
 
    def get(self, url, params={}, headers={}, charset='UTF-8'):
        '''HTTP GET 方法'''
        if params: url += '?' + urllib.parse.urlencode(params)
        request = urllib.request.Request(url)
        for k,v in headers.items(): request.add_header(k, v)    # 為特定的 request 新增指定的 headers
 
        try:
            response = urllib.request.urlopen(request)
        except urllib.error.HTTPError as e:
            self.__error(e)
        else:
            return self.__decode(response.read(), charset)
 
    def post(self, url, params={}, headers={}, charset='UTF-8'):
        '''HTTP POST 方法'''
        params = urllib.parse.urlencode(params)
        request = urllib.request.Request(url, data=params.encode(charset))  # 帶 data 引數的 request 被認為是 POST 方法。
        for k,v in headers.items(): request.add_header(k, v)
 
        try:
            response = urllib.request.urlopen(request)
        except urllib.error.HTTPError as e:
            self.__error(e)
        else:
            return self.__decode(response.read(), charset)
 
    def download(self, url, savefile):
        '''下載檔案或網頁'''
        header_gzip = None
 
        for header in self.__opener.addheaders:     # 移除支援 gzip 壓縮的 header
            if 'Accept-Encoding' in header:
                header_gzip = header
                self.__opener.addheaders.remove(header)
 
        __perLen = 0
        def reporthook(a, b, c):    # a:已經下載的資料大小; b:資料大小; c:遠端檔案大小;
            if c > 1000000:
                nonlocal __perLen
                per = (100.0 * a * b) / c
                if per>100: per=100
                per = '{:.2f}%'.format(per)
                print('\b'*__perLen, per, end='')     # 列印下載進度百分比
                sys.stdout.flush()
                __perLen = len(per)+1
 
        print('--> {}\t'.format(url), end='')
        try:
            urllib.request.urlretrieve(url, savefile, reporthook)   # reporthook 為回撥鉤子函式，用於顯示下載進度
        except urllib.error.HTTPError as e:
            self.__error(e)
        finally:
            self.__opener.addheaders.append(header_gzip)
            print()

Python 3 中的 urllib 例項

#!/usr/bin/env python3 # -*- coding: utf-8 -*- import time import sys import gzip import socket import urllib.request, urllib.parse, urllib.error import

Python自學之樂-python 2、python 3中經典類、新式類的深度和廣度優先小結

python2 __init__ 寫上 print class ast python init 廣度優先 #Author:clarkclass Original(object):#在python 3 中寫上object的新式類和不寫的經典類遵循的都是廣度優先原則 de

Python 3中的str和bytes類型

等等 nco sci 空間英語單詞轉換代碼 strong odi Python3 中的str和bytes類型 Python3最重要的新特性之一是：對字符串和二進制數據流做了明確的區分。文本總是Unicode，由str類型表示，二進制數據則由bytes類型表示。Pyth

Python 3中bytes/string的區別

如何 strings 插入圖片 enc 混合技術分享 cnblogs 劃分比特原文： https://www.cnblogs.com/abclife/p/7445222.html python 3中最重要的新特性可能就是將文本(text)和二進制數據做了更清晰的區分。

python 3中\w+匹配漢字的問題。

今天記錄一下有關python 3中正則表示式的一個小問題。我們知道正常情況下，\w+匹配字母數字及下劃線，相當於[A-Za-z0-9_]。在python 3中我們試下\w+的匹配字串的時候,會發現匹配會匹配到中文漢字。如下圖所示：這是

python 3中的subprocess

commands好像python3.6被去除了，它現在被subprocess替代了 FUNCTIONS getoutput(cmd) Return output (stdout or stderr) of executing cmd in a shell. getst

Python 3中的yield from語法詳解

前言最近在搗鼓Autobahn，它有給出個例子是基於asyncio 的，想著說放到pypy3上跑跑看竟然就……失敗了。 pip install asyncio直接報invalid syntax，粗看還以為2to3處理的時候有問題——這不能怪我，好～多package都是用

Python 3中實現cmp()函式的功能

本文由荒原之夢原創，原文連結：http://zhaokaifeng.com/?p=1088 cmp() 函式是Python 2中的一個用於比較兩個列表, 數字或字串等的大小關係的函式, 在Python 3中已經無法使用這個函數了: >>> a = [1, 2,

python 3 中的輸入和輸出問題

一、普遍的輸入和輸出 1.輸入在python3中，函式的輸入格式為：input()，能夠接受一個標準輸入資料，返回string型別。 input() 函式是從鍵盤作為字串讀取資料，不論是否使用引號(”或“”)。 Name=input("請輸入你的名字：") prin

Python 3 中字符串和 bytes 的區別

binary 代碼 tin warning red all 進制存儲 ipython 來自猿人學Python教程的學習筆記。 https://www.yuanrenxue.com/ 在Python中字符串和unicode真是傻傻分不清楚，在沒搞懂兩個區別時，你會發現程序報

Python 3 中字串和 bytes 的區別

來自猿人學Python教程的學習筆記。 https://www.yuanrenxue.com/ 在Python中字串和unicode真是傻傻分不清楚，在沒搞懂兩個區別時，你會發現程式報的錯怎麼改都是再報錯，讓你煩躁。學習Python的時候，又重溫了這部分內容，寫了這個學習筆記。 &nbs

python 3.6 urllib 沒有 urlencode 屬性

執行程式碼import urllib url = 'http://www.xxxxxx.com/login' postdata = {'username': 'qiye', 'passsword': 'qiye_pass'} data = urllib.urlencode(p

python 3 中的time模組使用（待完善）

python中和時間處理有關的模組包括time，datetime，calender等模組： python通常用這幾種方式來表示時間：1）時間戳，2）格式化的字串，3）元祖struct_time中含有九個元素，分別表示的就是年月日等資訊 time模組中包含的某些函式： tim

Python2.7 以及 Python 3.5的例項方法,類方法,靜態方法之間的區別及呼叫關係

　　今天很好奇Python2.7 和Python 3.5 的例項方法、類方法、靜態方法之間的區別與聯絡。所以就做了兩個小實驗來測驗一下 Python3.5及以上類的定義 class Test(): def instance

在Python 3中設定Jupyter Notebook

介紹Jupyter Notebook提供了一個用於作為Web應用程式的互動式計算的命令shell。該工具可以與多種語言一起使用，包括Python，Julia，R，Haskell和Ruby。它通常用於處理資料，統計建模和機器學習。本教程將引導您設定Jupyter Notebo

Python 3 中檢測QQ線上的指令碼

近期在學習python ，在論壇上看到一些qq 狀態的測試，但都是python 2.7的版本，為此在python3 中實現一次，以便日後複習使用 import time,datetime from urllib.request import urlopen def chk

pickle 在python 2和python 3中相容性問題

以下的演示均在python 3環境下進行。我們以一個簡單的檔案開啟和檔案讀為例說明，pickle讀檔案時可能存在的編碼問題： import gzip import pickle # 使用with結構避免手動的檔案關閉操作 with gzip.

Python 3.x中使用urllib出現AttributeError: module 'urllib' has no attribute 'request'錯誤

剛剛開始學習爬蟲，開始寫部落格打算把錯誤記錄下來，已杜自己忘記，並給同樣的小白幫助 python 3.x中urllib庫和urilib2庫合併成了urllib庫，python3.X中應該使用urllib.request，即替換掉（python中的）urllib2成urllib.request

Python 3.x中使用urllib出現AttributeError: module 'urllib' has no attribute 'request'錯誤

剛剛開始學習爬蟲，開始寫部落格打算把錯誤記錄下來，已杜自己忘記，並給同樣的小白幫助python 3.x中urllib庫和urilib2庫合併成了urllib庫，python3.X中應該使用urllib.request，即替換掉（python中的）urllib2成urllib.

Python中urllib.urlencode中文字符的一個問題

如果 ice pytho div win enc nbsp window gbk Django項目在訪問Restful service時調用urllib.urlencode編碼中文字符串時碰到下面這個錯誤、 v = quote_plus(str(v)) U

Python 3 中的 urllib 例項

相關推薦