urllib庫中的urllib.parsel解析模組使用

阿新 • • 發佈：2018-12-29

urlib.parse模組，主要是對url資料進行解析，分解，組合等操作。目前urllib.parse模組下主要有urllib.parse.urlpase，urllib.parse.urlunparse,urlliib.parse.urljoin和urlencode常用幾個方法。

1.urlparse()的使用

urlparse模組主要是把url拆分為6部分，並返回元組。urlparse將url分為6個部分，返回一個包含6個字串專案的元組：協議、位置、路徑、引數、查詢、片段。解析url的urlparpse函式使用，引數格式如下：

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

1.1.urlparse()只有一個引數urlstring的使用

from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=5#comment')
print(type(result), result)

'''結果如下：
<class 'urllib.parse.ParseResult'> ParseResult(scheme='http', netloc='www.baidu.com', 
path='/index.html', params='user', query='id=5', fragment='comment')

'''

如上程式碼輸出結果所示：其中 scheme 是協議,netloc 是域名伺服器，path 相對路徑，params是引數，query是查詢的條件。

1.2.urlparse()，scheme引數的使用，解析協議

from urllib.parse import urlparse

result = urlparse('www.baidu.com/index.html;user?id=5#comment', scheme='https')
print(result)

'''將url按照https的協議進行解析，輸入的url沒有帶協議版本
ParseResult(scheme='https', netloc='', path='www.baidu.com/index.html', params='user', query='id=5', fragment='comment')'''

2.如果輸入的url已經帶協議版本了，按實際的協議解析,如下儘管指定https,實際按http解析
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=5#comment', scheme='https')
print(result)
'''結果如下:
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
'''

1.3.urlparse的allow_fragments引數使用

#演示1:
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=5#comment', allow_fragments=False)
print(result)
'''結果如下
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5#comment', fragment='')
'''

#演示2.
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html#comment', allow_fragments=False)
print(result)
'''結果如下：
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html#comment', params='', query='', fragment='')
'''

2.urlunparse是urlparse功能的相對作用

#1.對網頁解析，使用urlparse

from urllib.parse import urlparse

result = urlparse('https://www.baidu.com/s?wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse')
print(type(result), result)

'''解析結果如下：

<class 'urllib.parse.ParseResult'> ParseResult(scheme='https', netloc='www.baidu.com', path='/s', params='', query='wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse', fragment='')

'''

#2.對上面解析的網頁資料進行urlunparse操作
from urllib.parse import urlunparse

data = ['https', 'www.baidu.com', '/s', '', 'wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse', '']
print(urlunparse(data))

'''urlunparse結果如下：

https://www.baidu.com/s?wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse


'''

3.urljoin對多個url進行合併

合併的原則是以後面的url為準，如果後面的有則留下，如果沒有則從前面的取值補充。

from urllib.parse import urljoin

print(urljoin('http://www.baidu.com', 'FAQ.html'))
print(urljoin('http://www.baidu.com', 'https://cuiqingcai.com/FAQ.html'))
print(urljoin('http://www.baidu.com/about.html', 'https://cuiqingcai.com/FAQ.html'))
print(urljoin('http://www.baidu.com/about.html', 'https://cuiqingcai.com/FAQ.html?question=2'))
print(urljoin('http://www.baidu.com?wd=abc', 'https://cuiqingcai.com/index.php'))
print(urljoin('http://www.baidu.com', '?category=2#comment'))
print(urljoin('www.baidu.com', '?category=2#comment'))
print(urljoin('www.baidu.com#comment', '?category=2'))

'''結果如下：
http://www.baidu.com/FAQ.html
https://cuiqingcai.com/FAQ.html
https://cuiqingcai.com/FAQ.html
https://cuiqingcai.com/FAQ.html?question=2
https://cuiqingcai.com/index.php
http://www.baidu.com?category=2#comment
www.baidu.com?category=2#comment
www.baidu.com?category=2
'''

4.urlencode把字典物件轉換成get請求引數

from urllib.parse import urlencode

params = {
    'name': 'germey',
    'age': 22
}
base_url = 'http://www.baidu.com?'
url = base_url + urlencode(params)
print(url)

'''測試結果如下：
http://www.baidu.com?name=germey&age=22
'''

urllib庫中的urllib.parsel解析模組使用

urlib.parse模組，主要是對url資料進行解析，分解，組合等操作。目前urllib.parse模組下主要有urllib.parse.urlpase，urllib.parse.urlunparse,urlliib.parse.urljoin和url

爬蟲入門 -> urllib庫中request模組的基本使用（筆記二）

urllib庫中request模組的基本使用 request模組是urllib中最基本的HTTP請求模組，可以用來模擬傳送請求。 urlopen方法基本使用要使用request模組來模擬傳送請求，最基本方法就是urlopen方法，其主要引數就是一個

urllib庫中cookie的使用

#----------------------------如何獲取cookie資訊-------------------------------------------- import http.cookiejar,urllib.request # #第一步宣告一個Cookiejar物件 # co

Python2和Python3中urllib庫中urlencode的使用注意事項

前言在Python中，我們通常使用urllib中的urlencode方法將字典編碼，用於提交資料給url等操作，但是在Python2和Python3中urllib模組中所提供的urlencode

[模組記錄] Matplotlib庫中一些常用的模組--pyplot篇

Matplotlib庫是一個面向物件的繪相簿。繪圖介面由pyplot模組提供。匯入模組 import matplotlib.pyplot as plt 1. plot 函式 : plt.plot( x , y ，"str" ) 繪製由x,y 構成的折線圖，引數x，y

Python常用庫urllib中urllib.request模組使用詳解

1.urllib2和urllib庫的區別 Urllib庫是Python中的一個功能強大、用於操作URL，並在做爬蟲的時候經常要用到的庫。在Python2.x中，分為Urllib庫和Urllin2庫，P

Python2/3中的urllib庫

latest val geturl log center 出現 httplib 捕獲 chrome 摘要：介紹urllib庫在不同版本的Python中的變動，並以Python3.X講解urllib庫的相關用法。 urllib庫對照速查表 Python2.X

Urllib庫的詳解（urlopen，response，request，Headler，異常處理，URL解析）

簡介 Urllib是Python內建的HTTP請求庫。它包含四個模組： urllib.request ：請求模組 urllib.error ：異常處理模組 urllib.parse url ：解析模組 urllib.robotparser ：robots.

Python2中urllib、urllib2在Python3中urllib庫匯入對應關係

◆在Python2.X中使用import urllib2——對應的，在Python3.X中會便用import urllib.request, urllib.error ◆在Python2.X中使用import urllib——對應的，在Python3.X中會使用import urllib.r

Python3中urllib模組的使用

轉載自：https://www.cnblogs.com/php-linux/p/8365941.html 1.基本方法 urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, ca

Python3中urllib庫的使用

urlopen方法 urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) 這是urllib.r

urllib庫:解析連結

1from urllib.parse import urlparse, urlunparse, urlsplit, urlunsplit, urljoin, urlencode, parse_qs, par

urllib庫:解析鏈接

letter esc cin adding code per urllib lan color 1from urllib.parse import urlparse, urlunparse, urlsplit, urlunsplit, urljoin, urlencode,

爬蟲urllib庫parse模組API詳解二

一 urlunparse() 1 程式碼 #它接受的引數是一個可迭代物件，但是它的長度必須是6，否則會丟擲引數數量不足或者過多的問題。 from urllib.parse import urlunparse data = ['http', 'www.baidu.com', '

爬蟲urllib庫parse模組的urlparse詳解

一點睛 urllib庫裡還提供了parse這個模組，它定義了處理URL的標準介面，例如實現URL各部分的抽取、合併以及連結轉換。它支援如下協議的URL處理：file、ftp、gopher、hdl、http、https、imap、mailto、 mms、news、nntp、p

Urllib庫在python2.x與3.x中的區別和聯絡

urllib庫是python提供的一個用於操作URL的模組，在python2中有urllib和urllib2，在python3中urllib2合併到urllib中，區別和聯絡如下： 1）在python2中使用的import urllib2——對應的，在python3中使用

python urllib 庫

由於 con items name html png aid post work urllib模塊中的方法 1.urllib.urlopen(url[,data[,proxies]]) 打開一個url的方法，返回一個文件對象，然後可以進行類似文件對象的操作。本例試著打開go

Python urllib的urlretrieve()函數解析 (顯示下載進度)

blog 服務 local nes header ade ref col function 1 #!/usr/bin/python 2 #encoding:utf-8 3 import urllib 4 import os 5 def Schedule(a,b,

Python中urllib.urlencode中文字符的一個問題

如果 ice pytho div win enc nbsp window gbk Django項目在訪問Restful service時調用urllib.urlencode編碼中文字符串時碰到下面這個錯誤、 v = quote_plus(str(v)) U

第三百三十節，web爬蟲講解2—urllib庫爬蟲—實戰爬取搜狗微信公眾號

文章 odin data 模塊 webapi 頭信息 hone 微信 android 第三百三十節，web爬蟲講解2—urllib庫爬蟲—實戰爬取搜狗微信公眾號封裝模塊 #!/usr/bin/env python # -*- coding: utf-8 -*- impo

urllib庫中的urllib.parsel解析模組使用

urlib.parse模組，主要是對url資料進行解析，分解，組合等操作。目前urllib.parse模組下主要有urllib.parse.urlpase，urllib.parse.urlunparse,urlliib.parse.urljoin和urlencode常用幾個方法。

1.urlparse()的使用

2.urlunparse是urlparse功能的相對作用

3.urljoin對多個url進行合併

4.urlencode把字典物件轉換成get請求引數

相關推薦