Python中類似於jquery的pyquery庫用法分析

阿新 • • 發佈：2020-01-09

本文例項講述了Python中類似於jquery的pyquery庫用法。分享給大家供大家參考，具體如下：

pyquery：一個類似於jquery的Python庫

pyquery可以使你在xml文件上做jquery查詢，它的API儘可能地類似於jquery。pyquery使用lxml執行快速的xml和html操作。

這並非（至少目前還不是）一個生成javascript程式碼或者與javascript程式碼做互動的庫。pyquery的作者只是由於非常喜歡jquery的API因而將其用python實現。

該專案目前託管在Github倉庫中並且處於活躍開發狀態。作者可以為任何想要貢獻原始碼的開發者賦予push許可權，並且會對其做的變更做回顧。如果你想要貢獻原始碼，可以發Email給專案作者。

專案的Bug可以通過Github Issue Tracker進行提交。

快速入門

你可以使用PyQuery類從一個字串，一個lxml文件，一個檔案或者一個url鍾載入一個xml文件：

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url=your_url)
>>> d = pq(url=your_url,...    opener=lambda url,**kw: urlopen(url).read())
>>> d = pq(filename=path_to_html_file)

現在，d就相當於jquery裡的$：

>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> print(p.html())
Hello world !
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> print(p.html())
you know <a href="http://python.org/" rel="external nofollow" >Python</a> rocks
>>> print(p.text())
you know Python rocks

你也可以使用某些jQuery中可用而並非css標準的偽類，諸如 :first :last :even :odd :eq :lt :gt :checked :selected :file:等

>>> d('p:first')
[<p#hello.hello>]

參見http://pyquery.rtfd.org/檢視全部文件

CSS

你可以像這樣新增、切換、移除CSS：

>>> p.addClass("toto")
[<p#hello.hello.toto>]
>>> p.toggleClass("titi toto")
[<p#hello.hello.titi>]
>>> p.removeClass("titi")
[<p#hello.hello>]

或者操作CSS樣式：

>>> p.css("font-size","15px")
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 15px'
>>> p.css({"font-size": "17px"})
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 17px'

使用更加Pythonic的方式完成同樣的功能 (‘_' 字元轉換為 ‘-‘)：

>>> p.css.font_size = "16px"
>>> p.attr.style
'font-size: 16px'
>>> p.css['font-size'] = "15px"
>>> p.attr.style
'font-size: 15px'
>>> p.css(font_size="16px")
[<p#hello.hello>]
>>> p.attr.style
'font-size: 16px'
>>> p.css = {"font-size": "17px"}
>>> p.attr.style
'font-size: 17px'

使用偽類：

:button

匹配所有按鈕輸入元素和按鈕元素 Matches all button input elements and the button element

:checkbox

匹配所有複選框輸入元素 Matches all checkbox input elements

:checked

匹配選中的元素，下標從0開始 Matches odd elements,zero-indexed

:child

右邊是左邊的直接子元素 right is an immediate child of left

:contains()

包含元素 Matches all elements that contain the given text

:descendant

右邊是左邊的子元素、孫元素或者更遠的後繼元素 right is a child,grand-child or further descendant of left

:disabled

匹配所有被禁用的元素 Matches all elements that are disabled

:empty

匹配所有不包括任何其他元素的元素 Match all elements that do not contain other elements

:enabled

匹配所有啟用的元素 Matches all elements that are enabled

:eq()

使用下標匹配 Matches a single element by its index

:even

從下標0開始，匹配所有偶數元素 Matches even elements,zero-indexed

:file

匹配所有檔案型別的輸入元素 Matches all input elements of type file

:first

匹配第一個被選擇的元素 Matches the first selected element

:gt()

匹配下標大於指定值的元素 Matches all elements with an index over the given one

:header

匹配所有標題元素 Matches all header elelements (h1,...,h6)

:image

匹配所有影象輸入元素 Matches all image input elements

:input

匹配所有輸入元素 Matches all input elements

:last

匹配最後一個選擇的元素 Matches the last selected element

:lt()

匹配所有下標小於指定值的元素 Matches all elements with an index below the given one

:odd

匹配奇元素，下標從0開始 Matches odd elements,zero-indexed

:parent

匹配所有包含其他元素的元素 Match all elements that contain other elements

:password

匹配所有密碼輸入元素 Matches all password input elements

:radio

匹配單選按鈕輸入元素 Matches all radio input elements

:reset

匹配所有重置輸入元素 Matches all reset input elements

:selected

匹配所有被選中的元素 Matches all elements that are selected

:submit

匹配所有提交輸入元素 Matches all submit input elements

:text¶

匹配所有文字輸入元素 Matches all text input elements

操作

你也可以向標籤的尾部追加元素：

>>> d = pq('<p class="hello" id="hello">you know Python rocks</p>')
>>> d('p').append(' check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ><span>reddit</span></a>')
[<p#hello.hello>]
>>> print(d)
<p class="hello" id="hello">you know Python rocks check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ><span>reddit</span></a></p>

或者加至開頭：

>>> p = d('p')
>>> p.prepend('check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >reddit</a>')
[<p#hello.hello>]
>>> print(p.html())
check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >reddit</a>you know ...

在其他元素之前或者之後追加元素：

>>> d = pq('<html><body><div id="test"><a href="http://python.org" rel="external nofollow" rel="external nofollow" >python</a> !</div></body></html>')
>>> p.prependTo(d('#test'))
[<p#hello.hello>]
>>> print(d('#test').html())
<p class="hello" ...

在其他元素之後插入元素：

>>> p.insertAfter(d('#test'))
[<p#hello.hello>]
>>> print(d('#test').html())
<a href="http://python.org" rel="external nofollow" rel="external nofollow" >python</a> !

或者插入其他元素之前：

>>> p.insertBefore(d('#test'))
[<p#hello.hello>]
>>> print(d('body').html())
<p class="hello" id="hello">...

對每個元素做一些事情：

>>> p.each(lambda i,e: pq(e).addClass('hello2'))
[<p#hello.hello.hello2>]

移除一個元素：

>>> d = pq('<html><body><p id="id">Yeah!</p><p>python rocks !</p></div></html>')
>>> d.remove('p#id')
[<html>]
>>> d('p#id')
[]

移除選中元素的內容：

>>> d('p').empty()
[<p>]

你可以獲得修改後的html內容：

>>> print(d)
<html><body><p/></body></html>

你可以生成html片段：

>>> from pyquery import PyQuery as pq
>>> print(pq('<div>Yeah !</div>').addClass('myclass') + pq('cool'))
<div class="myclass">Yeah !</div>cool

移除所有名稱空間：

>>> d = pq('<foo xmlns="http://example.com/foo"></foo>')
>>> d
[<{http://example.com/foo}foo>]
>>> d.remove_namespaces()
[<foo>]

遍歷

一些jQuery遍歷方法也可以支援。這裡有幾個例子。

你可以使用字串選擇器過濾選擇列表：

>>> d = pq('<p id="hello" class="hello"><a/></p><p id="test"><a/></p>')
>>> d('p').filter('.hello')
[<p#hello.hello>]

可以使用eq選擇器選中單個元素：

>>> d('p').eq(0)
[<p#hello.hello>]

你可以找出巢狀元素：

>>> d('p').find('a')
[<a>,<a>]
>>> d('p').eq(1).find('a')
[<a>]

也支援使用end從一級遍歷中跳出：

>>> d('p').find('a').end()
[<p#hello.hello>,<p#test>]
>>> d('p').eq(0).end()
[<p#hello.hello>,<p#test>]
>>> d('p').filter(lambda i: i == 1).end()
[<p#hello.hello>,<p#test>]

網路 Scraping

pyquery也可以從一個url載入html文件：

>>> pq(your_url)
[<html>]

預設使用的是python的urllib。

如果安裝了requests就使用requests。你可以使用大部分requests的引數。

>>> pq(your_url,headers={'user-agent': 'pyquery'})
[<html>]
>>> pq(your_url,{'q': 'foo'},method='post',verify=True)
[<html>]

pyquery – PyQuery完整API參見：http://pyquery.readthedocs.org/en/latest/api.html

pyquery.ajax – PyQuery AJAX 擴充套件

如果安裝了WebOb（它並不是pyquery的依賴專案），你可以查詢一些wsgi app。在本例中，測試app在/處返回一個簡單的輸入，在/submit處返回一個提交按鈕： IN this example the test app returns a simple input at / and a submit button at /submit:

>>> d = pq('<form></form>',app=input_app)
>>> d.append(d.get('/'))
[<form>]
>>> print(d)
<form><input name="youyou" type="text" value=""/></form>

app在新節點中也可用： The app is also available in new nodes:

>>> d.get('/').app is d.app is d('form').app
True

你也可以請求另外一個路徑：

>>> d.append(d.get('/submit'))
[<form>]
>>> print(d)
<form><input name="youyou" type="text" value=""/><input type="submit" value="OK"/></form>

如果安裝了restkit，你就可以直接從一個HostProxy app獲取url：

>>> a = d.get(your_url)
>>> a
[<html>]

你可以獲取到app的響應：

>>> print(a.response.status)
200 OK

小貼士 Tips

你可以使連結轉化為絕對鏈，在螢幕抓取時還會比較有用： You can make links absolute which can be usefull for screen scrapping:

>>> d = pq(url=your_url,parser='html')
>>> d('form').attr('action')
'/form-submit'
>>> d.make_links_absolute()
[<html>]

使用不同的解析器

預設情況下，pyquery使用lxml xml解析器並且如果它不能工作的話，繼續嘗試lxml.html中的html解析器。xml解析器在解析xhtml頁面時可能出現一些問題，因為解析器不會丟擲一個錯誤，而是給出一個不能用的樹。 The xml parser can sometimes be problematic when parsing xhtml pages because the parser will not raise an error but give an unusable tree (on w3c.org for example).

你也可以顯式地宣告使用哪一個解析器：

>>> pq('<html><body><p>toto</p></body></html>',parser='xml')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>',parser='html')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>',parser='html_fragments')
[<p>]

html和html_fragments解析器都在lxml.html當中。

更多關於Python相關內容感興趣的讀者可檢視本站專題：《Python資料結構與演算法教程》、《Python加密解密演算法與技巧總結》、《Python編碼操作技巧總結》、《Python函式使用技巧總結》、《Python字串操作技巧彙總》及《Python入門與進階經典教程》

希望本文所述對大家Python程式設計有所幫助。

Python中類似於jquery的pyquery庫用法分析

Python中類似於jquery的pyquery庫用法分析

python的json中方法及jsonpath模組用法分析

Python裝飾器原理與基本用法分析

對python中assert、isinstance的用法詳解

python 協程 gevent原理與用法分析

詳解Python 中sys.stdin.readline()的用法

python中open函式的基本用法示例

Python中zip()函式的簡單用法舉例

Python中 Global和Nonlocal的用法詳解

node.js中npm包管理工具用法分析

Python中sys模組功能與用法例項詳解

Python中os模組功能與用法詳解

python中使用you-get庫批量線上下載bilibili視訊的教程

淺談Python中threading join和setDaemon用法及區別說明

python中列表的含義及用法

python中字首運算子 *和 **的用法示例詳解

Python pytesseract驗證碼識別庫用法解析

python中通過pip安裝庫檔案時出現“EnvironmentError: [WinError 5] 拒絕訪問”的問題及解決方案

python中pathlib模組的基本用法與總結

python中的split的部分用法

Python中類似於jquery的pyquery庫用法分析

相關推薦