PyQuery庫詳解
阿新 • • 發佈:2018-06-16
則表達式 HR ID div TE DG -c contain filename
強大又靈活的網頁解析庫,如果覺得正則表達式寫起來太麻煩,而BeautifulSoup語法太難記,但是熟悉jQuery的語法,那麽PyQuery就是一個絕佳選擇。
安裝:pip3 install pyquery
初始化
字符串初始化
from pyquery import PyQuery as pq html = ‘‘‘ <div> <url> <li class=‘item-0‘>first item</li> <li class=‘item-1‘><a href=‘link3.html‘><span class=‘bold‘>third item</span></a></li> </url> </div> ‘‘‘ doc = pq(html) print(doc(‘li‘))
#這裏的選擇與css選擇器一樣,選class加點,選id加#,選標簽什麽都不加 輸出結果為: <li class="item-0">first item</li> <li class="item-1"><a href="link3.html"><span class="bold">third item</span></a></li>
URL初始化
from pyquery import PyQuery as pq doc = pq(url=‘http://www.baidu.com‘) print(doc(‘head‘)) 輸出結果為: <head><meta http-equiv="content-type" content="text/html;charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta content="always" name="referrer"/><link rel="stylesheet" type="text/css" href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css"/><title>????o|??????????? ?°±??¥é??</title></head>
這種是傳入一個url,會自動請求這個url,把源代碼給pq,生成一個pq對象
文件初始化
from pyquery import PyQuery as pq doc = pq(filename=‘1.html‘) print(doc(‘url‘)) 輸出結果為: <url> <li class="item-0">first item</li> <li class="item-1"><a href="link3.html"><span class="bold">third item</span></a></li> </url> ------------------------ 1.html內容: <div> <url> <li class=‘item-0‘>first item</li> <li class=‘item-1‘><a href=‘link3.html‘><span class=‘bold‘>third item</span></a></li> </url> </div>
基本css選擇器:
from pyquery import PyQuery as pq html = ‘‘‘ <div id=‘container‘> <ul class=‘list‘> <li class=‘item-0‘>first item</li> <li class=‘item-1‘><a href=‘link2.html‘>second item</a></li> <li class=‘item-0 active‘><a href=‘link3.html‘><span class=‘bold‘>third item</span></a></li> <li class=‘item-1 active‘><a href=‘link4.html‘>fourth item</a></li> <li class=‘item-0‘><a href=‘link5.html‘>fifth item</a></li> </url> </div> ‘‘‘ doc = pq(html) print(doc(‘#container .list li‘)) 輸出結果為: <li class="item-0">first item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li>
css選擇器,id前面加#號,class前面加點,標簽前面什麽都不加
查找元素
查找子元素
查找父元素
PyQuery庫詳解