Python爬蟲之PyQuery使用
阿新 • • 發佈:2018-12-18
html=''' <html> <body> <ul class="mh-col"> <li class="g-ellipsis1"> <a class="g-a-noline1" data-md='{"b":"list","p":"1-1"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=%E6%AD%8C%E6%89%8B%E9%AB%98%E7%A9%BA%E6%8B%8DMV%E5%9D%A0%E4%BA%A1&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D" target="_blank"> 歌手高空拍MV墜亡 </a> <span class="mh-ico-up"> </span> </li> <li class="g-ellipsis2"> <a class="g-a-noline2" data-md='{"b":"list","p":"1-2"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=%E4%BD%9B%E7%A5%96%E6%9C%B1%E9%BE%99%E5%B9%BF%E9%87%91%E5%A9%9A&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D&fr=hao_360so_history_b" target="_blank"> 佛祖朱龍廣金婚 </a> <span class="mh-ico-down"> </span> </li> <li class="g-ellipsis3"> <a class="g-a-noline3" data-md='{"b":"list","p":"1-3"}' href="http://www.so.com/link?url=http%3A%2F%2Fbaike.so.com%2Fzt%2Fzufangfangpian.html%3Fsrc%3Dreci&q=%E5%91%A8%E6%9D%B0%E4%BC%A6%E7%9A%84%E6%AD%8C&ts=1540365398&t=c6890eee0e669832ca96d5223582d6e" target="_blank"> 常見租房陷阱 </a> <span class="mh-ico-up"> </span> </li> <li class="g-ellipsis4"> <a class="g-a-noline4" data-md='{"b":"list","p":"1-4"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=%E9%9D%B3%E4%B8%9C%E5%9B%9E%E5%BA%94%E5%8F%91%E9%94%99%E8%AF%97%E8%AF%8D&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D" target="_blank"> 靳東迴應發錯詩詞 </a> </li> <li class="g-ellipsis5"> <a class="g-a-noline5" data-md='{"b":"list","p":"1-5"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=%E7%85%A4%E8%80%81%E6%9D%BF%E4%BB%AC%E7%9A%84%E5%BD%B1%E8%A7%86%E6%B1%9F%E6%B9%96&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D" target="_blank"> 煤老闆們的影視江湖 </a> </li> <li class="g-ellipsis6"> <a class="g-a-noline6" data-md='{"b":"list","p":"1-6"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=1024%E7%A8%8B%E5%BA%8F%E5%91%98%E8%8A%82&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D" target="_blank"> 1024程式設計師節 </a> </li> <li class="g-ellipsis7"> <a class="g-a-noline7" data-md='{"b":"list","p":"1-7"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=%E7%BE%8E%E7%9A%84%E5%90%88%E5%B9%B6%E5%B0%8F%E5%A4%A9%E9%B9%85&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D" target="_blank"> 美的合併小天鵝 </a> </li> <li class="g-ellipsis8"> <a class="g-a-noline8" data-md='{"b":"list","p":"1-8"}' href="https://www.so.com/s?ie=utf-8&src=know_side_nlp_sohot&q=%E4%BA%AC%E6%98%86%E9%AB%98%E9%80%9F4%E8%BD%A6%E7%9B%B8%E6%92%9E&ob_ext=%7B%22rsv_cq%22%3A%22%5Cu5468%5Cu6770%5Cu4f26%5Cu7684%5Cu6b4c%22%2C%22rt%22%3A%22%5Cu5b9e%5Cu65f6%5Cu70ed%5Cu641c%22%2C%22rclken%22%3Anull%7D" target="_blank"> <p>新聞</p> 京昆高速4車相撞 </a> </li> </ul> </body> </html> ''' from pyquery import PyQuery as pq doc = pq(html) items = doc('.mh-col')#.find()
:查詢巢狀元素 alist = items.find('li a') print(alist) #查詢所有子元素 alist2 = items.children() print(alist2) #查詢指定的子元素 alist3 = items.children('.g-ellipsis1')print(alist2)
#查詢父元素 #注意:一個元素只有一個父元素 body = items.parent() print(body) #查詢祖先元素 content = items.parents() print(content) #查詢兄弟元素 li = doc('.mh-col .g-ellipsis1') print(li.siblings()) #遍歷 單個元素 #遍歷所有的a標籤alist =doc('.mh-col li a').items() for a in alist: print(a)