【整理】【轉載】爬蟲相關

阿新 • • 發佈：2018-12-20

（1）抓取小說--轉

import requests
import re
from bs4 import BeautifulSoup

if __name__=='__main__':
    headers={"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0'}
   # url='http://z8.cnzz.com/stat.htm?id=1273371515&r=&lg=zh-cn&ntime=1543752090&cnzz_eid=1445241018-1543752090-&showp=1280x720&t=最強惡魔妖孽系統_最強惡魔妖孽系統最新章節_最強惡魔妖孽系統最新章節列表_全書網&umuuid=1676ee2c2530-045ac19542698d8-4c312979-e1000-1676ee2c25463&h=1&rnd=1980725067'
    url='http://www.shushu8.com/shaolinbajue/'
    r= requests.get(url,headers=headers)#.content
    r.encoding=r.apparent_encoding

    #print(r.text)
    d=BeautifulSoup(r.text,'lxml')
    t=d.select('.clearfix > ul > li > a')
   # print(t)
    for i in t:
        deta={'href': i.get('href'),
              '標題': i.get_text()}
        urlk='http://www.shushu8.com'+deta['href']
        jsu=requests.get(urlk)
        jsu.encoding=jsu.apparent_encoding
        di=BeautifulSoup(jsu.text,'lxml')
        ti=di.select('div.page-content')

        for k in ti:
            print(k.get_text())

（2）抓取網頁圖片

https://blog.csdn.net/caozewei/article/details/82497388

1、根據給定的網址獲取網頁原始碼

2、利用正則表示式把原始碼中的圖片地址過濾出來

3、根據過濾出來的圖片地址下載網路圖片

import re
import urllib.request
def gethtml(url):
    page=urllib.request.urlopen(url)
    html=page.read()
    return html
def getimg(html):
    reg = r'src="(.*?\.jpg)"'
    img=re.compile(reg)
    html=html.decode('utf-8')#python3
    imglist=re.findall(img,html)
    x = 0
    for imgurl in imglist:
        urllib.request.urlretrieve(imgurl,'%s.jpg'%x)
        x = x+1
html=gethtml("http://news.ifeng.com/a/20161115/50258273_0.shtml")
print(getimg(html))

把程式碼直接匯入直譯器，可直接執行抓取圖片

（3）其他

https://blog.csdn.net/sinat_37390744/article/details/55533360

https://blog.csdn.net/sinat_37390744/article/details/55670553

https://blog.csdn.net/qq_32252957/article/details/78997021

https://blog.csdn.net/qq_32252957/article/details/78441293

https://blog.csdn.net/qq_32252957/article/details/78961867

【整理】【轉載】爬蟲相關

（1）抓取小說--轉

（2）抓取網頁圖片

【個人總結 | 個人轉載】——Java時間工具類

IOS開發系列——APP間相互呼叫專題【整理，部分原創】

【整理】【轉載】爬蟲相關

【轉載】張鑫旭對知乎前端相關問題的十問十答

【轉載】國外程序員整理的Java資源大全

python爬蟲beautifulsoup4系列3【轉載】

【轉載】WebService相關概念

【轉載】koa相關知識（來自官網）

【轉載】CPU相關總結

【轉載】【python3.x爬蟲】設定IP代理

【轉載】Linux記憶體管理與相關概念

【轉載】CentOS7為firewalld新增開放埠及相關操作

【轉載】Power Query-大量複雜資料的整理彙總

【整理】【轉載】python3一些問題：如安裝Python3報錯解決，使用IDLE等

thoughtworks筆試整理【轉載】

【整理】Android螢幕適配相關

【整理轉載】四月股票市場分析與覆盤

【爬蟲相關】爬蟲爬取拉勾網的安卓招聘資訊

【轉載】SVN常見問題及相關原因，供各位查閱

寫論文的時候會經常使用到的技巧、網站、工具整理【轉載】

【整理】【轉載】爬蟲相關

（1） 抓取小說--轉

（2） 抓取網頁圖片

相關推薦

（1）抓取小說--轉

（2）抓取網頁圖片