簡單的python爬取網頁字串內容並儲存
阿新 • • 發佈:2019-02-04
最近想試試python的爬蟲庫,就找了個只有字串的的網頁來爬取。網址如下:
開啟後看到是一些歌名還有hash等資訊。按照hash|filename的方式存在檔案裡,先貼程式碼#coding=utf-8 import urllib import re import os def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getHash(html): reg = r'"hash":"(.+?)",' has = re.compile(reg) hashlist = re.findall(has,html) with open('1.txt','w') as f: for has in hashlist: f.write(has+"|"+"\r\n") def getName(html): reg=r'"filename":"(.+?)",' name=re.compile(reg) namelist=re.findall(name,html) with open('1.txt','rb') as fr: with open('2.txt','wb') as fw: for name in namelist: for l in fr: fw.write(l.replace(b'\r\n', name+b'\r\n')) break html=getHtml("http://mobilecdn.kugou.com/api/v3/special/song?plat=0&page=1&pagesize=-1&version=7993&with_res_tag=1&specialid=26430") getHash(html) getName(html) os.remove('1.txt')
說起來也比較簡單,就是拿到取html頁面後按照正則取兩次內容後存在txt裡面。