base64編碼、bs4

阿新 • • 發佈：2017-10-19

mpi content scheme bytesio 提示 4.0 extc -i bytes

BeautifulSoup的用法： r=requests.get(‘http://www.qq.com/‘).text soup=BeautifulSoup(r,‘lxml‘) Eg：t=soup.find_all(class_=‘aabbcc‘,text=re.compile(‘\w‘))[1].string.strip() find內，若只寫有text參數則取文本；若還有標簽或屬性等式，則取標簽。屬性class：可在標簽名後直接寫屬性值xxx；或加個_，即class_=‘xxx‘；或改為{‘class‘:‘xxx‘}，屬性名有-等標點的，也用此法。 ‘a‘是搜索標簽a，a=‘xxx‘是屬性a。可用的搜索形式：str，正則，list，True。如搜索不含src屬性的a標簽：soup.find_all(‘a‘,src=False)。獲取所有的a標簽及t開頭的標簽：soup.find_all([‘a‘,re.compile(‘^t‘)])。若無[ ]，即soup.find_all(‘a‘,‘xxx‘)，則是取class值為xxx的a標簽了。 find是缺省方法，若無屬性附加，則soup.find(‘div‘)可簡寫為soup.div；attrs是find返回對象的缺省屬性，即x.attrs[‘href‘]等於x[‘href‘]； string只能取自己的文本，後代的取不了；而stripped_strings和方法get_text(‘\n‘,strip=True)都是取對象下的所有文本，前者返回生成器，後者是str。定位css的soup.select()方法可通用。包含 class 屬性卻不包含 id 屬性的所有標簽： def has_class_but_no_id(tag): return tag.has_attr(‘class‘) and not tag.has_attr(‘id‘) soup.find_all(has_class_but_no_id) w=標簽x.標簽y.extract()——把x中的y踢出，並給了w；類似於list中的pop方法。 t=soup.find_all(class_=‘login-container‘)[0] t1=t.find(‘a‘,class_=‘item login‘).extract().get_text().strip() print(t.get_text().strip()) print(t1) ****************************************分割線****************************************

base64編碼： #驗證碼圖片的src常用的Data URI scheme： from io import BytesIO from PIL import Image import base64,requests url=‘https://my.fengjr.com/api/v2/captcha?_ts=35045549418.92857‘ sourceCode=requests.get(url).json()[‘captcha‘] parseCode = sourceCode.replace(‘data:image/png;base64,‘,‘‘) imgData = BytesIO(base64.b64decode(parseCode)) # print(textData.getvalue().decode()) #用於顯示base64編碼的JS、CSS、HTML代碼 Image.open(imgData).show() #內存中讀寫bytes用BytesIO，str用StringIO # data:,——文本數據 # data:text/plain,——文本數據 # # data:text/css,——CSS代碼 # data:text/css;base64,——base64編碼的CSS代碼 # ;javascript:;,——HTML代碼 # ;javascript:;;base64,——base64編碼的HTML代碼 # data:text/javascript,——Javascript代碼 # data:text/javascript;base64,——base64編碼的js代碼 # # data:image/gif;base64,——base64編碼的gif圖片數據 # data:image/png;base64,——base64編碼的png圖片 # data:image/jpeg;base64,——base64編碼的jpeg圖片 # data:image/x-icon;base64,——base64編碼的icon圖片 ****************************************分割線****************************************

tkinter窗體： from tkinter import * from tkinter import messagebox import requests,re from io import BytesIO from PIL import Image def download(): startUrl=‘http://www.uustv.com/‘ name=entry.get() if not name: messagebox.showinfo(‘提示‘, ‘請輸入姓名！‘) data={‘word‘:name,‘sizes‘:‘60‘,‘fonts‘:‘jfcs.ttf‘,‘fontcolor‘:‘#000000‘} response=requests.post(url=startUrl,data=data) response.encoding=‘utf8‘ pic=re.findall(‘tmp/\d+?.gif‘,response.text)[0] imgUrl=startUrl+pic print(imgUrl) imgData=requests.get(imgUrl).content imgData=BytesIO(imgData) Image.open(imgData).show() #等1秒，待凹按鈕復原後再關閉窗體 root=Tk() root.title(‘個性簽名‘) root.geometry(‘480x360+600+300‘) Label(root,text=‘hello,python‘,font=(‘華文中宋‘,20),background=‘yellow‘).grid() entry=Entry(root,font=(‘微軟雅黑‘,20)) entry.grid(row=0,column=1) Button(root,text=‘設計簽名‘,font=20,width=20,height=1,command=download).grid(row=2,column=1) root.mainloop() ****************************************分割線****************************************

爬取糗事百科的段子： import requests from bs4 import BeautifulSoup page = 1 url = ‘http://www.qiushibaike.com/text/page/‘ + str(page) headers = {‘User-Agent‘: ‘Mozilla/4.0‘} html = requests.get(url, headers=headers).text soup = BeautifulSoup(html,‘lxml‘) jokes = soup.select(‘.content > span‘) with open(‘糗事百科.txt‘,‘w‘,encoding=‘utf8‘) as f: for item in jokes: f.write(‘\n‘.join(item.stripped_strings)) # f.write(item.get_text(‘\n‘,strip=True)) f.write(‘\n-----------分割線---------\n‘) ****************************************分割線**************************************** 下載起點某小說的公共章節： import requests,re from bs4 import BeautifulSoup def getSourceCode(url): html = requests.get(url).text html = BeautifulSoup(html, ‘lxml‘) return html def getBookName(html): author=html.find(‘a‘,href=re.compile(‘//me.qidian.com/authorIndex.+‘)).text bookName=html.find(‘a‘,‘act‘).text #屬性是class時，可省略不寫 return author+‘：‘+bookName def getContent(html): title=html.find(‘h3‘,‘j_chapterName‘).text content = html.find(‘div‘,‘read-content j_readContent‘).get_text(‘\n‘) content=content.replace(‘　　　　‘,‘　　‘).replace(‘ ‘,‘‘) return title+content def getNextChapter(html): #屬性名有-等標點符號的，只能用{}的寫法 return ‘https:‘+html.find(‘div‘,{‘data-nurl‘:True})[‘data-nurl‘] def main(url): html=getSourceCode(url) content=getContent(html) bookName=getBookName(html) with open(‘%s.txt‘ %bookName,‘a‘,encoding=‘utf8‘) as f: while len(content)>500: #起點的vip章節只公開兩三百字，不再提取 print(len(html),len(content),content.split(‘\n‘)[0]) f.write(content) nextChapterUrl = getNextChapter(html) html=getSourceCode(nextChapterUrl) content=getContent(html) if __name__ == ‘__main__‘: u = ‘https://read.qidian.com/chapter/6JSeRXxo8g01/sLuDHoqD3wIex0RJOkJclQ2‘ main(u) #從該小說的第一章的網址，開始爬取

base64編碼、bs4

mpi content scheme bytesio 提示 4.0 extc -i bytes BeautifulSoup的用法： r=requests.get(‘http://www.qq.com/‘).text soup=BeautifulSoup(r,‘lxml

base64編碼、bs4

base64編碼、bs4

base64編碼的作用、為何使用base64編碼、base64編碼使用場景

對簽名串做BASE64編碼和解碼、驗籤，驗籤失敗的原因

linux之用openssl命令Base64編碼解碼、md5/sha1摘要、AES/DES3加密解密

Base64演算法、Base64Encode、UrlEcode編碼及應用

JS實現—Base64編碼解碼，帶16進制顯示

java-base64編碼和解碼

圖片和base64編碼字符串互相轉換，圖片和byte數組互相轉換

BASE64編碼的字符進行URL傳輸丟失特殊字符的問題

JAVA實現Base64編碼的三種方式

文件上傳三：base64編碼上傳

關於base64編碼的原理及實現

UTF-8和GBK編碼之間的區別(頁面編碼、數據庫編碼區別)以及在實際項目中的應用

從原理上理解Base64編碼

赫夫曼樹的構建、編碼、譯碼解析

C#中將字符串轉成 Base64 編碼（加密--解密）

所有的字符編碼由System.Text.Encoding類獲取所有的字符編碼如Unicode編碼、 GB18030編碼、(UTF-8) 簡體中文(GB2312)

Base64編碼字符串時數據量明顯變大

C#中圖片轉換為Base64編碼，Base64編碼轉換為圖片

Python小白學習之路—變量、字符編碼、字符拼接

base64編碼、bs4

相關推薦