1. 程式人生 > >BeautifulSoup實現博文簡介與過濾惡意標簽(xxs攻擊)

BeautifulSoup實現博文簡介與過濾惡意標簽(xxs攻擊)

round 12px 簡介 scrip 模塊 腳本 -c pri pip

一、BeautifulSoup模塊 二、博文簡介 三、過濾惡意標簽 一、BeautifulSoup模塊 pip install bs4 # 安裝bs4 from bs4 import BeautifulSoup # 導入BeautifulSoup
二、博文簡介 from bs4 import BeautifulSoup content = ‘<a href="http://example.com/">I linked to <i>example.com</i></a>‘ soup = BeautifulSoup(content, ‘html.parser‘)
overview = soup.text[0:9] print(overview) 三、過濾惡意標簽 from bs4 import BeautifulSoup content = ‘<a href="http://example.com/">I linked to <i>example.com</i></a><div><img src=""></img>image</div><a>link</a><script>alert(123)</script>‘ soup = BeautifulSoup(content, ‘html.parser‘)
print(soup) # 這裏帶有script標簽的腳本 for tag in soup.find_all(): if tag.name in [‘script‘, ‘link‘]: tag.decompose() print(soup) # 這裏已經把帶有script標簽的腳本去掉了

BeautifulSoup實現博文簡介與過濾惡意標簽(xxs攻擊)