BeautifulSoup實現博文簡介與過濾惡意標簽(xxs攻擊)
阿新 • • 發佈:2018-09-11
round 12px 簡介 scrip 模塊 腳本 -c pri pip
一、BeautifulSoup模塊
二、博文簡介
三、過濾惡意標簽
一、BeautifulSoup模塊
pip install bs4 # 安裝bs4
from bs4 import BeautifulSoup # 導入BeautifulSoup
二、博文簡介 from bs4 import BeautifulSoup content = ‘<a href="http://example.com/">I linked to <i>example.com</i></a>‘ soup = BeautifulSoup(content, ‘html.parser‘)
overview = soup.text[0:9]
print(overview)
三、過濾惡意標簽
from bs4 import BeautifulSoup
content = ‘<a href="http://example.com/">I linked to <i>example.com</i></a><div><img src=""></img>image</div><a>link</a><script>alert(123)</script>‘
soup = BeautifulSoup(content, ‘html.parser‘)
print(soup) # 這裏帶有script標簽的腳本
for tag in soup.find_all():
if tag.name in [‘script‘, ‘link‘]:
tag.decompose()
print(soup) # 這裏已經把帶有script標簽的腳本去掉了
二、博文簡介 from bs4 import BeautifulSoup content = ‘<a href="http://example.com/">I linked to <i>example.com</i></a>‘ soup = BeautifulSoup(content, ‘html.parser‘)
BeautifulSoup實現博文簡介與過濾惡意標簽(xxs攻擊)