A rookie of python_crawler----1(tf)
阿新 • • 發佈:2018-12-13
記錄一個菜鳥學習爬蟲的過程
下面這個程式碼很簡單,爬取的是TF官網上熱門口紅的資訊
採取的是最基本的BeautifulSoup和requests庫
#A simple code for crawling the information of the popular TF-lipsticks import requests import re from bs4 import BeautifulSoup url='https://www.tom-ford.cn/' data={} headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/70.0.3538.77 Safari/537.36' } response = requests.get(url, headers=headers) html_doc = response.content # TF #print(response.status_code) #狀態碼 #print(response.content.decode("utf-8")) #內容 soup = BeautifulSoup( html_doc, 'html.parser', from_encoding='utf-8' # html文件編碼# ) TF_type = soup.find_all('a', href=re.compile(r"goods-")) for tf_type in TF_type: #print(tf_type.name,tf_type['href'],tf_type.get_text()) print(tf_type.get_text())