python爬取網頁資訊
一、簡單瞭解html網頁
1.推薦瀏覽器:
使用Chrome瀏覽器,在檢查元素中可以看到HTML程式碼和css樣式。
2.網頁構成:
網頁的內容主要包括三個部分:javascript主要針對功能,html針對結構,css針對樣式。在本地檔案中通常是三部分,html+images+css。
3.常用標籤和結構
<div></div> 劃分區域
<div class=”aasdf”></div>說明樣式
<p>wowiji</p>說明文字內容
<li></li>列表
<img>圖片
<h1></h1>....<h6></h6>六種字型不同的標題格式
<a href=”” ></a>超連結
標籤可以互相巢狀
4.實戰做一個網頁
使用工具:pycharm
檔案內容:sample.html
Main.css
主要框架:head(標題欄+導航欄),content(主體),footer(頁尾)
5.網頁效果
6.html原始碼
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>The blah</title>
<link rel="stylesheet" type="text/css" href="main.css">
</head>
<body>
<div class="header">
<img src="images/blah.png">
<ul class="nav">
<li><a href="#">Home</a></li>
<li><a href="#">Site</a></li>
<li><a href="#">Other</a></li>
</ul>
</div>
<div class="main-content">
<h2>Article</h2>
<ul class="article">
<li>
<img src="images/0001.jpg" width="100" height="90">
<h3><a href="#">The blah</a></h3>
<p>Say something</p>
</li>
<li>
<img src="images/0002.jpg" width="100" height="90">
<h3><a href="#">The blah</a></h3>
<p>Say something</p>
</li>
<li>
<img src="images/0003.jpg" width="100" height="90">
<h3><a href="#">The blah</a></h3>
<p>Say something</p>
</li>
<li>
<img src="images/0004.jpg" width="100" height="90">
<h3><a href="#">The blah</a></h3>
<p>Say something</p>
</li>
</ul>
</div>
<div class="footer">
<p>@xumeng</p>
</div>
</body>
</html>
7.css原始碼
body {
padding: 0 0 0 0;
background-color: #ffffff;
background-image: url(images/bg3-dark.jpg);
background-position: top left;
background-repeat: no-repeat;
background-size: cover;
font-family: Helvetica, Arial, sans-serif;
}
.main-content {
width: 500px;
padding: 20px 20px 20px 20px;
border: 1px solid #dddddd;
border-radius:25px;
margin: 30px auto 0 auto;
background: #f1f1f1;
-webkit-box-shadow: 0 0 22px 0 rgba(50, 50, 50, 1);
-moz-box-shadow: 0 0 22px 0 rgba(50, 50, 50, 1);
box-shadow: 0 0 22px 0 rgba(50, 50, 50, 1);
}
.main-content p {
line-height: 26px;
}
.main-content h2 {
color: dimgray;
}
.nav {
padding-left: 0;
margin: 5px 0 20px 0;
text-align: center;
}
.nav li {
display: inline;
padding-right: 10px;
}
.nav li:last-child {
padding-right: 0;
}
.header {
padding: 10px 10px 10px 10px;
}
.header a {
color: #ffffff;
}
.header img {
display: block;
margin: 0 auto 0 auto;
}
.header h1 {
text-align: center;
}
.article {
list-style-type: none;
padding: 0;
}
.article li {
border: 1px solid #f6f8f8;
background-color: #ffffff;
height: 90px;
}
.article h3 {
border-bottom: 0;
margin-bottom: 5px;
}
.article a {
color: #37a5f0;
text-decoration: none;
}
.article img {
float: left;
padding-right: 11px;
}
.footer {
margin-top: 20px;
}
.footer p {
color: #aaaaaa;
text-align: center;
font-weight: bold;
font-size: 12px;
font-style: italic;
text-transform: uppercase;
}
.post {
padding-bottom: 2em;
}
.post-title {
font-size: 2em;
color: #222;
margin-bottom: 0.2em;
}
.post-avatar {
border-radius: 50px;
float: right;
margin-left: 1em;
}
.post-description {
font-family: Georgia, "Cambria", serif;
color: #444;
line-height: 1.8em;
}
.post-meta {
color: #999;
font-size: 90%;
margin: 0;
}
.post-category {
margin: 0 0.1em;
padding: 0.3em 1em;
color: #fff;
background: #999;
font-size: 80%;
}
.post-category-design {
background: #5aba59;
}
.post-category-pure {
background: #4d85d1;
}
.post-category-yui {
background: #8156a7;
}
.post-category-js {
background: #df2d4f;
}
.post-images {
margin: 1em 0;
}
.post-image-meta {
margin-top: -3.5em;
margin-left: 1em;
color: #fff;
text-shadow: 0 1px 1px #333;
}
8.注意:
共有十張圖片,注意路徑關係,CSS、HTML、IMages資料夾在同一目錄下。
寫給自己:此專案路徑在:F:\Python實戰:四周實現爬蟲系統\作業程式碼\第一週\上課_1
二、解析本地檔案中的元素
1.解析的檔案html原始碼
<html>
<head>
<link rel="stylesheet" type="text/css" href="new_blah.css">
</head>
<body>
<div class="header">
<img src="images/blah.png">
<ul class="nav">
<li><a href="#">Home</a></li>
<li><a href="#">Site</a></li>
<li><a href="#">Other</a></li>
</ul>
</div>
<div class="main-content">
<h2>Article</h2>
<ul class="articles">
<li>
<img src="images/0001.jpg" width="100" height="91">
<div class="article-info">
<h3><a href="www.sample.com">Sardinia's top 10 beaches</a></h3>
<p class="meta-info">
<span class="meta-cate">fun</span>
<span class="meta-cate">Wow</span>
</p>
<p class="description">white sands and turquoise waters</p>
</div>
<div class="rate">
<span class="rate-score">4.5</span>
</div>
</li>
<li>
<img src="images/0002.jpg" width="100" height="91">
<div class="article-info">
<h3><a href="www.sample.com">How to get tanned</a></h3>
<p class="meta-info">
<span class="meta-cate">butt</span><span class="meta-cate">NSFW</span>
</p>
<p class="description">hot bikini girls on beach</p>
</div>
<div class="rate">
<img src="images/Fire.png" width="18" height="18">
<span class="rate-score">5.0</span>
</div>
</li>
<li>
<img src="images/0003.jpg" width="100" height="91">
<div class="article-info">
<h3><a href="www.sample.com">How to be an Aussie beach bum</a></h3>
<p class="meta-info">
<span class="meta-cate">sea</span>
</p>
<p class="description">To make the most of your visit</p>
</div>
<div class="rate">
<span class="rate-score">3.5</span>
</div>
</li>
<li>
<img src="images/0004.jpg" width="100" height="91">
<div class="article-info">
<h3><a href="www.sample.com">Summer's cheat sheet</a></h3>
<p class="meta-info">
<span class="meta-cate">bay</span>
<span class="meta-cate">boat</span>
<span class="meta-cate">beach</span>
</p>
<p class="description">choosing a beach in Cape Cod</p>
</div>
<div class="rate">
<span class="rate-score">3.0</span>
</div>
</li>
</ul>
</div>
<div class="footer">
<p>© Mugglecoding</p>
</div>
</body>
</html>
2.需解析的網頁CSS檔案
body {
padding: 0 0 0 0;
background-color: #ffffff;
background-image: url(images/bg3-dark.jpg);
background-position: top left;
background-repeat: no-repeat;
background-size: cover;
font-family: Helvetica, Arial, sans-serif;
}
.main-content {
width: 500px;
padding: 20px 20px 20px 20px;
border: 1px solid #dddddd;
border-radius:15px;
margin: 30px auto 0 auto;
background: #fdffff;
-webkit-box-shadow: 0 0 22px 0 rgba(50, 50, 50, 1);
-moz-box-shadow: 0 0 22px 0 rgba(50, 50, 50, 1);
box-shadow: 0 0 22px 0 rgba(50, 50, 50, 1);
}
.main-content p {
line-height: 26px;
}
.main-content h2 {
color: #585858;
}
.articles {
list-style-type: none;
padding: 0;
}
.articles img {
float: left;
padding-right: 11px;
}
.articles li {
border-top: 1px solid #F1F1F1;
background-color: #ffffff;
height: 90px;
clear: both;
}
.articles h3 {
margin: 0;
}
.articles a {
color:#585858;
text-decoration: none;
}
.articles p {
margin: 0;
}
.article-info {
float: left;
display: inline-block;
margin: 8px 0 8px 0;
}
.rate {
float: right;
display: inline-block;
margin:35px 20px 35px 20px;
}
.rate-score {
font-size: 18px;
font-weight: bold;
color: #585858;
}
.rate-score-hot {
}
.meta-info {
}
.meta-cate {
margin: 0 0.1em;
padding: 0.1em 0.7em;
color: #fff;
background: #37a5f0;
font-size: 20%;
border-radius: 10px ;
}
.description {
color: #cccccc;
}
.nav {
padding-left: 0;
margin: 5px 0 20px 0;
text-align: center;
}
.nav li {
display: inline;
padding-right: 10px;
}
.nav li:last-child {
padding-right: 0;
}
.header {
padding: 10px 10px 10px 10px;
}
.header a {
color: #ffffff;
}
.header img {
display: block;
margin: 0 auto 0 auto;
}
.header h1 {
text-align: center;
}
.footer {
margin-top: 20px;
}
.footer p {
color: #aaaaaa;
text-align: center;
font-weight: bold;
font-size: 12px;
font-style: italic;
text-transform: uppercase;
}
3.解析步驟
(1)beautifulsoup解析網頁
(2)描述爬取定位
(3)從標籤獲取資訊並按照要求裝進容器方便查詢
4.beautifulsoup解析網頁
(1)爬取程式碼
標準解析格式為:soup=beautifulsoup(html,’lxml’)//第一個引數是網頁檔案,第二個是解析方式,解析方式共有五種:lxml,html.parser,lxml HTML,lxml xML,HTML5lib
from bs4 import BeautifulSoup
with open('F:/Python實戰:四周實現爬蟲系統/作業程式碼/第一週/上課_2/web/new_index.html','r') as wb_data:
Soup = BeautifulSoup(wb_data,'lxml')
print(Soup)
(2)報錯1:
can't import beautifulsoup
原因是沒有安裝beautifulsoup庫,解決:在cmd下
pip install bs4
(3)報錯2:
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
原因是沒有安裝解析器,解決:在cmd下:
pip install lxml
(4)爬取結果
<html>
<head>
<link href="new_blah.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div class="header">
<img src="images/blah.png"/>
<ul class="nav">
<li><a href="#">Home</a></li>
<li><a href="#">Site</a></li>
<li><a href="#">Other</a></li>
</ul>
</div>
<div class="main-content">
<h2>Article</h2>
<ul class="articles">
<li>
<img height="91" src="images/0001.jpg" width="100"/>
<div class="article-info">
<h3><a href="www.sample.com">Sardinia's top 10 beaches</a></h3>
<p class="meta-info">
<span class="meta-cate">fun</span>
<span class="meta-cate">Wow</span>
</p>
<p class="description">white sands and turquoise waters</p>
</div>
<div class="rate">
<span class="rate-score">4.5</span>
</div>
</li>
<li>
<img height="91" src="images/0002.jpg" width="100"/>
<div class="article-info">
<h3><a href="www.sample.com">How to get tanned</a></h3>
<p class="meta-info">
<span class="meta-cate">butt</span><span class="meta-cate">NSFW</span>
</p>
<p class="description">hot bikini girls on beach</p>
</div>
<div class="rate">
<img height="18" src="images/Fire.png" width="18"/>
<span class="rate-score">5.0</span>
</div>
</li>
<li>
<img height="91" src="images/0003.jpg" width="100"/>
<div class="article-info">
<h3><a href="www.sample.com">How to be an Aussie beach bum</a></h3>
<p class="meta-info">
<span class="meta-cate">sea</span>
</p>
<p class="description">To make the most of your visit</p>
</div>
<div class="rate">
<span class="rate-score">3.5</span>
</div>
</li>
<li>
<img height="91" src="images/0004.jpg" width="100"/>
<div class="article-info">
<h3><a href="www.sample.com">Summer's cheat sheet</a></h3>
<p class="meta-info">
<span class="meta-cate">bay</span>
<span class="meta-cate">boat</span>
<span class="meta-cate">beach</span>
</p>
<p class="description">choosing a beach in Cape Cod</p>
</div>
<div class="rate">
<span class="rate-score">3.0</span>
</div>
</li>
</ul>
</div>
<div class="footer">
<p>© Mugglecoding</p>
</div>
</body>
</html>
5.描述爬取位置
描述位置使用selector位置,獲取方法,選擇->右鍵檢查->右鍵copy->複製selector
#原始碼
from bs4 import BeautifulSoup
with open('F:/Python實戰:四周實現爬蟲系統/作業程式碼/第一週/上課_2/web/new_index.html','r') as wb_data:
Soup = BeautifulSoup(wb_data,'lxml')
#print(Soup)
print("獲取第一張照片")
#images=Soup.select('body > div.main-content > ul > li:nth-child(1) > img')
#注意使用上面的地址會報錯,要根據提示修改
image1 = Soup.select('body > div.main-content > ul > li:nth-of-type(1) > img')
print(image1)
print("獲取所有照片")
#要獲取所有照片需要清除位置資訊
images = Soup.select('body > div.main-content > ul > li > img')
#把其他資訊篩選出來
title=Soup.select('body > div.main-content > ul > li > div.article-info > h3 > a')
score=Soup.select('body > div.main-content > ul > li > div.rate > span')
selector=Soup.select('body > div.main-content > ul > li > div.article-info > p.meta-info > span')
description=Soup.select('body > div.main-content > ul > li > div.article-info > p.description')
print(images,title,score,selector,description,sep='\n----------------------------------\n')
#列印結果
獲取第一張照片
[<img height="91" src="images/0001.jpg" width="100"/>]
獲取所有照片
[<img height="91" src="images/0001.jpg" width="100"/>, <img height="91" src="images/0002.jpg" width="100"/>, <img height="91" src="images/0003.jpg" width="100"/>, <img height="91" src="images/0004.jpg" width="100"/>]
----------------------------------
[<a href="www.sample.com">Sardinia's top 10 beaches</a>, <a href="www.sample.com">How to get tanned</a>, <a href="www.sample.com">How to be an Aussie beach bum</a>, <a href="www.sample.com">Summer's cheat sheet</a>]
----------------------------------
[<span class="rate-score">4.5</span>, <span class="rate-score">5.0</span>, <span class="rate-score">3.5</span>, <span class="rate-score">3.0</span>]
----------------------------------
[<span class="meta-cate">fun</span>, <span class="meta-cate">Wow</span>, <span class="meta-cate">butt</span>, <span class="meta-cate">NSFW</span>, <span class="meta-cate">sea</span>, <span class="meta-cate">bay</span>, <span class="meta-cate">boat</span>, <span class="meta-cate">beach</span>]
----------------------------------
[<p class="description">white sands and turquoise waters</p>, <p class="description">hot bikini girls on beach</p>, <p class="description">To make the most of your visit</p>, <p class="description">choosing a beach in Cape Cod</p>]
6.篩選有關資訊
#打印出所有種類的結果
from bs4 import BeautifulSoup
with open('F:/Python實戰:四周實現爬蟲系統/作業程式碼/第一週/上課_2/web/new_index.html','r') as wb_data:
Soup = BeautifulSoup(wb_data,'lxml')
images = Soup.select('body > div.main-content > ul > li > img')
titles = Soup.select('body > div.main-content > ul > li > div.article-info > h3 > a')
scores = Soup.select('body > div.main-content > ul > li > div.rate > span')
#selecs = Soup.select('body > div.main-content > ul > li > div.article-info > p.meta-info > span')
selecs = Soup.select('body > div.main-content > ul > li > div.article-info > p.meta-info ')
descrs = Soup.select('body > div.main-content > ul > li > div.article-info > p.description')
for title,image,desc,selec,score in zip(titles,images,descrs,selecs,scores):
data={
#'selec': selec.get_text(),
'selec':list(selec.stripped_strings),#獲取子級目錄下所有
'title':title.get_text(),
'image':image.get('src'),
'desc':desc.get_text(),
'score':score.get_text()
}
print(data)
#列印結果
['fun', 'Wow'], 'title': "Sardinia's top 10 beaches", 'image': 'images/0001.jpg', 'desc': 'white sands and turquoise waters', 'score': '4.5'}
{'selec': ['butt', 'NSFW'], 'title': 'How to get tanned', 'image': 'images/0002.jpg', 'desc': 'hot bikini girls on beach', 'score': '5.0'}
{'selec': ['sea'], 'title': 'How to be an Aussie beach bum', 'image': 'images/0003.jpg', 'desc': 'To make the most of your visit', 'score': '3.5'}
{'selec': ['bay', 'boat', 'beach'], 'title': "Summer's cheat sheet", 'image': 'images/0004.jpg', 'desc': 'choosing a beach in Cape Cod', 'score': '3.0'}
#打印出評分>3分的文章
from bs4 import BeautifulSoup
info=[]
with open('F:/Python實戰:四周實現爬蟲系統/作業程式碼/第一週/上課_2/web/new_index.html','r') as wb_data:
Soup = BeautifulSoup(wb_data,'lxml')
images = Soup.select('body > div.main-content > ul > li > img')
titles = Soup.select('body > div.main-content > ul > li > div.article-info > h3 > a')
scores = Soup.select('body > div.main-content > ul > li > div.rate > span')
#selecs = Soup.select('body > div.main-content > ul > li > div.article-info > p.meta-info > span')
selecs = Soup.select('body > div.main-content > ul > li > div.article-info > p.meta-info ')
descrs = Soup.select('body > div.main-content > ul > li > div.article-info > p.description')
for title,image,desc,selec,score in zip(titles,images,descrs,selecs,scores):
data={
#'selec': selec.get_text(),
'selec':list(selec.stripped_strings),#獲取子級目錄下所有
'title':title.get_text(),
'image':image.get('src'),
'desc':desc.get_text(),
'score':score.get_text()
}
info.append(data)
for i in info:
if float(i['score'])>3:
print(i['title'],i['score'])
#列印結果:
Sardinia's top 10 beaches 4.5
How to get tanned 5.0
How to be an Aussie beach bum 3.5
三、爬取真實網頁
Requests+beautifulsoup爬取tripadvisior
1.伺服器與本地的交換機制
(1)http協議
點選頁面:向伺服器傳送請求(request)
#get:
GET /page_one.html HTTP/1.1 Host:www.sample.com
顯示頁面:response(status_code:)
檢視:右鍵->檢查->network
HTTP1.0:get,post,head
http1.1:get,post,head,options.connect,trace,delete
(2)程式碼
pip install requests
2.解析真實網頁的步驟
(1)requests請求
(2)爬取整個介面
from bs4 import BeautifulSoup
import requests
url='https://cn.tripadvisor.com/Attractions-g60763-Activities-New_York_City_New_York.html'
wb_data=requests.get(url,timeout = 500)
soup=BeautifulSoup(wb_data.text,'lxml')
print(soup)
(3)描述爬取的元素位置
#爬取某個標題的selector
from bs4 import BeautifulSoup
import requests
url='https://cn.tripadvisor.com/Attractions-g60763-Activities-New_York_City_New_York.html'
wb_data=requests.get(url,timeout=500)
soup=BeautifulSoup(wb_data.text,'lxml')
titles=soup.select('#taplc_attraction_coverpage_attraction_0 > div:nth-of-type(4) > div > div > div.shelf_item_container > div:nth-of-type(1) > div.poi > div > div.item.name > a')
print(titles)
結果:
[<a class="poiTitle" data-tpact="shelf_item_click" data-tpatt="4|poi|272517" data-tpid="20" data-tpp="Attractions" href="/Attraction_Review-g60763-d272517-Reviews-Conservatory_Garden-New_York_City_New_York.html" onclick="widgetEvCall('handlers.shelfItemClick', event, this)" target="_blank">溫室花園</a>]
(4)描述爬取的所有元素取所有特徵大小的圖片
#爬取所有特徵大小的圖片
from bs4 import BeautifulSoup
import requests
url='https://cn.tripadvisor.com/Attractions-g60763-Activities-New_York_City_New_York.html'
wb_data=requests.get(url,timeout=500)
soup=BeautifulSoup(wb_data.text,'lxml')
imgs=soup.select('img[width="200"]')
print(imgs)
(5)字典方式遍歷
#字典方式遍歷
from bs4 import BeautifulSoup
import requests
url='https://cn.tripadvisor.com/Attractions-g60763-Activities-New_York_City_New_York.html'
wb_data=requests.get(url,timeout=500)
soup=BeautifulSoup(wb_data.text,'lxml')
imgs=soup.select('img[width="200"]')
titles=soup.select('#taplc_attraction_coverpage_attraction_0 > div > div > div > div.shelf_item_container > div:nth-of-type(1) > div.poi > div > div.item.name > a')
for title,img in zip(titles,imgs):
data={
'title':title.get_text(),
'img':img.get('src'),
}
print(data)
3.跳過登入步驟,在request引數獲取資訊
from bs4 import BeautifulSoup
import requests
import time
url_saves = 'http://www.tripadvisor.com/Saves#37685322'
url = 'https://cn.tripadvisor.com/Attractions-g60763-Activities-New_York_City_New_York.html'
urls = ['https://cn.tripadvisor.com/Attractions-g60763-Activities-oa{}-New_York_City_New_York.html#ATTRACTION_LIST'.format(str(i)) for i in range(30,930,30)]
headers = {
'User-Agent':'',
'Cookie':''
}
def get_attractions(url,data=None):
wb_data = requests.get(url)
time.sleep(4)
soup = BeautifulSoup(wb_data.text,'lxml')
titles = soup.select('div.property_title > a[target="_blank"]')
imgs = soup.select('img[width="160"]')
cates = soup.select('div.p13n_reasoning_v2')
if data == None:
for title,img,cate in zip(titles,imgs,cates):
data = {
'title' :title.get_text(),
'img' :img.get('src'),
'cate' :list(cate.stripped_strings),
}
print(data)
def get_favs(url,data=None):
wb_data = requests.get(url,headers=headers)
soup = BeautifulSoup(wb_data.text,'lxml')
titles = soup.select('a.location-name')
imgs = soup.select('div.photo > div.sizedThumb > img.photo_image')
metas = soup.select('span.format_address')
if data == None:
for title,img,meta in zip(titles,imgs,metas):
data = {
'title' :title.get_text(),
'img' :img.get('src'),
'meta' :list(meta.stripped_strings)
}
print(data)
for single_url in urls:
get_attractions(single_url)
4.反爬蟲
只用檢查->在移動端檢視->解析(保護措施不是非常嚴密)
四、獲取動態資料非同步載入
1.非同步載入
不換頁的情況不斷載入
JS 持續載入,與JavaScript不在一起,分批量載入
2. 發現非同步資料
檢查->Network->XHR
Name:出現新請求成功的頁碼->動態請求網址URL(page=x)
Response載入回一組div標籤,包括連結
3.程式碼
from bs4 import BeautifulSoup
import requests
import time
url = 'https://knewone.com/discover?page='
def get_page(url,data=None):
wb_data = requests.get(url)
soup = BeautifulSoup(wb_data.text,'lxml')
imgs = soup.select('a.cover-inner > img')
titles = soup.select('section.content > h4 > a')
links = soup.select('section.content > h4 > a')
if data==None:
for img,title,link in zip(imgs,titles,links):
data = {
'img':img.get('src'),
'title':title.get('title'),
'link':link.get('href')
}
print(data)
#自控頁碼函式
def get_more_pages(start,end):
for one in range(start,end):
get_page(url+str(one))
time.sleep(2)
get_more_pages(1,10)
五、作業:爬取商品資訊
from bs4 import BeautifulSoup
import requests
import time
url = 'http://bj.58.com/pingbandiannao/24604629984324x.shtml'
wb_data = requests.get(url)
soup = BeautifulSoup(wb_data.text,'lxml')
def get_links_from(who_sells):
urls = []
list_view = 'http://bj.58.com/pbdn/{}/pn2/'.format(str(who_sells))
wb_data = requests.get(list_view)
soup = BeautifulSoup(wb_data.text,'lxml')
for link in soup.select('td.t a.t'):
urls.append(link.get('href').split('?')[0])
return urls
def get_views_from(url):
id = url.split('/')[-1].strip('x.shtml')
api = 'http://jst1.58.com/counter?infoid={}'.format(id)
# 這個是找到了58的查詢介面,不瞭解介面可以參照一下新浪微博介面的介紹
js = requests.get(api)
views = js.text.split('=')[-1]
return views
# print(views)
def get_item_info(who_sells=0):
urls = get_links_from(who_sells)
for url in urls:
wb_data = requests.get(url)
soup = BeautifulSoup(wb_data.text,'lxml')
data = {
'title':soup.title.text,
'price':soup.select('.price')[0].text,
'area' :list(soup.select('.c_25d')[0].stripped_strings) if soup.find_all('span','c_25d') else None,
'date' :soup.select('.time')[0].text,
'cate' :'個人' if who_sells == 0 else '商家',
# 'views':get_views_from(url)
}
print(data)
# get_item_info(url)
# get_links_from(1)
get_item_info(url)