xpath,requests爬取小豬短租網

阿新 • • 發佈：2018-12-19

import requests
from lxml import etree
import time
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
    'AppleWebKit/537.36 (KHTML, like Gecko) '
    'Chrome/70.0.3538.77 Safari/537.36'
}

def page_url(url):
    res = requests.get(url,headers=headers)
    text = res.text
    html = etree.HTML(text)
    links = html.xpath('.//a[@class="resule_img_a"]/@href')
    for link in links:
        parse_page(link)

def parse_page(url):
    res = requests.get(url,headers=headers)
    text = res.text
    html = etree.HTML(text)
    titles = html.xpath('.//div[@class="pho_info"]/h4/em//text()')
    addresses = html.xpath('.//span[@class="pr5"]//text()')
    prices = html.xpath('.//div[@class="day_l"]/span//text()')
    imgs = html.xpath('.//div[@class="member_pic"]/a//img/@src')
    names = html.xpath('.//div[@class="w_240"]/h6/a//text()')
    for title,address,price,img,name in zip(titles,addresses,prices,imgs,names):
        data ={
            '屋子標題':title.strip(),
            '地址':address.strip(),
            '價格':price,
            '圖片':img,
            '名字':name
        }
        print(data)



if __name__ == '__main__':
    urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format
            (x)  for x in range(1,14)]
    for url in urls:
        page_url(url)
        time.sleep(2)

xpath,requests爬取小豬短租網

import requests from lxml import etree import time headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) ' 'AppleWebKit/537.3

爬取小豬短租網信息

5.0 head test == lec 信息 names file float # -*- coding: utf-8 -*- import time import lxml import requests from bs4 import BeautifulSoup h

Python爬蟲入門 | 5 爬取小豬短租租房信息

圖片交流 ffffff 信息 jpg http 而已基本 mat 小豬短租是一個租房網站，上面有很多優質的民宿出租信息，下面我們以成都地區的租房信息為例，來嘗試爬取這些數據。小豬短租（成都）頁面：http://cd.xiaozhu.com/1.爬取租房標題按照慣例，

python3爬取“小豬短租-北京”租房資訊

爬蟲思路分析： 1. 觀察小豬短租（北京）的網頁首頁：http://www.xiaozhu.com/?utm_source=baidu&utm_medium=cpc&utm_term=PC%E6%A0%87%E9%A2%98&utm_content=pinzhuan

Python爬蟲入門 | 5 爬取小豬短租租房資訊

小豬短租是一個租房網站，上面有很多優質的民宿出租資訊，下面我們以成都地區的租房資訊為例，來嘗試爬取這些資料。 1.爬取租房標題按照慣例，先來爬下標題試試水，找到標題，複製xpath。多複製幾個房屋的標題 xpath 進行對比：

抓取小豬短租1000張列表頁內容

pre quest 個數 import rom lxml zip .text with 代碼如下 #!/usr/bin/env python# -*- coding:utf-8 -*-from bs4 import BeautifulSoupimport requestsd

抓取小豬短租列表內容並保存在mongodb裏

抓取 select requests orm com titles mongod lin ges import pymongoimport requestsfrom bs4 import BeautifulSoupclient = pymongo.MongoClient(‘

小豬短租網requests庫使用，爬蟲案例

請求庫官方文件指出：讓HTTP 。服務人類細心的讀者就會發現，請求庫的作用英文就是請求網站電子雜誌|網頁資料的從簡單的例項開始，講解。請求庫的使用方法。 import requests res = requests.get

爬取網站小豬短租的少量資訊及詳細介紹--爬蟲案例篇

#!/usr/bin/env python # -*- coding:utf-8 -*- # @Time : 18-10-10 下午9:21 import requests #匯入requests包;發請求網頁 from bs4 import BeautifulSoup #匯入bs4包;

爬蟲寫法及狀態碼的認知,以小豬短租為例---爬蟲案例

寫一個最簡單的爬蟲先介紹,我的環境: Ubuntu:18.04(64位)

Python爬蟲實戰--小豬短租爬蟲

前言：通過上次的TripAdvisor爬蟲實戰，我們學會了如何使用requests傳送一個網頁請求，並使用BeautifulSoup來解析頁面，從中提取出我們的目標內容，並將其存入文件中。同時我們也學會了如何分析頁面，並提取出關鍵資料。下面我們將進一步學習，並爬去小豬短租的詳情頁面，提取

爬取小豬網站住房資訊並把結果儲存到資料庫中

from bs4 import BeautifulSoup import requests, pymongo #啟用MongoDB client = pymongo.MongoClient('localhost', 27017) #給資料庫命名 xiaozhu = client['xiao

python爬蟲實踐——零基礎快速入門（四）爬取小豬租房資訊

接下來我們爬取小豬短租租房資訊。進入主頁後選擇深圳地區的位置。地址如下： http://sz.xiaozhu.com/ 一，標題爬取按照慣例，我們先複製標題的xpath資訊，多複製幾個進行對比： //*[@id="page_list"]/ul/li[1]/

requests+xpath+map爬取百度貼吧

name ads int strip 獲取 app open http col 1 # requests+xpath+map爬取百度貼吧 2 # 目標內容:跟帖用戶名,跟帖內容,跟帖時間 3 # 分解: 4 # requests獲取網頁 5 # xpath提取內

使用requests配合【lxml+xpath】爬取B2B網站

@匯入類庫 import requests from lxml import etree import time @準備請求頭，以偽裝客戶端瀏覽器 # 請求頭，可以由F12頁面控制檯或fi

python requests庫網頁爬取小實例：百度/360搜索關鍵詞提交

ext aid col text () status exc print 爬取百度/360搜索關鍵詞提交全代碼： #百度/360搜索關鍵詞提交import requestskeyword=‘Python‘try: 　　#百度關鍵字　　# kv={‘w

使用requests爬取貓眼電影TOP100榜單

esp 進行得到 ensure .com key d+ odin pickle 　　Requests是一個很方便的python網絡編程庫，用官方的話是“非轉基因，可以安全食用”。裏面封裝了很多的方法，避免了urllib/urllib2的繁瑣。　　這一節使用request

【爬蟲】002 python3 +beautifulsoup4 +requests 爬取靜態頁面

bgcolor img err 預覽政府 bold 技術貴的頁面元素實驗環境: win7 python3.5 bs4 0.0.1 requests 2.19 實驗日期：2018-08-07 爬取網站：http://www.xhsd.cn/ 現在的網站大多有復雜

我的第一個爬蟲，爬取北京地區短租房信息

爬取 connect except links 效率 chrom cti clas 爬蟲 # 導入程序所需要的庫。import requestsfrom bs4 import BeautifulSoupimport time# 加入請求頭偽裝成瀏覽器headers = {

requests爬取中國天氣網深圳七日天氣

dumps pat txt all resp att .sh asc code 1 # conding=utf-8 2 import json 3 import re 4 import requests 5 6 def get_data(url): 7

xpath,requests爬取小豬短租網

相關推薦