python獲取網站http://www.weather.com.cn 城市 8-15天天氣

阿新 • • 發佈：2017-05-10

status header none esp user lis [1] bad reat

參考一個前輩的代碼，修改了一個案例開始學習beautifulsoup做爬蟲獲取天氣信息，前輩獲取的是7日內天氣，

我看旁邊還有8-15日就模仿修改了下。其實其他都沒有變化，只變換了獲取標簽的部分。但是我碰到

一個span獲取的問題，如我的案例中每日的源代碼是這樣的。

<li class="t">
<span class="time">周五（19日）</span>
<big class="png30 d301"></big>
<big class="png30 n301"></big>
<span class 
="wea">雨</span>
<span class="tem"><em>36℃</em>/22℃</span>
<span class="wind">東南風</span>
<span class="wind1">微風</span>
</li>

上門的所有span標簽中，日期，天氣，風向都可以通過beautifulsoup進行標簽匹配獲取。唯獨溫度獲取不到，

獲取到的值為none，我奇怪了好酒，用span.em能獲取到36°，獲取不完全，不符合我的要求。最後沒辦法。

我只能通過獲取到這個span這一回內容

<span class="tem"><em>36℃</em>/22℃</span>

然後通過字符串替換替換掉多余的字符。剩余36℃/22℃

得到這個結果。存入變量並寫入csv文件。

以下為全部代碼，如有不對的地方歡迎指教。

‘‘‘
Created on 2017年5月10日

@author: bekey qq：402151718
‘‘‘

#conding:UTF-8

import requests
import csv
import random
import time
 
import socket
import http.client
#import urllib.request
from bs4 import BeautifulSoup


def get_content(url , data = None):
    header={
        ‘Accept‘: ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8‘,
        ‘Accept-Encoding‘: ‘gzip, deflate, sdch‘,
        ‘Accept-Language‘: ‘zh-CN,zh;q=0.8‘,
        ‘Connection‘: ‘keep-alive‘,
        ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36‘
    }
    timeout = random.choice(range(80, 180))
    while True:
        try:
            rep = requests.get(url,headers = header,timeout = timeout)
            rep.encoding = ‘utf-8‘
            # req = urllib.request.Request(url, data, header)
            # response = urllib.request.urlopen(req, timeout=timeout)
            # html1 = response.read().decode(‘UTF-8‘, errors=‘ignore‘)
            # response.close()
            break
        # except urllib.request.HTTPError as e:
        #         print( ‘1:‘, e)
        #         time.sleep(random.choice(range(5, 10)))
        #
        # except urllib.request.URLError as e:
        #     print( ‘2:‘, e)
        #     time.sleep(random.choice(range(5, 10)))
        except socket.timeout as e:
            print( ‘3:‘, e)
            time.sleep(random.choice(range(8,15)))

        except socket.error as e:
            print( ‘4:‘, e)
            time.sleep(random.choice(range(20, 60)))

        except http.client.BadStatusLine as e:
            print( ‘5:‘, e)
            time.sleep(random.choice(range(30, 80)))

        except http.client.IncompleteRead as e:
            print( ‘6:‘, e)
            time.sleep(random.choice(range(5, 15)))

    return rep.text
    # return html_text
    
    
def get_data(html_text):
        final = []
        bs = BeautifulSoup(html_text, "html.parser")  # 創建BeautifulSoup對象
        body = bs.body # 獲取body部分
        data = body.find(‘div‘, {‘id‘: ‘15d‘})  # 找到id為7d的div
        ul = data.find(‘ul‘)  # 獲取ul部分
        li = ul.find_all(‘li‘)  # 獲取所有的li

        for day in li: # 對每個li標簽中的內容進行遍歷
            temp = []
            #print(day)
            span = day.find_all(‘span‘) #找到所有的span標簽
            #print(span)
            date = span[0].string  # 找到日期
            temp.append(date)  # 添加到temp中
            wea1 = span[1].string#獲取天氣情況
            temp.append(wea1) #加入到list
            tem =str(span[2])
            tem = tem.replace(‘<span class="tem"><em>‘, ‘‘)
            tem = tem.replace(‘</span>‘,‘‘)
            tem = tem.replace(‘</em>‘,‘‘)
            #tem = tem.find(‘span‘).string #獲取溫度
            temp.append(tem) #溫度加入list
            
            
            windy = span[3].string
            temp.append(windy)#加入到list
            windy1 = span[4].string
            temp.append(windy1)#加入到list
            final.append(temp)
           
        return final


def write_data(data, name):
    file_name = name
    with open(file_name, ‘a‘, errors=‘ignore‘, newline=‘‘) as f:
            f_csv = csv.writer(f)
            f_csv.writerows(data)
            
            
if __name__ == ‘__main__‘:
    url =‘http://www.weather.com.cn/weather15d/101180101.shtml‘
    html = get_content(url)
    #print(html)
    result = get_data(html)
    #print(result)
    write_data(result, ‘weather7.csv‘)

效果如圖：

技術分享

項目地址：[email protected]:zhangbei59/weather_get.git

python獲取網站http://www.weather.com.cn 城市 8-15天天氣

status header none esp user lis [1] bad reat 參考一個前輩的代碼，修改了一個案例開始學習beautifulsoup做爬蟲獲取天氣信息，前輩獲取的是7日內天氣，我看旁邊還有8-15日就模仿修改了下。其實其他都沒有變化，只變換了獲

python+selenium自動化測試css選擇器，定位元素神器，http://www.w3school.com.cn/cssref/css_selectors.asp

一、頁面介紹二、專案實戰 1、iframe[src*=BUSI] 採用欄位包含的方式 2、cssBAJG = ‘#PRO_BAK_ORG + span > input[type=“text”]’ 採用同級#PRO_BAK_ORG + sp

http://www.w3school.com.cn/sql MYSQL中的錯誤

1、今天試了一下w3school中的這種用法，但執行時卻報這個錯誤經過查詢相關資料才瞭解到mysql不支援select * into from這種格式。但是可以通過另一種方法解決這個問題。 Create table Table2 (Select * from Tab

http://www.kfc.com.cn 爬取(案例練習：ajax、post)

#!/usr/bin/env python # -*- coding: utf-8 -*- import urllib import urllib.request import urllib.parse # ajax post post_url = 'http://www

歡迎訪問個人網站http://www.pqdong.com/

部落格遷移到個人網站點選訪問為啥在windows作業系統中訪問速度那麼慢那？先佔個坑，解決後再更。 1.初步測試：辣雞360,360瀏覽器速度超級慢，而用谷歌瀏覽器速度超級快。為什麼？ 2.Edge瀏

js獲取路由(http://www.aaa.com/module/218.html)中的id

//返回當前頁面的 URL(http://www.aaa.com/模組名/216.html) var strUrl = window.location.href; //把一個字串按照/分割成字串陣列 a

SQLServer和Oracle常用函式對比@http://www.enet.com.cn/article/2004/1207/A20041207369373.shtml

SQLServer和Oracle是大家經常用到的資料庫，在此感謝作者總結出這些常用函式以供大家參考。數學函式　　1.絕對值　　S:select abs(-1) value 　　O:select abs(-1) value from dual 　　2.取整(大

QOS技術（http://www.h3c.com.cn/Products___Technology/Technology/QoS/Other_technology/Technology_recomm）

QoS QoS簡介概述 QoS（Quality of Service）即服務質量。對於網路業務，服務質量包括傳輸的頻寬、傳送的時延、資料的丟包率等。在網路中可以通過保證傳輸的頻寬、降低傳送的時延、降低資料的丟包率以及時延抖動等措施來提高服務質量。網路資源總是有限的，

超有創意的設計網站:http://www.tagxedo.com/,將字型設設計成圖片---ShinePans

Welcome to Tagxedo, word cloud with styles Tagxedo turns words -- famous speeches, news articles, slogans and themes, even your love letters -- into a v

【aspgreener的專欄】人生的自由本人做的小站【小事一籮筐】歡迎訪問 http://www.xiaoshiyiluokuang.com 或http://xsylk.cn wap網站全新改版，請輸入http://wap.xsylk.cn訪問

人生的自由本人做的小站【小事一籮筐】歡迎訪問 http://www.xiaoshiyiluokuang.com 或http://xsylk.cn wap網站全新改版，請輸入http://wap.xs...

python獲取網站http://www.weather.com.cn 城市 8-15天天氣

python獲取網站http://www.weather.com.cn 城市 8-15天天氣

python+selenium自動化測試css選擇器，定位元素神器，http://www.w3school.com.cn/cssref/css_selectors.asp

http://www.w3school.com.cn/sql MYSQL中的錯誤

http://www.kfc.com.cn 爬取(案例練習：ajax、post)

歡迎訪問個人網站http://www.pqdong.com/

js獲取路由(http://www.aaa.com/module/218.html)中的id

SQLServer和Oracle常用函式對比@http://www.enet.com.cn/article/2004/1207/A20041207369373.shtml

QOS技術（http://www.h3c.com.cn/Products___Technology/Technology/QoS/Other_technology/Technology_recomm）

超有創意的設計網站:http://www.tagxedo.com/,將字型設設計成圖片---ShinePans

【aspgreener的專欄】人生的自由本人做的小站【小事一籮筐】歡迎訪問 http://www.xiaoshiyiluokuang.com 或http://xsylk.cn wap網站全新改版，請輸入http://wap.xsylk.cn訪問

用python計算文件行數[裝載自http://www.centoscn.com/python/2013/0806/1110.html]

python學習——day9（ssh,線程和進程，信號量，隊列，生產者消費者模型） Alex地址：http://www.cnblogs.com/alex3714/articles/5230609.html

python學習——day8（socket,socket server） Alex網址:http://www.cnblogs.com/alex3714/articles/5227251.html

python學習——day12（MySQL常用命令，連接python）alex：http://www.cnblogs.com/wupeiqi/articles/5713330.html

火星人網站開通，開放註冊中 http //www scrum org cn/

使用 Jersey 和 Apache Tomcat 構建 RESTful Web 服務---很詳細，轉自http://www.ibm.com/developerworks/cn/web/wa-aj-t

構建伺服器叢集感知的 Java 應用程式-http://www.ibm.com/developerworks/cn/java/j-zookeeper/

從瀏覽器輸入http://www.baidu.com/到返回網站發生了什麽？

前端面試題（來自前端網http://www.qdfuns.com/notes/23515/c9163ddd620baac5dd23141d41982bb8.html）

前端面試題二（來自前端網http://www.qdfuns.com/notes/23515/fa8b1e788ac39b04108fc33e5b543c4a.html）

python獲取網站http://www.weather.com.cn 城市 8-15天天氣

相關推薦