1. 程式人生 > >部落格訪問人數統計

部落格訪問人數統計

很早之前就寫了這個程式碼,今天重新更新一下

發現對千分位數字的匹配有些bug

另外對來自不同地區的資料沒進行爬取。

爬取資料:http://s04.flagcounter.com/more7/XTPq/

處理邏輯:

1. 爬取資料

2. 構造陣列:日期,部落格訪問量,flag訪問量

3. 儲存資料到檔案

4. 儲存pickle檔案

5. 生成訪問折線圖

爬取程式碼如下所示,詳細程式碼見開源訪問:https://github.com/zpfbuaa/blogVisitors

# -*- coding: utf-8 -*-
# @Time    : 2018/5/25 下午1:15
# @Author  : 伊甸一點
# @FileName: getHtml.py # @Software: PyCharm # @Blog : http://zpfbuaa.github.io import requests import re import time import os date_pt = re.compile('<font face=arial size=-1>(\w+ \d+, \d+)') visitors_pt = re.compile('<font face=arial size=2>(\w+)</td><td>') flagViews_pt
= re.compile('<font face=arial size=2>(\S+)</font></td></tr>') def getTotalBlog(url, pages): date = [] visitors = [] flagViews = [] for page in range(1, pages+1): newUrl = url + str(page) print(newUrl) html = requests.get(newUrl).text item_date
= date_pt.findall(html) item_visitors = visitors_pt.findall(html) item_flagViews = flagViews_pt.findall(html) date.extend(item_date) visitors.extend(item_visitors) flagViews.extend(item_flagViews) return date, visitors, flagViews def change_data(date, visitors, flagViews): print(len(visitors)) print(len(flagViews)) for i in range(0, len(date)): str_visitor = str(visitors[i]) str_flagViews = str(flagViews[i]) if (str_visitor.find(',') != -1): v_split = str_visitor.split(',') visitors[i] = int(v_split[0]) * 1000 + int(v_split[1]) else: visitors[i] = int(str_visitor) if (str_flagViews.find(',') != -1): f_split = str_flagViews.split(',') flagViews[i] = int(f_split[0]) * 1000 + int(f_split[1]) else: flagViews[i] = int(str_flagViews) return date, visitors, flagViews def printData(date, visitors, flagViews): print('Date Visitors Flag Counter Views') for i in range(0, len(date)): print(date[i],visitors[i],flagViews[i]) def writeToFile(date, visitors, flagViews, data_root='data/'): today = time.strftime('%Y%m%d', time.localtime(time.time())) data_file = data_root+'blog_'+str(today) f = open(data_file,'w+') header = 'Date\tVisitors\tFlag Counter Views'+'\n' f.write(header) for i in range(0, len(date)): line = date[i]+'\t'+str(visitors[i])+'\t'+str(flagViews[i])+'\n' f.write(line) f.close() return 1 url = 'http://s04.flagcounter.com/more7/XTPq/' pages = 23 date, visitors, flagViews = getTotalBlog(url, pages) # printData(date, visitors, flagViews) date, visitors, flagViews = change_data(date, visitors, flagViews) # printData(date, visitors, flagViews) flag = writeToFile(date, visitors, flagViews) print('Data Prepare Done!')

 以下為截止到當前2019年01月12日的訪問量折線圖

訪問量折線圖

訪問入口flag統計圖

flag訪問量

兩者diff差值

訪問量差值折線圖