1. 程式人生 > >從Nginx的access日誌統計PV、UV和熱點資源

從Nginx的access日誌統計PV、UV和熱點資源

port 需求 lba jquery 實現 有用 控制臺 pen for

需求:

在阿裏雲-CDN管理控制臺的監控頁面裏,有對PV、UV和熱點資源的統計。於是自己也寫了腳本來獲取相關數據。


分析:

PV:指網站的訪問請求數。包含同一來源IP的多次請求。

UV:值網站的獨立訪客數。同一來源IP的多次請求只計算一次。


來看一條Nginx的access日誌信息:

# head -1 access.log 
192.165.158.238 - - 2017-03-06T20:47:04+08:00 "GET http://download.helloworld.com/ HTTP/1.1" 200 851 425 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" "-" 0.000 -

以空格為分隔符的第一列,代表客戶端的來源IP;第六列代表了用戶請求的資源。


Linux Shell實現:

# awk 'END{print "PV is:",NR}' access.log 
PV is: 1881955
# awk '{s[$1]+=1} END{for(i in s){sum+=1}} END{print "UV is:",sum}' access.log 
UV is: 64953
# awk '{s[$6]+=1} END{for(i in s){print s[i],i}}' access.log  | sort -rn | head -10
# 只打印出訪問次數最多的10條記錄
92838 http://download.helloworld.com/hello/hello
88873 http://download.helloworld.com/world/hi/
57711 http://appy.helloworld.com/world/js/jquery-1.10.1.min.js
46980 http://download.helloworld.com/favicon.ico
38759 http://appy.helloworld.com/world/css/style.css?t=00001
38684 http://appy.helloworld.com/world/css/base.css
35404 http://appy.helloworld.com/favicon.ico
34907 http://download.helloworld.com/world/js/jquery-1.10.1.min.js
34882 http://appy.helloworld.com/world/img/hi.jpg
34445 http://download.helloworld.com/world/css/base.css


Python實現:

# cat count.py 
from __future__ import print_function
from collections import Counter

ips = []                       #定義存儲客戶端來源IP的列表
hot_resources = Counter()      #用計數器來統計資源的訪問情況
with open('access.log', 'r') as fin:
    for line in fin:
        ip = line.split()[0]
        if ip:
            ips.append(ip)
        resource = line.split()[5]
        if resource:
            hot_resources[resource] += 1

print("PV is: {0:d}".format(len(ips)))
print("UV is: {0:d}".format(len(set(ips))))

for key, val in hot_resources.most_common(10):          #計數器提供了most_common,可以輸出最大的10條記錄
    print(val, key)
    
# python count.py 
PV is: 1881955
UV is: 64953
92838 http://download.helloworld.com/hello/hello
88873 http://download.helloworld.com/world/hi/
57711 http://appy.helloworld.com/world/js/jquery-1.10.1.min.js
46980 http://download.helloworld.com/favicon.ico
38759 http://appy.helloworld.com/world/css/style.css?t=00001
38684 http://appy.helloworld.com/world/css/base.css
35404 http://appy.helloworld.com/favicon.ico
34907 http://download.helloworld.com/world/js/jquery-1.10.1.min.js
34882 http://appy.helloworld.com/world/img/hi.jpg
34445 http://download.helloworld.com/world/css/base.css



PS:

現階段正在自學Python,但是線上業務並沒有用到。只好將Shell實現的功能,用Python再實現一遍,以做練習。


從Nginx的access日誌統計PV、UV和熱點資源