從Nginx的access日誌統計PV、UV和熱點資源
阿新 • • 發佈:2018-01-09
port 需求 lba jquery 實現 有用 控制臺 pen for 需求:
在阿裏雲-CDN管理控制臺的監控頁面裏,有對PV、UV和熱點資源的統計。於是自己也寫了腳本來獲取相關數據。
分析:
PV:指網站的訪問請求數。包含同一來源IP的多次請求。
UV:值網站的獨立訪客數。同一來源IP的多次請求只計算一次。
來看一條Nginx的access日誌信息:
# head -1 access.log 192.165.158.238 - - 2017-03-06T20:47:04+08:00 "GET http://download.helloworld.com/ HTTP/1.1" 200 851 425 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" "-" 0.000 -
以空格為分隔符的第一列,代表客戶端的來源IP;第六列代表了用戶請求的資源。
Linux Shell實現:
# awk 'END{print "PV is:",NR}' access.log PV is: 1881955 # awk '{s[$1]+=1} END{for(i in s){sum+=1}} END{print "UV is:",sum}' access.log UV is: 64953 # awk '{s[$6]+=1} END{for(i in s){print s[i],i}}' access.log | sort -rn | head -10 # 只打印出訪問次數最多的10條記錄 92838 http://download.helloworld.com/hello/hello 88873 http://download.helloworld.com/world/hi/ 57711 http://appy.helloworld.com/world/js/jquery-1.10.1.min.js 46980 http://download.helloworld.com/favicon.ico 38759 http://appy.helloworld.com/world/css/style.css?t=00001 38684 http://appy.helloworld.com/world/css/base.css 35404 http://appy.helloworld.com/favicon.ico 34907 http://download.helloworld.com/world/js/jquery-1.10.1.min.js 34882 http://appy.helloworld.com/world/img/hi.jpg 34445 http://download.helloworld.com/world/css/base.css
Python實現:
# cat count.py from __future__ import print_function from collections import Counter ips = [] #定義存儲客戶端來源IP的列表 hot_resources = Counter() #用計數器來統計資源的訪問情況 with open('access.log', 'r') as fin: for line in fin: ip = line.split()[0] if ip: ips.append(ip) resource = line.split()[5] if resource: hot_resources[resource] += 1 print("PV is: {0:d}".format(len(ips))) print("UV is: {0:d}".format(len(set(ips)))) for key, val in hot_resources.most_common(10): #計數器提供了most_common,可以輸出最大的10條記錄 print(val, key) # python count.py PV is: 1881955 UV is: 64953 92838 http://download.helloworld.com/hello/hello 88873 http://download.helloworld.com/world/hi/ 57711 http://appy.helloworld.com/world/js/jquery-1.10.1.min.js 46980 http://download.helloworld.com/favicon.ico 38759 http://appy.helloworld.com/world/css/style.css?t=00001 38684 http://appy.helloworld.com/world/css/base.css 35404 http://appy.helloworld.com/favicon.ico 34907 http://download.helloworld.com/world/js/jquery-1.10.1.min.js 34882 http://appy.helloworld.com/world/img/hi.jpg 34445 http://download.helloworld.com/world/css/base.css
PS:
現階段正在自學Python,但是線上業務並沒有用到。只好將Shell實現的功能,用Python再實現一遍,以做練習。
從Nginx的access日誌統計PV、UV和熱點資源