1. 程式人生 > >Python日誌產生器

Python日誌產生器

win 有一個 mozilla baidu linu lin count oca 狀態

Python日誌產生器

寫在前面
有的時候,可能就是我們做實時數據收集的時候,會有一個頭疼的問題就是,你會發現,你可能一下子,沒有日誌的數據源。所以,我們可以簡單使用python腳本來實現產生實時的數據,這樣就很方便了

在編寫代碼之前,我們得知道我們的webserver日誌到底長什麽樣,下面我找了一段的nginx服務器上真實日誌,作為樣例:

223.104.25.1 - - [21/Nov/2017:20:34:16 +0800] "GET / HTTP/1.1" 200 94 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0 Mobile/14G60 Safari/602.1" "-"
223.104.25.1 - - [21/Nov/2017:20:34:16 +0800] "GET / HTTP/1.1" 200 94 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0 Mobile/14G60 Safari/602.1" "-"
156.151.199.137 - - [21/Nov/2017:20:34:19 +0800] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36" "-"

從上面的服務器日誌中我們可以看見,主要的字段有:
1.訪問的ip地址156.151.199.137
2.訪問的時間/時區 [21/Nov/2017:20:34:19 +0800]
3.狀態碼,
4.useragent 信息等

接下來,我們就開始來開發模擬的日誌產生器

思路??
開發的pyhton日誌產生器中包括:請求的URL、ip、referer和狀態碼等信息。
實現,這裏直接貼上代碼python:

#coding=UTF-8

import random
import time

url_paths = [
    "class/154.html",
    "class/128.html",
    "class/147.html",
    "class/116.html",
    "class/138.html",
    "class/140.html",
    "learn/828",
    "learn/521",
    "course/list"
]

ip_slices = [127,156,222,105,24,192,153,127,31,168,32,10,82,77,118,228]

http_referers = [
    "http://www.baidu.com/s?wd={query}",
    "https://www.sogou.com/web?query={query}",
    "http://cn.bing.com/search?q={query}",
    "https://search.yahoo.com/search?p={query}",
]

search_keyword = [
    "Spark 項目實戰",
    "Hadoop 項目實戰",
    "Storm 項目實戰",
    "Spark Streaming實戰",
    "古詩詞鑒賞"
]

status_codes = ["200","404","500","503","403"]

def sample_url():
    return random.sample(url_paths, 1)[0]

def sample_ip():
    slice = random.sample(ip_slices , 4)
    return ".".join([str(item) for item in slice])

def sample_referer():
    if random.uniform(0, 1) > 0.2:
        return "-"

    refer_str = random.sample(http_referers, 1)
    query_str = random.sample(search_keyword, 1)
    return refer_str[0].format(query=query_str[0])

def sample_status_code():
    return random.sample(status_codes, 1)[0]

def generate_log(count = 10):
    time_str = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
    
    f = open("/home/hadoop/data/project/logs/access.log","w+")

    while count >= 1:
        query_log = "{ip}\t{local_time}\t\"GET /{url} HTTP/1.1\"\t{status_code}\t{referer}".format(url=sample_url(), ip=sample_ip(), referer=sample_referer(), status_code=sample_status_code(),local_time=time_str)

        f.write(query_log + "\n")

        count = count - 1 

if __name__ == ‘__main__‘:
    generate_log(10)



這樣我們就能夠實現日誌的產生,測試:

[hadoop@hadoop000 logs]$ more access.log 
105.228.77.82   2017-11-21 06:38:01 "GET /learn/828 HTTP/1.1"   200 -
31.10.153.77    2017-11-21 06:38:01 "GET /class/138.html HTTP/1.1"  200 -
77.156.153.105  2017-11-21 06:38:01 "GET /class/140.html HTTP/1.1"  503 http://www.bai
du.com/s?wd=Storm 項目實戰
222.32.228.77   2017-11-21 06:38:01 "GET /learn/521 HTTP/1.1"   404 https://www.so
gou.com/web?query=Spark 項目實戰
#產生的部分

數據可以產生了,接下來我們要實現數據的實時產生了,這裏就是需要使用到linux裏面的Crontab執行計劃了。相信學過linux的人,肯定會知道。我們編寫一個執行計劃就好。
推薦一個測試工具網站:
https://tool.lu/crontab

1)先寫一個執行計劃的執行腳本。new一個.sh文件:

[hadoop@hadoop000 project]$ vim log_generator.sh 
python /home/hadoop/data/project/generate_log.py

2)寫好之後,就可以寫我們的執行計劃了

[hadoop@hadoop000 project]$ crontab -e
* * * * * /home/hadoop/data/project/log_generator.sh

* * * * * sleep 10; /home/hadoop/data/project/log_generator.sh

* * * * * sleep 20; /home/hadoop/data/project/log_generator.sh

* * * * * sleep 30; /home/hadoop/data/project/log_generator.sh

* * * * * sleep 40; /home/hadoop/data/project/log_generator.sh

* * * * * sleep 50; /home/hadoop/data/project/log_generator.sh

這樣,我們的執行計劃就設計好了,我們這裏設計的是每10秒執行一次
,即每10秒產生十條日誌信息

驗證:

[hadoop@hadoop000 logs]$ tail -f access.log 
222.153.118.82  2017-11-21 06:45:01 "GET /class/147.html HTTP/1.1"  403 -
127.192.168.31  2017-11-21 06:45:01 "GET /class/138.html HTTP/1.1"  200 -
77.31.153.127   2017-11-21 06:45:01 "GET /class/116.html HTTP/1.1"  403 https://search.yahoo.com/search?p=Spark Streaming實戰
153.10.82.192   2017-11-21 06:45:01 "GET /class/147.html HTTP/1.1"  404 -
168.32.153.222  2017-11-21 06:45:01 "GET /learn/828 HTTP/1.1"   503 -
118.153.222.192 2017-11-21 06:45:01 "GET /class/128.html HTTP/1.1"  503 -
192.32.156.31   2017-11-21 06:45:01 "GET /class/147.html HTTP/1.1"  500 https://search.yahoo.com/search?p=Spark 項目實戰
127.192.82.228  2017-11-21 06:45:01 "GET /class/154.html HTTP/1.1"  403 -
118.31.222.105  2017-11-21 06:45:01 "GET /learn/521 HTTP/1.1"   503 -
127.127.168.228 2017-11-21 06:45:01 "GET /class/140.html HTTP/1.1"  200 -
tail: access.log: file truncated
228.10.153.192  2017-11-21 06:56:01 "GET /class/147.html HTTP/1.1"  500 -
10.168.156.31   2017-11-21 06:56:01 "GET /course/list HTTP/1.1" 403 -
192.153.222.77  2017-11-21 06:56:01 "GET /class/154.html HTTP/1.1"  200 -
153.32.105.82   2017-11-21 06:56:01 "GET /course/list HTTP/1.1" 500 http://www.baidu.com/s?wd=Spark 項目實戰

上面是部分截取,可以觀察到,每隔10秒就會產生日誌數據

接下來,我們就可以來使用這個日誌產生器來實時產生我們需要的日誌信息了。

Python日誌產生器