openresty+lua+kafka實現日誌收集
本文轉自:https://www.cnblogs.com/gxyandwmm/p/11298912.html
********************* 部署過程 **************************
一:場景描述
對於線上大流量服務或者需要上報日誌的nginx服務,每天會產生大量的日誌,這些日誌非常有價值。可用於計數上報、使用者行為分析、介面質量、效能監控等需求。但傳統nginx記錄日誌的方式資料會散落在各自nginx上,而且大流量日誌本身對磁碟也是一種衝擊。
我們需要把這部分nginx日誌統一收集彙總起來,收集過程和結果需要滿足如下需求:
支援不同業務獲取資料,如監控業務,資料分析統計業務,推薦業務等。
高效能保證
二:技術方案
得益於openresty和kafka的高效能,我們可以非常輕量高效的實現當前需求,架構如下:
方案描述:
1:線上請求打向nginx後,使用lua完成日誌整理:如統一日誌格式,過濾無效請求,分組等。
2:根據不同業務的nginx日誌,劃分不同的topic。
3:lua實現producter非同步傳送到kafka叢集。
4:對不同日誌感興趣的業務組實時消費獲取日誌資料。
三:相關技術
openresty:http://openresty.org
kafka:http://kafka.apache.org
lua-resty-kafka:https://github.com/doujiang24/lua-resty-kafka
四:安裝配置
為了簡單直接,我們採用單機形式配置部署,叢集情況類似。
1)準備openresty依賴:
Java程式碼 收藏程式碼
apt-get install libreadline-dev libncurses5-dev libpcre3-dev libssl-dev perl make build-essential
# 或者
yum install readline-devel pcre-devel openssl-devel gcc
2)安裝編譯openresty:
Java程式碼 收藏程式碼
#1:安裝openresty:
cd /opt/nginx/ # 安裝檔案所在目錄
wgethttps://openresty.org/download/openresty-1.9.7.4.tar.gz
tar -xzf openresty-1.9.7.4.tar.gz /opt/nginx/
#配置:
# 指定目錄為/opt/openresty,預設在/usr/local。
./configure –prefix=/opt/openresty \
–with-luajit \
–without-http_redis2_module \
–with-http_iconv_module
make
make install
3)安裝lua-resty-kafka
Java程式碼 收藏程式碼
#下載lua-resty-kafka:
wgethttps://github.com/doujiang24/lua-resty-kafka/archive/master.zip
unzip lua-resty-kafka-master.zip -d /opt/nginx/
#拷貝lua-resty-kafka到openresty
mkdir /opt/openresty/lualib/kafka
cp -rf /opt/nginx/lua-resty-kafka-master/lib/resty /opt/openresty/lualib/kafka/
4):安裝單機kafka
Java程式碼 收藏程式碼
cd /opt/nginx/
wgethttp://apache.fayea.com/kafka/0.9.0.1/kafka_2.10-0.9.0.1.tgz
tar xvf kafka_2.10-0.9.0.1.tgz
# 開啟單機zookeeper
nohup sh bin/zookeeper-server-start.sh config/zookeeper.properties > ./zk.log 2>&1 &
**# 繫結broker ip,必須繫結
**#在config/servier.properties下修改host.name
host.name={your_server_ip}
# 啟動kafka服務
nohup sh bin/kafka-server-start.sh config/server.properties > ./server.log 2>&1 &
# 建立測試topic
sh bin/kafka-topics.sh –zookeeper localhost:2181 –create –topic test1 –partitions 1 –replication-factor 1
五:配置執行
開發編輯/opt/openresty/nginx/conf/nginx.conf 實現kafka記錄nginx日誌功能,原始碼如下:
Java程式碼 收藏程式碼
worker_processes 12;
events {
use epoll;
worker_connections 65535;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 0;
gzip on;
gzip_min_length 1k;
gzip_buffers 4 8k;
gzip_http_version 1.1;
gzip_types text/plain application/x-javascript text/css application/xml application/X-JSON;
charset UTF-8;
# 配置後端代理服務
upstream rc{
server 10.10.*.15:8080 weight=5 max_fails=3;
server 10.10.*.16:8080 weight=5 max_fails=3;
server 10.16.*.54:8080 weight=5 max_fails=3;
server 10.16.*.55:8080 weight=5 max_fails=3;
server 10.10.*.113:8080 weight=5 max_fails=3;
server 10.10.*.137:8080 weight=6 max_fails=3;
server 10.10.*.138:8080 weight=6 max_fails=3;
server 10.10.*.33:8080 weight=4 max_fails=3;
# 最大長連數
keepalive 32;
}
# 配置lua依賴庫地址
lua_package_path “/opt/openresty/lualib/kafka/?.lua;;”;
server {
listen 80;
server_name localhost;
location /favicon.ico {
root html;
index index.html index.htm;
}
location / {
proxy_connect_timeout 8;
proxy_send_timeout 8;
proxy_read_timeout 8;
proxy_buffer_size 4k;
proxy_buffers 512 8k;
proxy_busy_buffers_size 8k;
proxy_temp_file_write_size 64k;
proxy_next_upstream http_500 http_502 http_503 http_504 error timeout invalid_header;
root html;
index index.html index.htm;
proxy_pass http://rc;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# 使用log_by_lua 包含lua程式碼,因為log_by_lua指令執行在請求最後且不影響proxy_pass機制
log_by_lua '
-- 引入lua所有api
local cjson = require "cjson"
local producer = require "resty.kafka.producer"
-- 定義kafka broker地址,ip需要和kafka的host.name配置一致
local broker_list = {
{ host = "10.10.78.52", port = 9092 },
}
-- 定義json便於日誌資料整理收集
local log_json = {}
log_json["uri"]=ngx.var.uri
log_json["args"]=ngx.var.args
log_json["host"]=ngx.var.host
log_json["request_body"]=ngx.var.request_body
log_json["remote_addr"] = ngx.var.remote_addr
log_json["remote_user"] = ngx.var.remote_user
log_json["time_local"] = ngx.var.time_local
log_json["status"] = ngx.var.status
log_json["body_bytes_sent"] = ngx.var.body_bytes_sent
log_json["http_referer"] = ngx.var.http_referer
log_json["http_user_agent"] = ngx.var.http_user_agent
log_json["http_x_forwarded_for"] = ngx.var.http_x_forwarded_for
log_json["upstream_response_time"] = ngx.var.upstream_response_time
log_json["request_time"] = ngx.var.request_time
-- 轉換json為字串
local message = cjson.encode(log_json);
-- 定義kafka非同步生產者
local bp = producer:new(broker_list, { producer_type = "async" })
-- 傳送日誌訊息,send第二個引數key,用於kafka路由控制:
-- key為nill(空)時,一段時間向同一partition寫入資料
-- 指定key,按照key的hash寫入到對應的partition
local ok, err = bp:send("test1", nil, message)
if not ok then
ngx.log(ngx.ERR, "kafka send err:", err)
return
end
';
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
六:檢測&執行
Java程式碼 收藏程式碼
檢測配置,只檢測nginx配置是否正確,lua錯誤日誌在nginx的error.log檔案中
./nginx -t /opt/openresty/nginx/conf/nginx.conf
# 啟動
./nginx -c /opt/openresty/nginx/conf/nginx.conf
# 重啟
./nginx -s reload
七:測試
1:使用任意http請求傳送給當前nginx,如:
引用
2:檢視upstream代理是否工作正常
3:檢視kafka 日誌對應的topic是否產生訊息日誌,如下:
引用
# 從頭消費topic資料命令
sh kafka-console-consumer.sh –zookeeper 10.10.78.52:2181 –topic test1 –from-beginning
效果監測:
4:ab壓力測試
引用
#單nginx+upstream測試:
ab -n 10000 -c 100 -khttp://10.10.34.15/m/personal/AC8E3BC7-6130-447B-A9D6-DF11CB74C3EF/rc/[email protected]&page=2&size=10
#結果
Server Software: nginx
Server Hostname: 10.10.34.15
Server Port: 80
Document Path: /m/personal/AC8E3BC7-6130-447B-A9D6-DF11CB74C3EF/rc/[email protected]
Document Length: 13810 bytes
Concurrency Level: 100
Time taken for tests: 2.148996 seconds
Complete requests: 10000
Failed requests: 9982
(Connect: 0, Length: 9982, Exceptions: 0)
Write errors: 0
Keep-Alive requests: 0
Total transferred: 227090611 bytes
HTML transferred: 225500642 bytes
Requests per second: 4653.34 [#/sec] (mean)
Time per request: 21.490 [ms] (mean)
Time per request: 0.215 [ms] (mean, across all concurrent requests)
Transfer rate: 103196.10 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 2
Processing: 5 20 23.6 16 701
Waiting: 4 17 20.8 13 686
Total: 5 20 23.6 16 701
Percentage of the requests served within a certain time (ms)
50% 16
66% 20
75% 22
80% 25
90% 33
95% 41
98% 48
99% 69
100% 701 (longest request)
引用
#單nginx+upstream+log_lua_kafka接入測試:
ab -n 10000 -c 100 -khttp://10.10.78.52/m/personal/AC8E3BC7-6130-447B-A9D6-DF11CB74C3EF/rc/[email protected]&page=2&size=10
#結果
Server Software: openresty/1.9.7.4
Server Hostname: 10.10.78.52
Server Port: 80
Document Path: /m/personal/AC8E3BC7-6130-447B-A9D6-DF11CB74C3EF/rc/[email protected]
Document Length: 34396 bytes
Concurrency Level: 100
Time taken for tests: 2.234785 seconds
Complete requests: 10000
Failed requests: 9981
(Connect: 0, Length: 9981, Exceptions: 0)
Write errors: 0
Keep-Alive requests: 0
Total transferred: 229781343 bytes
HTML transferred: 228071374 bytes
Requests per second: 4474.70 [#/sec] (mean)
Time per request: 22.348 [ms] (mean)
Time per request: 0.223 [ms] (mean, across all concurrent requests)
Transfer rate: 100410.10 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 3
Processing: 6 20 27.6 17 1504
Waiting: 5 15 12.0 14 237
Total: 6 20 27.6 17 1504
Percentage of the requests served within a certain time (ms)
50% 17
66% 19
75% 21
80% 23
90% 28
95% 34
98% 46
99% 67
100% 1004 (longest request)
********************* 最重要的模組 **************************
nginx配置檔案配置如下:
#user nobody;
worker_processes 1;
#error_log logs/error.log;
#error_log logs/error.log notice;
#error_log logs/error.log info;
#pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
#log_format main '$remote_addr - $remote_user [$time_local] "$request" '
# '$status $body_bytes_sent "$http_referer" '
# '"$http_user_agent" "$http_x_forwarded_for"';
#access_log logs/access.log main;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
upstream myServer {
server 192.168.0.109:8080 weight=1;
}
lua_package_path "/opt/openresty/lualib/kafka/?.lua;;";
lua_need_request_body on;
server {
listen 80;
server_name localhost;
#charset koi8-r;
#access_log logs/host.access.log main;
location /test1 {
# 請求轉向自定義的伺服器列表
proxy_pass http://myServer;
}
location /test2 {
# 使用log_by_lua 包含lua程式碼,因為log_by_lua指令執行在請求最後且不影響proxy_pass機制
log_by_lua '
-- 引入lua所有api
local topic = "test"
local cjson = require "cjson"
local producer = require "resty.kafka.producer"
-- 定義kafka broker地址,ip需要和kafka的host.name配置一致
local broker_list = {
{ host = "192.168.0.109", port = 9092 },
{ host = "192.168.0.110", port = 9092 },
{ host = "192.168.0.101", port = 9092 }
}
-- 定義json便於日誌資料整理收集
local log_json = {}
log_json["uri"]=ngx.var.uri
log_json["args"]=ngx.req.get_uri_args()
log_json["host"]=ngx.var.host
log_json["request_body"]=ngx.var.request_body
log_json["remote_addr"] = ngx.var.remote_addr
log_json["remote_user"] = ngx.var.remote_user
log_json["time_local"] = ngx.var.time_local
log_json["status"] = ngx.var.status
log_json["body_bytes_sent"] = ngx.var.body_bytes_sent
log_json["http_referer"] = ngx.var.http_referer
log_json["http_user_agent"] = ngx.var.http_user_agent
log_json["http_x_forwarded_for"] = ngx.var.http_x_forwarded_for
log_json["upstream_response_time"] = ngx.var.upstream_response_time
log_json["request_time"] = ngx.var.request_time
-- 轉換json為字串
local message = cjson.encode(ngx.req.get_uri_args());
-- 定義kafka非同步生產者
local bp = producer:new(broker_list, { producer_type = "async" })
-- 傳送日誌訊息,send第二個引數key,用於kafka路由控制:
-- key為nill(空)時,一段時間向同一partition寫入資料
-- 指定key,按照key的hash寫入到對應的partition
local ok, err = bp:send(topic, nil, message)
if not ok then
ngx.log(ngx.ERR, "kafka send err:", err)
return
end
';
}
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
# proxy the PHP scripts to Apache listening on 127.0.0.1:80
#
#location ~ \.php$ {
# proxy_pass http://127.0.0.1;
#}
# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
#
#location ~ \.php$ {
# root html;
# fastcgi_pass 127.0.0.1:9000;
# fastcgi_index index.php;
# fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name;
# include fastcgi_params;
#}
# deny access to .htaccess files, if Apache's document root
# concurs with nginx's one
#
#location ~ /\.ht {
# deny all;
#}
}
# another virtual host using mix of IP-, name-, and port-based configuration
#
#server {
# listen 8000;
# listen somename:8080;
# server_name somename alias another.alias;
# location / {
# root html;
# index index.html index.htm;
# }
#}
# HTTPS server
#
#server {
# listen 443 ssl;
# server_name localhost;
# ssl_certificate cert.pem;
# ssl_certificate_key cert.key;
# ssl_session_cache shared:SSL:1m;
# ssl_session_timeout 5m;
# ssl_ciphers HIGH:!aNULL:!MD5;
# ssl_prefer_server_ciphers on;
# location / {
# root html;
# index index.html index.htm;
# }
#}
}
********************* 遇到的坑 **************************
問題概述:
利用server1伺服器上的openresty nginx的lua指令碼往server5中kafka寫資料,發現報錯 無法解析主機(no resolver defined to resolve "xxxxx"),xxxxx是某臺機器的域名,再後來,經過一天的摸索,發現了問題。
問題原因:
最終發現,原來是openResty不會去解析 host 對映,因為kafka客戶端用IP連線後會請求broker,然後去到zookeeper拿到broker叢集資訊(地址記錄是 kafka236:1111),這時候lua消費者拿到的是 kafka236 的IP,
但是又不會通過 host去解析,就會報錯無法解析主機的問題。
解決方案
如果存在路由器DNS解析服務,直接在DNS配置個域名解析,再nginx配置裡面指向這個DNS伺服器即可(沒有的話需要自己搭建DNS服務)
nginx.conf配置:
DNS配置:
備註說明:
1、如果kafka服務端配置成IP或者域名,在kafka服務端的本機kafka客戶端是無法用localhost連線的(除非服務端也用localhost)
2、如果kafka服務端Listen配置成IP,那麼在zookeeper記錄的是IP地址
如果kafka服務端Listen配置成域名,那麼在zookeeper記錄的是域名
如果kafka服務端有advertised.listeners配置成域名,那麼zookeeper會記錄成域名,不管Listen配置成什麼
後來發現
低版本的openresty-1.7.10.2 , 在kafka中配置域名或者IP,都可以訪問
高版本的openresty-1.13.6.2 , 在kafka中配置域名無法訪問,只能是IP,配置resolver也不行。