基於ELK搭建MySQL日誌平臺的要點和常見錯誤
第一部分 概括
ELK是集分散式資料儲存、視覺化查詢和日誌解析於一體的日誌分析平臺。ELK=elasticsearch+Logstash+kibana,三者各司其職,相互配合,共同完成日誌的資料處理工作。ELK各元件的主要功能如下:
- elasticsearch,資料儲存以及全文檢索;
- logstash,日誌加工、“搬運工”;
- kibana:資料視覺化展示和運維管理。
我們在搭建平臺時,還藉助了filebeat外掛。Filebeat是本地檔案的日誌資料採集器,可監控日誌目錄或特定日誌檔案(tail file),並可將資料轉發給Elasticsearch或Logstatsh等。
本案例的實踐,主要通過ELK收集、管理、檢索mysql例項的慢查詢日誌和錯誤日誌。
簡單的資料流程圖如下:
第二部分 elasticsearch
2.1 ES特點和優勢
- 分散式實時檔案儲存,可將每一個欄位存入索引,使其可以被檢索到。
- 實時分析的分散式搜尋引擎。分散式:索引分拆成多個分片,每個分片可有零個或多個副本;負載再平衡和路由在大多數情況下自動完成。
- 可以擴充套件到上百臺伺服器,處理PB級別的結構化或非結構化資料。也可以執行在單臺PC上。
- 支援外掛機制,分詞外掛、同步外掛、Hadoop外掛、視覺化外掛等。
2.2 ES主要概念
ES資料庫 | MySQL資料庫 |
Index | Database |
Tpye[在7.0之後type為固定值_doc] | Table |
Document | Row |
Field | Column |
Mapping | Schema |
Everything is indexed | Index |
Query DSL[Descriptor structure language] | SQL |
GET http://... | Select * from table … |
PUT http://... | Update table set … |
- 關係型資料庫中的資料庫(DataBase),等價於ES中的索引(Index);
- 一個關係型資料庫有N張表(Table),等價於1個索引Index下面有N多型別(Type);
- 一個數據庫表(Table)下的資料由多行(ROW)多列(column,屬性)組成,等價於1個Type由多個文件(Document)和多Field組成;
- 在關係型資料庫裡,schema定義了表、每個表的欄位,還有表和欄位之間的關係。 與之對應的,在ES中:Mapping定義索引下的Type的欄位處理規則,即索引如何建立、索引型別、是否儲存原始索引JSON文件、是否壓縮原始JSON文件、是否需要分詞處理、如何進行分詞處理等;
- 關係型資料庫中的增insert、刪delete、改update、查search操作等價於ES中的增PUT/POST、刪Delete、改_update、查GET.
2.3 執行許可權問題
報錯提示
[usernimei@testes01 bin]$ Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: java.nio.file.AccessDeniedException: /data/elasticsearch/elasticsearch-7.4.2/config/elasticsearch.keystore Likely root cause: java.nio.file.AccessDeniedException: /data/elasticsearch/elasticsearch-7.4.2/config/elasticsearch.keystore at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) at java.base/java.nio.file.Files.newByteChannel(Files.java:374) at java.base/java.nio.file.Files.newByteChannel(Files.java:425) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77) at org.elasticsearch.common.settings.KeyStoreWrapper.load(KeyStoreWrapper.java:219) at org.elasticsearch.bootstrap.Bootstrap.loadSecureSettings(Bootstrap.java:234) at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:305) at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125) at org.elasticsearch.cli.Command.main(Command.java:90) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) Refer to the log for complete error details
問題分析
第一次誤用了root賬號啟動,此時路徑下的elasticsearch.keystore 許可權屬於了root
-rw-rw---- 1 root root 199 Mar 24 17:36 elasticsearch.keystore
解決方案--切換到root使用者修改檔案elasticsearch.keystore許可權
調整到es使用者下,即
chown -R es使用者:es使用者組 elasticsearch.keystore
問題2.4 maximum shards open 問題
根據官方解釋,從Elasticsearch v7.0.0 開始,叢集中的每個節點預設限制 1000 個shard,如果你的es叢集有3個數據節點,那麼最多 3000 shards。這裡我們是隻有一臺es。所以只有1000。
[2019-05-11T11:05:24,650][WARN ][logstash.outputs.elasticsearch][main] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://qqelastic:[email protected]:55944/][Manticore::SocketTimeout] Read timed out {:url=>http://qqelastic:[email protected]:55944/, :error_message=>"Elasticsearch Unreachable: [http://qqelastic:[email protected]:55944/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"} [2019-05-11T11:05:24,754][ERROR][logstash.outputs.elasticsearch][main] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://qqelastic:[email protected]:55944/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2} [2019-05-11T11:05:25,158][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"http://qqelastic:[email protected]:55944/"} [2019-05-11T11:05:26,763][WARN ][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"mysql-error-testqq-2019.05.11", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x65416fce>], :response=>{"index"=>{"_index"=>"mysql-error-qqweixin-2020.05.11", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;"}}}}
可以用Kibana來設定
主要命令:
PUT /_cluster/settings { "transient": { "cluster": { "max_shards_per_node":10000 } } }
操作截圖如下:
注意事項:
建議設定後重啟下lostash服務
第三部分 Filebeat
問題3.1 不讀取log檔案中的資料
2019-03-23T19:24:41.772+0800 INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s
{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":30,"time":{"ms":2}},"total":{"ticks":80,"time":{"ms":4},"value":80},"user":{"ticks":50,"time":{"ms":2}}},"handles":{"limit":{"hard":1000000,"soft":1000000},"open":6},"info":{"ephemeral_id":"a4c61321-ad02-2c64-9624-49fe4356a4e9","uptime":{"ms":210031}},"memstats":{"gc_next":7265376,"memory_alloc":4652416,"memory_total":12084992},"runtime":{"goroutines":16}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":0,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0,"15":0.05,"5":0.01,"norm":{"1":0,"15":0.0125,"5":0.0025}}}}}}
修改 filebeat.yml 的配置引數
問題3.2 多個服務程序
2019-03-27T20:13:22.985+0800 ERROR logstash/async.go:256 Failed to publish events caused by: write tcp [::1]:48338->[::1]:5044: write: connection reset by peer 2019-03-27T20:13:23.985+0800 INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":130,"time":{"ms":11}},"total":{"ticks":280,"time":{"ms":20},"value":280},"user":{"ticks":150,"time":{"ms":9}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":7},"info":{"ephemeral_id":"a02ed909-a7a0-49ee-aff9-5fdab26ecf70","uptime":{"ms":150065}},"memstats":{"gc_next":10532480,"memory_alloc":7439504,"memory_total":19313416,"rss":806912},"runtime":{"goroutines":27}},"filebeat":{"events":{"active":1,"added":1},"harvester":{"open_files":1,"running":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"batches":1,"failed":1,"total":1},"write":{"errors":1}},"pipeline":{"clients":1,"events":{"active":1,"published":1,"total":1}}},"registrar":{"states":{"current":1}},"system":{"load":{"1":0.05,"15":0.11,"5":0.06,"norm":{"1":0.0063,"15":0.0138,"5":0.0075}}}}}} 2019-03-27T20:13:24.575+0800 ERROR pipeline/output.go:121 Failed to publish events: write tcp [::1]:48338->[::1]:5044: write: connection reset by peer
原因是同時有多個logstash程序在執行,關閉重啟
問題3.3 將Filebeat 配置成服務進行管理
filebeat 服務所在路徑:
/etc/systemd/system
編輯filebeat.service檔案
[Unit] Description=filebeat.service [Service] User=root ExecStart=/data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat -e -c /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat.yml [Install] WantedBy=multi-user.target
管理服務的相關命令
systemctl start filebeat #啟動filebeat服務 systemctl enable filebeat #設定開機自啟動 systemctl disable filebeat #停止開機自啟動 systemctl status filebeat #檢視服務當前狀態 systemctl restart filebeat #重新啟動服務 systemctl list-units --type=service #檢視所有已啟動的服務
問題3.4 Filebeat 服務啟動報錯
注意錯誤
Exiting: error loading config file: yaml: line 29: did not find expected key
主要問題是:filebeat.yml 檔案中的格式有破壞,應特別注意修改和新增的地方,對照前後文,驗證格式是否有變化。
問題 3.5 Linux 版本過低,無法以systemctl管理filebeat服務
此時我們可以以service來管理,在目錄init.d下建立一個filebeat.service檔案。主要指令碼如下:
#!/bin/bash agent="/data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat" args="-e -c /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat.yml" start() { pid=`ps -ef |grep /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'` if [ ! "$pid" ];then echo "Starting filebeat: " nohup $agent $args >/dev/null 2>&1 & if [ $? == '0' ];then echo "start filebeat ok" else echo "start filebeat failed" fi else echo "filebeat is still running!" exit fi } stop() { echo -n $"Stopping filebeat: " pid=`ps -ef |grep /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'` if [ ! "$pid" ];then echo "filebeat is not running" else kill $pid echo "stop filebeat ok" fi } restart() { stop start } status(){ pid=`ps -ef |grep /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'` if [ ! "$pid" ];then echo "filebeat is not running" else echo "filebeat is running" fi } case "$1" in start) start ;; stop) stop ;; restart) restart ;; status) status ;; *) echo $"Usage: $0 {start|stop|restart|status}" exit 1 esac
注意事項
1.檔案授予執行許可權
chmod 755 filebeat.service
2.設定開機自啟動
chkconfig --add filebeat.service
上面的服務新增自啟動時,會報錯
解決方案 在 service file的開頭新增以下 兩行
即修改完善後的程式碼如下:
#!/bin/bash # chkconfig: 2345 10 80 # description: filebeat is a tool for colletct log data agent="/data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat" args="-e -c /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat.yml" start() { pid=`ps -ef |grep /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'` if [ ! "$pid" ];then echo "Starting filebeat: " nohup $agent $args >/dev/null?2>&1 & if [ $? == '0' ];then echo "start filebeat ok" else echo "start filebeat failed" fi else echo "filebeat is still running!" exit fi } stop() { echo -n $"Stopping filebeat: " pid=`ps -ef |grep /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'` if [ ! "$pid" ];then echo "filebeat is not running" else kill $pid echo "stop filebeat ok" fi } restart() { stop start } status(){ pid=`ps -ef |grep /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'` if [ ! "$pid" ];then echo "filebeat is not running" else echo "filebeat is running" fi } case "$1" in start) start ;; stop) stop ;; restart) restart ;; status) status ;; *) echo $"Usage: $0 {start|stop|restart|status}" exit 1 esac
第四部分 Logstash
問題 4.1 服務化配置
logstash最常見的執行方式即命令列執行./bin/logstash -f logstash.conf啟動,結束命令是ctrl+c。這種方式的優點在於執行方便,缺點是不便於管理,同時如果遇到伺服器重啟,則維護成本會更高一些,如果在生產環境執行logstash推薦使用服務的方式。以服務的方式啟動logstash,同時藉助systemctl的特性實現開機自啟動。
(1)安裝目錄下的config中的startup.options需要修改
修改主要項:
1.服務預設啟動使用者和使用者組為logstash;可以修改為root;
2. LS_HOME 引數設定為 logstash的安裝目錄;例如:/data/logstash/logstash-7.6.0
3. LS_SETTINGS_DIR引數配置為含有logstash.yml的目錄;例如:/data/logstash/logstash-7.6.0/config
4. LS_OPTS 引數項,新增 logstash.conf 指定項(-f引數);例如:LS_OPTS="--path.settings ${LS_SETTINGS_DIR} -f /data/logstash/logstash-7.6.0/config/logstash.conf"
(2)以root身份執行logstash命令建立服務
建立服務的命令
安裝目錄/bin/system-install
執行建立命令後,在/etc/systemd/system/目錄中生成了logstash.service 檔案
(3)logstash 服務的管理
設定服務自啟動:systemctl enable logstash
啟動服務:systemctl start logstash
停止服務:systemctl stop logstash
重啟服務:systemctl restart logstash
檢視服務狀態:systemctl status logstash
問題 4.2 安裝logstash服務需先安裝jdk
報錯提示如下:
通過檢視jave版本,驗證是否已安裝
上圖說明沒有安裝。則將安裝包下載(或上傳)至本地,執行安裝
執行安裝命令如下:
yum localinstall jdk-8u211-linux-x64.rpm
安裝OK,執行驗證
問題 4.3 Linux 版本過低,安裝 logstash 服務失效
問題提示
檢視Linux系統版本
原因: centos 6.5 不支援 systemctl 管理服務
解決方案
方案驗證
相關命令
1.啟動命令 initctl start logstash 2.檢視狀態 initctl status logstash
注意事項:
注意以下生成服務的命令還是要執行的
./system-install
否則提示錯誤
initctl: Unknown job: logstash
問題 4.4 配置檔案中定義的index name 命名需小寫
"Invalid index name [mysql-error-Test-2019.05.13], must be lowercase", "index_uuid"=>"_na_", "index"=>"mysql-error-Test-2019.05.13"}}}} May 13 13:36:33 hzvm1996 logstash[123194]: [2019-05-13T13:36:33,907][ERROR][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"mysql-slow-Test-2020.05.13", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1f0aedbc>], :response=>{"index"=>{"_index"=>"mysql-slow-Test-2019.05.13", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"invalid_index_name_exception", "reason"=>"Invalid index name [mysql-slow-Test-2019.05.13], must be lowercase", "index_uuid"=>"_na_", "index"=>"mysql-slow-Test-2019.05.13"}}}} May 13 13:38:50 hzvm1996 logstash[123194]: [2019-05-13T13:38:50,765][ERROR][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"mysql-error-Test-2020.05.13", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x4bdce1db>], :response=>{"index"=>{"_index"=>"mysql-error-Test-2019.05.13", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"invalid_index_name_exception", "reason"=>"Invalid index name [mysql-error-Test-2019.05.13], must be lowercase", "index_uuid"=>"_na_", "index"=>"mysql-error-Test-2019.05.13"}}}}
第五部分 kibana
問題5.1 開啟密碼認證
[root@testkibaba bin]# ./kibana-plugin install x-pack Plugin installation was unsuccessful due to error "Kibana now contains X-Pack by default, there is no longer any need to install it as it is already present.
說明:新版本的Elasticsearch和Kibana都已經支援自帶支援x-pack了,不需要進行顯式安裝。老版本的需要進行安裝。
問題5.2 應用啟動報錯
[root@testkibana bin]# ./kibana
報錯
Kibana should not be run as root. Use --allow-root to continue.
添加個專門的賬號
useradd qqweixinkibaba --新增賬號 chown -R qqweixinkibaba:hzdbakibaba kibana-7.4.2-linux-x86_64 --為新增賬號賦予文件目錄的許可權 su qqweixinkibaba ---切換賬號,讓後再啟動
問題5.3 登入kibana報錯
{"statusCode":403,"error":"Forbidden","message":"Forbidden"}
報錯原因是:用kibana賬號登入kibana報錯,改為elastic使用者就行了
問題5.4 多租戶實現的問題
一個公司會有多個業務線,也可能會有多個研發小組,那麼如何實現收集到的資料只對相應的團隊開放呢?即實現只能看到自家的資料。一種思路就是搭建多個ELK,一個業務線一個ELK,但這個方法會導致資源浪費和增加運維工作量;另一種思路就是通過多租戶來實現。
實現時,應注意以下問題:
要在 elastic 賬號下,轉到指定的空間(space)下,再設定 index pattern 。
先建立role(注意與space關聯),最後建立user。
參考資料
1.https://www.jianshu.com/p/0a5acf831409 《ELK應用之Filebeat》
2.http://www.voidcn.com/article/p-nlietamt-zh.html 《filebeat 啟動指令碼》
3.https://www.bilibili.com/video/av68523257/?redirectFrom=h5 《ElasticTalk #22 Kibana 多租戶介紹與實戰》
4.https://www.cnblogs.com/shengyang17/p/10597841.html 《ES叢集》
5.https://www.jianshu.com/p/54cdddf89989 《Logstash配置以服務方式執行》
6.https://www.elastic.co/guide/en/logstash/current/running-logstash.html#running-logstash-upstart 《Running Logstash as a Service on Debian or RP