使用go-mysql-elasticsearch開源工具同步資料到ES
阿新 • • 發佈:2018-11-16
使用go-mysql-elasticsearch開源工具同步資料到ES
go-mysql-elasticsearch是用於同步mysql資料到ES叢集的一個開源工具,
專案github地址:https://github.com/siddontang/go-mysql-elasticsearch
go-mysql-elasticsearch的基本原理是:如果是第一次啟動該程式,首先使用mysqldump工具對源mysql資料庫進行一次全量同步,通過elasticsearch client執行操作寫入資料到ES;然後實現了一個mysql client,作為slave連線到源mysql,源mysql作為master會將所有資料的更新操作通過binlog event同步給slave, 通過解析binlog event就可以獲取到資料的更新內容,之後寫入到ES.
另外,該工具還提供了操作統計的功能,每當有資料增刪改操作時,會將對應操作的計數加1,程式啟動時會開啟一個http服務,通過呼叫http介面可以檢視增刪改操作的次數。
使用事項: 1. mysql binlog必須是ROW模式 2. 要同步的mysql資料表必須包含主鍵,否則直接忽略,這是因為如果資料表沒有主鍵,UPDATE和DELETE操作就會因為在ES中找不到對應的document而無法進行同步 3. 不支援程式執行過程中修改表結構 4. 要賦予用於連線mysql的賬戶RELOAD許可權以及REPLICATION許可權, SUPER許可權: GRANT REPLICATION SLAVE ON *.* TO 'elastic'@'172.16.32.44'; GRANT RELOAD ON *.* TO 'elastic'@'172.16.32.44'; UPDATE mysql.user SET Super_Priv='Y' WHERE user='elastic' AND host='172.16.32.44';
使用方法
git clone https://github.com/siddontang/go-mysql-elasticsearch
cd go-mysql-elasticsearch/src/github.com/siddontang/go-mysql-elasticsearch
vi etc/river.toml, 修改配置檔案,同步172.16.0.101:3306資料庫中的webservice.building表到ES叢集172.16.32.64:9200的building index(更詳細的配置檔案說明可以參考專案文件)
# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "172.16.0.101:3306"
my_user = "bellen"
my_pass = "Elastic_123"
my_charset = "utf8"
# Set true when elasticsearch use https
#es_https = false
# Elasticsearch address
es_addr = "172.16.32.64:9200"
# Elasticsearch user and password, maybe set by shield, nginx, or x-pack
es_user = ""
es_pass = ""
# Path to store data, like master.info, if not set or empty,
# we must use this to support breakpoint resume syncing.
# TODO: support other storage, like etcd.
data_dir = "./var"
# Inner Http status address
stat_addr = "127.0.0.1:12800"
# pseudo server id like a slave
server_id = 1001
# mysql or mariadb
flavor = "mariadb"
# mysqldump execution path
# if not set or empty, ignore mysqldump.
mysqldump = "mysqldump"
# if we have no privilege to use mysqldump with --master-data,
# we must skip it.
#skip_master_data = false
# minimal items to be inserted in one bulk
bulk_size = 128
# force flush the pending requests if we don't have enough items >= bulk_size
flush_bulk_time = "200ms"
# Ignore table without primary key
skip_no_pk_table = false
# MySQL data source
[[source]]
schema = "webservice"
tables = ["building"]
[[rule]]
schema = "webservice"
table = "building"
index = "building"
type = "buildingtype"
在ES叢集中建立building index, 因為該工具並沒有使用ES的auto create index功能,如果index不存在會報錯
執行命令:./bin/go-mysql-elasticsearch -config=./etc/river.toml