1. 程式人生 > >使用go-mysql-elasticsearch開源工具同步資料到ES

使用go-mysql-elasticsearch開源工具同步資料到ES

使用go-mysql-elasticsearch開源工具同步資料到ES

go-mysql-elasticsearch是用於同步mysql資料到ES叢集的一個開源工具,
專案github地址:https://github.com/siddontang/go-mysql-elasticsearch

go-mysql-elasticsearch的基本原理是:如果是第一次啟動該程式,首先使用mysqldump工具對源mysql資料庫進行一次全量同步,通過elasticsearch client執行操作寫入資料到ES;然後實現了一個mysql client,作為slave連線到源mysql,源mysql作為master會將所有資料的更新操作通過binlog event同步給slave, 通過解析binlog event就可以獲取到資料的更新內容,之後寫入到ES.
另外,該工具還提供了操作統計的功能,每當有資料增刪改操作時,會將對應操作的計數加1,程式啟動時會開啟一個http服務,通過呼叫http介面可以檢視增刪改操作的次數。
使用事項:
1. mysql binlog必須是ROW模式
    2. 要同步的mysql資料表必須包含主鍵,否則直接忽略,這是因為如果資料表沒有主鍵,UPDATE和DELETE操作就會因為在ES中找不到對應的document而無法進行同步
    3. 不支援程式執行過程中修改表結構
    4. 要賦予用於連線mysql的賬戶RELOAD許可權以及REPLICATION許可權, SUPER許可權:
       GRANT REPLICATION SLAVE ON *.* TO 'elastic'@'172.16.32.44';
       GRANT RELOAD ON *.* TO 'elastic'@'172.16.32.44';
       UPDATE mysql.user SET Super_Priv='Y' WHERE user='elastic' AND host='172.16.32.44';

使用方法

git clone https://github.com/siddontang/go-mysql-elasticsearch
cd go-mysql-elasticsearch/src/github.com/siddontang/go-mysql-elasticsearch
vi etc/river.toml, 修改配置檔案,同步172.16.0.101:3306資料庫中的webservice.building表到ES叢集172.16.32.64:9200的building index(更詳細的配置檔案說明可以參考專案文件)
# MySQL address, user and password
    # user must have replication privilege in MySQL.
my_addr = "172.16.0.101:3306" my_user = "bellen" my_pass = "Elastic_123" my_charset = "utf8" # Set true when elasticsearch use https #es_https = false # Elasticsearch address es_addr = "172.16.32.64:9200" # Elasticsearch user and password, maybe set by shield, nginx, or x-pack es_user = "" es_pass = "" # Path to store data, like master.info, if not set or empty, # we must use this to support breakpoint resume syncing. # TODO: support other storage, like etcd. data_dir = "./var" # Inner Http status address stat_addr = "127.0.0.1:12800" # pseudo server id like a slave server_id = 1001 # mysql or mariadb flavor = "mariadb" # mysqldump execution path # if not set or empty, ignore mysqldump. mysqldump = "mysqldump" # if we have no privilege to use mysqldump with --master-data, # we must skip it. #skip_master_data = false # minimal items to be inserted in one bulk bulk_size = 128 # force flush the pending requests if we don't have enough items >= bulk_size flush_bulk_time = "200ms" # Ignore table without primary key skip_no_pk_table = false # MySQL data source [[source]] schema = "webservice" tables = ["building"] [[rule]] schema = "webservice" table = "building" index = "building" type = "buildingtype"
在ES叢集中建立building index, 因為該工具並沒有使用ES的auto create index功能,如果index不存在會報錯

執行命令:./bin/go-mysql-elasticsearch -config=./etc/river.toml