實時電商數倉(十)之資料採集(九)資料庫資料採集(四)Maxwell入門與安裝
1 Maxwell
maxwell是由美國zendesk開源,用java編寫的Mysql實時抓取軟體。其抓取的原理也是基於binlog。
1.1 工具對比
1Maxwell 沒有 Canal那種server+client模式,只有一個server把資料傳送到訊息佇列或redis。
2Maxwell有一個亮點功能,就是Canal只能抓取最新資料,對已存在的歷史資料沒有辦法處理。而Maxwell有一個bootstrap功能,可以直接引匯出完整的歷史資料用於初始化,非常好用。
3Maxwell不能直接支援HA,但是它支援斷點還原,即錯誤解決後重啟繼續上次點兒讀取資料。
4Maxwell只支援json格式,而
5Maxwell比Canal更加輕量級。
1.2 安裝Maxwell
解壓縮maxwell-1.25.0.tar.gz 到某個目錄下。
1.3 使用前準備工作
在資料庫中建立一個maxwell庫用於儲存Maxwell的元資料。
CREATE DATABASE maxwell ;
並且分配一個賬號可以操作該資料庫
GRANT ALL ON maxwell.* TO 'maxwell'@'%' IDENTIFIED BY '123123';
分配這個賬號可以監控其他資料庫的許可權
GRANT SELECT ,REPLICATIONSLAVE , REPLICATION CLIENT ON *.* TO maxwell@'%'
1.4 使用Maxwell監控抓取MySql資料
在任意位置建立maxwell.properties 檔案
producer=kafka kafka.bootstrap.servers=hadoop1:9092,hadoop2:9092,hadoop3:9092 kafka_topic=ODS_DB_GMALL2020_M host=hadoop2 user=maxwell password=123123 client_id=maxwell_1
啟動程式
/ext/maxwell-1.25.0/bin/maxwell --config /xxx/xxxx/maxwell.properties >/dev/null 2>&1 &
1.5 修改或插入mysql資料,並消費kafka進行觀察
/ext/kafka_2.11-1.0.0/bin/kafka-topics.sh --create --topic ODS_DB_GMALL2020_M --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --partitions 12 --replication-factor 1
執行測試語句
INSERT INTO z_user_info VALUES(30,'zhang3','13810001010'),(31,'li4','1389999999');
對比
canal |
maxwell |
{"data":[{"id":"30","user_name":"zhang3","tel":"13810001010"},{"id":"31","user_name":"li4","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385314000,"id":2,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385314116,"type":"INSERT"} |
{"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"xoffset":0,"data":{"id":30,"user_name":"zhang3","tel":"13810001010"}} {"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"commit":true,"data":{"id":31,"user_name":"li4","tel":"1389999999"}} |
執行update操作
UPDATE z_user_info SET user_name='wang55' WHERE id IN(30,31)
canal |
maxwell |
{"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385508000,"id":3,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":[{"user_name":"zhang3"},{"user_name":"li4"}],"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385508676,"type":"UPDATE"} |
{"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"},"old":{"user_name":"zhang3"}} {"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"},"old":{"user_name":"li4"}} |
delete操作
DELETE FROM z_user_info WHERE id IN(30,31)
canal |
maxwell |
{"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385644000,"id":4,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385644829,"type":"DELETE"} |
{"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"}} {"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"}} |
總結資料特點:
一 日誌結構
canal 每一條SQL會產生一條日誌,如果該條Sql影響了多行資料,則已經會通過集合的方式歸集在這條日誌中。(即使是一條資料也會是陣列結構)
maxwell 以影響的資料為單位產生日誌,即每影響一條資料就會產生一條日誌。如果想知道這些日誌是否是通過某一條sql產生的可以通過xid進行判斷,相同的xid的日誌來自同一sql。
二 數字型別
當原始資料是數字型別時,maxwell會尊重原始資料的型別不增加雙引,變為字串。
canal一律轉換為字串。
三 帶原始資料欄位定義
canal資料中會帶入表結構。maxwell更簡潔。