1. 程式人生 > 實用技巧 >實時電商數倉(十)之資料採集(九)資料庫資料採集(四)Maxwell入門與安裝

實時電商數倉(十)之資料採集(九)資料庫資料採集(四)Maxwell入門與安裝

1 Maxwell

maxwell是由美國zendesk開源,用java編寫的Mysql實時抓取軟體。其抓取的原理也是基於binlog

1.1 工具對比

1Maxwell 沒有 Canal那種server+client模式,只有一個server把資料傳送到訊息佇列或redis。

2Maxwell有一個亮點功能,就是Canal只能抓取最新資料,對已存在的歷史資料沒有辦法處理。而Maxwell有一個bootstrap功能,可以直接引匯出完整的歷史資料用於初始化,非常好用。

3Maxwell不能直接支援HA,但是它支援斷點還原,即錯誤解決後重啟繼續上次點兒讀取資料。

4Maxwell只支援json格式,而

Canal如果用Server+client模式的話,可以自定義格式。

5MaxwellCanal更加輕量級。

1.2 安裝Maxwell

解壓縮maxwell-1.25.0.tar.gz 到某個目錄下。

1.3 使用前準備工作

在資料庫中建立一個maxwell庫用於儲存Maxwell的元資料。

CREATE DATABASE maxwell ;

並且分配一個賬號可以操作該資料庫

GRANT ALL   ON maxwell.* TO 'maxwell'@'%' IDENTIFIED BY '123123';

分配這個賬號可以監控其他資料庫的許可權

GRANT  SELECT ,REPLICATION
SLAVE , REPLICATION CLIENT ON *.* TO maxwell@'%'

1.4 使用Maxwell監控抓取MySql資料

在任意位置建立maxwell.properties 檔案

producer=kafka
kafka.bootstrap.servers=hadoop1:9092,hadoop2:9092,hadoop3:9092
kafka_topic=ODS_DB_GMALL2020_M

host=hadoop2
user=maxwell
password=123123

client_id=maxwell_1

啟動程式

/ext/maxwell-1.25.0/bin/
maxwell --config /xxx/xxxx/maxwell.properties >/dev/null 2>&1 &

1.5 修改或插入mysql資料,並消費kafka進行觀察

/ext/kafka_2.11-1.0.0/bin/kafka-topics.sh --create --topic ODS_DB_GMALL2020_M --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181     --partitions 12 --replication-factor 1

執行測試語句

INSERT INTO z_user_info VALUES(30,'zhang3','13810001010'),(31,'li4','1389999999');

對比

canal

maxwell

{"data":[{"id":"30","user_name":"zhang3","tel":"13810001010"},{"id":"31","user_name":"li4","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385314000,"id":2,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385314116,"type":"INSERT"}

{"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"xoffset":0,"data":{"id":30,"user_name":"zhang3","tel":"13810001010"}}

{"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"commit":true,"data":{"id":31,"user_name":"li4","tel":"1389999999"}}

執行update操作

UPDATE z_user_info SET user_name='wang55' WHERE id IN(30,31)

canal

maxwell

{"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385508000,"id":3,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":[{"user_name":"zhang3"},{"user_name":"li4"}],"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385508676,"type":"UPDATE"}

{"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"},"old":{"user_name":"zhang3"}}

{"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"},"old":{"user_name":"li4"}}

delete操作

DELETE  FROM z_user_info   WHERE id IN(30,31)

canal

maxwell

{"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385644000,"id":4,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385644829,"type":"DELETE"}

{"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"}}

{"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"}}

總結資料特點:

日誌結構

canal 每一條SQL會產生一條日誌,如果該條Sql影響了多行資料,則已經會通過集合的方式歸集在這條日誌中。(即使是一條資料也會是陣列結構)

maxwell 以影響的資料為單位產生日誌,即每影響一條資料就會產生一條日誌。如果想知道這些日誌是否是通過某一條sql產生的可以通過xid進行判斷,相同的xid的日誌來自同一sql。

數字型別

當原始資料是數字型別時,maxwell會尊重原始資料的型別不增加雙引,變為字串。

canal一律轉換為字串。

帶原始資料欄位定義

canal資料中會帶入表結構。maxwell更簡潔。