mongodb的資料遷移與同步工具 mongoshake

阿新 • • 發佈：2021-12-01

##########

[[email protected]]$ cat collector.conf
# if you have any problem, please visit https://github.com/alibaba/MongoShake/wiki/FAQ
# for the detail explanation, please visit xxxx
# 如果有問題，請先檢視FAQ文件以及wiki上的說明。
# 關於各個引數的詳細說明，請參考：xxx

# current configuration version, do not modify.
# 當前配置檔案的版本號，請不要修改該值。
conf.version  
= 2

# --------------------------- global configuration ---------------------------
# collector name
# id用於輸出pid檔案等資訊。
id = mongoshake

# high availability option.
# enable master election if set true. only one mongoshake can become master
# and do sync, the others will wait and at most one of them become master once 
# previous master die. The master information stores  
in the `mongoshake` db in the source 
# database by default.
# This option is useless when there is only one mongoshake running.
# 如果開啟主備mongoshake拉取同一個源端，此引數需要開啟。
master_quorum = false

# http api interface. Users can use this api to monitor mongoshake.
# `curl 127.0.0.1:9100`.
# We also provide a restful tool named  
"mongoshake-stat" to
# print ack, lsn, checkpoint and qps information based on this api.
# usage: `./mongoshake-stat --port=9100`
# 全量和增量的restful監控埠，可以用curl檢視內部監控metric統計情況。詳見wiki。
full_sync.http_port = 9101
incr_sync.http_port = 9100
# profiling on net/http/profile
# profiling埠，用於檢視內部go堆疊。
system_profile_port = 9200

# global log level: debug, info, warning, error. lower level message will be filter
log.level = info
# log directory. log and pid file will be stored into this file.
# if not set, default is "./logs/"
# log和pid檔案的目錄，如果不設定預設打到當前路徑的logs目錄。
log.dir =
# log file name.
# log檔名。
log.file = collector.log
# log flush enable. If set false, logs may not be print when exit. If
# set true, performance will be decreased extremely
# 設定log重新整理，false表示包含快取，如果true那麼每條log都會直接刷屏，但對效能有影響；
# 反之，退出不一定能列印所有的log，除錯時建議配置true。
log.flush = false
# sync mode: all/full/incr. default is incr.
# all means full synchronization + incremental synchronization.
# full means full synchronization only.
# incr means incremental synchronization only.
# 同步模式，all表示全量+增量同步，full表示全量同步，incr表示增量同步。
#sync_mode = incr
sync_mode = all

# connect source mongodb, set username and password if enable authority. Please note: password shouldn't contain '@'.
# split by comma(,) if use multiple instance in one replica-set. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC
# split by semicolon(;) if sharding enable. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ
# 源MongoDB連線串資訊，逗號分隔同一個副本集內的結點，分號分隔分片sharding例項，免密模式
# 可以忽略“username:password@”，注意，密碼裡面不能含有'@'符號。
# 舉例：
# 副本集：mongodb://username1:password1@primaryA,secondaryB,secondaryC
# 分片集：mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ
mongo_urls = mongodb://igoodful:[email protected]:27000,10.10.10.12:27000
# please fill the source config server url if source mongodb is sharding.
mongo_cs_url =
# please give one mongos address if using change stream to fetching data in incremental stage.
# 如果源端採用change stream拉取，這裡還需要配置一個mongos的地址
mongo_s_url = 

# tunnel pipeline type. now we support rpc,file,kafka,mock,direct
# 通道模式。
tunnel = direct
# tunnel target resource url
# for rpc. this is remote receiver socket address
# for tcp. this is remote receiver socket address
# for file. this is the file path, for instance "data"
# for kafka. this is the topic and brokers address which split by comma, for
# instance: topic@brokers1,brokers2, default topic is "mongoshake"
# for mock. this is uesless
# for direct. this is target mongodb address which format is the same as `mongo_urls`. If
# the target is sharding, this should be the mongos address.
# direct模式用於直接寫入MongoDB，其餘模式用於一些分析，或者遠距離傳輸場景，
# 注意，如果是非direct模式，需要通過receiver進行解析，具體參考FAQ文件。
# 此處配置通道的地址，格式與mongo_urls對齊。
#tunnel.address = mongodb://user:password@host:port
tunnel.address = mongodb://igoodful:[email protected]:27000
# the message format in the tunnel, used when tunnel is kafka.
# "raw": batched raw data format which has good performance but encoded so that users
# should parse it by receiver.
# "json": single oplog format by json.
# "bson": single oplog format by bson.
# 通道資料的型別，只用於kafka和file通道型別。
# raw是預設的型別，其採用聚合的模式進行寫入和
# 讀取，但是由於攜帶了一些控制資訊，所以需要專門用receiver進行解析。
# json以json的格式寫入kafka，便於使用者直接讀取。
# bson以bson二進位制的格式寫入kafka。
tunnel.message = raw

# connect mode:
# primary: fetch data from primary.
# secondaryPreferred: fetch data from secondary if has, otherwise primary.(default)
# standalone: fetch data from given 1 node, no matter primary, secondary or hidden. This is only
# support when tunnel type is direct.
# 連線模式，primary表示從主上拉取，secondaryPreferred表示優先從secondary拉取（預設建議值），
# standalone表示從任意單個結點拉取。
mongo_connect_mode = secondaryPreferred

# filter db or collection namespace. at most one of these two parameters can be given.
# if the filter.namespace.black is not empty, the given namespace will be
# filtered while others namespace passed. 
# if the filter.namespace.white is not empty, the given namespace will be
# passed while others filtered. 
# all the namespace will be passed if no condition given.
# db and collection connected by the dot(.).
# different namespaces are split by the semicolon(;).
# filter: filterDbName1.filterCollectionName1;filterDbName2
# 黑白名單過濾，目前不支援正則，白名單表示通過的namespace，黑名單表示過濾的namespace，
# 不能同時指定。分號分割不同namespace，每個namespace可以是db，也可以是db.collection。
#filter.namespace.black =
filter.namespace.white = mktact.themis_template_field; mktact.themis_template_module;  mktact.goods_allow_list
# some databases like "admin", "local", "mongoshake", "config", "system.views" are
# filtered, users can enable these database based on some special needs.
# different database are split by the semicolon(;).
# e.g., admin;mongoshake.
# pay attention: collection isn't support like "admin.xxx" except "system.views"
# 正常情況下，不建議配置該引數，但對於有些非常特殊的場景，使用者可以啟用admin，mongoshake等庫的同步，
# 以分號分割，例如：admin;mongoshake。
filter.pass.special.db =
# only transfer oplog commands for syncing. represent
# by oplog.op are "i","d","u".
# DDL will be transferred if disable like create index, drop databse,
# transaction in mongodb 4.0.
# 是否需要開啟DDL同步，true表示開啟，源是sharding暫時不支援開啟。
# 如果目的端是sharding，暫時不支援applyOps命令，包括事務。
filter.ddl_enable = false

# checkpoint info, used in resuming from break point.
# checkpoint儲存資訊，用於支援斷點續傳。
# context.storage.url is used to mark the checkpoint store database. E.g., mongodb://127.0.0.1:20070
# if not set, checkpoint will be written into source mongodb when source mongodb is replica-set(db=mongoshake),
# when source mongodb is sharding, the checkpoint will be written into config-server(db=admin)
# checkpoint的具體寫入的MongoDB地址，如果不配置，對於副本集將寫入源庫(db=mongoshake)，對於分片集
# 將寫入config-server（db=admin）
checkpoint.storage.url =
# checkpoint db's name.
# checkpoint儲存的db的名字
checkpoint.storage.db = mongoshake
# checkpoint collection's name.
# checkpoint儲存的表的名字，如果啟動多個mongoshake拉取同一個源可以修改這個表名以防止衝突。
checkpoint.storage.collection = ckpt_default
# real checkpoint: the fetching oplog position.
# pay attention: this is UTC time which is 8 hours latter than CST time. this
# variable will only be used when checkpoint is not exist.
# 本次開始拉取的位置，如果checkpoint已經存在（位於上述儲存位置）則該引數無效，
# 如果需要強制該位置開始拉取，需要先刪除原來的checkpoint，詳見FAQ。
# 若checkpoint不存在，且該值為1970-01-01T00:00:00Z，則會拉取源端現有的所有oplog。
# 若checkpoint不存在，且該值不為1970-01-01T00:00:00Z，則會先檢查源端oplog最老的時間是否
# 大於給定的時間，如果是則會直接報錯退出。
checkpoint.start_position = 1970-01-01T00:00:00Z

# transform from source db or collection namespace to dest db or collection namespace.
# at most one of these two parameters can be given.
# transform: fromDbName1.fromCollectionName1:toDbName1.toCollectionName1;fromDbName2:toDbName2
# 轉換名稱空間，比如a.b同步後變成c.d，謹慎建議開啟，比較耗效能。
#transform.namespace = ucenter.award:award.award;ucenter.award_config:award.award_config;ucenter.award_order:award.award_order;ucenter.address_info:award.address_info
#transform.namespace = ucenter:award
#transform.namespace = ucenter.athena_user_task_v2:athena.athena_user_task_v3
transform.namespace =

# --------------------------- full sync configuration ---------------------------
# the number of collection concurrence
# 併發最大拉取的表個數，例如，6表示同一時刻shake最多拉取6個表。
full_sync.reader.collection_parallel = 16
# the number of document writer thread in each collection.
# 同一個表內併發寫的執行緒數，例如，8表示對於同一個表，將會有8個寫執行緒進行併發寫入。
full_sync.reader.write_document_parallel = 32
# number of documents in a batch insert in a document concurrence
# 目的端寫入的batch大小，例如，128表示一個執行緒將會一次聚合128個文件然後再寫入。
full_sync.reader.document_batch_size = 10240
full_sync.reader.read_document_count=0
# drop the same name of collection in dest mongodb in full synchronization
# 同步時如果目的庫存在，是否先刪除目的庫再進行同步。
full_sync.collection_exist_no_drop = true

# create foreground indexes when data sync finish in full sync stage.
# 全量期間資料同步完畢後，是否需要建立索引，none表示不建立，foreground表示建立前臺索引，
# background表示建立後臺索引。
full_sync.create_index = background

# convert insert to update when duplicate key found
# 如果_id存在在目的庫，是否將insert語句修改為update語句。
full_sync.executor.insert_on_dup_update = true
# filter orphan document for source type is sharding.
# 源端是sharding，是否需要過濾orphan文件
full_sync.executor.filter.orphan_document = false
# enable majority write in full sync.
# the performance will degrade if enable.
# 全量階段寫入端是否啟用majority write
full_sync.executor.majority_enable = false

# --------------------------- incrmental sync configuration ---------------------------
# fetch method:
# oplog: fetch oplog from source mongodb (default)
# change_stream: use change to receive change event from source mongodb, support MongoDB >= 4.0
incr_sync.mongo_fetch_method = oplog

# global id. used in active-active replication.
# this parameter is not supported on current open-source version.
# gid用於雙活防止環形複製，目前只用於阿里云云上MongoDB，如果是阿里云云上例項互相同步
# 希望開啟gid，請聯絡阿里雲售後或者燭昭(vinllen)，sharding的有多個gid請以分號(;)分隔。
incr_sync.oplog.gids =
# distribute data to different worker by hash key to run in parallel.
# rker = 8
# [auto]         decide by if there has unique index in collections.
#                  use `collection` if has unique index otherwise use `id`.
# [id]             shard by ObjectId. handle oplogs in sequence by unique _id
# [collection]     shard by ns. handle oplogs in sequence by unique ns
# hash的方式，id表示按文件hash，collection表示按表hash，auto表示自動選擇hash型別。
# 如果沒有索引建議選擇id達到非常高的同步效能，反之請選擇collection。
incr_sync.shard_key = auto

# oplog transmit worker concurrent
# if the source is sharding, worker number must equal to shard numbers.
# 內部發送的worker數目，如果機器效能足夠，可以提高worker個數。
incr_sync.worker = 128
# batched oplogs have block level checksum value using
# crc32 algorithm. and compressor for compressing content
# of oplog entry.
# supported compressor are : gzip,zlib,deflate
# Do not enable this option when tunnel type is "direct"
# 是否啟用傳送，非direct模式傳送可以選擇壓縮以減少網路頻寬消耗。
incr_sync.worker.oplog_compressor = none

# memory queue configuration, plz visit FAQ document to see more details.
# do not modify these variables if the performance and resource usage can
# meet your needs.
# 內部佇列的配置引數，如果目前效能足夠不建議修改，詳細資訊參考FAQ。
incr_sync.worker.batch_queue_size = 64
incr_sync.adaptive.batching_max_size = 1024
incr_sync.fetcher.buffer_capacity = 256

# --- direct tunnel only begin ---
# if tunnel type is direct, all the below variable should be set
# 下列引數僅用於tunnel為direct的情況。

# oplog changes to Insert while Update found non-exist (_id or unique-index)
# 如果_id不存在在目的庫，是否將update語句修改為insert語句。
incr_sync.executor.upsert = false
# oplog changes to Update while Insert found duplicated key (_id or unique-index)
# 如果_id存在在目的庫，是否將insert語句修改為update語句。
incr_sync.executor.insert_on_dup_update = false
# db. write duplicated logs to mongoshake_conflict
# sdk. write duplicated logs to sdk.
# 如果寫入存在衝突，記錄衝突的文件。
incr_sync.conflict_write_to = none

# enable majority write in incrmental sync.
# the performance will degrade if enable.
# 增量階段寫入端是否啟用majority write
incr_sync.executor.majority_enable = false

# --- direct tunnel only end ---

###################

##################

[email protected]

mongodb的資料遷移與同步工具 mongoshake

########## [[email protected]]$ cat collector.conf # if you have any problem, please visit https://github.com/alibaba/MongoShake/wiki/FAQ

詳解MongoDB資料還原及同步解決思路

mongodb資料如何還原，同步到其他系統？只要我們瞭解了資料庫日誌原理，一切都是那麼簡單

Laravel入坑指南(9)——資料遷移與填充

當我們開發完成一個（小）專案，釋出到線上時，我們需要將本地資料庫遷移到伺服器上，並且填充初始化資料。而Laravel框架規定了一套完善的資料遷移與填充機制。

MongoDB 資料備份與恢復

MongoDB 備份與恢復資料備份恢復工具 1.mongoexport/mongoimport# 資料分析時使用 2.mongodump/mongorestore# 單純備份時使用

Redis migrate資料遷移工具的使用教程

前言在工作中可能會遇到單點Redis向Redis叢集遷移資料的問題，但又不能老麻煩運維來做。為了方便研發自己遷移資料，我這裡寫了一個簡單的Redis遷移工具，希望對有需要的人有用。

Kettle+MongoDB 資料同步到MySQL

Kettle+MongoDB 資料同步到MySQL 1、前言： MongoDB中的date型別以UTC（Coordinated Universal Time）儲存，isodate型別，就等於GMT（格林尼治標準時）時間。而北京所處的是+8區，所以mongo shell會將當前

Redis migrate 資料遷移工具

在工作中可能會遇到單點Redis向Redis叢集遷移資料的問題，但又不能老麻煩運維來做。為了方便研發自己遷移資料，我這裡寫了一個簡單的Redis遷移工具，希望對有需要的人有用。

使用Canal作為mysql的資料同步工具

一、Canal介紹 1、應用場景在前面的統計分析功能中，我們採取了服務呼叫獲取統計資料，這樣耦合度高，效率相對較低，目前我採取另一種實現方式，通過實時同步資料庫表的方式實現，例如我們要統計每天註冊與登入人

由資料遷移至MongoDB導致的資料不一致問題及解決方案

故事背景企業現狀 2019年年初，我接到了一個神祕電話，電話那頭竟然準確的說出了我的暱稱：上海小胖。

資料同步工具maxwell使用

啟動MySQL 建立maxwell的資料庫和使用者在MySQL中建立一個測試資料庫和表前面三個步驟詳見 https://www.cnblogs.com/geogre123/p/14062685.html

PhoneTrans for mac(資料遷移工具)

PhoneTrans for mac一款資料遷移工具，PhoneTrans彌合了差距，使您只需單擊即可輕鬆地以所需的任何方式跨iOS，Android手機和平板電腦自由遷移所需的一切。

Canal資料同步工具

官方介紹及文件 https://github.com/alibaba/canal 1丶Canal環境搭建 canal的原理是基於mysql binlog技術，所以這裡一定需要開啟mysql的binlog寫入功能

資料加密與安全專題《mbedtls工具篇，實用教程2@hello命令原始碼分析》

技術標籤：資料加密與安全專題mbedtlshello 上一篇中提及mbedtls工具安裝後，可以使用hello命令進行驗證是否已安裝成功，那麼hello命令來自哪裡，如何增加一條自定義的命令

資料加密與安全專題《mbedtls工具篇，實用教程6@對大檔案使用加密解密演算法，再探MD5》

技術標籤：資料加密與安全專題大檔案md5md5mbedtls 前言做嵌入式開發的同學知道，嵌入式片上記憶體RAM容量一般是比較小的，而經常需要對Flash裡面的大檔案做雜湊校驗，例如一個RAM 256KB的MCU，需要對512KB的Fl

微軟釋出測試工具，可將 Dropbox 資料遷移至 Microsoft 365

5 月 16 日訊息根據外媒 mspoweruser 訊息，微軟昨日釋出了一個公開測試版軟體，有將 Dropbox 商用服務遷移至 Microsoft 365。該工具可以遷移 Dropbox 的資料夾、檔案、使用者列表等資訊，自動將有關資料轉移至微軟

資料遷移工具 Sqoop

介紹：　　Sqoop是一款開源的工具，主要用於在Hadoop(Hive)與傳統的資料庫（mysql、 postgresql等）間進行資料的傳遞。可以將關係型資料庫（MySQL ,Oracle ,Postgres等）中的資料匯入到HDFS中，也可以將HDFS的資料

ClickHouse學習系列之八【資料匯入遷移&同步】

背景　　在介紹了一些ClickHouse相關的系列文章之後，大致對ClickHouse有了比較多的瞭解。它是一款非常優秀的OLAP資料庫，為了更好的來展示其強大的OLAP能力，本文將介紹一些快速匯入大量資料到ClickHouse的方法。

效能優化之資料庫篇5-分庫分表與資料遷移

一、資料庫拆分 1. 為什麼要做資料庫拆分單機資料庫存在的問題？從容量、效能、可用性和運維成本上難以滿足海量資料的場景。

golang 版本的migrate資料遷移工具

github地址：https://github.com/golang-migrate/migrate 一、CLI方式使用需要下載工具：go install -tags \'mysql\' github.com/golang-migrate/migrate/v4/cmd/migrate@latest之後在GOPATH目錄下會多一個migrate

資料遷移同步平臺CloudCanal免費社群版正式釋出，諸多功能特性等你體驗！

關於CloudCanal 產品介紹 2021年8月1日，經歷近2年的準備，CloudCanal社群版正式與大家見面了。CloudCanal是一款由ClouGence公司發行的集結構遷移、資料全量遷移/校驗/訂正、增量實時同步為一體的資料遷移同步平臺。

mongodb的資料遷移與同步工具 mongoshake

相關推薦