1. 程式人生 > >MongDB叢集容災方案步驟

MongDB叢集容災方案步驟

MongoDB複製集
優/特點
支援大資料量、高擴充套件性、高效能、靈活資料模型、高可用性。
同步機制
資料複製的目的是使資料得到最大的可用性,避免單點故障引起的整站不能訪問的情況的發生,Mongodb的副本集在同一時刻只有一臺伺服器是可以寫的,副本集的主從複製也是一個非同步同步的過程,是slave端從primary端獲取日誌,然後在自己身上完全順序的執行日誌所記錄的各種操作(該日誌是不記錄查詢操作的),這個日誌就是local資料庫中的oplog.rs表,預設在64位機器上這個表是比較大的,佔磁碟大小的5%,oplog.rs的大小可以在啟動引數中設定:--oplogSize 1000,單位是M。

鑑於雙機房容災意外情況可能放生,本方案選擇故障時人工介入轉移或恢復,不加入仲裁節點。其中A機房為主機房一個primary+2個Secondary節點,B機房作為災備機房,2個Secondary節點。最壞當主機房掛掉時通過權重來啟動B機房某節點為Primary,繼續提供服務。
環境規劃
A機房 角色 B機房 角色
192.168.70.214 Primary 192.168.71.214 Secondary 3 複製集節點 3
192.168.70.215 Secondary 1 複製集節點 1 192.168.71.215 Secondary 4 複製集節點 4
192.168.70.216 Secondary 2 複製集節點 2

架構示意圖
其中下面是主機房斷電斷網時的故障轉移示意圖。

安裝配置
這裡所有節點目錄建立一致,方便管理維護,從配置檔案來判斷各節點的角色。
建立目錄
--為MongoDB建立軟體、資料、日誌目錄,預設情況下它將資料儲存在/mgdata
[[email protected] /]# mkdir -p /mgdb/mongodbtest/replset/data
[[email protected] /]# mkdir /mgdata
[[email protected] /]# mkdir /mglog

上傳介質
sftp> cd /mgdb
sftp> put mongodb-linux-x86_64-2.2.3.tgz.tar

解壓
[[email protected] /]# cd /mgdb
$ tar -xvf mongodb-linux-x86_64-2.2.3.tgz.tar
[[email protected] mgdb]# mv mongodb-linux-x86_64-2.2.3 mongodb

服務啟動
每個節點都要執行
cd /root/mongodb/bin

192.168.70.214
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27011 --oplogSize 2048 --logpath /root/log27011/log27011.log &
./mongo 192.168.70.214:27017

192.168.70.215
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27012 --oplogSize 2048 --logpath /root/log27012/log27012.log &
./mongo 192.168.70.215:27017

192.168.70.216
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27013 --oplogSize 2048 --logpath /root/log27013/log27013.log &
./mongo 192.168.70.216:27017

192.168.71.214
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27017 --oplogSize 2048 --logpath /root/log27017/log27017.log &
./mongo 192.168.71.214:27017

192.168.71.215
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27018 --oplogSize 2048 --logpath /root/log27018/log27018.log &
./mongo 192.168.71.215:27017

分別通過 tail -f /root/log27011/log27011.log 來觀察分析個節點執行情況
複製集配置
在任何一臺mongodb例項上登入,進入admin庫,執行config命令,配置相應權重
[[email protected] bin]# pwd
/root/mongodb/bin
[[email protected] bin]# ./mongo 192.168.70.214:27017
MongoDB shell version: 2.2.3
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
> use admin
switched to db admin
> config = { _id:"repset", members:[
... {_id:0,host:"192.168.70.214:27017",priority:10},
... {_id:1,host:"192.168.70.215:27017",priority:7},
... {_id:2,host:"192.168.70.216:27017",priority:6},
... {_id:3,host:"192.168.71.214:27017",priority:9}]
... {_id:4,host:"192.168.71.215:27017",priority:8}]
... }
{
"_id" : "repset",
"members" : [
{
"_id" : 0,
"host" : "192.168.70.214:27017",
"priority" : 10
},
{
"_id" : 0,
"host" : "192.168.70.215:27017",
"priority" : 7
},
{
"_id" : 0,
"host" : "192.168.70.216:27017",
"priority" : 6
},
{
"_id" : 1,
"host" : "192.168.71.214:27017",
"priority" : 9
},
{
"_id" : 2,
"host" : "192.168.71.215:27017",
"priority" : 8
}
]
}
--檢視
repset:PRIMARY> rs.conf()
{
"_id" : "repset",
"version" : 38349,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 9
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 8
},
{
"_id" : 6,
"host" : "192.168.70.214:27017",
"priority" : 10
},
{
"_id" : 7,
"host" : "192.168.70.215:27017",
"priority" : 7
},
{
"_id" : 8,
"host" : "192.168.70.216:27017",
"priority" : 6
}
]
}
初始化副本集配置
> rs.initiate(config);
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
初始需要一點時間同步
檢視叢集節點狀態
repset:PRIMARY> rs.status()
{
"set" : "repset",
"date" : ISODate("2018-11-09T07:55:04Z"),
"myState" : 1,
"members" : [
{
"_id" : 4,
"name" : "192.168.71.214:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1104,
"optime" : Timestamp(1541749003000, 1),
"optimeDate" : ISODate("2018-11-09T07:36:43Z"),
"lastHeartbeat" : ISODate("2018-11-09T07:55:03Z"),
"pingMs" : 0
},
{
"_id" : 5,
"name" : "192.168.71.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1104,
"optime" : Timestamp(1541749003000, 1),
"optimeDate" : ISODate("2018-11-09T07:36:43Z"),
"lastHeartbeat" : ISODate("2018-11-09T07:55:03Z"),
"pingMs" : 0
},
{
"_id" : 6,
"name" : "192.168.70.214:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1680,
"optime" : Timestamp(1541749003000, 1),
"optimeDate" : ISODate("2018-11-09T07:36:43Z"),
"self" : true
},
{
"_id" : 7,
"name" : "192.168.70.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1104,
"optime" : Timestamp(1541749003000, 1),
"optimeDate" : ISODate("2018-11-09T07:36:43Z"),
"lastHeartbeat" : ISODate("2018-11-09T07:55:03Z"),
"pingMs" : 0
},
{
"_id" : 8,
"name" : "192.168.70.216:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1104,
"optime" : Timestamp(1541749003000, 1),
"optimeDate" : ISODate("2018-11-09T07:36:43Z"),
"lastHeartbeat" : ISODate("2018-11-09T07:55:03Z"),
"pingMs" : 0
}
],
"ok" : 1
}
repset:PRIMARY>

檢視後臺日誌
[[email protected]_master ~]# tail -f /mgdata/mongodb/log27017/mongod.log
驗證複製集資料一致性
先進去主庫primary的mongodb上,錄入資料

repset:PRIMARY> use dinpay
switched to db dinpay
repset:PRIMARY> db.dinpay.insert({"test1108":"xiawu1"})
repset:PRIMARY> db.getMongo().setSlaveOk();

去另一個備庫上驗證資料
repset:SECONDARY> db.dinpay.find()
{ "_id" : ObjectId("5bd676e97e238f7b0dddfb0d"), "MongoDB TEST" : "dinpay" }
{ "_id" : ObjectId("5bd823b65b237ec32e664db2"), "mdbtest" : "zgy20181030" }
{ "_id" : ObjectId("5be53e2c60074628c8509830"), "test1108" : "xiawu1" }

斷電斷網模擬
斷電:直接kill mongdb程序
斷網:開啟某一機房的防火牆限制機房間通訊
B機房斷電斷網
192.168.71.214、192.16871.215斷電斷網 後個節點狀態
repset:PRIMARY> rs.status()
{
"set" : "repset",
"date" : ISODate("2018-11-09T08:02:36Z"),
"myState" : 1,
"members" : [
{
"_id" : 4,
"name" : "192.168.71.214:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1541750316000, 1),
"optimeDate" : ISODate("2018-11-09T07:58:36Z"),
"lastHeartbeat" : ISODate("2018-11-09T08:01:59Z"),
"pingMs" : 0,
"errmsg" : "socket exception [CONNECT_ERROR] for 192.168.71.214:27017"
},
{
"_id" : 5,
"name" : "192.168.71.215:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1541750316000, 1),
"optimeDate" : ISODate("2018-11-09T07:58:36Z"),
"lastHeartbeat" : ISODate("2018-11-09T08:01:57Z"),
"pingMs" : 0,
"errmsg" : "socket exception [CONNECT_ERROR] for 192.168.71.215:27017"
},
{
"_id" : 6,
"name" : "192.168.70.214:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 2132,
"optime" : Timestamp(1541750316000, 1),
"optimeDate" : ISODate("2018-11-09T07:58:36Z"),
"self" : true
},
{
"_id" : 7,
"name" : "192.168.70.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1556,
"optime" : Timestamp(1541750316000, 1),
"optimeDate" : ISODate("2018-11-09T07:58:36Z"),
"lastHeartbeat" : ISODate("2018-11-09T08:02:35Z"),
"pingMs" : 0
},
{
"_id" : 8,
"name" : "192.168.70.216:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1556,
"optime" : Timestamp(1541750316000, 1),
"optimeDate" : ISODate("2018-11-09T07:58:36Z"),
"lastHeartbeat" : ISODate("2018-11-09T08:02:35Z"),
"pingMs" : 0
}
],
"ok" : 1
}
repset:PRIMARY>

結論:A機房執行正常。
A機房斷電斷網
192.168.70.214(PRI)、192.168.70.215、192.168.70.216

 

登入B機房任一臺節點強制reconfig恢復副本集,僅保留活動著的節點
repset:SECONDARY> use admin
switched to db admin
--檢視現有配置,其中70網段3個節點都已死掉了
repset:SECONDARY> cfg=rs.conf()
{
"_id" : "repset",
"version" : 79,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 10
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 9
},
{
"_id" : 7,
"host" : "192.168.70.214:27017",
"priority" : 11
},
{
"_id" : 8,
"host" : "192.168.70.215:27017",
"priority" : 6
},
{
"_id" : 13,
"host" : "192.168.70.216:27017",
"priority" : 5
}
]
}
--只保留活著的節點
repset:SECONDARY> cfg.members = [cfg.members[0], cfg.members[1]]
[
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 10
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 9
}
]
--強制啟動並新產生一個PRIMARY組成2節點的備份集
repset:SECONDARY> rs.reconfig(cfg, {force :true })
{ "ok" : 1 }
repset:SECONDARY> rs.status()
{
"set" : "repset",
"date" : ISODate("2018-11-09T03:45:29Z"),
"myState" : 1,
"members" : [
{
"_id" : 4,
"name" : "192.168.71.214:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 69133,
"optime" : Timestamp(1541663971000, 1),
"optimeDate" : ISODate("2018-11-08T07:59:31Z"),
"self" : true
},
{
"_id" : 5,
"name" : "192.168.71.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 8,
"optime" : Timestamp(1541663971000, 1),
"optimeDate" : ISODate("2018-11-08T07:59:31Z"),
"lastHeartbeat" : ISODate("2018-11-09T03:45:29Z"),
"pingMs" : 0
}
],
"ok" : 1
}
repset:PRIMARY>
--檢查資料,先前資料仍存在
repset:PRIMARY> use test
switched to db test
repset:PRIMARY> show collections
system.indexes
test
Testdb
--狀態查詢
repset:PRIMARY> rs.conf()
{
"_id" : "repset",
"version" : 38342,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 10
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 9
}
]
}

結論:B機房強制啟動,變成新的備份集
儘管已產生新的備份集,但不能保證斷電斷網瞬間存在舊PRIMARY未同步至各從節點而導致資料丟失的可能性。

恢復初始狀態
Kill並重啟A機房個節點mongdb

192.168.70.214
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27011 --oplogSize 2048 --logpath /root/log27011/log27011.log &
192.168.70.215
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27012 --oplogSize 2048 --logpath /root/log27012/log27012.log &
192.168.70.216
/root/mongodb/bin/mongod --replSet repset --port 27017 --dbpath /root/data27013 --oplogSize 2048 --logpath /root/log27013/log27013.log &

將A機房個節點加入新備份集(B機房),並對A機房某節點提權升為新的PRIMARY,恢復至斷電斷網前的狀態
epset:PRIMARY> use admin
switched to db admin
repset:PRIMARY> cfg=rs.conf()
repset:PRIMARY> cfg.members[XX].priority = 8
8
repset:PRIMARY> rs.reconfig(cfg)


////////////////////////////////////////////////////////////////////////////////
主機房掛了再恢復測試。。。比上面恢復詳細
--A機房(主機房) 192.168.70.214(主)/192.168.70.215/192.168.70.216斷電斷網,B機房強制重啟後成為了新的叢集,現在將2機房重新回到初始狀態,首先要確認之前各節點都是什麼角色
--加節點
--設權重
repset:PRIMARY> use admin
switched to db admin
repset:PRIMARY> rs.add("192.168.70.214:27017")
{ "ok" : 1 }
repset:PRIMARY> rs.add("192.168.70.215:27017")
{ "ok" : 1 }
repset:PRIMARY> rs.add("192.168.70.216:27017")
{ "ok" : 1 }
repset:PRIMARY>
repset:PRIMARY> cfg=rs.conf()
{
"_id" : "repset",
"version" : 63044,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 9
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 8
},
{
"_id" : 6,
"host" : "192.168.70.214:27017"
},
{
"_id" : 7,
"host" : "192.168.70.215:27017"
},
{
"_id" : 8,
"host" : "192.168.70.216:27017"
}
]
}
repset:PRIMARY> cfg.members[2].priority = 11
11
repset:PRIMARY> cfg.members[3].priority = 6
6
repset:PRIMARY> cfg.members[4].priority = 5
5
repset:PRIMARY> rs.reconfig(cfg)
Mon Nov 12 16:04:10 DBClientCursor::init call() failed
Mon Nov 12 16:04:10 query failed : admin.$cmd { replSetReconfig: { _id: "repset", version: 63045, members: [ { _id: 4, host: "192.168.71.214:27017", priority: 9.0 }, { _id: 5, host: "192.168.71.215:27017", priority: 8.0 }, { _id: 6, host: "192.168.70.214:27017", priority: 11.0 }, { _id: 7, host: "192.168.70.215:27017", priority: 6.0 }, { _id: 8, host: "192.168.70.216:27017", priority: 5.0 } ] } } to: 192.168.71.214:27017
Mon Nov 12 16:04:10 trying reconnect to 192.168.71.214:27017
Mon Nov 12 16:04:10 reconnect 192.168.71.214:27017 ok
reconnected to server after rs command (which is normal)

repset:PRIMARY>
Mon Nov 12 16:04:29 Socket recv() errno:104 Connection reset by peer 192.168.71.214:27017
Mon Nov 12 16:04:29 SocketException: remote: 192.168.71.214:27017 error: 9001 socket exception [1] server [192.168.71.214:27017]
Mon Nov 12 16:04:29 DBClientCursor::init call() failed
Mon Nov 12 16:04:29 query failed : admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 } to: 192.168.71.214:27017
>
Mon Nov 12 16:04:37 trying reconnect to 192.168.71.214:27017
Mon Nov 12 16:04:37 reconnect 192.168.71.214:27017 ok
repset:SECONDARY>
repset:SECONDARY> rs.conf()
{
"_id" : "repset",
"version" : 63045,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 9
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 8
},
{
"_id" : 6,
"host" : "192.168.70.214:27017",
"priority" : 11
},
{
"_id" : 7,
"host" : "192.168.70.215:27017",
"priority" : 6
},
{
"_id" : 8,
"host" : "192.168.70.216:27017",
"priority" : 5
}
]
}
repset:SECONDARY>
repset:SECONDARY> rs.status()
{
"set" : "repset",
"date" : ISODate("2018-11-12T08:05:16Z"),
"myState" : 2,
"syncingTo" : "192.168.70.214:27017",
"members" : [
{
"_id" : 4,
"name" : "192.168.71.214:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 19426,
"optime" : Timestamp(1542009850000, 1),
"optimeDate" : ISODate("2018-11-12T08:04:10Z"),
"errmsg" : "syncing to: 192.168.70.214:27017",
"self" : true
},
{
"_id" : 5,
"name" : "192.168.71.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 50,
"optime" : Timestamp(1542009850000, 1),
"optimeDate" : ISODate("2018-11-12T08:04:10Z"),
"lastHeartbeat" : ISODate("2018-11-12T08:05:14Z"),
"pingMs" : 0,
"errmsg" : "syncing to: 192.168.70.214:27017"
},
{
"_id" : 6,
"name" : "192.168.70.214:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 64,
"optime" : Timestamp(1542009850000, 1),
"optimeDate" : ISODate("2018-11-12T08:04:10Z"),
"lastHeartbeat" : ISODate("2018-11-12T08:05:14Z"),
"pingMs" : 1
},
{
"_id" : 7,
"name" : "192.168.70.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 64,
"optime" : Timestamp(1542009850000, 1),
"optimeDate" : ISODate("2018-11-12T08:04:10Z"),
"lastHeartbeat" : ISODate("2018-11-12T08:05:14Z"),
"pingMs" : 0,
"errmsg" : "syncing to: 192.168.70.214:27017"
},
{
"_id" : 8,
"name" : "192.168.70.216:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 64,
"optime" : Timestamp(1542009850000, 1),
"optimeDate" : ISODate("2018-11-12T08:04:10Z"),
"lastHeartbeat" : ISODate("2018-11-12T08:05:14Z"),
"pingMs" : 1,
"errmsg" : "syncing to: 192.168.70.214:27017"
}
],
"ok" : 1
}
repset:SECONDARY>