redis cluster 搭建
一、關於redis cluster
1、redis cluster的現狀
reids-cluster計劃在redis3.0中推出,目前最新版本3.2.0
目前redis支援的cluster特性(已親測):
1):節點自動發現
2):slave->master 選舉,叢集容錯
3):Hot resharding:線上分片
4):進群管理:cluster xxx
5):基於配置(nodes-port.conf)的叢集管理
6):ASK 轉向/MOVED 轉向機制.
2、redis cluster 架構
1)redis-cluster架構圖
架構細節:
(1)所有的redis節點彼此互聯(PING-PONG機制),內部使用二進位制協議優化傳輸速度和頻寬.
(2)節點的fail是通過叢集中超過半數的節點檢測失效時才生效.
(3)客戶端與redis節點直連,不需要中間proxy層.客戶端不需要連線叢集所有節點,連線叢集中任何一個可用節點即可
(4)redis-cluster把所有的物理節點對映到[0-16383]slot上,cluster 負責維護node<->slot<->value
2) redis-cluster選舉:容錯
(1)領著選舉過程是叢集中所有master參與,如果半數以上master節點與master節點通訊超過(cluster-node-timeout),認為當前master節點掛掉.
(2):什麼時候整個叢集不可用(cluster_state:fail),當叢集不可用時,所有對叢集的操作做都不可用,收到((error) CLUSTERDOWN The cluster is down)錯誤
a:如果叢集任意master掛掉,且當前master沒有slave.叢集進入fail狀態,也可以理解成進群的slot對映[0-16383]不完成時進入fail狀態.
b:如果進群超過半數以上master掛掉,無論是否有slave叢集進入fail狀態.
二、redis cluster的使用
1、安裝redis cluster
1):安裝redis-cluster依賴:redis-cluster的依賴庫在使用時有相容問題,在reshard時會遇到各種錯誤,請按指定版本安裝.
(1)確保系統安裝zlib,否則gem install會報(no such file to load -- zlib)
- #download:zlib-1.2.6.tar
- ./configure
- make
- make install
(1)安裝ruby:version(1.9.2)
- # ruby1.9.2
- cd /path/ruby
- ./configure -prefix=/usr/local/ruby
- make
- make install
- sudo cp ruby /usr/local/bin
- # rubygems-1.8.16.tgz
- cd /path/gem
- sudo ruby setup.rb
- sudo cp bin/gem /usr/local/bin
- gem install redis --version 3.0.0
- #由於源的原因,可能下載失敗,就手動下載下來安裝
- #download地址:http://rubygems.org/gems/redis/versions/3.0.0
- gem install -l /data/soft/redis-3.0.0.gem
2)安裝redis-cluster
- cd /path/redis
- make
- sudo cp /opt/redis/src/redis-server /usr/local/bin
- sudo cp /opt/redis/src/redis-cli /usr/local/bin
- sudo cp /opt/redis/src/redis-trib.rb /usr/local/bin
2:配置redis cluster
1)redis配置檔案結構:
使用包含(include)把通用配置和特殊配置分離,方便維護.
3、redis cluster 運維操作
1)初始化並構建叢集
(1)啟動叢集相關節點(必須是空節點),指定配置檔案和輸出日誌
- redis-server /opt/redis/conf/redis-6380.conf > /opt/redis/logs/redis-6380.log 2>&1 &
- redis-server /opt/redis/conf/redis-6381.conf > /opt/redis/logs/redis-6381.log 2>&1 &
- redis-server /opt/redis/conf/redis-6382.conf > /opt/redis/logs/redis-6382.log 2>&1 &
- redis-server /opt/redis/conf/redis-7380.conf > /opt/redis/logs/redis-7380.log 2>&1 &
- redis-server /opt/redis/conf/redis-7381.conf > /opt/redis/logs/redis-7381.log 2>&1 &
- redis-server /opt/redis/conf/redis-7382.conf > /opt/redis/logs/redis-7382.log 2>&1 &
(2)使用自帶的ruby工具(redis-trib.rb)構建叢集
- #redis-trib.rb的create子命令構建
- #--replicas 則指定了為Redis Cluster中的每個Master節點配備幾個Slave節點
- #節點角色由順序決定,先master之後是slave(為方便辨認,slave的埠比master大1000)
- redis-trib.rb create --replicas 1 10.10.34.14:6380 10.10.34.14:6381 10.10.34.14:6382 10.10.34.14:7380 10.10.34.14:7381 10.10.34.14:7382
- #redis-trib.rb的check子命令構建
- #ip:port可以是叢集的任意節點
- redis-trib.rb check 1 10.10.34.14:6380
- [OK] All nodes agree about slots configuration.
- >>> Check for open slots...
- >>> Check slots coverage...
- [OK] All 16384 slots covered.
三、重啟redis cluster
1、重啟部分redis node
命令:/usr/local/bin/redis-cli shutdown
該命令會停止當前伺服器上6379埠的redis node
此時redis cluster的節點狀態變為
67d89b4f617b9cb83899bb5631167ec577c00827 10.86.45.137:6379 myself,slave 337241090f64e602e5f917ef840809ed52025565 0 0 10 connected
9899dae2ea13c350c5ad0dd562a76c32fdf1522d 10.86.45.136:6379 slave,fail 9e2ebd2d76d708045f7117dd8a4d922d988dfa9d 1435577082442 1435577078233 17 disconnected
7b86c086200b5078b6a97fa6152bbabd97569d5f 10.86.45.138:6379 master - 0 1435577092279 18 connected 10000-16383
9e2ebd2d76d708045f7117dd8a4d922d988dfa9d 10.86.41.39:6379 master - 0 1435577095290 0 connected 0-4999
337241090f64e602e5f917ef840809ed52025565 10.86.41.40:6379 master - 0 1435577093285 15 connected 5000-9999
1ba8070f2057e312517dc002660957e66431d676 10.86.41.41:6379 slave 7b86c086200b5078b6a97fa6152bbabd97569d5f 0 1435577094286 18 connected
看9899dae2ea13c350c5ad0dd562a76c32fdf1522d這個節點,已經是disconnected狀態,說明該節點已經從redis cluster斷開
執行以下命令
sudo /usr/local/bin/redis-server /etc/redis.conf
此時redis cluster的節點狀態變為
67d89b4f617b9cb83899bb5631167ec577c00827 10.86.45.137:6379 myself,slave 337241090f64e602e5f917ef840809ed52025565 0 0 10 connected
9899dae2ea13c350c5ad0dd562a76c32fdf1522d 10.86.45.136:6379 slave 9e2ebd2d76d708045f7117dd8a4d922d988dfa9d 0 1435577341917 17 connected
7b86c086200b5078b6a97fa6152bbabd97569d5f 10.86.45.138:6379 master - 0 1435577342418 18 connected 10000-16383
9e2ebd2d76d708045f7117dd8a4d922d988dfa9d 10.86.41.39:6379 master - 0 1435577338910 0 connected 0-4999
337241090f64e602e5f917ef840809ed52025565 10.86.41.40:6379 master - 0 1435577339913 15 connected 5000-9999
1ba8070f2057e312517dc002660957e66431d676 10.86.41.41:6379 slave 7b86c086200b5078b6a97fa6152bbabd97569d5f 0 1435577342919 18 connected
看9899dae2ea13c350c5ad0dd562a76c32fdf1522d這個節點,已經是connected狀態,說明該節點已經重新連回redis cluster
注意:當叢集正常動行時,某有部分節點重啟後,node id是不發生變化的
2、重啟整個redis cluster
特別注意:重啟整個叢集的時候,redis id有可能會發生變化,導致叢集啟動失敗
啟動失敗的修復方法
1、cluster nodes,檢視當前叢集結點狀態,找出各節點新ID
2、cluster meet,將沒有在叢集中的結點連入叢集,如果所有結點都已連入叢集,可以忽略這步
3、cluster forget,將已經沒有IP的結點清除掉
4、cluster replicate,將結點手工分配為另一個節點的salve
5、cluster slots,給master分別slot
執行完以上五步,檢視叢集狀態,此時叢集狀態應該已經恢復正常