1. 程式人生 > >elasticsearch2.x升級到6.x 完成資料遷移

elasticsearch2.x升級到6.x 完成資料遷移

官方文件方法描述:https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html

官方對於版本升級的規則描述:https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html#rolling-upgrades

Rolling upgrades can be performed between minor versions. Elasticsearch 6.x supports rolling upgrades from Elasticsearch 5.6. Upgrading from earlier 5.x versions requires a full cluster restart. You must reindex to upgrade from versions prior to 5.x.

To upgrade an Elasticsearch 5.x cluster that contains indices created in 2.x, you must reindex or delete them before upgrading to 6.x. For more information, see Reindex in place.

To upgrade an Elasticsearch cluster running 2.x, you have two options:

  • Perform a full cluster restart upgrade to 5.6, reindex the 2.x indices, then perform a rolling upgrade to 6.x. If your Elasticsearch 2.x cluster contains indices that were created before 2.x, you must either delete or reindex them before upgrading to 5.6. For more information about upgrading from 2.x to 5.6, see Upgrading Elasticsearch in the Elasticsearch 5.6 Reference.
  • Create a new 6.x cluster and reindex from remote to import indices directly from the 2.x cluster.

To upgrade an Elasticsearch 1.x cluster, you have two options:

  • Perform a full cluster restart upgrade to Elasticsearch 2.4.x and reindex or delete the 1.x indices. Then, perform a full cluster restart upgrade to 5.6 and reindex or delete the 2.x indices. Finally, perform a rolling upgrade to 6.x. For more information about upgrading from 1.x to 2.4, see Upgrading Elasticsearch in the Elasticsearch 2.4 Reference. For more information about upgrading from 2.4 to 5.6, see Upgrading Elasticsearch in the Elasticsearch 5.6 Reference.
  • Create a new 6.x cluster and reindex from remote to import indices directly from the 1.x cluster.

舊es版本為2.4.6,三節點

127.0.0.1:9201,127.0.0.1:9301
127.0.0.1:9202,127.0.0.1:9302
127.0.0.1:9203,127.0.0.1:9303

新es版本為6.4.2,三節點

127.0.0.1:19201,127.0.0.1:19301
127.0.0.1:19202,127.0.0.1:19302
127.0.0.1:19203,127.0.0.1:19303

先啟動一個新版本的es節點,配置檔案如下:

# 這個引數必須設定  此引數是用來允許單機允許多個例項,預設不允許
node.max_local_storage_nodes: 32

# 用來重構索引的舊叢集地址
reindex.remote.whitelist: 127.0.0.1:9201

# 使用分割槽屬性來控制分片的分配 以及請求的分配
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone-1,zone-2,zone-3
node.attr.zone: zone-1

# 叢集名稱 同一個叢集裡的此引數要一致
cluster.name: version6

# 節點名稱  統一叢集的節點之間需要不一致
node.name: node-6-1

# 此節點為master候選節點
node.master: true
# 此節點為data節點
node.data: true

### es的節點分為三類  client  master data  
### client的master和data都是false, 作用類似於nginx的請求轉發
### master負責結果聚合等工作,壓力較大  所以不建議節點同時為master和data
### data節點儲存分片負責部分計算

# data存放的路徑
path.data: F:\\es\\data\\6.4.2\\node-6-1
# log存放的路徑
path.logs: F:\\es\\logs\\6.4.2\\node-6-1

# 此節點對外的ip  本地直接寫127.0.0.1即可
network.host: 127.0.0.1
# 對外暴露的http訪問埠
http.port: 19201
# es內部通訊埠  單播使用
transport.tcp.port: 19301
# 關閉組播  1.x的引數  5.x 6.x新增會報錯
# discovery.zen.ping.multicast.enabled: false
# 單播訪問的地址 域名直接改成127.0.0.1即可  或者修改下本機host將127.0.0.1對映到多個域名,我是這麼做的  我的有一個client 3個master  所以配了四個地址
# discovery.zen.ping.unicast.hosts: ["127.0.0.1:19301","127.0.0.1:19302","127.0.0.1:19303"]
discovery.zen.ping.unicast.hosts: ["127.0.0.1:19301"]

# CORS跨域訪問設定
http.cors.enabled: true
http.cors.allow-origin: "*"

# 設定master選舉需要贊同的最小節點數
# 值得計算方式為 master候選節點數除2加1
# 例如 node.master值為true的節點數量為5 值即為(5/2)+1 = 3
# discovery.zen.minimum_master_nodes: 2
discovery.zen.minimum_master_nodes: 1

然後執行reindex操作,官方給的示例:

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://oldhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

在執行reindex操作之前,可以先建立索引,並設定設定相應引數,來加快索引匯入時的初始化,這裡匯入的索引為test1:
PS:這只是優化速度的一個方面,通過調整批量寫入的數量batch_size以及匯入的併發性slices,可以加快效率,不過這兩個引數需要結合具體叢集情況進行調優,沒有統一的標準。

# 控制segment建立的速度,1s,即每秒建立一個,預設30s 這裡設定為-1,不主動建立segment,加快匯入速度
# index.refresh_interval: -1
# 控制分片的數量,為了加快初始化匯入索引的速度,這裡直接設定為0,表示關閉副本
# index.number_of_replicas: 0

curl -X PUT -H "Content-Type:application/json" "127.0.0.1:19201/test1" -d '{
    "settings":{
        "refresh_interval":-1,
        "number_of_replicas":0
    }
}'

然後執行reindex操作(資料遷移方式也可以具體控制,例如相同doc直接跳過,只建立新的等):

curl -X POST "127.0.0.1:19201/_reindex" -d '{
  "source": {
    "remote": {
      "host": "http://127.0.0.1:9201"
    },
    "index": "test1"
  },
  "dest": {
    "index": "test1"
  }
}'

上面我們為了加快匯入速度修改了配置,這裡改回來:

curl -X PUT -H "Content-Type:application/json" "127.0.0.1:19201/test1/_settings" -d '{
    "refresh_interval": "30s",
    "number_of_replicas":1
}'

然後,依次將剩餘節點加入叢集即可,加入方法,配置檔案中discovery.zen.ping.unicast.hosts:加入叢集已存在的節點和本節點。

然後修改叢集引數,之前第一個節點最小選舉數量設定的是1,三個節點容易腦裂,節點全部加入後,統一修改叢集引數:

curl -X PUT -H "Content-Type:application/json" "127.0.0.1:19201/_cluster/settings" -d '{
  "persistent": {
    "discovery.zen.minimum_master_nodes":2
  }
}'

實際生產中,索引較大,遷移時間可能會過長,這時候如果一直等待會出現請求超時的情況,所以我們可以執行後臺任務,只需增加wait_for_completion引數即可:

curl -X POST "127.0.0.1:19201/_reindex?wait_for_completion=false" -d '{
  "source": {
    "remote": {
      "host": "http://127.0.0.1:9201"
    },
    "index": "test1"
  },
  "dest": {
    "index": "test1"
  }
}'

這時請求會直接返回taskID:

{
    "task": "q270bwrnQAe3K4SBu0GW8w:5012"
}

然後我們可以通過查詢taskID來檢視請求狀態:

# GET _tasks/TASK_ID 

curl -X GET "127.0.0.1:19201/_tasks/q270bwrnQAe3K4SBu0GW8w:5012"

可以看到請求的執行狀態:

{
    "completed": true,
    "task": {
        "node": "q270bwrnQAe3K4SBu0GW8w",
        "id": 5012,
        "type": "transport",
        "action": "indices:data/write/reindex",
        "status": {
            "total": 20753,
            "updated": 20753,
            "created": 0,
            "deleted": 0,
            "batches": 21,
            "version_conflicts": 0,
            "noops": 0,
            "retries": {
                "bulk": 0,
                "search": 0
            },
            "throttled_millis": 0,
            "requests_per_second": -1,
            "throttled_until_millis": 0
        },
        "description": "reindex from [host=127.0.0.1 port=9201 query={\n  \"match_all\" : {\n    \"boost\" : 1.0\n  }\n}][test1] to [test1]",
        "start_time_in_millis": 1541498674939,
        "running_time_in_nanos": 9599382168,
        "cancellable": true,
        "headers": {}
    },
    "response": {
        "took": 9595,
        "timed_out": false,
        "total": 20753,
        "updated": 20753,
        "created": 0,
        "deleted": 0,
        "batches": 21,
        "version_conflicts": 0,
        "noops": 0,
        "retries": {
            "bulk": 0,
            "search": 0
        },
        "throttled_millis": 0,
        "requests_per_second": -1,
        "throttled_until_millis": 0,
        "failures": []
    }
}

PS:升級遷移前,可在舊的es叢集前加上代理,讓客戶端連線代理地址,然後再進行資料遷移,擇機遷移完成後修改代理指向,某種程度上可以做到不停服遷移升級