Elasticsearch分片、副本與路由(shard replica routing)

阿新 • • 發佈：2022-05-02

本文講述，如何理解Elasticsearch的分片、副本和路由策略。

1、預備知識

1）分片（shard）

Elasticsearch叢集允許系統儲存的資料量超過單機容量，實現這一目標引入分片策略shard。在一個索引index中，資料（document）被分片處理（sharding）到多個分片上。Elasticsearch遮蔽了管理分片的複雜性，使得多個分片呈現出一個大索引的樣子。

2）副本（replica）

為了提升訪問壓力過大是單機無法處理所有請求的問題，Elasticsearch叢集引入了副本策略replica。副本策略對index中的每個分片建立冗餘的副本，處理查詢時可以把這些副本當做主分片來對待（primary shard），此外副本策略提供了高可用和資料安全的保障，當分片所在的機器宕機，Elasticsearch可以使用其副本進行恢復，從而避免資料丟失。

3）路由（routing）

當向Elasticsearch存放資料時，根據文件識別符號_id將文件分配到多個分片上，負載均衡演算法只需要實現平均即可。當取用資料時，查詢所有的分片然後彙總結果，而並不必須知道資料到底存在哪個分片上。帶來的問題是，在查詢時，要查詢所有的分片然後彙總結果，造成效能的損耗，在不樂觀的情況下，有些分片的查詢可能失敗(failed)，造成結果不準確。為了避免這個問題，引入了路由功能（routing），在存入時通過路由鍵將資料存入指定分片，在查詢的時候可以通過相同的路由鍵指明在哪個分片將資料查出來。

預設情況下，索引資料的分片演算法如下

shard_num = hash(_routing) % num_primary_shards

routing欄位的取值，預設是_id欄位或者是_parent欄位，這樣的取值在hash之後再與有多少個shard的數量取模，最終得到這條資料應該在被分配在那個一個shard上，也就是說預設是基於hash的分片，保證在每個shard上資料量都近似平均，這樣就不會出現負載不均衡的情況，然後在檢索的時候，es預設會搜尋所有shard上的資料，最後在master節點上匯聚在處理後，返回最終資料。

假設你有一個100個分片的索引。當一個請求在叢集上執行時會發生什麼呢？

1. 這個搜尋的請求會被髮送到一個節點

2. 接收到這個請求的節點，將這個查詢廣播到這個索引的每個分片上（可能是主分片，也可能是複製分片）

3. 每個分片執行這個搜尋查詢並返回結果

4. 結果在通道節點上合併、排序並返回給使用者

2、分片(shard)與副本(replica)的數量

ElasticSearch在建立索引資料時，最好指定相關的shards數量和replicas，否則會使用伺服器中的預設配置引數shards=5，replicas=1。

index.number_of_shards: 5
index.number_of_replicas: 1

對於一個索引來說，number_of_shards只能設定一次，而number_of_replicas可以使用索引更新設定API在任何時候被增加或者減少。

那麼如何確定分片和副本的數量呢？

依照經驗，最理想的分片數量應該依賴於節點的數量。假設索引index配置了10個分片，1個副本，那麼總共的分片數應該是20個，10 *（1+1），那麼最大的Elasticsearch節點數應該就是20。

節點最大數 = 分片數 * （副本數 + 1）

3、路由功能

1）安裝Paramedic外掛

Elasticsearch提供了很多外掛化功能，Paramedic可以直觀的檢視Elasticsearch對資料的分片和副本。

[bigdata-dw@bigdata-arch-client10 es2.1.1]$ ./bin/plugin install karmi/elasticsearch-paramedic
-> Installing karmi/elasticsearch-paramedic...
Trying https://github.com/karmi/elasticsearch-paramedic/archive/master.zip ...
Downloading ............................................................................................................................................................................................................................................DONE
Verifying https://github.com/karmi/elasticsearch-paramedic/archive/master.zip checksums if available ...
NOTE: Unable to verify checksum for downloaded plugin (unable to find .sha1 or .md5 file to verify)
Installed paramedic into /home/bigdata-dw/es2.1.1/plugins/paramedic

2）建立索引documents

建立ducuments索引，包含3個分片，1個副本。

[bigdata-dw@bigdata-arch-client10 es2.1.1]$ curl -XPUT http://10.93.21.21:8049/documents -d '{
> settings: {
>  number_of_replicas: 1,
>  number_of_shards: 3
>  }
> }'
{"acknowledged":true}

3）在索引資料的過程中使用路由

我們建立3個Document

id=1

curl -XPUT http://10.93.21.21:8049/documents/doc/1?routing=A -d '{"title": "Document"}'
{"_index":"documents","_type":"doc","_id":"1","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}

id=2

curl -XPUT http://10.93.21.21:8049/documents/doc/2?routing=A -d '{"title": "Document"}'
{"_index":"documents","_type":"doc","_id":"2","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}

id=3

curl -XPUT http://10.93.21.21:8049/documents/doc/3?routing=A -d '{ "title": "Document"}'
{"_index":"documents","_type":"doc","_id":"3","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}

查詢一下，可以看到document中是帶有_routing鍵的。

curl -XGET 'http://10.93.21.21:8049/documents/_search?pretty'
{
  "took" : 51,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "documents",
      "_type" : "doc",
      "_id" : "1",
      "_score" : 1.0,
      "_routing" : "A",
      "_source":{
"title": "Document"}
    }, {
      "_index" : "documents",
      "_type" : "doc",
      "_id" : "2",
      "_score" : 1.0,
      "_routing" : "A",
      "_source":{
"title": "Document"}
    }, {
      "_index" : "documents",
      "_type" : "doc",
      "_id" : "3",
      "_score" : 1.0,
      "_routing" : "A",
      "_source":{
"title": "Document"}
    } ]
  }
}

在Paramedic中檢視

4）在查詢中使用路由

使用路由鍵“A”進行查詢，可以看到_shards.total=1，便可知只查詢了一個分片，這個分片便是路由鍵“A”算出的分片，在這個分片中可以查出我們以路由鍵“A”存入的資料

curl -XGET 'http://10.93.21.21:8049/documents/_search?pretty&q=*:*&routing=A'
{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "documents",
      "_type" : "doc",
      "_id" : "1",
      "_score" : 1.0,
      "_routing" : "A",
      "_source":{
"title": "Document"}
    }, {
      "_index" : "documents",
      "_type" : "doc",
      "_id" : "2",
      "_score" : 1.0,
      "_routing" : "A",
      "_source":{
"title": "Document"}
    }, {
      "_index" : "documents",
      "_type" : "doc",
      "_id" : "3",
      "_score" : 1.0,
      "_routing" : "A",
      "_source":{
"title": "Document"}
    } ]
  }
}

使用路由鍵“B”，可以看到_shards.total=1，也是隻查詢由路由鍵“B”指定的分片，在這個分片中不能查出我們以路由鍵“A”存入的資料

curl -XGET 'http://10.93.21.21:8049/documents/_search?pretty&q=*:*&routing=B'
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

總結一下路由的優點

1）只查詢一個shard，避免在其他shard無用的查詢與master上的合併，提升了查詢效率。

2）在nodes與shards較多的大規模叢集中，在多個shards上查詢出現failed的可能性較大，在master上合併後，對資料完整性並不能很好的確定，使用routing可以有效避免。例如在total=64個shards的索引上查詢，successful=60，failed=4，這時候對合並的資料，我們不能保證其是完整的。

Elasticsearch分片、副本與路由(shard replica routing)

Elasticsearch分片、副本與路由(shard replica routing)

vue守衛、儲存與路由模式

第二章、檢視與路由

Elasticsearch節點，叢集，分片及副本

實時電商數倉（二十一）之實時資料儲存與查詢（十一) Clickhouse (六) 副本與分片叢集

靜態路由、Track與NQA聯動配置舉例

策略路由、Track與NQA聯動配置總結-H3C

淺析Nuxt.js主要作用、應用架構（5步）及其模式選擇介紹-SPA路由請求執行流程、Universal模式（SSR渲染執行流程路由請求流程、SSG渲染、SSR與SSG對比）

Kafka中失效副本與ISR伸縮流程

Node 中 Cookie、Session 與 Redis 快取

BIO、NIO與AIO

IntelliJ IDEA 2019.2 x64的安裝、應用與簡單配置(圖文)

MySQL中or、in、union與索引優化詳析

redis學習之RDB、AOF與複製時對過期鍵的處理教程

Windows環境下MySQL 8.0 的安裝、配置與解除安裝

MySQL儲存過程概念、原理與常見用法詳解

MySQL觸發器概念、原理與用法詳解

PostgreSQL實現批量插入、更新與合併操作的方法

MongoDB分片在部署與維護管理中常見的事項總結大全

關於Mysql隔離級別、鎖與MVCC介紹

Elasticsearch分片、副本與路由(shard replica routing)

相關推薦