1. 程式人生 > >ES 一個索引多少分片合適?(持續更新)

ES 一個索引多少分片合適?(持續更新)

ES叢集簡圖

ES叢集簡圖

基本概念:

  • cluster node集合,同一cluster name
  • node A single Elasticsearch instance.
  • index doc集合,可能有多個shard組成
  • shard ES的分散式特性,an index is usually split into elements known as shards that are distributed across multiple nodes. shard對使用者透明由ES自動管理,按需平衡shard的分配。若確實需要調整shard,則需要reindex。
  • replica shard副本。ES預設每索引建立5 shard,1 replica。高可用(failover:HA)+負載均衡(LB)。

主片、副片區別:

  • only the primary shard can accept indexing requests. Both serve querying requests.
  • 主片靜態不可變、副片動態可修改。
  • 分片概念,是基於索引的。
number_of_shards

Replicas are primarily for search performance, and a user can add or remove them at any time.

多與少:適中

A little overallocation is good. A kagillion shards is bad.
Depends on their size and how they are being used.

shard cost:

  1. a shard is essentially a Lucene index, it consumes file handles, memory, and CPU resources.
  2. Each search request will touch a copy of every shard in the index,, which isn’t a problem when the shards are spread across several nodes. Contention arises and performance decreases when the shards are competing for the same hardware resources. 每節點一分片。
  3. Elasticsearch uses term frequency statistics to calculate relevance, but these statistics correspond to individual shards. Result in poor document relevance.

There is therefore always a need for contingency planning.

30GB/20億/每data一分片
1.5 to 3 times the number of nodes in your initial configuration.
增加節點自平衡。

one shard per index per node
need only one replica, then you’ll need twice as many nodes. Two replicas would require three times the number of nodes.