1. 程式人生 > >18-elasticsearch叢集健康為黃色

18-elasticsearch叢集健康為黃色

1.問題1:新加入一個數據後,叢集由綠色轉為黃色

描述:
windows下只有一臺機器,在Kibana中使用以下命令
新建索引:
設定這個索引分片為1

PUT /megacorp
{
  "settings":{
           "number_of_shards":1,     
           "number_of_replicas":1
  }
}

新增資料

POST /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25
, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }

介面黃色,提示UNASSIGNED

這裡寫圖片描述

後臺檢視所有分片狀態

GET /_cat/shards

結果

.monitoring-alerts-6            0 p STARTED        1   6.1kb 127.0.0.1 6wPQIP5
megacorp                        0 p STARTED        3  16.4kb 127.0.0.1 6wPQIP5
megacorp                        0
r UNASSIGNED .monitoring-kibana-6-2018.02.13 0 p STARTED 1824 1mb 127.0.0.1 6wPQIP5 .monitoring-kibana-6-2018.02.11 0 p STARTED 194 76.2kb 127.0.0.1 6wPQIP5 .watcher-history-7-2018.02.11 0 p STARTED 272 363.4kb 127.0.0.1 6wPQIP5 .monitoring-es-6-2018.02.13 0 p STARTED 27920
19.7mb 127.0.0.1 6wPQIP5 .triggered_watches 0 p STARTED 0 5.1kb 127.0.0.1 6wPQIP5 .watches 0 p STARTED 6 25.1kb 127.0.0.1 6wPQIP5 .monitoring-es-6-2018.02.11 0 p STARTED 4888 2.4mb 127.0.0.1 6wPQIP5 .security-6 0 p STARTED 3 9.8kb 127.0.0.1 6wPQIP5

1.1 解決方法(未解決)

查詢得到master節點的唯一標識
命令:

GET /_nodes/process?pretty

結果:

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "6wPQIP5sT_qEmEpyJm8i6w": {
      "name": "6wPQIP5",
      "transport_address": "127.0.0.1:9300",
      "host": "127.0.0.1",
      "ip": "127.0.0.1",
      "version": "6.2.1",
      "build_hash": "7299dc3",
      "roles": [
        "master",
        "data",
        "ingest"
      ],
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 10892,
        "mlockall": false
      }
    }
  }
}

執行reroute(分多次, 變更shard的值為UNASSIGNED查詢結果中編號, 上一步查詢結果是1和3)

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "pv-2015.05.22",
                  "shard" : 1,
                  "node" : "AfUyuXmGTESHXpwi4OExxx",
                  "allow_primary" : true
              }
            }
        ]
    }'

這裡我windows下無法使用,暫時沒測試

批量處理的指令碼(當數量很多的話, 注意替換node的名字)

#!/bin/bash

for index in $(curl  -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | awk '{print $1}' | sort | uniq); do
    for shard in $(curl  -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | grep $index | awk '{print $2}' | sort | uniq); do
        echo  $index $shard

        curl -XPOST 'localhost:9200/_cluster/reroute' -d "{
            'commands' : [ {
                  'allocate' : {
                      'index' : $index,
                      'shard' : $shard,
                      'node' : 'Master',
                      'allow_primary' : true
                  }
                }
            ]
        }"

        sleep 5
    done
done

1.2 解決方法(未解決)

1.在J:\elasticsearch\elasticsearch-6.2.1\config\elasticsearch.yml檔案加入配置

cluster.routing.allocation.enable : all

重啟es,任然黃色 已經開啟了自動分配功能。那就很奇怪了,為什麼這個節點沒有分配呢。
新增新配置

index.number_of_replicas: 0

列印如下,並且直接退出了

J:\elasticsearch\elasticsearch-6.2.1\bin>elasticsearch.bat
*************************************************************************************
Found index level settings on node level configuration.

Since elasticsearch 5.x index level settings can NOT be set on the nodes
configuration like the elasticsearch.yaml, in system properties or command line
arguments.In order to upgrade all indices the settings must be updated via the
/${index}/_settings API. Unless all settings are dynamic all indices must be closed
in order to apply the upgradeIndices created in the future should use index templates
to set default values.

Please ensure all required values are updated on all indices by executing:

curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
  "index.number_of_replicas" : "0"
}'
*************************************************************************************

[2018-02-13T23:16:03,368][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: node settings must not contain any index level settings
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.2.1.jar:6.2.1]
        at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85) ~[elasticsearch-6.2.1.jar:6.2.1]
Caused by: java.lang.IllegalArgumentException: node settings must not contain any index level settings
        at org.elasticsearch.common.settings.SettingsModule.<init>(SettingsModule.java:128) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.node.Node.<init>(Node.java:331) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.node.Node.<init>(Node.java:246) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.1.jar:6.2.1]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.1.jar:6.2.1]
        ... 6 more

J:\elasticsearch\elasticsearch-6.2.1\bin>

1.3 解決方法(未解決)

因為所有primary shards都是好的,所有replica shards有問題,那麼我強制刪除掉replica shards,讓es再重新生成,不就ok了嗎。
首先先將出問題的index的副本為0

檢視節點

.watcher-history-7-2018.02.11   0 p STARTED      272 363.3kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.11     0 p STARTED     4888   2.4mb 127.0.0.1 6wPQIP5
megacorp                        0 p STARTED        3  16.4kb 127.0.0.1 6wPQIP5
megacorp                        0 r UNASSIGNED                         
.watches                        0 p STARTED        6    25kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.13     0 p STARTED    24874  17.6mb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.13 0 p STARTED     1649   1.1mb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.11 0 p STARTED      194  76.2kb 127.0.0.1 6wPQIP5
.triggered_watches              0 p STARTED        0   5.1kb 127.0.0.1 6wPQIP5
.monitoring-alerts-6            0 p STARTED        1   6.1kb 127.0.0.1 6wPQIP5
.security-6                     0 p STARTED        3   9.8kb 127.0.0.1 6wPQIP5

將出問題的index的副本為0

PUT /megacorp/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}

結果如下

{
  "acknowledged": true
}

檢視節點

命令
GET /_cat/shards

結果
.monitoring-es-6-2018.02.13     0 p STARTED 27097  19.6mb 127.0.0.1 6wPQIP5
.monitoring-alerts-6            0 p STARTED     1   6.1kb 127.0.0.1 6wPQIP5
.watches                        0 p STARTED     6  25.1kb 127.0.0.1 6wPQIP5
.security-6                     0 p STARTED     3   9.8kb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.11 0 p STARTED   194  76.2kb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.13 0 p STARTED  1766   1.1mb 127.0.0.1 6wPQIP5
megacorp                        0 p STARTED     3  16.4kb 127.0.0.1 6wPQIP5
.watcher-history-7-2018.02.11   0 p STARTED   272 363.4kb 127.0.0.1 6wPQIP5
.triggered_watches              0 p STARTED     0   5.1kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.11     0 p STARTED  4888   2.4mb 127.0.0.1 6wPQIP5

沒有 replica shards 了。
接下來再恢復回去:

PUT /megacorp/_settings
{
    "index" : {
        "number_of_replicas" : 1
    }
}

檢視

megacorp                        0 p STARTED        3  16.4kb 127.0.0.1 6wPQIP5
megacorp                        0 r UNASSIGNED                     

還是未解決

1.4解決方法(解決)

叢集的健康狀況為 yellow 則表示全部 主 分片(number_of_shards 這個引數的設定)都正常執行(叢集可以正常服務所有請求),但是 副本 分片沒有全部處在正常狀態。 實際上,單節點無論有多少個副本分片(number_of_replicas 的設定)都是 unassigned —— 它們都沒有被分配到任何節點。 在同一個節點上既儲存原始資料又儲存副本是沒有意義的,因為一旦失去了那個節點,我們也將丟失該節點上的所有副本資料。

因此我們在只有一臺機器的時候,新建索引要這麼建立

PUT /website
{
  "settings":{
           "number_of_shards":1,     
           "number_of_replicas":0
  }
}

建立索引website,設定主分片為1,沒有副分片,這樣就可以了,開啟http://localhost:9100/發現我們的叢集是綠色了

1.4 總結:yellow

原因

yellow表示所有主分片可用但不是所有副本分片都可用,最常見的情景是單節點時,由於es預設是有1個副本,主分片和副本不能在同一個節點上,所以副本就是未分配unassigned

所以單節點是沒法分配副本的

處理

過濾檢視所有未分配索引的方式,curl -s “http://10.19.22.142:9200/_cat/shards” | grep UNASSIGNED結果如下,第一列表示索引名,第二列表示分片編號,第三列p是主分片,r是副本

知道哪個索引的哪個分片就開始手動修復,通過reroute的allocate分配

curl -XPOST '{ESIP}:9200/_cluster/reroute' -d '{
    "commands" : [ {
          "allocate" : {
              "index" : "eslog1",
              "shard" : 4,
              "node" : "es1",
              "allow_primary" : true
          }
        }
    ]
}'

分配時可能遇到的坑,需要注意的地方

  • 分配副本時必須要帶引數”allow_primary” : true, 不然會報錯
  • 當叢集中es版本不同時,如果這個未分配的分片是高版本生成的,不能分配到低版本節點上,反過來低版本的分片可以分配給高版本,如果遇到了,只要升級低版本節點的ES版本即可
  • (升級ES版本詳見官方詳細文件,我是ubuntu系統apt安裝的,直接apt-get install
    elasticsearch升級的,elasticsearch.yml配置檔案沒變不用修改,但是/usr/share/elasticsearch/bin/elasticsearch檔案中有個記憶體配置ES_HEAP_SIZE=6G需要再手動加一下&重啟es)