18-elasticsearch叢集健康為黃色
1.問題1:新加入一個數據後,叢集由綠色轉為黃色
描述:
windows下只有一臺機器,在Kibana中使用以下命令
新建索引:
設定這個索引分片為1
PUT /megacorp
{
"settings":{
"number_of_shards":1,
"number_of_replicas":1
}
}
新增資料
POST /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25 ,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
介面黃色,提示UNASSIGNED
後臺檢視所有分片狀態
GET /_cat/shards
結果
.monitoring-alerts-6 0 p STARTED 1 6.1kb 127.0.0.1 6wPQIP5
megacorp 0 p STARTED 3 16.4kb 127.0.0.1 6wPQIP5
megacorp 0 r UNASSIGNED
.monitoring-kibana-6-2018.02.13 0 p STARTED 1824 1mb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.11 0 p STARTED 194 76.2kb 127.0.0.1 6wPQIP5
.watcher-history-7-2018.02.11 0 p STARTED 272 363.4kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.13 0 p STARTED 27920 19.7mb 127.0.0.1 6wPQIP5
.triggered_watches 0 p STARTED 0 5.1kb 127.0.0.1 6wPQIP5
.watches 0 p STARTED 6 25.1kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.11 0 p STARTED 4888 2.4mb 127.0.0.1 6wPQIP5
.security-6 0 p STARTED 3 9.8kb 127.0.0.1 6wPQIP5
1.1 解決方法(未解決)
查詢得到master節點的唯一標識
命令:
GET /_nodes/process?pretty
結果:
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "elasticsearch",
"nodes": {
"6wPQIP5sT_qEmEpyJm8i6w": {
"name": "6wPQIP5",
"transport_address": "127.0.0.1:9300",
"host": "127.0.0.1",
"ip": "127.0.0.1",
"version": "6.2.1",
"build_hash": "7299dc3",
"roles": [
"master",
"data",
"ingest"
],
"process": {
"refresh_interval_in_millis": 1000,
"id": 10892,
"mlockall": false
}
}
}
}
執行reroute(分多次, 變更shard的值為UNASSIGNED查詢結果中編號, 上一步查詢結果是1和3)
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "pv-2015.05.22",
"shard" : 1,
"node" : "AfUyuXmGTESHXpwi4OExxx",
"allow_primary" : true
}
}
]
}'
這裡我windows下無法使用,暫時沒測試
批量處理的指令碼(當數量很多的話, 注意替換node的名字)
#!/bin/bash
for index in $(curl -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | awk '{print $1}' | sort | uniq); do
for shard in $(curl -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | grep $index | awk '{print $2}' | sort | uniq); do
echo $index $shard
curl -XPOST 'localhost:9200/_cluster/reroute' -d "{
'commands' : [ {
'allocate' : {
'index' : $index,
'shard' : $shard,
'node' : 'Master',
'allow_primary' : true
}
}
]
}"
sleep 5
done
done
1.2 解決方法(未解決)
1.在J:\elasticsearch\elasticsearch-6.2.1\config\elasticsearch.yml檔案加入配置
cluster.routing.allocation.enable : all
重啟es,任然黃色 已經開啟了自動分配功能。那就很奇怪了,為什麼這個節點沒有分配呢。
新增新配置
index.number_of_replicas: 0
列印如下,並且直接退出了
J:\elasticsearch\elasticsearch-6.2.1\bin>elasticsearch.bat
*************************************************************************************
Found index level settings on node level configuration.
Since elasticsearch 5.x index level settings can NOT be set on the nodes
configuration like the elasticsearch.yaml, in system properties or command line
arguments.In order to upgrade all indices the settings must be updated via the
/${index}/_settings API. Unless all settings are dynamic all indices must be closed
in order to apply the upgradeIndices created in the future should use index templates
to set default values.
Please ensure all required values are updated on all indices by executing:
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
"index.number_of_replicas" : "0"
}'
*************************************************************************************
[2018-02-13T23:16:03,368][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: node settings must not contain any index level settings
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.2.1.jar:6.2.1]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85) ~[elasticsearch-6.2.1.jar:6.2.1]
Caused by: java.lang.IllegalArgumentException: node settings must not contain any index level settings
at org.elasticsearch.common.settings.SettingsModule.<init>(SettingsModule.java:128) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.node.Node.<init>(Node.java:331) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.node.Node.<init>(Node.java:246) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.1.jar:6.2.1]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.1.jar:6.2.1]
... 6 more
J:\elasticsearch\elasticsearch-6.2.1\bin>
1.3 解決方法(未解決)
因為所有primary shards都是好的,所有replica shards有問題,那麼我強制刪除掉replica shards,讓es再重新生成,不就ok了嗎。
首先先將出問題的index的副本為0
檢視節點
.watcher-history-7-2018.02.11 0 p STARTED 272 363.3kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.11 0 p STARTED 4888 2.4mb 127.0.0.1 6wPQIP5
megacorp 0 p STARTED 3 16.4kb 127.0.0.1 6wPQIP5
megacorp 0 r UNASSIGNED
.watches 0 p STARTED 6 25kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.13 0 p STARTED 24874 17.6mb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.13 0 p STARTED 1649 1.1mb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.11 0 p STARTED 194 76.2kb 127.0.0.1 6wPQIP5
.triggered_watches 0 p STARTED 0 5.1kb 127.0.0.1 6wPQIP5
.monitoring-alerts-6 0 p STARTED 1 6.1kb 127.0.0.1 6wPQIP5
.security-6 0 p STARTED 3 9.8kb 127.0.0.1 6wPQIP5
將出問題的index的副本為0
PUT /megacorp/_settings
{
"index" : {
"number_of_replicas" : 0
}
}
結果如下
{
"acknowledged": true
}
檢視節點
命令
GET /_cat/shards
結果
.monitoring-es-6-2018.02.13 0 p STARTED 27097 19.6mb 127.0.0.1 6wPQIP5
.monitoring-alerts-6 0 p STARTED 1 6.1kb 127.0.0.1 6wPQIP5
.watches 0 p STARTED 6 25.1kb 127.0.0.1 6wPQIP5
.security-6 0 p STARTED 3 9.8kb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.11 0 p STARTED 194 76.2kb 127.0.0.1 6wPQIP5
.monitoring-kibana-6-2018.02.13 0 p STARTED 1766 1.1mb 127.0.0.1 6wPQIP5
megacorp 0 p STARTED 3 16.4kb 127.0.0.1 6wPQIP5
.watcher-history-7-2018.02.11 0 p STARTED 272 363.4kb 127.0.0.1 6wPQIP5
.triggered_watches 0 p STARTED 0 5.1kb 127.0.0.1 6wPQIP5
.monitoring-es-6-2018.02.11 0 p STARTED 4888 2.4mb 127.0.0.1 6wPQIP5
沒有 replica shards 了。
接下來再恢復回去:
PUT /megacorp/_settings
{
"index" : {
"number_of_replicas" : 1
}
}
檢視
megacorp 0 p STARTED 3 16.4kb 127.0.0.1 6wPQIP5
megacorp 0 r UNASSIGNED
還是未解決
1.4解決方法(解決)
叢集的健康狀況為 yellow 則表示全部 主 分片(number_of_shards
這個引數的設定)都正常執行(叢集可以正常服務所有請求),但是 副本 分片沒有全部處在正常狀態。 實際上,單節點無論有多少個副本分片(number_of_replicas
的設定)都是 unassigned —— 它們都沒有被分配到任何節點。 在同一個節點上既儲存原始資料又儲存副本是沒有意義的,因為一旦失去了那個節點,我們也將丟失該節點上的所有副本資料。
因此我們在只有一臺機器的時候,新建索引要這麼建立
PUT /website
{
"settings":{
"number_of_shards":1,
"number_of_replicas":0
}
}
建立索引website,設定主分片為1,沒有副分片,這樣就可以了,開啟http://localhost:9100/發現我們的叢集是綠色了
1.4 總結:yellow
原因
yellow表示所有主分片可用,但不是所有副本分片都可用,最常見的情景是單節點時,由於es預設是有1個副本,主分片和副本不能在同一個節點上,所以副本就是未分配unassigned
所以單節點是沒法分配副本的
處理
過濾檢視所有未分配索引的方式,curl -s “http://10.19.22.142:9200/_cat/shards” | grep UNASSIGNED結果如下,第一列表示索引名,第二列表示分片編號,第三列p是主分片,r是副本
知道哪個索引的哪個分片就開始手動修復,通過reroute的allocate分配
curl -XPOST '{ESIP}:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "eslog1",
"shard" : 4,
"node" : "es1",
"allow_primary" : true
}
}
]
}'
分配時可能遇到的坑,需要注意的地方
- 分配副本時必須要帶引數”allow_primary” : true, 不然會報錯
- 當叢集中es版本不同時,如果這個未分配的分片是高版本生成的,不能分配到低版本節點上,反過來低版本的分片可以分配給高版本,如果遇到了,只要升級低版本節點的ES版本即可
- (升級ES版本詳見官方詳細文件,我是ubuntu系統apt安裝的,直接apt-get install
elasticsearch升級的,elasticsearch.yml配置檔案沒變不用修改,但是/usr/share/elasticsearch/bin/elasticsearch檔案中有個記憶體配置ES_HEAP_SIZE=6G需要再手動加一下&重啟es)