分片副本監控優化備份分詞器03

阿新 • • 發佈：2020-08-16

分片和副本

elasticsearch中，分片所在的位置與節點無關
主節點負責排程資料，從節點負責接收資料
主分片被讀寫，副本只是做備胎
elasticsearch中，一個文件儲存在哪個節點由演算法決定，演算法決定主分片的數量不能修改，所以，後期想要增加主節點的話，需要把資料重新儲存（節點自動遷移 複製）
副本和分片的數量可以在配置檔案中指定
每個節點都有能力處理任意請求，每個節點都知道任意文件所在的節點，所以可以將請求轉發到需要的節點

logstsh架構：
	資料收集 --> 過濾 --> 處理
logstsh使用ruby語言寫出來的，啟動時間較長，需要指定指令碼執行
logstsh中有很多外掛:
	input 
	filter 操作資料 資料型別轉化 資料解析 輸出
	
	
分片是 Elasticsearch 叢集分發資料的'單元'。 Elasticsearch 在重新平衡資料時可以移動分片的速度，例如發生故障後，將取決於分片的大小和數量以及網路和磁碟效能。
叢集(cluster):由一個或多個節點組成, 並通過叢集名稱與其他叢集進行區分
節點(node):單個 ElasticSearch 例項. 通常一個節點執行在一個隔離的容器或虛擬機器中
索引(index):在 ES 中, 索引是一組'文件的集合'

分片(shard):因為 ES 是個分散式的搜尋引擎, 所以'索引通常都會分解成不同部分', 而這些'分佈在不同節點的資料就是分片'. ES自動管理和組織分片, 並在必要的時候對分片資料進行再平衡分配, 所以使用者基本上不用擔心分片的處理細節.
ES中所有'資料'均衡的儲存在叢集中各個節點的分片中

副本(replica):ES '預設'為一個索引建立 5 個主分片, 並分別為其建立一個副本分片. 也就是說每個索引都由 5 個主分片, 而每個主分片都相應的有一個 copy。
對於分散式搜尋引擎來說, 分片及副本的分配將是高可用及快速搜尋響應的設計核心.'主分片與副本都能處理查詢請求'，它們的唯一區別在於只有主分片才能處理索引請求.副本對搜尋效能非常重要，同時使用者也可在任何時候新增或刪除副本。額外的副本能給帶來更大的容量, 更高的呑吐能力及更強的故障恢復能力。

    注1:避免使用非常大的分片，因為這會對群集從故障中恢復的能力產生負面影響。 對分片的大小沒有固定的限制，但是通常情況下很多場景限制在 50GB 的分片大小以內。
    小的分片會造成小的分段，從而會增加開銷。我們的目的是將平均分片大小控制在幾 GB 到幾十 GB 之間。對於基於時間的資料的使用場景來說，通常將分片大小控制在 20GB 到 40GB 之間。

    注2:當在ElasticSearch叢集中配置好你的索引後, 你要明白在叢集執行中你無法調整分片設定. 既便以後你發現需要調整分片數量, 你也只能新建建立並對資料進行重新索引(reindex)(雖然reindex會比較耗時, 但至少能保證你不會停機).
    
SN(分片數) = IS(索引大小) / 30
NN(節點數) = SN(分片數) + MNN(主節點數[無資料]) + NNN(負載節點數)

分詞器
standard analyzer(標準分詞器)
simple analyzer(簡單分詞器)
whitespace analyzer（空格分詞器）
language analyzer（語言分詞器）

一、叢集修改

1.配置ES預設分片數和副本數

設定索引的分片數,預設為5 
#index.number_of_shards: 5  

設定索引的副本數,預設為1:  
#index.number_of_replicas: 1

2.修改指定索引的副本數

PUT /index/_settings
{
  "number_of_replicas": 2
}

#可以在head外掛看到副本數量的變化
#不能修改已經建立好的索引的分片數

3.修改所有索引副本數

PUT _all/_settings
{
  "number_of_replicas": 2
}

4.建立索引時指定分片數和副本數

PUT /qiudao
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  }
}

#注意：
1.分片數不是越多越好，會佔用資源
2.每個分片都會佔用檔案控制代碼數（65535）
3.查詢資料時會根據演算法去指定節點獲取資料，分片數越少，查詢成本越低

5.企業中一般怎麼設定

1.跟開發溝通
2.看一共要幾個節點
    2個節點，預設就可以了
    3個節點，重要的資料，2副本5分片，不重要的資料，1副本5分片
3.在開始階段, 一個好的方案是根據你的節點數量按照1.5~3倍的原則來建立分片. 
    例如：如果你有3個節點, 則推薦你建立的分片數最多不超過9(3x3)個.
4.儲存資料量多的可以設定分片多一些，儲存資料量少的，可以少分寫分片

SN(分片數) = IS(索引大小) / 30
NN(節點數) = SN(分片數) + MNN(主節點數[無資料]) + NNN(負載節點數)

三、叢集的監控

1.監控內容

1.檢視叢集健康狀態
	GET _cat/health

2.檢視所有節點,使用 wc 可以得到正在執行的節點數
	GET _cat/nodes
	
#兩者有一個產生變化，說明叢集出現故障

#檢視主節點
GET _cat/master
e7EDJ7X8TMq-zPrbkLF5ew 10.0.0.52 10.0.0.52 node-52

#檢視所有索引
GET _cat/indices
green open  e SvafbN49QdGfTcBN89Zkqg 5 2     1   0  15.1kb     5kb

2.指令碼監控

[root@db01 ~]# vim es_cluster_status.py
#!/usr/bin/env python
#coding:utf-8
#Author:_syy_
#Date:2017.02.12

import smtplib
from email.mime.text import MIMEText
from email.utils import formataddr
import subprocess
body = ""
false = "false"
clusterip = "10.0.0.51"
obj = subprocess.Popen(("curl -sXGET http://"+clusterip+":9200/_cluster/health?pretty=true"),shell=True, stdout=subprocess.PIPE)
data =  obj.stdout.read()
data1 = eval(data)
status = data1.get("status")
if status == "green":
    print "\033[1;32m 叢集執行正常 \033[0m"
elif status == "yellow":
    print "\033[1;33m 副本分片丟失 \033[0m"
else:
    print "\033[1;31m 主分片丟失 \033[0m"
    
[root@db01 ~]# python es_cluster_status.py
 叢集執行正常

3.監控外掛 x-pack

四、ES優化

1.限制記憶體

1.es啟動記憶體最大是32G,超過則es起不來
2.伺服器一半的記憶體全都給ES，剩下的一半給lucenc使用
3.設定可以先給小一點，慢慢提高記憶體
4.'記憶體不足時'
	1）讓開發刪除資料
	2）增加es節點
	3）提高es伺服器硬體配置
5.關閉swap空間，swap off,業務伺服器一半開啟swap空間

2.檔案描述符

1.配置檔案描述符
[root@db02 ~]# vim /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 131072
* hard nofile 131072
* - nofile 65535

2.普通使用者
[root@db02 ~]# vim /etc/security/limits.d/20-nproc.conf 
*          soft    nproc     65535
root       soft    nproc     unlimited

[root@db02 ~]# vim /etc/security/limits.d/90-nproc.conf 
*          soft    nproc     65535
root       soft    nproc     unlimited

3.語句優化

1.條件查詢時，使用term查詢，減少range的查詢
2.建索引的時候，儘量使用命中率高的詞

五、資料備份與恢復

0.安裝npm環境

#安裝npm（只需要在一個節點安裝即可，如果前端還有nginx做反向代理可以每個節點都裝）
[root@elkstack01 ~]# yum install -y npm
#進入下載head外掛程式碼目錄
[root@elkstack01 src]# cd /usr/local/
#從GitHub上克隆程式碼到本地
[root@elkstack01 local]# git clone git://github.com/mobz/elasticsearch-head.git
#克隆完成後，進入elasticsearch外掛目錄
[root@elkstack01 local]# cd elasticsearch-head/
#清除快取
[root@elkstack01 elasticsearch-head]# npm cache clean -f
#使用npm安裝n模組（不同的專案js指令碼所需的node版本可能不同，所以就需要node版本管理工具）

1.安裝備份工具

[root@db01 ~]# npm install elasticdump -g

2.備份命令

幫助文件：https://github.com/elasticsearch-dump/elasticsearch-dump
[elasticsearch](https://www.cnblogs.com/JimShi/p/11244126.html)

1）備份引數

--input: 資料來源
--output: 接收資料的目標
--type: 匯出的資料型別（settings, analyzer, data, mapping, alias, template）

2）備份資料到另一個ES叢集

elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=http://100.10.0.51:9200/student \
  --type=analyzer
  
elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=http://100.10.0.51:9200/student \
  --type=mapping
  
elasticdump \
--input=http://10.0.0.51:9200/student \
--output=http://100.10.0.51:9200/student \
--type=data

elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=http://100.10.0.51:9200/student \
  --type=template

3）備份資料到本地的json檔案

elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=/tmp/student_mapping.json \
  --type=mapping
  
elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=/tmp/student_data.json \
  --type=data
......

4）匯出檔案打包

elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=$ \
  | gzip > /data/student.json.gz

5）備份指定條件的資料

elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=query.json \
  --searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}"

3.匯入命令

elasticdump \
  --input=./student_template.json \
  --output=http://10.0.0.51:9200 \
  --type=template
  
elasticdump \
  --input=./student_mapping.json \
  --output=http://10.0.0.51:9200 \
  --type=mapping
  
elasticdump \
  --input=./student_data.json \
  --output=http://10.0.0.51:9200 \
  --type=data
  
elasticdump \
  --input=./student_analyzer.json \
  --output=http://10.0.0.51:9200 \
  --type=analyzer

#恢復資料的時候，如果資料已存在，會覆蓋原資料

4.備份指令碼

#!/bin/bash
read -p '要備份的機器是：' host
read -p '要備份的索引是：' index_name

for index in `echo $index_name`
do
    echo "start output index ${index}"
    elasticdump --input=http://${host}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
    elasticdump --input=http://${host}:9200/${index} --output=/data/${index}_analyzer.json --type=analyzer &> /dev/null
    elasticdump --input=http://${host}:9200/${index} --output=/data/${index}_data.json --type=data &> /dev/null
    elasticdump --input=http://${host}:9200/${index} --output=/data/${index}_alias.json --type=mapping &> /dev/null
    elasticdump --input=http://${host}:9200/${index} --output=/data/${index}_template.json --type=template &> /dev/null
done

mkdir /data -p

5.匯入指令碼

#!/bin/bash
read -p '要備份的機器是：' host
read -p '要備份的索引是：' index_name

for index in `echo $index_name`
do
    echo "start input index ${index}"
    elasticdump --input=/data/${index}_alias.json --output=http://${host}:9200/${index} --type=alias &> /dev/null
    elasticdump --input=/data/${index}_analyzer.json --output=http://${host}:9200/${index} --type=analyzer &> /dev/null
    elasticdump --input=/data/${index}_data.json --output=http://${host}:9200/${index} --type=data &> /dev/null
    elasticdump --input=/data/${index}_template.json --output=http://${host}:9200/${index} --type=template &> /dev/null
    elasticdump --input=/data/${index}_mapping.json --output=http://${host}:9200/${index} --type=template &> /dev/null
done

六、中文分詞器 ik

1.插入資料

POST /index/_doc/1
{"content":"美國留給伊拉克的是個爛攤子嗎"}

POST /index/_doc/2
{"content":"公安部：各地校車將享最高路權"}

POST /index/_doc/3
{"content":"中韓漁警衝突調查：韓警平均每天扣1艘中國漁船"}

POST /index/_doc/4
{"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"}

2.查詢資料

POST /index/_search
{
  "query" : { "match" : { "content" : "中國" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
  }
}

#檢視結果，會獲取到帶中字和國字的資料，我們查詢的詞被分開了，所以我們要使用ik中文分詞器

3.配置中文分詞器

1）安裝外掛

[root@db01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip
[root@db02 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip
[root@db03 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

#解壓到es目錄下
[root@db01 ~]# unzip elasticsearch-analysis-ik-6.6.0.zip -d /etc/elasticsearch/

2）建立索引與mapping

curl -XPOST http://localhost:9200/news/text/_mapping -H 'Content-Type:application/json' -d'
{
    "properties": {
         "content": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart"
        }
    }
}

3）編輯我們要定義的詞

[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 擴充套件配置</comment>
    <!--使用者可以在這裡配置自己的擴充套件字典 -->
    <entry key="ext_dict">/etc/elasticsearch/analysis-ik/my.dic</entry>
    
[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/my.dic 
中國

[root@redis01 ~]# chown -R elasticsearch.elasticsearch /etc/elasticsearch/analysis-ik/my.dic

4）重新插入資料

POST /news/text/1
{"content":"美國留給伊拉克的是個爛攤子嗎"}

POST /news/text/2
{"content":"公安部：各地校車將享最高路權"}

POST /news/text/3
{"content":"中韓漁警衝突調查：韓警平均每天扣1艘中國漁船"}

POST /news/text/4
{"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"}

5）重新查詢資料

POST /news/_search
{
  "query" : { "match" : { "content" : "中國" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
   }
}

分片副本監控優化備份分詞器03

分片和副本 elasticsearch中，分片所在的位置與節點無關主節點負責排程資料，從節點負責接收資料

北大開源中文分詞器被打臉現場...

有做過搜尋的朋友知道，分詞的好壞直接影響我們最終的搜尋結果。在分詞的領域，英文分詞要簡單很多，因為英文語句中都是通過一個個空格來劃分的，而我們的中文博大精深，同樣的詞在不同的語境中所代表的含義千差萬別

Elasticsearch入門(1)-倒排索引和分詞器

這部分檔案主要包含：倒排索引 Analyzer分詞倒排索引舉例類比做個類比，看書時，我們看到了哪個章節，根據章節標題去目錄中檢索具體的內容。但是當我們回憶起一些隻言片語，一些句子，一些情節時，去定位它出

Elasticsearch從入門到放棄：分詞器初印象

Elasticsearch 系列回來了，先給因為這個系列關注我的同學說聲抱歉，拖了這麼久才回來，這個系列雖然叫「Elasticsearch 從入門到放棄」，但只有三篇就放棄還是有點過分的，所以還是回來繼續更新。

Elasticsearch 建立ik中文分詞器

一、建立ik中文分詞器 1、下載ik中文分詞器進入https://github.com/medcl/elasticsearch-analysis-ik

IK分詞器的安裝與使用

分詞器什麼是IK分詞器？分詞：即把一段中文或者別的劃分成一個個的關鍵字，我們在搜尋時會把自己的資訊進行分詞，會把資料庫中或者索引庫中的資料進行分詞，然後進行一個匹配操作，Elasticsearch的標準分詞器，會將

DockerFile構建ElasticSearch映象安裝IK中文分詞器外掛

DockerFile構建ElasticSearch映象安裝IK中文分詞器外掛為什麼要安裝IK中文分詞器？

ELASTIC-PHP + IK分詞器 + THINKPHP6 初次使用（關鍵詞查詢）

環境：centos 6 php73 mysql56 ELASTIC7.71 1.安裝elastic 使用華為雲映象更快哦https://mirrors.huaweicloud.com/elasticsearch/

ElasticSearch-分詞器analyzer

analyzer 分詞器使用的兩個情形：1，Index time analysis. 建立或者更新文件時，會對文件進行分詞2，Search time analysis. 查詢時，對查詢語句分詞

ES新增elasticsearch-analysis-ik分詞器

1、下載分詞器包 https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.4.3 2、將分詞器解壓並放入plugins目錄下。（一定要在plugins下面建立一個ik檔案，然後將解壓的elasticsearch-analysis-ik檔

springboot整合elasticsearch+ik分詞器+kibana

SpringBoot整合Elasticsearch+IK+Kibana ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎，基於RESTful web介面。

中文分詞器

1.安裝外掛 [root@db01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

Elasticsearch、分詞器、kibana的linux安裝和使用

安裝包提供https://pan.baidu.com/s/1qeRSkws2e1RKoRAWg7zUXw 提取碼p8ne 由於es出於安全考慮，不可以用root使用者操作es。

win10 安裝Elasticsearch(es)和IK分詞器

1 安裝Elasticsearch 7.x 1.1 下載地址 https://www.elastic.co/cn/downloads/elasticsearch 1.2 下載後解壓的目錄結構

elasticsearch 安裝IK中文分詞器

1.去github下載zip包具體地址：https://github.com/medcl/elasticsearch-analysis-ik 因為我本地裝的是7.4.2版本，我ik分詞器也下載這個版本

linux 安裝ElasticSearch的中文分詞器IK

首先確保ElasticSearch映象已經啟動一定要保證ElasticSearch和ElasticSearch外掛的版本一致

（2）ElasticSearch在linux環境中整合IK分詞器

1.簡介 ElasticSearch預設自帶的分詞器，是標準分詞器，對英文分詞比較友好，但是對中文，只能把漢字一個個拆分。而elasticsearch-analysis-ik分詞器能針對中文詞項顆粒度進行粗細提取，所以對中文搜尋是比較友好的。

solr8.6新增中文分詞器

1.新增solr8 自帶分詞工具（1）在solr安裝資料夾下面找到這個lucene-analyzers-smartcn-8.6.0.jar包

elasticsearch(v2.4.6)新增中文分詞器ik

一、參考 ik github文件將maven源改為國內阿里雲映象二、編譯安裝 analysis-ik 2.1 下載原始碼

【Elasticsearch】之中文分詞器ik

技術標籤：Elasticsearch 應用筆記elasticsearcheselk 安裝分詞外掛ik mkdir plugins/ik cp elasticsearch-analysis-ik-6.5.4.zip plugins/ik

分片副本監控優化備份分詞器03

分片和副本

一、叢集修改

1.配置ES預設分片數和副本數

2.修改指定索引的副本數

3.修改所有索引副本數

4.建立索引時指定分片數和副本數

5.企業中一般怎麼設定

三、叢集的監控

1.監控內容

2.指令碼監控

3.監控外掛 x-pack

四、ES優化

1.限制記憶體

2.檔案描述符

3.語句優化

五、資料備份與恢復

0.安裝npm環境

1.安裝備份工具

2.備份命令

1）備份引數

2）備份資料到另一個ES叢集

3）備份資料到本地的json檔案

4）匯出檔案打包

5）備份指定條件的資料

3.匯入命令

4.備份指令碼

5.匯入指令碼

六、中文分詞器 ik

1.插入資料

2.查詢資料

3.配置中文分詞器

1）安裝外掛

2）建立索引與mapping

3）編輯我們要定義的詞

4）重新插入資料

5）重新查詢資料

相關推薦