分散式搜尋引擎ElasticSearch

阿新 • • 發佈：2020-07-29

1 ElasticSearch簡介

1.1 什麼是ElasticSearch

Elasticsearch是一個實時的分散式搜尋和分析引擎。它可以幫助你用前所未有的速度去處理大規模資料。ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎，基於RESTful web介面。Elasticsearch是用Java開發
的，並作為Apache許可條款下的開放原始碼釋出，是當前流行的企業級搜尋引擎。設計用於雲端計算中，能夠達到實時搜尋，穩定，可靠，快速，安裝使用方便。

1.2 ElasticSearch特點

（1）可以作為一個大型分散式叢集（數百臺伺服器）技術，處理PB級資料，服務大公司；也可以執行在單機上

（2）將全文檢索、資料分析以及分散式技術，合併在了一起，才形成了獨一無二的ES；
（3）開箱即用的，部署簡單
（4）全文檢索，同義詞處理，相關度排名，複雜資料分析，海量資料的近實時處理

1.3 ElasticSearch體系結構

下表是Elasticsearch與MySQL資料庫邏輯結構概念的對比

2 ElasticSearch部署與啟動

2.1 安裝JDK

ElasticSearch是基於lucence開發的，也就是執行需要java jdk支援。所以要先安裝JAVA環境。由於ElasticSearch 5.x 往後依賴於JDK 1.8的，所以下載JDK 1.8或者更高版本。下載JDK1.8,下載完成後安裝。

2.2 安裝ElasticSearch

1.ElasticSearch下載地址：

https://www.elastic.co/downloads/elasticsearch

2.下載安裝包後解壓

3.進入bin目錄下,雙擊執行elasticsearch.bat

4.看到started說明啟動成功,開啟瀏覽器測試一下,如下圖http://localhost:9200

2.3ElasticSearch安裝為Windows服務

1.elasticsearch的bin目錄下有一個elasticsearch-service.bat

2.cmd進入bin目錄下執行:elasticsearch-service.bat install

3.檢視電腦服務es已經存在了

elasticsearch-service.bat後面還可以執行這些命令
install: 安裝Elasticsearch服務
remove: 刪除已安裝的Elasticsearch服務（如果啟動則停止服務）
start: 啟動Elasticsearch服務（如果已安裝）
stop: 停止服務（如果啟動）
manager:啟動GUI來管理已安裝的服務

3 Postman呼叫RestAPI

3.1 新建索引

例如要建立一個叫articleindex的索引 ,就以put方式提交http://127.0.0.1:9200/articleindex/

3.2 新建文件

新建文件：
以post方式提交 http://127.0.0.1:9200/articleindex/article
body:

{
"title":"SpringBoot2.0", 
"content":"釋出啦" 
}

3.3 查詢全部文件

查詢某索引某型別的全部資料，以get方式請求
http://127.0.0.1:9200/articleindex/article/_search

3.4 修改文件

以put形式提交以下地址：

http://192.168.184.134:9200/articleindex/article/AWPKrI4pFdLZnId5S_F7
body:

{
"title":"SpringBoot2.0正式版", 
"content":"釋出了嗎" 
}

如果我們在地址中的ID不存在，則會建立新文件

3.5 按ID查詢文件

GET方式請求

http://192.168.184.134:9200/articleindex/article/1

3.6 基本匹配查詢

根據某列進行查詢 get方式提交下列地址：

http://192.168.184.134:9200/articleindex/article/_search?q=title:hello

3.7 模糊查詢

可以用*代表任意字元：

http://192.168.184.134:9200/articleindex/article/_search?q=title:*s*

3.8 刪除文件

根據ID刪除文件,刪除ID為1的文件 DELETE方式提交

http://192.168.184.134:9200/articleindex/article/1

4 Head外掛的安裝與使用

4.1 Head外掛安裝

如果都是通過rest請求的方式使用Elasticsearch，未免太過麻煩，而且也不夠人性化。一般都會使用圖形化介面來實現Elasticsearch的日常管理，最常用的就是Head外掛

步驟1：

下載head外掛：https://github.com/mobz/elasticsearch-head

步驟2：

解壓到任意目錄，但是要和elasticsearch的安裝目錄區別開。

步驟3：

安裝node js ,安裝cnpm

npminstall‐gcnpm‐‐registry=https://registry.npm.taobao.org

步驟4：

將grunt安裝為全域性命令。Grunt是基於Node.js的專案構建工具。它可以自動執行你所
設定的任務

npminstall‐ggrunt‐cli

步驟5：安裝依賴

cnpminstall

步驟6：

進入head目錄啟動head，在命令提示符下輸入命令

gruntserver

步驟7：

開啟瀏覽器，輸入 http://localhost:9100

步驟8：

點選連線按鈕沒有任何相應，按F12發現有如下錯誤

No 'Access-Control-Allow-Origin' header is present on the requested resource

這個錯誤是由於elasticsearch預設不允許跨域呼叫，而elasticsearch-head是屬於前端工程，所以報錯。這時需要修改elasticsearch的配置，讓其允許跨域訪問。修改elasticsearch配置檔案：elasticsearch.yml，增加以下兩句命令：

http.cors.enabled:true
http.cors.allow‐origin:"*"

此步為允許elasticsearch跨越訪問點選連線即可看到相關資訊

4.2 Head外掛操作

4.2.1 新建索引

選擇“索引”選項卡，點選“新建索引”按鈕

輸入索引名稱點選OK

4.2.2 新建或修改文件

在複合查詢中提交地址，輸入內容，提交方式為PUT

點選資料瀏覽 ,點選要查詢的索引名稱，右側窗格中顯示文件資訊

點選文件資訊：

再次回到剛才的介面

修改資料後重新提交請求 , 此時因為ID已經存在，所以執行的是修改操作。重新查詢此記錄，發現版本為2 。也就是說每次修改後版本都會增加1.

4.2.3 搜尋文件

4.2.4 刪除文件

5 IK分詞器

5.1什麼是IK分詞器

我們在瀏覽器位址列輸入http://127.0.0.1:9200/_analyze?analyzer=chinese&pretty=true&text=我是程式設計師，瀏覽器顯示效果如下

{
"tokens":[
{
"token":"我",
"start_offset":0,
"end_offset":1,
"type":"<IDEOGRAPHIC>",
"position":0
},
{
"token":"是",
"start_offset":1,
"end_offset":2,
"type":"<IDEOGRAPHIC>",
"position":1
},
{
"token":"程",
"start_offset":2,
"end_offset":3,
"type":"<IDEOGRAPHIC>",
"position":2
},
{
"token":"序",
"start_offset":3,
"end_offset":4,
"type":"<IDEOGRAPHIC>",
"position":3
},
{
"token":"員",
"start_offset":4,
"end_offset":5,
"type":"<IDEOGRAPHIC>",
"position":4
}
]
}

預設的中文分詞是將每個字看成一個詞，這顯然是不符合要求的，所以需要安裝中文分詞器來解決這個問題。

IK分詞是一款國人開發的相對簡單的中文分詞器。雖然開發者自2012年之後就不在維護了，但在工程應用中IK算是比較流行的一款！

5.2 IK分詞器安裝

下載地址：https://github.com/medcl/elasticsearch-analysis-ik/releases下載5.6.8版本
（1）先將其解壓，將解壓後的elasticsearch資料夾重新命名資料夾為ik
（2）將ik資料夾拷貝到elasticsearch/plugins 目錄下。
（3）重新啟動，即可載入IK分詞器

5.3 IK分詞器測試

IK提供了兩個分詞演算法ik_smart 和 ik_max_word
其中 ik_smart 為最少切分，ik_max_word為最細粒度劃分
（1）最小切分：在瀏覽器位址列輸入地址
http://127.0.0.1:9200/_analyze?analyzer=ik_smart&pretty=true&text=我是程式設計師
輸出的結果為：

{
"tokens":[
{
"token":"我",
"start_offset":0,
"end_offset":1,
"type":"CN_CHAR",
"position":0
},
{
"token":"是",
"start_offset":1,
"end_offset":2,
"type":"CN_CHAR",
"position":1
},
{
"token":"程式設計師",
"start_offset":2,
"end_offset":5,
"type":"CN_WORD",
"position":2
}
]
}

（2）最細切分：在瀏覽器位址列輸入地址
http://127.0.0.1:9200/_analyze?analyzer=ik_max_word&pretty=true&text=我是程式設計師
輸出的結果為：

{
"tokens":[
{
"token":"我",
"start_offset":0,
"end_offset":1,
"type":"CN_CHAR",
"position":0
},
{
"token":"是",
"start_offset":1,
"end_offset":2,
"type":"CN_CHAR",
"position":1
},
{
"token":"程式設計師",
"start_offset":2,
"end_offset":5,
"type":"CN_WORD",
"position":2
},
{
"token":"程式",
"start_offset":2,
"end_offset":4,
"type":"CN_WORD",
"position":3
},
{
"token":"員",
"start_offset":4,
"end_offset":5,
"type":"CN_CHAR",
"position":4
}
]
}

4.4 自定義詞庫
我們現在測試"傳智播客"，瀏覽器的測試效果如下：
http://127.0.0.1:9200/_analyze?analyzer=ik_smart&pretty=true&text=傳智播客

{
"tokens":[
{
"token":"傳",
"start_offset":0,
"end_offset":1,
"type":"CN_CHAR",
"position":0
},
{
"token":"智",
"start_offset":1,
"end_offset":2,
"type":"CN_CHAR",
"position":1
},
{
"token":"播",
"start_offset":2,
"end_offset":3,
"type":"CN_CHAR",
"position":2
},
{
"token":"客",
"start_offset":3,
"end_offset":4,
"type":"CN_CHAR",
"position":3
}
]
}

預設的分詞並沒有識別“傳智播客”是一個詞。如果我們想讓系統識別“傳智播客”是一個詞，需要編輯自定義詞庫。
步驟：
（1）進入elasticsearch/plugins/ik/config目錄
（2）新建一個my.dic檔案，編輯內容：

傳智播客

修改IKAnalyzer.cfg.xml（在ik/config目錄下）

<properties>
<comment>IKAnalyzer擴充套件配置</comment> 
<!‐‐使用者可以在這裡配置自己的擴充套件字典‐‐> 
<entrykey="ext_dict">my.dic</entry> 
<!‐‐使用者可以在這裡配置自己的擴充套件停止詞字典‐‐> 
<entrykey="ext_stopwords"></entry> 
</properties>

重新啟動elasticsearch,通過瀏覽器測試分詞效果

{
"tokens":[
{
"token":"傳智播客",
"start_offset":0,
"end_offset":4,
"type":"CN_WORD",
"position":0
}
]
}