ElasticSearch基礎入門學習筆記
前言
本筆記的內容主要是在從0開始學習ElasticSearch中,按照官方文件以及自己的一些測試的過程。
安裝
由於是初學者,按照官方文件安裝即可。前面ELK入門使用主要就是講述了安裝過程,這裡不再贅述。
學習教程
找了很久,文件大多比較老。即使是官方文件也是基於2.x介紹的,官網最新已經演進到6了。不過基礎入門還是可以的。接下來將參照官方文件來學習。
安裝好ElasticSearch和Kibana之後. 開啟localhost:5601, 選擇Dev Tools。
索引(儲存)僱員文件
測試的資料來源是公司僱員的資訊列表。其中,每個僱員的資訊叫做一個文件,新增一條資訊叫做索引一個文件。
在console裡輸入
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
- megacorp 是索引名稱
- employee 是型別名稱
- 1 是id,同樣是僱員的id
游標定位到第一行,點選綠色按鈕執行。
這個是簡化的存入快捷方式, 其本質還是通過ES提供的REST API來實現的。上述可以用postman或者curl來實現,域名為ES的地址,即localhost:9200。對於postman,get方法不允許傳body,用post也可以。
這樣就將一個文件存入了ES。接下來,多儲存幾個
PUT /megacorp/employee/2 { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ] } PUT /megacorp/employee/3 { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] }
然後,我們可以去檢視,點選Management,Index Patterns,Configure an index pattern, 輸入megacorp
,確定。
點選Discover, 就可以看到我們儲存的資訊了。
檢索文件
存入資料後,想要查詢出來。查詢id為1的員工。
GET /megacorp/employee/1
返回:
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_version": 5,
"found": true,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
區別於儲存一條記錄,只是http method不同。
- put 新增
- get 獲取
- delete 刪除
- head 查詢是否存在
- 想要更新,再次put即可
輕量搜尋
我們除了findById,最常見就是條件查詢了。
先來檢視所有:
GET /megacorp/employee/_search
對了,可以檢視記錄個數count
GET /megacorp/employee/_count
想要檢視last_name是Smith的
GET /megacorp/employee/_search?q=last_name:Smith
加一個引數q,欄位名:Value的形式查詢。
查詢表示式
Query-string 搜尋通過命令非常方便地進行臨時性的即席搜尋 ,但它有自身的侷限性(參見 輕量 搜尋 )。Elasticsearch 提供一個豐富靈活的查詢語言叫做 查詢表示式 , 它支援構建更加複雜和健壯的查詢。
領域特定語言 (DSL), 指定了使用一個 JSON 請求。我們可以像這樣重寫之前的查詢所有 Smith 的搜尋
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
更復雜的查詢
繼續修改上一步的查詢
GET /megacorp/employee/_search
{
"query" : {
"bool": {
"must": {
"match" : {
"last_name" : "smith"
}
},
"filter": {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}
多了一個range過濾,要求age大於30.
結果
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
全文檢索
截止目前的搜尋相對都很簡單:單個姓名,通過年齡過濾。現在嘗試下稍微高階點兒的全文搜尋--一項傳統資料庫確實很難搞定的任務。
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
結果
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.53484553,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.53484553,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.26742277,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
有個排序,以及是分數_score
。可以看到只有一個字母匹配到的也查出來了. 如果我們想完全匹配, 換一個種查詢.
match_phrase 會完全匹配短語.
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
我們百度搜索的時候, 命中的關鍵字還會高亮, es也可以返回高亮的位置.
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
返回
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.5753642,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
聚合計算Group by
在sql裡經常遇到統計的計算, 比如sum, count, avg. es可以這樣:
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
aggs
表示聚合, all_interests
是返回的變數名稱, terms
表示count計算. 這個語句的意思是, 對interests
進行count統計. 然後, es可能會返回:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "megacorp",
"node": "iqHCjOUkSsWM2Hv6jT-xUQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}
意思是,對字元的統計, 需要開啟一個設定fielddata=true
.
這就需要修改index設定了, 相當於修改關係型資料庫表結構.
修改index mapping
我們先來檢視一個配置:
GET /megacorp/employee/_mapping
結果:
{
"megacorp": {
"mappings": {
"employee": {
"properties": {
"about": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"interests": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
簡單可以看出是定義了各個欄位型別. 上個問題是需要增加一個配置
"fielddata": true
更新方法如下:
PUT /megacorp/employee/_mapping
{
"properties": {
"about": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"interests": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": true
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
返回:
{
"acknowledged": true
}
表示更新成功了. 然後可以繼續我們之前的聚合計算了.
聚合計算 group by count
對於sql類似於
select interests, count(*) from index_xxx
where last_name = 'smith'
group by interests.
在es裡可以這樣查詢:
GET /megacorp/employee/_search
{
"_source": false,
"query": {
"match": {
"last_name": "smith"
}
},
"size": 0,
"aggs": {
"all_interests": {
"terms": {
"field": "interests"
}
}
}
}
_source=false
是為了不返回hit命中的item的屬性, 預設true.
"size": 0,
表示不返回hits. 預設會返回所有的行, 我們不需要, 我們只要返回統計結果.
aggs
表示一個聚合操作.
all_interests
是自定義的一個變數名稱, 可以隨便寫一個.
terms
表示進行count操作, 對應的欄位是interests
.
返回:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "sports",
"doc_count": 1
}
]
}
}
}
可以得到需要的欄位的count. 同樣可以計算sum, avg.
GET /megacorp/employee/_search
{
"_source": false,
"size": 0,
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
},
"sum_age" : {
"sum" : { "field" : "age" }
}
}
}
返回
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"avg_age": {
"value": 30.666666666666668
},
"sum_age": {
"value": 92
}
}
}
總結
上述是官方文件的第一節, 基礎入門. 這裡只是摘抄和實現了一遍. 沒做更多的突破,但增加了個人理解. 可以知道es基本怎麼用了. 更多更詳細的語法後面慢慢來.
參考
- https://www.elastic.co/guide/cn/elasticsearch/guide/current/_search_with_query_dsl.html