ElasticSearch基礎入門學習筆記

阿新 • • 發佈：2020-02-20

前言

本筆記的內容主要是在從0開始學習ElasticSearch中，按照官方文件以及自己的一些測試的過程。

安裝

由於是初學者，按照官方文件安裝即可。前面ELK入門使用主要就是講述了安裝過程，這裡不再贅述。

學習教程

找了很久，文件大多比較老。即使是官方文件也是基於2.x介紹的，官網最新已經演進到6了。不過基礎入門還是可以的。接下來將參照官方文件來學習。

安裝好ElasticSearch和Kibana之後. 開啟localhost:5601, 選擇Dev Tools。

索引(儲存)僱員文件

測試的資料來源是公司僱員的資訊列表。其中，每個僱員的資訊叫做一個文件，新增一條資訊叫做索引一個文件。

在console裡輸入

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

megacorp 是索引名稱

employee 是型別名稱
1 是id，同樣是僱員的id

游標定位到第一行，點選綠色按鈕執行。

這個是簡化的存入快捷方式, 其本質還是通過ES提供的REST API來實現的。上述可以用postman或者curl來實現，域名為ES的地址，即localhost:9200。對於postman，get方法不允許傳body，用post也可以。

這樣就將一個文件存入了ES。接下來，多儲存幾個

PUT /megacorp/employee/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

PUT /megacorp/employee/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

然後，我們可以去檢視，點選Management，Index Patterns，Configure an index pattern，輸入megacorp，確定。

點選Discover, 就可以看到我們儲存的資訊了。

檢索文件

存入資料後，想要查詢出來。查詢id為1的員工。

GET /megacorp/employee/1

返回：
{
  "_index": "megacorp",
  "_type": "employee",
  "_id": "1",
  "_version": 5,
  "found": true,
  "_source": {
    "first_name": "John",
    "last_name": "Smith",
    "age": 25,
    "about": "I love to go rock climbing",
    "interests": [
      "sports",
      "music"
    ]
  }
}

區別於儲存一條記錄，只是http method不同。

put 新增
get 獲取
delete 刪除
head 查詢是否存在
想要更新，再次put即可

輕量搜尋

我們除了findById，最常見就是條件查詢了。

先來檢視所有：

GET /megacorp/employee/_search

對了，可以檢視記錄個數count

GET /megacorp/employee/_count

想要檢視last_name是Smith的

GET /megacorp/employee/_search?q=last_name:Smith

加一個引數q，欄位名:Value的形式查詢。

查詢表示式

Query-string 搜尋通過命令非常方便地進行臨時性的即席搜尋，但它有自身的侷限性（參見輕量搜尋）。Elasticsearch 提供一個豐富靈活的查詢語言叫做查詢表示式，它支援構建更加複雜和健壯的查詢。

領域特定語言（DSL），指定了使用一個 JSON 請求。我們可以像這樣重寫之前的查詢所有 Smith 的搜尋

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

更復雜的查詢

繼續修改上一步的查詢

GET /megacorp/employee/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

多了一個range過濾，要求age大於30.

結果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      }
    ]
  }
}

全文檢索

截止目前的搜尋相對都很簡單：單個姓名，通過年齡過濾。現在嘗試下稍微高階點兒的全文搜尋--一項傳統資料庫確實很難搞定的任務。

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

結果

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.53484553,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.53484553,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.26742277,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      }
    ]
  }
}

有個排序，以及是分數_score。可以看到只有一個字母匹配到的也查出來了. 如果我們想完全匹配, 換一個種查詢.

match_phrase 會完全匹配短語.

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

我們百度搜索的時候, 命中的關鍵字還會高亮, es也可以返回高亮的位置.

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        },
        "highlight": {
          "about": [
            "I love to go <em>rock</em> <em>climbing</em>"
          ]
        }
      }
    ]
  }
}

聚合計算Group by

在sql裡經常遇到統計的計算, 比如sum, count, avg. es可以這樣:

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

aggs表示聚合, all_interests是返回的變數名稱, terms 表示count計算. 這個語句的意思是, 對interests進行count統計. 然後, es可能會返回:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "megacorp",
        "node": "iqHCjOUkSsWM2Hv6jT-xUQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

意思是,對字元的統計, 需要開啟一個設定fielddata=true.

這就需要修改index設定了, 相當於修改關係型資料庫表結構.

修改index mapping

我們先來檢視一個配置:

GET /megacorp/employee/_mapping

結果:

{
  "megacorp": {
    "mappings": {
      "employee": {
        "properties": {
          "about": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "age": {
            "type": "long"
          },
          "first_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "interests": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "last_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

簡單可以看出是定義了各個欄位型別. 上個問題是需要增加一個配置

"fielddata": true

更新方法如下:


PUT /megacorp/employee/_mapping
{
        "properties": {
          "about": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "age": {
            "type": "long"
          },
          "first_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "interests": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            },
            "fielddata": true
          },
          "last_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }

{
  "acknowledged": true
}

表示更新成功了. 然後可以繼續我們之前的聚合計算了.

聚合計算 group by count

對於sql類似於

select interests, count(*) from index_xxx
where last_name = 'smith'
group by interests.

在es裡可以這樣查詢:

GET /megacorp/employee/_search
{
  "_source": false,
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
    "size": 0,
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}

_source=false 是為了不返回hit命中的item的屬性, 預設true.

"size": 0,表示不返回hits. 預設會返回所有的行, 我們不需要, 我們只要返回統計結果.

aggs表示一個聚合操作.

all_interests是自定義的一個變數名稱, 可以隨便寫一個.

terms 表示進行count操作, 對應的欄位是interests.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all_interests": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "music",
          "doc_count": 2
        },
        {
          "key": "sports",
          "doc_count": 1
        }
      ]
    }
  }
}

可以得到需要的欄位的count. 同樣可以計算sum, avg.



GET /megacorp/employee/_search
{
    "_source": false, 
    "size": 0, 
    "aggs" : {
        "avg_age" : {
            "avg" : { "field" : "age" }
        },
        "sum_age" : {
            "sum" : { "field" : "age" }
        }
    }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avg_age": {
      "value": 30.666666666666668
    },
    "sum_age": {
      "value": 92
    }
  }
}

總結

上述是官方文件的第一節, 基礎入門. 這裡只是摘抄和實現了一遍. 沒做更多的突破,但增加了個人理解. 可以知道es基本怎麼用了. 更多更詳細的語法後面慢慢來.

參考

https://www.elastic.co/guide/cn/elasticsearch/guide/current/_search_with_query_dsl.html

ElasticSearch基礎入門學習筆記

前言

安裝

學習教程

索引(儲存)僱員文件

檢索文件

輕量搜尋

查詢表示式

更復雜的查詢

全文檢索

聚合計算Group by

修改index mapping

聚合計算 group by count

總結

參考

ElasticSearch基礎入門學習筆記

juniper cli 基礎入門學習筆記

Webpack基礎入門學習筆記

web前端OA現金盤源碼下載入門學習筆記——css基礎

《零基礎入門學習Python》學習筆記之第二十七講

前端入門學習筆記（十四）CSS基礎（二）CSS 規則與選擇器

01學習了一到四章----天馬營JAVA入門基礎教程學習筆記

03 五、六章運算子和控制流----天馬營JAVA入門基礎教程學習筆記

web前端入門學習筆記——html基礎（傳智播客）

學習筆記：《零基礎入門學習Python》（小甲蟲）

【零基礎入門學習Python筆記017】GUI的最終選擇：Tkinter

ElasticSearch 6.x 學習筆記：15.檢索入門

jQuery入門基礎知識學習筆記

【零基礎入門學習Python筆記006】Python之常用操作符

Kotlin 入門基礎語法學習筆記

【零基礎入門學習Python筆記005】閒聊之Python的資料型別

魚C工作室《零基礎入門學習Python》學習過程筆記記錄第一天 001-010

AngularJS入門學習筆記一

linux 基礎入門學習

python之前端HTML/CSS基礎知識學習筆記

ElasticSearch基礎入門學習筆記

前言

安裝

學習教程

索引(儲存)僱員文件

檢索文件

輕量搜尋

查詢表示式

更復雜的查詢

全文檢索

聚合計算Group by

修改index mapping

聚合計算 group by count

總結

參考

相關推薦