Elasticsearch6.X 新型別Join深入詳解
阿新 • • 發佈:2018-12-22
0、ES6.X 一對多、多對多的資料該如何儲存和實現呢?
引出問題:
“某頭條新聞APP”新聞內容和新聞評論是1對多的關係?
在ES6.X該如何儲存、如何進行高效檢索、聚合操作呢?
相信閱讀本文,你就能得到答案!
1、ES6.X 新型別Join 產生背景
Mysql中多表關聯,我們可以通過left join 或者Join等實現;
ES5.X版本,藉助父子文件實現多表關聯,類似資料庫中Join的功能;實現的核心是藉助於ES5.X支援1個索引(index)下多個型別(type)。
ES6.X版本,由於每個索引下面只支援單一的型別(type)。
- 所以,ES6.X版本如何實現Join成為大家關注的問題。
幸好,ES6.X新推出了Join型別,主要解決類似Mysql中多表關聯的問題。
2、ES6.X Join型別介紹
仍然是一個索引下,藉助父子關係,實現類似Mysql中多表關聯的操作。
3、ES6.X Join型別實戰
3.1 ES6.X Join型別 Mapping定義
Join型別的Mapping如下:
核心
- 1) “my_join_field”為join的名稱。
- 2)”question”: “answer” 指:qustion為answer的父類。
PUT my_join_index
{
"mappings": {
"_doc": {
"properties" : {
"my_join_field": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}
}
3.2 ES6.X join型別定義父文件
直接上以下簡化的形式,更好理解些。
如下,定義了兩篇父文件。
文件型別為父型別:”question”。
PUT my_join_index/_doc/1?refresh
{
"text": "This is a question" ,
"my_join_field": "question"
}
PUT my_join_index/_doc/2?refresh
{
"text": "This is another question",
"my_join_field": "question"
}
3.3 ES6.X join型別定義子文件
- 路由值是強制性的,因為父檔案和子檔案必須在相同的分片上建立索引。
- “answer”是此子文件的加入名稱。
- 指定此子文件的父文件ID:1。
PUT my_join_index/_doc/3?routing=1&refresh
{
"text": "This is an answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
}
PUT my_join_index/_doc/4?routing=1&refresh
{
"text": "This is another answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
}
4、ES6.X Join型別約束
- 每個索引只允許一個Join型別Mapping定義;
- 父文件和子文件必須在同一個分片上編入索引;這意味著,當進行刪除、更新、查詢子文件時候需要提供相同的路由值。
- 一個文件可以有多個子文件,但只能有一個父文件。
- 可以為已經存在的Join型別新增新的關係。
- 當一個文件已經成為父文件後,可以為該文件新增子文件。
5、ES6.X Join型別檢索與聚合
5.1 ES6.X Join全量檢索
GET my_join_index/_search
{
"query": {
"match_all": {}
},
"sort": ["_id"]
}
返回結果如下:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"text": "This is a question",
"my_join_field": "question"
},
"sort": [
"1"
]
},
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"text": "This is another question",
"my_join_field": "question"
},
"sort": [
"2"
]
},
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "3",
"_score": null,
"_routing": "1",
"_source": {
"text": "This is an answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
},
"sort": [
"3"
]
},
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "4",
"_score": null,
"_routing": "1",
"_source": {
"text": "This is another answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
},
"sort": [
"4"
]
}
]
}
}
5.2 ES6.X 基於父文件查詢子文件
GET my_join_index/_search
{
"query": {
"has_parent" : {
"parent_type" : "question",
"query" : {
"match" : {
"text" : "This is"
}
}
}
}
}
返回結果:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "3",
"_score": 1,
"_routing": "1",
"_source": {
"text": "This is an answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
}
},
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "4",
"_score": 1,
"_routing": "1",
"_source": {
"text": "This is another answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
}
}
]
}
}
5.3 ES6.X 基於子文件查詢父文件
GET my_join_index/_search
{
"query": {
"has_child" : {
"type" : "answer",
"query" : {
"match" : {
"text" : "This is question"
}
}
}
}
}
返回結果:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"text": "This is a question",
"my_join_field": "question"
}
}
]
}
}
5.4 ES6.X Join聚合操作實戰
以下操作含義如下:
- 1)parent_id是特定的檢索方式,用於檢索屬於特定父文件id=1的,子文件型別為answer的文件的個數。
- 2)基於父文件型別question進行聚合;
- 3)基於指定的field處理。
GET my_join_index/_search
{
"query": {
"parent_id": {
"type": "answer",
"id": "1"
}
},
"aggs": {
"parents": {
"terms": {
"field": "my_join_field#question",
"size": 10
}
}
},
"script_fields": {
"parent": {
"script": {
"source": "doc['my_join_field#question']"
}
}
}
}
返回結果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.13353139,
"hits": [
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "3",
"_score": 0.13353139,
"_routing": "1",
"fields": {
"parent": [
"1"
]
}
},
{
"_index": "my_join_index",
"_type": "_doc",
"_id": "4",
"_score": 0.13353139,
"_routing": "1",
"fields": {
"parent": [
"1"
]
}
}
]
},
"aggregations": {
"parents": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 2
}
]
}
}
}
6、ES6.X Join 一對多實戰
6.1 一對多定義
如下,一個父文件question與多個子文件answer,comment的對映定義。
PUT join_ext_index
{
"mappings": {
"_doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"]
}
}
}
}
}
}
6.2 一對多對多定義
實現如下圖的祖孫三代關聯關係的定義。
question
/ \
/ \
comment answer
|
|
vote
PUT join_multi_index
{
"mappings": {
"_doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"],
"answer": "vote"
}
}
}
}
}
}
孫子文件匯入資料,如下所示:
PUT join_multi_index/_doc/3?routing=1&refresh
{
"text": "This is a vote",
"my_join_field": {
"name": "vote",
"parent": "2"
}
}
注意:
- 孫子文件所在分片必須與其父母和祖父母相同
- 孫子文件的父代號(必須指向其父親answer文件)
7、小結
但手敲一遍,翻譯一遍,的的確確會更新認知,加深理解。
和你一起,死磕ELK Stack!
2018年03月31日 23:18 於家中床前