1. 程式人生 > >Elasticsearch6.X 新型別Join深入詳解

Elasticsearch6.X 新型別Join深入詳解

0、ES6.X 一對多、多對多的資料該如何儲存和實現呢?

引出問題:

“某頭條新聞APP”新聞內容和新聞評論是1對多的關係?

在ES6.X該如何儲存、如何進行高效檢索、聚合操作呢?

相信閱讀本文,你就能得到答案!

1、ES6.X 新型別Join 產生背景

  • Mysql中多表關聯,我們可以通過left join 或者Join等實現;

  • ES5.X版本,藉助父子文件實現多表關聯,類似資料庫中Join的功能;實現的核心是藉助於ES5.X支援1個索引(index)下多個型別(type)。

  • ES6.X版本,由於每個索引下面只支援單一的型別(type)。

  • 所以,ES6.X版本如何實現Join成為大家關注的問題。

幸好,ES6.X新推出了Join型別,主要解決類似Mysql中多表關聯的問題。

2、ES6.X Join型別介紹

仍然是一個索引下,藉助父子關係,實現類似Mysql中多表關聯的操作。

3、ES6.X Join型別實戰

3.1 ES6.X Join型別 Mapping定義

Join型別的Mapping如下:

核心
- 1) “my_join_field”為join的名稱。

  • 2)”question”: “answer” 指:qustion為answer的父類。
PUT my_join_index
{
  "mappings": {
    "_doc": {
      "properties"
: { "my_join_field": { "type": "join", "relations": { "question": "answer" } } } } } }

3.2 ES6.X join型別定義父文件

直接上以下簡化的形式,更好理解些。

如下,定義了兩篇父文件。
文件型別為父型別:”question”。

PUT my_join_index/_doc/1?refresh
{
  "text": "This is a question"
, "my_join_field": "question" } PUT my_join_index/_doc/2?refresh { "text": "This is another question", "my_join_field": "question" }

3.3 ES6.X join型別定義子文件

  • 路由值是強制性的,因為父檔案和子檔案必須在相同的分片上建立索引。
  • “answer”是此子文件的加入名稱。
  • 指定此子文件的父文件ID:1。
PUT my_join_index/_doc/3?routing=1&refresh 
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

PUT my_join_index/_doc/4?routing=1&refresh
{
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}

4、ES6.X Join型別約束

  1. 每個索引只允許一個Join型別Mapping定義;
  2. 父文件和子文件必須在同一個分片上編入索引;這意味著,當進行刪除、更新、查詢子文件時候需要提供相同的路由值。
  3. 一個文件可以有多個子文件,但只能有一個父文件。
  4. 可以為已經存在的Join型別新增新的關係。
  5. 當一個文件已經成為父文件後,可以為該文件新增子文件。

5、ES6.X Join型別檢索與聚合

5.1 ES6.X Join全量檢索

GET my_join_index/_search
{
  "query": {
    "match_all": {}
  },
  "sort": ["_id"]
}

返回結果如下:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": null,
    "hits": [
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "1",
        "_score": null,
        "_source": {
          "text": "This is a question",
          "my_join_field": "question"
        },
        "sort": [
          "1"
        ]
      },
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "2",
        "_score": null,
        "_source": {
          "text": "This is another question",
          "my_join_field": "question"
        },
        "sort": [
          "2"
        ]
      },
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "3",
        "_score": null,
        "_routing": "1",
        "_source": {
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        },
        "sort": [
          "3"
        ]
      },
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "4",
        "_score": null,
        "_routing": "1",
        "_source": {
          "text": "This is another answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        },
        "sort": [
          "4"
        ]
      }
    ]
  }
}

5.2 ES6.X 基於父文件查詢子文件

GET my_join_index/_search
{
    "query": {
        "has_parent" : {
            "parent_type" : "question",
            "query" : {
                "match" : {
                    "text" : "This is"
                }
            }
        }
    }
}

返回結果:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      },
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "4",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is another answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      }
    ]
  }
}

5.3 ES6.X 基於子文件查詢父文件

GET my_join_index/_search
{
"query": {
        "has_child" : {
            "type" : "answer",
            "query" : {
                "match" : {
                    "text" : "This is question"
                }
            }
        }
    }
}

返回結果:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "This is a question",
          "my_join_field": "question"
        }
      }
    ]
  }
}

5.4 ES6.X Join聚合操作實戰

以下操作含義如下:

  • 1)parent_id是特定的檢索方式,用於檢索屬於特定父文件id=1的,子文件型別為answer的文件的個數。
  • 2)基於父文件型別question進行聚合;
  • 3)基於指定的field處理。
GET my_join_index/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "1"
    }
  },
  "aggs": {
    "parents": {
      "terms": {
        "field": "my_join_field#question", 
        "size": 10
      }
    }
  },
  "script_fields": {
    "parent": {
      "script": {
         "source": "doc['my_join_field#question']" 
      }
    }
  }
}

返回結果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.13353139,
    "hits": [
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.13353139,
        "_routing": "1",
        "fields": {
          "parent": [
            "1"
          ]
        }
      },
      {
        "_index": "my_join_index",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.13353139,
        "_routing": "1",
        "fields": {
          "parent": [
            "1"
          ]
        }
      }
    ]
  },
  "aggregations": {
    "parents": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 2
        }
      ]
    }
  }
}

6、ES6.X Join 一對多實戰

6.1 一對多定義

如下,一個父文件question與多個子文件answer,comment的對映定義。

PUT join_ext_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_join_field": {
          "type": "join",
          "relations": {
            "question": ["answer", "comment"]  
          }
        }
      }
    }
  }
}

6.2 一對多對多定義

實現如下圖的祖孫三代關聯關係的定義。

question
    /    \
   /      \
comment  answer
           |
           |
          vote
PUT join_multi_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_join_field": {
          "type": "join",
          "relations": {
            "question": ["answer", "comment"],  
            "answer": "vote" 
          }
        }
      }
    }
  }
}

孫子文件匯入資料,如下所示:

PUT join_multi_index/_doc/3?routing=1&refresh 
{
  "text": "This is a vote",
  "my_join_field": {
    "name": "vote",
    "parent": "2" 
  }
}

注意:

- 孫子文件所在分片必須與其父母和祖父母相同
- 孫子文件的父代號(必須指向其父親answer文件)

7、小結

但手敲一遍,翻譯一遍,的的確確會更新認知,加深理解。

和你一起,死磕ELK Stack!

2018年03月31日 23:18 於家中床前

和你一起,死磕ELK Stack!