1. 程式人生 > >ElasticSearch教程——Search相關、deep paging問題及解決方案

ElasticSearch教程——Search相關、deep paging問題及解決方案

搜尋所有索引

GET /_search

返回結果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "doc",
        "_id": "config:6.4.0",
        "_score": 1,
        "_source": {
          "type": "config",
          "updated_at": "2018-09-18T09:30:18.949Z",
          "config": {
            "buildNum": 17929,
            "telemetry:optIn": true
          }
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "eTmX5mUBtZGWutGW0TNs",
        "_score": 1,
        "_source": {
          "title": "New version of Elasticsearch released!",
          "content": "Version 1.0 released today!",
          "priority": 10,
          "tags": [
            "announce",
            "elasticsearch",
            "release"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "jiajieshi yagao",
          "desc": "youxiao fangzhu",
          "price": 25,
          "producer": "jiajieshi producer",
          "tags": [
            "fangzhu"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "J3fLFWYBBoLynJN1-kOG",
        "_score": 1,
        "_source": {
          "name": "test yagao",
          "desc": "youxiao fangzhu"
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": "1",
          "title": "New version of Elasticsearch released!",
          "content": "Version 1.0 released today!",
          "priority": 10,
          "tags": [
            "announce",
            "elasticsearch",
            "release"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "KXfSFWYBBoLynJN1TUPo",
        "_score": 1,
        "_source": {
          "name": "test yagao2",
          "desc": "youxiao fangzhu2"
        }
      },
      {
        "_index": "index",
        "_type": "fulltext",
        "_id": "1",
        "_score": 1,
        "_source": {
          "content": "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "zhonghua yagao",
          "desc": "caoben zhiwu",
          "price": 40,
          "producer": "zhonghua producer",
          "tags": [
            "qingxin"
          ]
        }
      }
    ]
  }
}

返回引數含義

took:整個搜尋請求花費了多少毫秒

hits.total:本次搜尋,返回了幾條結果
hits.max_score:本次搜尋的所有結果中,最大的相關度分數是多少,每一條document對於search的相關度,越相關,_score分數越大,排位越靠前
hits.hits:預設查詢前10條資料,完整資料,_score降序排序

shards:shards fail的條件(primary和replica全部掛掉),不影響其他shard。預設情況下來說,一個搜尋請求,會打到一個index的所有primary shard上去,當然了,每個primary shard都可能會有一個或多個replic shard,所以請求也可以到primary shard的其中一個replica shard上去。

timeout:預設無timeout,當搜尋得特別深,需要花費很長時間的時候我們可以設定timeout,當時間達到這個timeout的時候就返回當前的搜尋結果(不繼續搜尋下去了)

timeout=10ms,timeout=1s,timeout=1m
GET /_search?timeout=10m

查詢指定index

GET /blog/_search

查詢多個指定index

GET /.kibana,blog/_search

按照萬用字元去匹配多個索引

GET /*log/_search

返回結果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "blog",
        "_type": "article",
        "_id": "eTmX5mUBtZGWutGW0TNs",
        "_score": 1,
        "_source": {
          "title": "New version of Elasticsearch released!",
          "content": "Version 1.0 released today!",
          "priority": 10,
          "tags": [
            "announce",
            "elasticsearch",
            "release"
          ]
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": "1",
          "title": "New version of Elasticsearch released!",
          "content": "Version 1.0 released today!",
          "priority": 10,
          "tags": [
            "announce",
            "elasticsearch",
            "release"
          ]
        }
      }
    ]
  }
}

搜尋一個index下指定的type的資料

GET /index1/type1/_search

搜尋一個index下多個type的資料

由於在6.0之後每個index下最多隻能有一個type,故在該版本及其以後無意義

GET /index1/type1,type2/_search

搜尋多個index下的多個type的資料

由於在6.0之後每個index下最多隻能有一個type,故在該版本及其以後無意義

GET /index1,index2/type1,type2/_search

搜尋所有index下的指定type的資料

由於在6.0之後每個index下最多隻能有一個type,故在該版本及其以後無意義

GET /_all/type1,type2/_search

分頁搜尋

新增"?from=0&size=2"

注:當資料量達到50000條以上時,用下面的scroll滾動的方式進行代替

GET /_search?from=0&size=2

返回結果

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "doc",
        "_id": "config:6.4.0",
        "_score": 1,
        "_source": {
          "type": "config",
          "updated_at": "2018-09-18T09:30:18.949Z",
          "config": {
            "buildNum": 17929,
            "telemetry:optIn": true
          }
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "eTmX5mUBtZGWutGW0TNs",
        "_score": 1,
        "_source": {
          "title": "New version of Elasticsearch released!",
          "content": "Version 1.0 released today!",
          "priority": 10,
          "tags": [
            "announce",
            "elasticsearch",
            "release"
          ]
        }
      }
    ]
  }
}

deep paging問題

deep paging簡單來說,就是搜尋的特別深,比如總共有60000條資料,現在有3個primary shard,每個shard上分20000條,每頁是10條資料,這個時候你要搜尋到第1000頁,實際上要拿到的是10001-10010,該怎麼拿呢? 請求首先可能是打到一個不包含這個index的shard的node上,這個node就是一個coordinate node,這個coordinate node就會將搜尋請求轉發到index的三個shard所在的node上去。

要搜尋60000條資料中的第1000頁,實際上每個shard都要將內部的20000條資料中的第1-10010條資料拿出來,不是10條,是10010條資料,3個shard每個shard都返回10010條資料給coordinate node,coordinate node會收到總共30030條資料,然後排序取到所需的那10條資料,其實就是我們要的最後的第1000頁的10條資料。

舉個例子,現在有60個帶編號的球(從1到60),我現在隨機給他們放到三個籃子裡面(他們在籃子裡面已經排好序了),現在我要取出第10-12個球,那我是不是應該先把各個籃子裡面前12個球取出來放到一起(籃子裡面的球是隨機放的,無規律),共計36個球,然後彙總進行排序後,在這個結果中取出第10-12個球!!!

缺點

搜尋過深的時候就需要在coordinate node上儲存大量的資料,還要進行大量資料的排序,排序之後再取出對應的那一頁,所以這個過程,既消耗網路寬頻,耗費記憶體,還消耗cpu。這就是deep paging的效能問題,我們應該儘量避免出現這種deep paging操作。

解決方案

為了解決上面的問題,elasticsearch提出了一個scroll滾動的方式,這個滾動的方式原理就是通過每次查詢後,返回一個scroll_id。根據這個scroll_id 進行下一頁的查詢。可以把這個scroll_id理解為通常關係型資料庫中的遊標。但是,這種scroll方式的缺點是不能夠進行反覆查詢,也就是說,只能進行下一頁,不能進行上一頁。

經過分析,如果資料達到了50000條以上,那麼使用者基本上是不會考慮每條都去看的,使用者需要的是最後對資料分析處理後的結果。而如果小於50000條的時候我們可以使用from size的方式進行分頁的查詢。那麼這種方式存在是為了什麼情景呢。應該是為了分批次的檢索所有資料。

實現步驟

1.首先取出前2條,並且得到scroll_id(這裡的3s代表的是持續滾動時間,如果過了3秒鐘,還沒有查詢下一頁,那麼這個scroll_id就會失效)。

GET /_search?scroll=3s&size=2

返回結果

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAPIFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAADyhZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA8sWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPJFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAADzRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA8wWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPOFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAADzxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA9cWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPQFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAD0RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA9UWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPWFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAD0hZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA9MWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPUFnhEVi1HVGViVFJxYzdlczBoRFI0clE=",
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "doc",
        "_id": "config:6.4.0",
        "_score": 1,
        "_source": {
          "type": "config",
          "updated_at": "2018-09-18T09:30:18.949Z",
          "config": {
            "buildNum": 17929,
            "telemetry:optIn": true
          }
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "eTmX5mUBtZGWutGW0TNs",
        "_score": 1,
        "_source": {
          "title": "New version of Elasticsearch released!",
          "content": "Version 1.0 released today!",
          "priority": 10,
          "tags": [
            "announce",
            "elasticsearch",
            "release"
          ]
        }
      }
    ]
  }
}

2.再次查詢下一頁,注意,這裡查詢時不需要指定index,只需要指定scroll_id和本次的持續滾動時間。

說白了,想要第幾頁,迴圈請求幾次就行了,在設定的時間內scroll_id是不會變的

GET /_search/scroll?scroll=3s&scroll_id=DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAWtFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFuxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABa4WeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAWwFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFrxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbIWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAWxFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFvBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbMWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAW0FnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFtRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbYWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAW3FnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFuBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbkWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAW6FnhEVi1HVGViVFJxYzdlczBoRFI0clE=

或者

POST /_search/scroll
{
 "scroll" : "3s",
 "scroll_id":"DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAXIFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF1RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABccWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXWFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFyRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABcoWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXLFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABc0WeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXOFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdAWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXSFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF0RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdQWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXTFnhEVi1HVGViVFJxYzdlczBoRFI0clE="
}

刪除對應scroll_id

當我們搜尋完畢或者說已經滾動到最後的時候,我們可以選擇刪除scroll_id

DELETE /_search/scroll/DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAXIFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF1RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABccWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXWFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFyRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABcoWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXLFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABc0WeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXOFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdAWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXSFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF0RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdQWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXTFnhEVi1HVGViVFJxYzdlczBoRFI0clE=

刪除所有scroll_id

DELETE /_search/scroll/_all