ElasticSearch教程——Search相關、deep paging問題及解決方案
搜尋所有索引
GET /_search
返回結果
{ "took": 6, "timed_out": false, "_shards": { "total": 16, "successful": 16, "skipped": 0, "failed": 0 }, "hits": { "total": 8, "max_score": 1, "hits": [ { "_index": ".kibana", "_type": "doc", "_id": "config:6.4.0", "_score": 1, "_source": { "type": "config", "updated_at": "2018-09-18T09:30:18.949Z", "config": { "buildNum": 17929, "telemetry:optIn": true } } }, { "_index": "blog", "_type": "article", "_id": "eTmX5mUBtZGWutGW0TNs", "_score": 1, "_source": { "title": "New version of Elasticsearch released!", "content": "Version 1.0 released today!", "priority": 10, "tags": [ "announce", "elasticsearch", "release" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "J3fLFWYBBoLynJN1-kOG", "_score": 1, "_source": { "name": "test yagao", "desc": "youxiao fangzhu" } }, { "_index": "blog", "_type": "article", "_id": "1", "_score": 1, "_source": { "id": "1", "title": "New version of Elasticsearch released!", "content": "Version 1.0 released today!", "priority": 10, "tags": [ "announce", "elasticsearch", "release" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "KXfSFWYBBoLynJN1TUPo", "_score": 1, "_source": { "name": "test yagao2", "desc": "youxiao fangzhu2" } }, { "_index": "index", "_type": "fulltext", "_id": "1", "_score": 1, "_source": { "content": "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首" } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] } }
返回引數含義
took:整個搜尋請求花費了多少毫秒 hits.total:本次搜尋,返回了幾條結果 hits.max_score:本次搜尋的所有結果中,最大的相關度分數是多少,每一條document對於search的相關度,越相關,_score分數越大,排位越靠前 hits.hits:預設查詢前10條資料,完整資料,_score降序排序 shards:shards fail的條件(primary和replica全部掛掉),不影響其他shard。預設情況下來說,一個搜尋請求,會打到一個index的所有primary shard上去,當然了,每個primary shard都可能會有一個或多個replic shard,所以請求也可以到primary shard的其中一個replica shard上去。 timeout:預設無timeout,當搜尋得特別深,需要花費很長時間的時候我們可以設定timeout,當時間達到這個timeout的時候就返回當前的搜尋結果(不繼續搜尋下去了) timeout=10ms,timeout=1s,timeout=1m GET /_search?timeout=10m
查詢指定index
GET /blog/_search
查詢多個指定index
GET /.kibana,blog/_search
按照萬用字元去匹配多個索引
GET /*log/_search
返回結果
{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "blog", "_type": "article", "_id": "eTmX5mUBtZGWutGW0TNs", "_score": 1, "_source": { "title": "New version of Elasticsearch released!", "content": "Version 1.0 released today!", "priority": 10, "tags": [ "announce", "elasticsearch", "release" ] } }, { "_index": "blog", "_type": "article", "_id": "1", "_score": 1, "_source": { "id": "1", "title": "New version of Elasticsearch released!", "content": "Version 1.0 released today!", "priority": 10, "tags": [ "announce", "elasticsearch", "release" ] } } ] } }
搜尋一個index下指定的type的資料
GET /index1/type1/_search
搜尋一個index下多個type的資料
由於在6.0之後每個index下最多隻能有一個type,故在該版本及其以後無意義
GET /index1/type1,type2/_search
搜尋多個index下的多個type的資料
由於在6.0之後每個index下最多隻能有一個type,故在該版本及其以後無意義
GET /index1,index2/type1,type2/_search
搜尋所有index下的指定type的資料
由於在6.0之後每個index下最多隻能有一個type,故在該版本及其以後無意義
GET /_all/type1,type2/_search
分頁搜尋
新增"?from=0&size=2"
注:當資料量達到50000條以上時,用下面的scroll滾動的方式進行代替
GET /_search?from=0&size=2
返回結果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 16,
"successful": 16,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 1,
"hits": [
{
"_index": ".kibana",
"_type": "doc",
"_id": "config:6.4.0",
"_score": 1,
"_source": {
"type": "config",
"updated_at": "2018-09-18T09:30:18.949Z",
"config": {
"buildNum": 17929,
"telemetry:optIn": true
}
}
},
{
"_index": "blog",
"_type": "article",
"_id": "eTmX5mUBtZGWutGW0TNs",
"_score": 1,
"_source": {
"title": "New version of Elasticsearch released!",
"content": "Version 1.0 released today!",
"priority": 10,
"tags": [
"announce",
"elasticsearch",
"release"
]
}
}
]
}
}
deep paging問題
deep paging簡單來說,就是搜尋的特別深,比如總共有60000條資料,現在有3個primary shard,每個shard上分20000條,每頁是10條資料,這個時候你要搜尋到第1000頁,實際上要拿到的是10001-10010,該怎麼拿呢? 請求首先可能是打到一個不包含這個index的shard的node上,這個node就是一個coordinate node,這個coordinate node就會將搜尋請求轉發到index的三個shard所在的node上去。
要搜尋60000條資料中的第1000頁,實際上每個shard都要將內部的20000條資料中的第1-10010條資料拿出來,不是10條,是10010條資料,3個shard每個shard都返回10010條資料給coordinate node,coordinate node會收到總共30030條資料,然後排序取到所需的那10條資料,其實就是我們要的最後的第1000頁的10條資料。
舉個例子,現在有60個帶編號的球(從1到60),我現在隨機給他們放到三個籃子裡面(他們在籃子裡面已經排好序了),現在我要取出第10-12個球,那我是不是應該先把各個籃子裡面前12個球取出來放到一起(籃子裡面的球是隨機放的,無規律),共計36個球,然後彙總進行排序後,在這個結果中取出第10-12個球!!!
缺點
搜尋過深的時候就需要在coordinate node上儲存大量的資料,還要進行大量資料的排序,排序之後再取出對應的那一頁,所以這個過程,既消耗網路寬頻,耗費記憶體,還消耗cpu。這就是deep paging的效能問題,我們應該儘量避免出現這種deep paging操作。
解決方案
為了解決上面的問題,elasticsearch提出了一個scroll滾動的方式,這個滾動的方式原理就是通過每次查詢後,返回一個scroll_id。根據這個scroll_id 進行下一頁的查詢。可以把這個scroll_id理解為通常關係型資料庫中的遊標。但是,這種scroll方式的缺點是不能夠進行反覆查詢,也就是說,只能進行下一頁,不能進行上一頁。
經過分析,如果資料達到了50000條以上,那麼使用者基本上是不會考慮每條都去看的,使用者需要的是最後對資料分析處理後的結果。而如果小於50000條的時候我們可以使用from size的方式進行分頁的查詢。那麼這種方式存在是為了什麼情景呢。應該是為了分批次的檢索所有資料。
實現步驟
1.首先取出前2條,並且得到scroll_id(這裡的3s代表的是持續滾動時間,如果過了3秒鐘,還沒有查詢下一頁,那麼這個scroll_id就會失效)。
GET /_search?scroll=3s&size=2
返回結果
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAPIFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAADyhZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA8sWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPJFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAADzRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA8wWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPOFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAADzxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA9cWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPQFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAD0RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA9UWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPWFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAD0hZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAAA9MWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAPUFnhEVi1HVGViVFJxYzdlczBoRFI0clE=",
"took": 10,
"timed_out": false,
"_shards": {
"total": 16,
"successful": 16,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 1,
"hits": [
{
"_index": ".kibana",
"_type": "doc",
"_id": "config:6.4.0",
"_score": 1,
"_source": {
"type": "config",
"updated_at": "2018-09-18T09:30:18.949Z",
"config": {
"buildNum": 17929,
"telemetry:optIn": true
}
}
},
{
"_index": "blog",
"_type": "article",
"_id": "eTmX5mUBtZGWutGW0TNs",
"_score": 1,
"_source": {
"title": "New version of Elasticsearch released!",
"content": "Version 1.0 released today!",
"priority": 10,
"tags": [
"announce",
"elasticsearch",
"release"
]
}
}
]
}
}
2.再次查詢下一頁,注意,這裡查詢時不需要指定index,只需要指定scroll_id和本次的持續滾動時間。
說白了,想要第幾頁,迴圈請求幾次就行了,在設定的時間內scroll_id是不會變的
GET /_search/scroll?scroll=3s&scroll_id=DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAWtFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFuxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABa4WeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAWwFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFrxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbIWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAWxFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFvBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbMWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAW0FnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFtRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbYWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAW3FnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFuBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABbkWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAW6FnhEVi1HVGViVFJxYzdlczBoRFI0clE=
或者
POST /_search/scroll
{
"scroll" : "3s",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAXIFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF1RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABccWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXWFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFyRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABcoWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXLFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABc0WeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXOFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdAWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXSFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF0RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdQWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXTFnhEVi1HVGViVFJxYzdlczBoRFI0clE="
}
刪除對應scroll_id
當我們搜尋完畢或者說已經滾動到最後的時候,我們可以選擇刪除scroll_id
DELETE /_search/scroll/DnF1ZXJ5VGhlbkZldGNoEAAAAAAAAAXIFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF1RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABccWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXWFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFyRZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABcoWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXLFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzBZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABc0WeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXOFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAFzxZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdAWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXSFnhEVi1HVGViVFJxYzdlczBoRFI0clEAAAAAAAAF0RZ4RFYtR1RlYlRScWM3ZXMwaERSNHJRAAAAAAAABdQWeERWLUdUZWJUUnFjN2VzMGhEUjRyUQAAAAAAAAXTFnhEVi1HVGViVFJxYzdlczBoRFI0clE=
刪除所有scroll_id
DELETE /_search/scroll/_all