1. 程式人生 > >Facet切面統計(高版本中為aggregations)

Facet切面統計(高版本中為aggregations)

儘管官網上強調,facet在以後的版本中將會從elasticsearch中移除,推薦使用aggregations。但在工作上,自己還是使用了facet。在閱讀《Mastering Elasticsearch》的時候,看到了對facet的介紹,介紹的非常的實用和易懂,於是就摘譯了一部分出來,供需要的參考。

當使用ElasticSearch 刻面(faceting)機制時,需要牢記:刻面(faceting)結果僅在查詢(query)結果上計算;如果你在query實體外包含過濾(filter),這樣的過濾不會限制刻面統計的文件(document)

來看例子:

  首先,使用以下命令往books索引內插入一些文字:

curl -XPUT 'localhost:9200/books/book/1' -d '{
"id":"1", "title":"Test book 1", "category":"book",
"price":29.99

}'

curl -XPUT 'localhost:9200/books/book/2' -d '{
"id":"2", "title":"Test book 2", "category":"book",
"price":39.99

}'

curl -XPUT 'localhost:9200/books/book/3' -d '{
"id":"3", "title":"Test comic 1","category":"comic",
"price":11.99

}'

curl -XPUT 'localhost:9200/books/book/4' -d '{
"id":"4", "title":"Test comic 2","category":"comic",
"price":15.99

}'

讓我們來看看當使用查詢(query)和過濾(filter)時,刻面(faceting)是如何工作的。我們將會執行一個簡單的查詢(query)——返回books索引上的所有文件。同樣,我們會包含一個過濾來將查詢結果限制僅僅屬於book分類(category),以及包含一個針對price欄位的範圍切面,來檢視有多少文件的價格低於30和有多少是高於30.整個查詢如下:

{

    "query": {

       "match_all": {}

    },

    "filter": {

        "term": {

           "category": "book"

        }

    },

    "facets": {

        "price": {

           "range": {

               "field": "price",

               "ranges": [

                    {

                       "to": 30

                    },

                    {

                       "from": 30

                    }

                ]

            }

        }

    }

}

執行後,我們將得到以下結果:

{

"hits":{

        "total":2,

       "max_score": 1.0,

        "hits": [

            {

                "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 1.0,

               "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 1.0,

                "_source": {

                   "id": "2",

                   "title": "Test book 2",

                   "category": "book",

                   "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 3,

                   "min": 11.99,

                   "max": 29.99,

                   "total_count": 3,

                   "total": 57.97,

                   "mean": 19.323333333333334

                },

                {

                   "from": 30.0,

                   "count": 1,

                   "min": 39.99,

                   "max": 39.99,

                    "total_count": 1,

                   "total": 39.99,

                   "mean": 39.99

                }

            ]

        }

    }

}

從結果可以看出,儘管filter限制只包括category欄位取值為book的文件,但facet並不是只在這些文件上執行,而是在books索引上的所有文件上執行(因為match_all查詢)。也就是說,刻面機制在計算的時候是不考慮filter的。但如果filter作為query的一部分呢?比如filtered查詢?繼續看例子。

{

    "query": {

       "filtered": {

           "query": {

               "match_all": {}

            },

           "filter": {

               "term": {

                   "category": "book"

                }

            }

        }

    },

    "facets": {

        "price": {

           "range": {

               "field": "price",

               "ranges": [

                    {

                       "to": 30

                    },

                    {

                       "from": 30

                    }

                ]

            }

        }

    }

}

返回結果:

{

...

"hits":{

        "total": 2,

       "max_score": 1.0,

        "hits": [

            {

               "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 1.0,

               "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 1.0,

               "_source": {

                   "id": "2",

                   "title": "Test book2",

                   "category": "book",

                   "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 1,

                   "min": 29.99,

                   "max": 29.99,

                   "total_count": 1,

                   "total": 29.99,

                   "mean": 29.99

                },

                {

                   "from": 30.0,

                   "count": 1,

                    "min": 39.99,

                   "max": 39.99,

                   "total_count": 1,

                   "total": 39.99,

                   "mean": 39.99

                }

            ]

        }

    }

}

從返回結果可以看出,這個時候的filter限制了facet的計算範圍。

現在,想象我們想要僅僅對title欄位包含”2”的書籍計算刻面。我們可以在query增加第二個filter,但是這樣的話,會限制查詢結果,這並不是我們想要的。我們要做的是引入facet filter。

在提供facet的同級使用facet_filter,這允許我們限制計算刻面的文字。比如如果想限制刻面計算只針對title欄位包含”2“的文字,elasticsearch語句可修改為:

{

    "query": {

       "filtered": {

            "query": {

               "match_all": {

                }

            },

           "filter": {

               "term": {

                   "category": "book"

                }

            }

        }

    }"facets": {

        "price": {

           "range": {

               "field": "price",

               "ranges": [

                    {

                       "to": 30

                    },

                    {

                       "from": 30

                    }

                ]

            },

           "facet_filter": {

               "term": {

                   "title": "2"

                }

            }

        }

    }

}

返回結果:

{

...

"hits":{

        "total":2,

       "max_score": 1.0,

        "hits": [

            {

               "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 1.0,

               "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 1.0,

                "_source": {

                   "id": "2",

                   "title": "Test book 2",

                   "category": "book",

                   "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 0,

                   "total_count": 0,

                   "total": 0.0,

                   "mean": 0.0

                },

                {

                   "from": 30.0,

                   "count": 1,

                   "min": 39.99,

                   "max": 39.99,

                   "total_count": 1,

                   "total": 39.99,

                    "mean": 39.99

                }

            ]

        }

    }

}

從上面可以看出,facet限制在了一個文字。而query沒變。

現在,假如我們想要對所有category欄位為”book“的文件進行query(查詢),但是想要對索引中的所有文件都進行facet,改怎麼辦呢?

直接看語句吧:

{

   "query": {

       "term": {

           "category": "book"

       }

   },

   "facets": {

       "price": {

           "range": {

                "field":"price",

                "ranges": [

                    {

                        "to": 30

                    },

                    {

                        "from": 30

                    }

                ]

           },

           "global": true

       }

    }

}

返回結果:

{

...

"hits":{

        "total":2,

       "max_score": 0.30685282,

        "hits": [

            {

               "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 0.30685282,

                "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 0.30685282,

               "_source": {

                   "id": "2",

                   "title": "Test book 2",

                   "category": "book",

                    "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 3,

                    "min": 11.99,

                   "max": 29.99,

                   "total_count": 3,

                   "total": 57.97,

                   "mean": 19.323333333333334

                },

                {

                   "from": 30.0,

                   "count": 1,

                   "min": 39.99,

                   "max": 39.99,

                   "total_count": 1,

                   "total": 39.99,

                   "mean": 39.99

                }

            ]

        }

    }

}

這就是global帶給facet的好處。