1. 程式人生 > >python elasticsearch 分組統計

python elasticsearch 分組統計

聚合(Aggregations)

     query = {
            "query": {
                "bool": {
                    "must": [
                        {"term": {"company_id": company_id}},
                        {"term": {"subject_type": 2}},


                    ],
                    "must_not": [],
                    "should": []
                }
            },
            "size": 0,   # 設定返回資訊條數
            "aggs": {
                "group_by_real_name": {
                    "terms": {"field": "real_name.keyword"},   
                    "aggs": {                       #條件巢狀
                        "group_by_screen_id": {
                            "terms": {"field": "screen_id"},
                }
            }
        }
    }
}

設定fielddata=true:

在系統終端設定:

curl -i -H "Content-Type:application/json" -XPUT 127.0.0.1:9200/your_index/_mapping/your_type/?pretty  -d'{"your_type":{"properties":{"your_field_name":{"type":"text","fielddata":true}}}}'

將以上標紅位置更改為自己對應的欄位,在Linux上似乎可以直接操作,windows似乎需要下一個curl外掛,我未曾設定過fielddata=true,我是使用了上面real_name.keyword

方法。

curl -i -H "Content-Type:application/json" -XPUT 127.0.0.1:9200/event/_mapping/koala-index/?pretty  -d'{"koala-index":{"properties":{"real_name":{"type":"text","fielddata":true}}}}'
 

去重統計:

query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"company_id": company_id}},
            ],
            "must_not": [],
            "should": []
        }
    },
    "size": 0,
    "aggs": {
        "group_by_screen_id": {
            "terms": {"field": "screen_id"},
            "aggs": {"group_by_subject_type": {
                "terms": {"field": "subject_type"},
                "aggs": {"distinct_subject_ids": {       #去重統計
                    "cardinality": {"field": "subject_id"}
                }
} } } } }, "sort": [ {"timestamp": {"order": 'desc'}} ], "from": (page - 1) * size, "size": size, }