Elasticsearch 搜尋模組之Cross Cluster Search(跨叢集搜尋)
Cross Cluster Search簡介
cross-cluster search功能允許任何節點作為跨多個群集的federated client(聯合客戶端),與tribe node不同的是cross-cluster search節點並不會加入remote cluster(遠端叢集),而是用輕量的方法連線到remote cluster,以便執行federated search(聯合搜尋)
Remote cluster
要使用cross-cluster search之前需要先了解remote cluster
一個remote cluster中有"name"和seed nodes(種子節點)列表以供引用,註冊remote cluster時,會從其中一個seed node來檢查其叢集狀態,以便在預設情況下選擇最多三個有資格的節點作為gateway nodes(閘道器節點), 叢集中配置了remote cluster的每個節點都連線到一個或多個gateway nodes,並使用它們將federated search到remote cluster。
可以使用叢集設定(可以動態更新)在全域性指定remote cluster,也可以在各個節點中的elasticsearch.yml
指定remote cluster 。
如果節點通過elasticsearch.yml檔案
配置remote cluster,則可以通過該節點連線到remote cluster。換句話說,federated search只有傳送到該節點才能連線到remote cluster。通過cluster settings API 設定的remote cluster叢集中的每個節點(設定了cluster.remote.connect: true的節點)都可以連線。
通過elasticsearch.yml設定
:
cluster: remote: cluster_one: seeds: 127.0.0.1:9300 cluster_two: seeds: 127.0.0.1:9301
cluster_one和cluster_two表示與每個群集連線的任意群集別名。這些名稱之後用於區分本地和遠端索引
使用cluster settings API設定:
PUT _cluster/settings { "persistent": { "cluster": { "remote": {"cluster_one": { "seeds": [ "127.0.0.1:9300" ] }, "cluster_two": { "seeds": [ "127.0.0.1:9301" ] }, "cluster_three": { "seeds": [ "127.0.0.1:9302" ] } } } } }
刪除遠端群集:
PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_three": { "seeds": null } } } } }
刪除cluster_three保留cluster_one和cluster_tow
Remote cluster的設定:
cluster.remote.connections_per_cluster
gateway nodes數量,預設是3
cluster.remote.initial_connect_timeout
節點啟動時等待遠端節點的超時時間,預設是30s
cluster.remote.node.attr
一個節點屬性,用於過濾掉remote cluster中 符合gateway nodes的節點,比如設定cluster.remote.node.attr=gateway,那麼將匹配節點屬性node.attr.gateway: true
cluster.remote.connect
預設情況下,群集中的任意節點都可以充當federated client並連線到remote cluster,cluster.remote.connect可以設定為 false(預設為true)以防止某些節點連線到remote cluster
cluster.remote.${cluster_alias}.skip_unavailable
在節點中跳過特定的群集別名,預設是false
使用cross-cluster search查詢
要搜尋遠端叢集cluster_one上的twitter索引,index名和叢集別用冒號分開:
GET /cluster_one:twitter/_search { "query": { "match": { "user": "kimchy" } } }
與tribe特徵相反,cross-cluster search還可以在不同群集上搜索相同名稱的index:
GET /cluster_one:twitter,twitter/_search { "query": { "match": { "user": "kimchy" } } }
搜尋結果的歧義與索引在請求中消除歧義的方式相同。即使index名稱相同,這些index也會在合併結果時被視為不同的index。從遠端index檢索的所有結果都將以remote cluster的name為字首:
{ "took": 150, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0, "skipped": 0 }, "_clusters": { "total": 2, "successful": 2, "skipped": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "cluster_one:twitter", "_type": "_doc", "_id": "0", "_score": 1, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } }, { "_index": "twitter", "_type": "_doc", "_id": "0", "_score": 2, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } } ] } }
跳過已經斷開連線的叢集:
預設情況下,在執行搜尋請求時,通過cross-cluster search搜尋的所有remote cluster都必須可用,否則整個請求將失敗,並且儘管某些群集可用,但不會返回搜尋結果。可以通過skip_unavailable設定使remote cluster可選,預設設定為false。
PUT _cluster/settings { "persistent": { "cluster.remote.cluster_two.skip_unavailable": true } }
cluster_two就變成可選的了
GET /cluster_one:twitter,cluster_two:twitter,twitter/_search { "query": { "match": { "user": "kimchy" } } }
在本地、cluster_one
,cluster_two中
搜尋索引twitter
{ "took": 150, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0, "skipped": 0 }, "_clusters": { #clusters部分表示一個群集不可用並被跳過 "total": 3, "successful": 2, "skipped": 1 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "cluster_one:twitter", "_type": "_doc", "_id": "0", "_score": 1, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } }, { "_index": "twitter", "_type": "_doc", "_id": "0", "_score": 2, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } } ] } }