1. 程式人生 > >ES 學習之路-document

ES 學習之路-document

1.每個index只能有一個type

從elasticsearch6.0開始已經移除多type,也就是說每個index只有一個type,這個與關係型資料庫中的databases有多個table不同,其實在官方有解釋為什麼移除,之前將elasticsearch與關係型資料進行類比就是一個錯誤的做法,因為在elasticsearch的同一個index,如果有多個type,而且這多個type有同樣的field,由於lucene的原因,這些field都必須要有同樣的型別,詳細描述請看方法文件:https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

,而且在elasticsearch後續版本中會完全將type這個概念移除。

2.index API 可以用json的方式新增或更新一個index中文件,並且是這個文件可以被檢索到
PUT test/log/2
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
//新增一個文件到test的log type中,並且指定這個文件的id為2
//返回結果:
{
  "_index": "test",
  "_type": "log",
  "_id"
: "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 3 } //total 表示由多少主分片和副本執行了索引操作 //successful 表示有多少分片副本執行成功 //failed 表示執行失敗的數量 //如果要一次操作成功, 那麼successful至少是1
2.自動索引建立

預設情況下,elasticsearch會自動判斷index是否存在當前文件,如果沒有就新增該文件到庫中,而且會自動匹配Mapping,沒有的Mapping也會自動新增的Mappings中,當然可以使用index.mapper.dynamic=false來禁止自動索引建立,如果沒有指定id,elasticsearch也會自動生成一個唯一id

3.version

每個文件都會有自己的一個version,不論是增刪改,這個version都會發生變化,在新增的時候我們可以指定文件version,更新和查詢的時候也可以指定verison

如果在引數中不提供version,elasticsearch則不會檢驗version

預設情況下version從1開始,如果沒有其他干預,每次更新操作+1

test/log/2?version=2
{
  "message": "test for verison"
}
//更新id為2的文件的message
//返回結果提示版本號不一致,不允許操作
{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[log][2]: version conflict, current version [1] is different than the one provided [2]",
        "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
        "shard": "2",
        "index": "test"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[log][2]: version conflict, current version [1] is different than the one provided [2]",
    "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
    "shard": "2",
    "index": "test"
  },
  "status": 409
}
4.Operation Type

在API中可以使用op_type引數進行一些特殊的操作,如預設情況下是如果文件不存在就建立,如果存在這是更新,但是下邊這個例項:

PUT test/log/1?op_type=create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
//這個樣例就是強制建立一個文件
//不同的寫法
PUT twitter/_doc/1/_create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
5.Routing

預設情況下文件放置的位置是用過文件id的hash值控制,為了顯示的控制,可以使用路由引數在每次操作的基礎上直接指定輸入到路由器使用的雜湊函式中的值,如:

POST test/log?routing=kimchy
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
6.Timeout

可以在api中設定超時時間

PUT test/log/1?timeout=5m
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
7.GET Index API

GET Index API 可以通過id獲取json格式的文件內容,如下:

GET /test/log/1
返回結果:
{
  "_index": "test",
  "_type": "log",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "arr": [
      {
        "name": "1"
      },
      {
        "name": "2"
      }
    ]
  }
}

在預設情況下,這個介面是實時性的,如果文件呼叫了update介面,但是沒有reflush,這個時候對GET 介面沒有任何影響,它會自動重新整理文件內容,如果想要禁止獲取實時文件資料,可以在引數中提供realtime為false

像上邊的呼叫那樣,介面預設是會返回_source結果的,如果不想返回該結果,可以提供引數 _source=false禁止

GET /test/log/1?_source=false

如果想要獲取其中的一個或多個指定的field資料可以像下邊這樣:

GET test/log/1?_source_include=*.id&_source_exclude=entities
//_source_exclude結果中排出項
//_source_include結果中包含項

可以在引數中指定version獲取特定版本的文件資訊

GET /test/log/1?version=1
8.DELETE INDEX API

elasticsearch提供API刪除指定id對應的文件

DELETE /test/log/1
{
  "_index": "test",
  "_type": "log",
  "_id": "1",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 4
}
//當刪除是, 返回結果中有_version,使用它可以代表著文件已經被更改,在elasticsearch中,文件新增,刪除,更新操作都會有version發生變化

當使用控制路由的能力進行索引時,為了刪除文件,還應該提供路由值

DELETE /test/log/1?routing=kimchy

當執行刪除API後,需要執行refresh才能在索引生效

刪除API可以帶有超時機制

DELETE /test/log/1?timeout=5m
9.DELETE INDEX BY QUERY API

elasticsearch提供通過查詢的方式刪除,也就是將查詢結果對應的文件刪除

POST /test/_delete_by_query
{
  "query": {
    "match": {
      "name": "mjlf"
    }
  }
}
//返回結果
{
  "took": 104,
  "timed_out": false,
  "total": 1,
  "deleted": 1,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

_DELETE_BY_Query API 在索引啟動時獲取該索引的快照,並使用內部版本控制刪除它發現的內容。這意味著,如果文件在快照拍攝時間和刪除請求處理時間之間發生更改,則會出現版本衝突。當版本匹配時,文件將被刪除。

由於內部版本控制不支援0作為有效的版本號,因此不能使用_DELETE_BY_Query刪除版本等於零的文件,並將請求失敗。

10.Index Document Update
POST /test/log/2/_update
{
  "script": {
    "source": "ctx._source.name += params.name",
    "params": {
      "name": "love f"
    }
  }
}

//向陣列中新增新的內容
POST /test/log/2/_update
{
  "script": {
    "source": "ctx._source.tags.add(params.tag)",
    "lang": "painless",
    "params": {
      "tag": "name"
    }
  }
}

//新增新欄位
POST /test/log/2/_update
{
    "script" : "ctx._source.new_field = 'value_of_new_field'"
}

//移除欄位
POST /test/log/2/_update
{
  "script": {
    "source": "ctx._source.remove('new_field')"
  }
}

//if判斷,如果tags中包含name, 執行刪除, 否則什麼都不幹
POST /test/log/2/_update
{
  "script": {
    "source": "if(ctx._source.tags.contains(params.name)){ ctx.op = 'delete' } else { ctx.op = 'none' }",
    "params": {
      "name": "name"
    }
  }
}

//update API 還支援一下方式更新文件
POST /test/log/2/_update
{
  "doc": {
    "name":"mjlf",//更新現有
    "age": 12,//新增field
    "tags": [//新增array
      "tag1"
    ]
  }
}

//
POST /test/log/2/_update
{
  "doc": {
    "age": 13
  }
}
//返回結果, 如果更新中沒有發生任何操作,如上, 原本age就是13,這時不需要更新,所有返回結果中result是noop,表示沒有執行任何操作,可以通過"detect_noop":"false"忽略這樣的檢查
{
  "_index": "test",
  "_type": "log",
  "_id": "2",
  "_version": 4,
  "result": "noop",
  "_shards": {
    "total": 0,
    "successful": 0,
    "failed": 0
  }
}

//如果文件不存在, 執行更新會報錯,但是如果加上upsert引數, 這個可以在沒有文件的時候新增新的問題, 文件存在的時候執行更新操作
{
  "error": {
    "root_cause": [
      {
        "type": "document_missing_exception",
        "reason": "[log][4]: document missing",
        "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
        "shard": "2",
        "index": "test"
      }
    ],
    "type": "document_missing_exception",
    "reason": "[log][4]: document missing",
    "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
    "shard": "2",
    "index": "test"
  },
  "status": 404
}

POST /test/log/3/_update
{
  "doc": {
    "name": "mjlf"
  },
  "upsert": {
    "counter": 1
  }
}

//同時可以使用如下方式
POST /test/log/4/_update
{
  "doc": {
    "name": "new_name"
  },
  "doc_as_upsert": true
}

The update operation supports the following query-string parameters:

retry_on_conflict In between the get and indexing phases of the update, it is possible that another process might have already updated the same document. By default, the update will fail with a version conflict exception. The retry_on_conflict parameter controls how many times to retry the update before finally throwing an exception.
routing Routing is used to route the update request to the right shard and sets the routing for the upsert request if the document being updated doesn’t exist. Can’t be used to update the routing of an existing document.
timeout Timeout waiting for a shard to become available.
wait_for_active_shards The number of shard copies required to be active before proceeding with the update operation. See here for details.
refresh Control when the changes made by this request are visible to search. See ?refresh.
_source Allows to control if and how the updated source should be returned in the response. By default the updated source is not returned. See source filtering for details.
version The update API uses the Elasticsearch’s versioning support internally to make sure the document doesn’t change during the update. You can use the versionparameter to specify that the document should only be updated if its version matches the one specified.
11.Update_By_Query

elasticsearch支援使用查詢匹配的方式進行文件更新,如下

POST test/log/_update_by_query
{
  "script": {
    "source": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}

12.Multi Get API

MultiGET API允許基於索引、型別(可選)和id(可能還包括路由)獲取多個文件。響應包括一個docs陣列,其中包含與原始多個GET請求對應的所有獲取文件(如果某個GET失敗,則在響應中包含一個包含此錯誤的物件)。成功GET的結構在結構上類似於GET API提供的文件。

//所有index中查詢
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2"
        }
    ]
}

//指定index中查詢
GET /test/_mget
{
    "docs" : [
        {
            "_type" : "_doc",
            "_id" : "1"
        },
        {
            "_type" : "_doc",
            "_id" : "2"
        }
    ]
}

//指定index 和 type中查詢
GET /test/type/_mget
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}

//換一種方式
GET /test/type/_mget
{
    "ids" : ["1", "2"]
}

GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "_source" : false//不顯示_source資訊
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2",
            "_source" : ["field3", "field4"]//過濾只顯示指定的field
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "3",
            "_source" : {
                "include": ["user"],//指定包含field
                "exclude": ["user.location"]//指定_source中不包含的field
            }
        }
    ]
}

可以指定每個要獲取的文件檢索特定的儲存欄位,類似於getAPI的儲存_field引數。例如:
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "stored_fields" : ["field1", "field2"]
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2",
            "stored_fields" : ["field3", "field4"]
        }
    ]
}

//Routing
GET /_mget?routing=key1
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "routing" : "key2"
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2"
        }
    ]
}
13.bulk API

bulk像是一個批量操作介面,可以同時順序執行多個操作,詳細描述檢視官方文件:

14.Reindex API

這個API大多時候用來複制一個index中的聞到到一個新的index中,像下邊這樣:

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}
//返回結果
{
  "took" : 147,
  "timed_out": false,
  "created": 120,
  "updated": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "total": 120,
  "failures" : [ ]
}

通過上邊的例項可以複製一個index的文件到另一個index中,但是這樣新index中每個文件的index預設是從1開始, 並非和原來文件一樣,我們可以通過在dest中使用version_type來控制文件的version, 當使用才是為”version_type”:”internal”,使用的是新index內容的version,當引數為”version_type”:”external”時,使用的是原index的version,如:

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "internal"
  }
}//使用新index中的version

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "external"
  }
}//使用原來文件

使用引數op_type:create時,只會在新index中建立沒有的文件,如果在源index和新index中同時存在相同的文件會引起衝突,如:

POST /_reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test1",
    "op_type": "create"
  }
}

{
  "took": 8,
  "timed_out": false,
  "total": 3,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 3,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "test1",
      "type": "log",
      "id": "2",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[log][2]: version conflict, document already exists (current version [8])",
        "index_uuid": "u_eShpBpREiFCytW6hL7jA",
        "shard": "2",
        "index": "test1"
      },
      "status": 409
    },
    {
      "index": "test1",
      "type": "log",
      "id": "3",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[log][3]: version conflict, document already exists (current version [8])",
        "index_uuid": "u_eShpBpREiFCytW6hL7jA",
        "shard": "4",
        "index": "test1"
      },
      "status": 409
    },
    {
      "index": "test1",
      "type": "log",
      "id": "4",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[log][4]: version conflict, document already exists (current version [8])",
        "index_uuid": "u_eShpBpREiFCytW6hL7jA",
        "shard": "2",
        "index": "test1"
      },
      "status": 409
    }
  ]
}

版本衝突會是reindex API結束執行, 但是可以通過conflicts:proceed使其繼續

POST _reindex
{
  "conflicts": "proceed",
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "op_type": "create"
  }
}

也可以通過查詢的方式限制需要複製的源index文件

POST _reindex
{
  "source": {
    "index": "twitter",
    "type": "_doc",
    "query": {
      "term": {
        "user": "kimchy"
      }
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

//多源操作
POST _reindex
{
  "source": {
    "index": ["twitter", "blog"],
    "type": ["_doc", "post"]
  },
  "dest": {
    "index": "all_together"
  }
}

//限制複製到新index中文件的數量
POST _reindex
{
  "size": 1,
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

//限制複製數量,使用date降序優先複製
POST _reindex
{
  "size": 10000,
  "source": {
    "index": "twitter",
    "sort": { "date": "desc" }
  },
  "dest": {
    "index": "new_twitter"
  }
}

//限制源index執行被複制的欄位
POST _reindex
{
  "source": {
    "index": "twitter",
    "_source": ["user", "_doc"]
  },
  "dest": {
    "index": "new_twitter"
  }
}

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "external"
  },
  "script": {
    "source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
    "lang": "painless"
  }
}
//執行復制後可以對源資料進行操作

//遠端複製
POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}