【原創】elasticsearch入門

阿新 • • 發佈：2018-12-10

示例

示例一：

示例二：

示例三：

示例四：

ES介紹

ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎，基於RESTful web介面。Elasticsearch是用Java開發的，並作為Apache許可條款下的開放原始碼釋出，是當前流行的企業級搜尋引擎。

安裝過程

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.4.0.tar.gz  
tar -xvzf elasticsearch-6.4.0.tar.gz  
cd elasticsearch-6.4.0/bin
./elasticsearch -d

修改配置檔案

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.141.129
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#

再次啟動報錯：

[2018-09-13T09:29:43,060][INFO ][o.e.b.BootstrapChecks    ] [7hyiUY2] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

解決方案：

vi /etc/security/limits.conf # 新增兩行行配置，並重連SSH
elasticsearch soft nofile 65536
elasticsearch hard nofile 65537

vi /etc/sysctl.conf # 新增一行配置
vm.max_map_count=262144
sysctl -p

頁面訪問

ES架構

基礎概念

接近實時（NRT） Elasticsearch 是一個接近實時的搜尋平臺。這意味著，從索引一個文件直到這個文件能夠被搜尋到有一個很小的延遲（通常是 1 秒）
叢集(Cluster) 代表一個叢集，叢集中有多個節點，其中有一個為主節點，這個主節點是可以通過選舉產生的，主從節點是對於叢集內部來說的。es的一個概念就是去中心化，字面上理解就是無中心節點，這是對於叢集外部來說的，因為從外部來看es叢集，在邏輯上是個整體，你與任何一個節點的通訊和與整個es叢集通訊是等價的。
節點(Node) 節點是一個單獨執行的elasticsearch例項，它屬於一個叢集。預設情況下，elasticsearch中的每個節點都加入名為“elasticsearch”的叢集。每個節點都可以在elasticsearch中使用自己的elasticsearch.yml，它們可以對記憶體和資源分配有不同的設定。
資料節點(Data Node) 資料節點索引文件並對索引文件執行搜尋。建議新增更多的資料節點，以提高效能或擴充套件叢集。通過在elasticsearch中設定這些屬性，可以使節點成為一個數據節點。elasticsearch.yml配置
管理節點(Master Node) 主節點負責叢集的管理。對於大型叢集，建議有三個專用的主節點(一個主節點和兩個備份節點)，它們只作為主節點，不儲存索引或執行搜尋。在elasticsearch.yml配置宣告節點為主節點:
路由節點亦稱負載均衡節點(Routing Node or load balancer node) 這些節點不扮演主或資料節點的角色，但只需執行負載平衡，或為搜尋請求路由，或將文件編入適當的節點。這對於高容量搜尋或索引操作非常有用。
索引(Index) Elasticsearch索引是一組具有共同特徵的文件集合。每個索引(index)包含多個型別(type)，這些型別依次包含多個文件(document)，每個文件包含多個欄位(Fields)。在Elasticsearch中索引由多個JSON文件組成。在Elasticsearch叢集中可以有多個索引。
型別(Type)[Deprecated] 型別用於在索引中提供一個邏輯分割槽。它基本上表示一類類似型別的文件。一個索引可以有多個型別，我們可以根據上下文來解除它們。
文件(Document)。 Elasticsearch文件是一個儲存在索引中的JSON文件。每個文件都有一個型別和對應的ID，這是惟一的。
對映(Mapping) 對映用於對映文件的每個field及其對應的資料型別，例如字串、整數、浮點數、雙精度數、日期等等。在索引建立過程中，elasticsearch會自動建立一個針對fields的對映，並且根據特定的需求型別，可以很容易地查詢或修改這些對映。
分片(Shard) 代表索引分片，es可以把一個完整的索引分成多個分片，這樣的好處是可以把一個大的索引拆分成多個，分佈到不同的節點上。構成分散式搜尋。分片的數量只能在索引建立前指定，並且索引建立後不能更改。
副本(Replica) 代表索引副本，es可以設定多個索引的副本，副本的作用一是提高系統的容錯性，當某個節點某個分片損壞或丟失時可以從副本中恢復。二是提高es的查詢效率，es會自動對搜尋請求進行負載均衡。
river 代表es的一個數據源，也是其它儲存方式（如：資料庫）同步資料到es的一個方法。它是以外掛方式存在的一個es服務，通過讀取river中的資料並把它索引到es中，官方的river有couchDB的，RabbitMQ的，Twitter的，Wikipedia的。
gateway 代表es索引快照的儲存方式，es預設是先把索引存放到記憶體中，當記憶體滿了時再持久化到本地硬碟。gateway對索引快照進行儲存，當這個es叢集關閉再重新啟動時就會從gateway中讀取索引備份資料。es支援多種型別的gateway，有本地檔案系統（預設），分散式檔案系統，Hadoop的HDFS和amazon的s3雲端儲存服務。

GET /_cat	命令解釋
/_cluster/stats	檢視叢集統計資訊
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes	檢視叢集的節點列表
/_cat/tasks
/_cat/indices	檢視所有索引
/_cat/indices/{index}	檢視指定索引
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health	檢視叢集的健康狀況
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
/_stats	檢視所有的索引狀態

v是用來要求在結果中返回表頭
pretty 格式化json
help 幫助

狀態值說明

Green - everything is good (cluster is fully functional)，即最佳狀態
Yellow - all data is available but some replicas are not yet allocated (cluster is fully functional)，即資料和叢集可用，但是叢集的備份有的是壞的
Red - some data is not available for whatever reason (cluster is partially functional)，即資料和叢集都不可用

索引管理

建立索引

直接建立

PUT twitter

settings

PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    }
}

mappings

PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    },
   "mappings" : {
        "_doc" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}

檢視索引

GET /twitter/
GET /twitter/_search

刪除索引

DELETE /twitter

對映管理

Core Datatypes     核心型別
string
    text and keyword 
Numeric datatypes
    long, integer, short, byte, double, float, half_float, scaled_float 
Date datatype
    date 
Boolean datatype
    boolean 
Binary datatype
    binary 
Range datatypes     範圍
    integer_range, float_range, long_range, double_range, date_range

Complex datatypes 複合型別
Array datatype
    陣列就是多值，不需要專門的型別
Object datatype
    object ：表示值為一個JSON 物件 
Nested datatype
    nested：for arrays of JSON objects（表示值為JSON物件陣列 ）
    
Geo datatypes  地理資料型別
Geo-point datatype
    geo_point： for lat/lon points  （經緯座標點）
Geo-Shape datatype
    geo_shape： for complex shapes like polygons （形狀表示）
    
Specialised datatypes 特別的型別
IP datatype
    ip： for IPv4 and IPv6 addresses 
Completion datatype
    completion： to provide auto-complete suggestions 
Token count datatype
    token_count： to count the number of tokens in a string 
mapper-murmur3
    murmur3： to compute hashes of values at index-time and store them in the index 
Percolator type
    Accepts queries from the query-dsl 
join datatype
    Defines parent/child relation for documents within the same index

文件管理

新建

指定id
PUT twitter/_doc/1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

自動生成id
POST twitter/_doc/
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

檢視

HEAD twitter/_doc/11
GET twitter/_doc/1

更新

PUT twitter/_doc/1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

刪除

DELETE twitter/_doc/1

批處理

POST _bulk
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "_doc", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

index 無論是否存在，都會成功
create 存在會提示
update 不存在會提示
delete 不存在會提示

結構化搜尋

精確值查詢term

POST /my_store/_doc/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8" }

一個欄位查詢

GET my_store/_doc/_search
{
  "query": {
    "term": {
      "price": "30"
    }
  }
}

組合過濾

GET my_store/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "price": 20
          }
        },
        {
          "term": {
            "productID": "XHDK-A-1293-#fJ3"
          }
        }
      ],
      "must_not": {
        "term": {
          "price": 30
        }
      }
    }
  }
}

PUT my_store
{
    "mappings" : {
        "_doc" : {
            "properties" : {
                "productID" : {
                    "type" : "keyword"
                }
            }
        }
    }
}

GET /my_store/_analyze
{
  "field": "productID",
  "text": "XHDK-A-1293-#fJ3"
}

高亮

GET my_store/_doc/_search
{
  "query": {
    "match": {
      "productID": "b"
    }
  },
  "highlight": {
      "pre_tags" : ["<span class='hlt'>"],
      "post_tags" : ["</span>"],
      "title": {},
      "content": {}
    }
  }
}

全文搜尋

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" }
{ "index": { "_id": 2 }}
{ "title": "The quick brown fox jumps over the lazy dog" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" }
{ "index": { "_id": 4 }}
{ "title": "Brown fox brown dog" }

匹配查詢

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "QUICK!"
        }
    }
}

GET /my_index/_analyze
{
  "field": "title",
  "text": "QUICK!"
}

組合查詢

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

分詞

character filter ：字元過濾器，對文字進行字元過濾處理，如處理文字中的html標籤字元。處理完後再交給tokenizer進行分詞。一個analyzer中可包含0個或多個字元過濾器，多個按配置順序依次進行處理。
tokenizer：分詞器，對文字進行分詞。一個analyzer必需且只可包含一個tokenizer。
token filter：詞項過濾器，對tokenizer分出的詞進行過濾處理。如轉小寫、停用詞處理、同義詞處理。一個analyzer可包含0個或多個詞項過濾器，按配置順序進行過濾。

測試分詞器

POST _analyze
{
  "tokenizer": "standard",
  "char_filter":  [ "html_strip" ],
  "filter":  [ "lowercase", "asciifolding" ],
  "text":      "Is this déja vu?"
}

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "微知"
}

內建的分析器

Standard Analyzer
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Fingerprint Analyzer
Custom analyzers

內建的character filter

HTML Strip Character Filter 　　html_strip ：過濾html標籤，解碼HTML entities like &.
Mapping Character Filter 　　mapping ：用指定的字串替換文字中的某字串。
Pattern Replace Character Filter 　　pattern_replace ：進行正則表示式替換。

內建的Tokenizer

Standard Tokenizer
Letter Tokenizer
Lowercase Tokenizer
Whitespace Tokenizer
UAX URL Email Tokenizer
Classic Tokenizer
Thai Tokenizer
NGram Tokenizer
Edge NGram Tokenizer
Keyword Tokenizer
Pattern Tokenizer
Simple Pattern Tokenizer
Simple Pattern Split Tokenizer
Path Hierarchy Tokenizer

示例

PUT customer
{
  "mappings": {
    "_doc": {
      "properties": {
        "customerName": {
          "type": "text",
          "analyzer": "ik_smart",
          "search_analyzer": "ik_smart"
        },
        "companyId": {
          "type": "text"
        }
      }
    }
  }
}


POST /customer/_doc/_bulk
{ "index": { "_id": 1 }}
{ "companyId": "55", "customerName": "微知（上海）服務外包有限公司" }
{ "index": { "_id": 2 }}
{ "companyId": "55", "customerName": "上海微盟" }
{ "index": { "_id": 3 }}
{ "companyId": "55", "customerName": "上海知道廣告有限公司" }
{ "index": { "_id": 4 }}
{ "companyId": "55", "customerName": "微鯨科技有限公司" }
{ "index": { "_id": 5}}
{ "companyId": "55", "customerName": "北京微塵大業電子商務" }
{ "index": { "_id": 6}}
{ "companyId": "55", "customerName": "福建微衝企業諮詢有限公司" }
{ "index": { "_id": 7}}
{ "companyId": "55", "customerName": "上海知盛企業管理諮詢有限公司" }

GET /customer/_doc/_search
{
  "query": {
    "match": {
      "customerName": "知道"
    }
  }
}

GET /customer/_doc/_search
{
  "query": {
    "match": {
      "customerName": "微知"
    }
  }
}

【原創】elasticsearch入門

示例

示例一：

示例二：

示例三：

示例四：

ES介紹

安裝過程

修改配置檔案

頁面訪問

ES架構

基礎概念

狀態值說明

索引管理

建立索引

檢視索引

刪除索引

對映管理

文件管理

新建

檢視

更新

刪除

批處理

結構化搜尋

精確值查詢term

組合過濾

高亮

全文搜尋

匹配查詢

組合查詢

分詞

測試分詞器

內建的分析器

內建的character filter

內建的Tokenizer

更多學習資料

相關推薦