Elasticsearch 5.5 Mapping詳解

阿新 • • 發佈：2018-12-06

前言

一、Field datatype(欄位資料型別)

1.1string型別

ELasticsearch 5.X之後的欄位型別不再支援string，由text或keyword取代。如果仍使用string，會給出警告。

測試：

PUT my_index { "mappings": { "my_type": { "properties": { "title": { "type": "string" } } } } }

結果：

#! Deprecation: The [string] field is deprecated, please use [text] or [keyword] instead on [title] { "acknowledged": true, "shards_acknowledged": true }

1.2 text型別

text取代了string，當一個欄位是要被全文搜尋的，比如Email內容、產品描述，應該使用text型別。設定text型別以後，欄位內容會被分析，在生成倒排索引以前，字串會被分析器分成一個一個詞項。text型別的欄位不用於排序，很少用於聚合（termsAggregation除外）。

把full_name欄位設為text型別的Mapping如下：

PUT my_index { "mappings": { "my_type": { "properties": { "full_name": { "type": "text" } } } } }

1.3 keyword型別

keyword型別適用於索引結構化的欄位，比如email地址、主機名、狀態碼和標籤。如果欄位需要進行過濾(比如查詢已釋出部落格中status屬性為published的文章)、排序、聚合。keyword型別的欄位只能通過精確值搜尋到。

1.4 數字型別

對於數字型別，ELasticsearch支援以下幾種：

型別	取值範圍
long	-2^63至2^63-1
integer	-2^31至2^31-1
short	-32,768至32768
byte	-128至127
double	64位雙精度IEEE 754浮點型別
float	32位單精度IEEE 754浮點型別
half_float	16位半精度IEEE 754浮點型別
scaled_float	縮放型別的的浮點數（比如價格只需要精確到分，price為57.34的欄位縮放因子為100，存起來就是5734）

對於float、half_float和scaled_float,-0.0和+0.0是不同的值，使用term查詢查詢-0.0不會匹配+0.0，同樣range查詢中上邊界是-0.0不會匹配+0.0，下邊界是+0.0不會匹配-0.0。

對於數字型別的資料，選擇以上資料型別的注意事項：

在滿足需求的情況下，儘可能選擇範圍小的資料型別。比如，某個欄位的取值最大值不會超過100，那麼選擇byte型別即可。迄今為止吉尼斯記錄的人類的年齡的最大值為134歲，對於年齡欄位，short足矣。欄位的長度越短，索引和搜尋的效率越高。
優先考慮使用帶縮放因子的浮點型別。

例子：

PUT my_index { "mappings": { "my_type": { "properties": { "number_of_bytes": { "type": "integer" }, "time_in_seconds": { "type": "float" }, "price": { "type": "scaled_float", "scaling_factor": 100 } } } } }

1.5 Object型別

JSON天生具有層級關係，文件會包含巢狀的物件：

PUT my_index/my_type/1 { "region": "US", "manager": { "age": 30, "name": { "first": "John", "last": "Smith" } } }

上面的文件中，整體是一個JSON，JSON中包含一個manager,manager又包含一個name。最終，文件會被索引成一平的key-value對：

{ "region": "US", "manager.age": 30, "manager.name.first": "John", "manager.name.last": "Smith" }

上面文件結構的Mapping如下：

PUT my_index { "mappings": { "my_type": { "properties": { "region": { "type": "keyword" }, "manager": { "properties": { "age": { "type": "integer" }, "name": { "properties": { "first": { "type": "text" }, "last": { "type": "text" } } } } } } } } }

1.6 date型別

JSON中沒有日期型別，所以在ELasticsearch中，日期型別可以是以下幾種：

日期格式的字串：e.g. “2015-01-01” or “2015/01/01 12:10:30”.
long型別的毫秒數( milliseconds-since-the-epoch)
integer的秒數(seconds-since-the-epoch)

日期格式可以自定義，如果沒有自定義，預設格式如下：

"strict_date_optional_time||epoch_millis"

例子:

PUT my_index { "mappings": { "my_type": { "properties": { "date": { "type": "date" } } } } } PUT my_index/my_type/1 { "date": "2015-01-01" } PUT my_index/my_type/2 { "date": "2015-01-01T12:10:30Z" } PUT my_index/my_type/3 { "date": 1420070400001 } GET my_index/_search { "sort": { "date": "asc"} }

檢視三個日期型別：

{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "date": "2015-01-01T12:10:30Z" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "date": "2015-01-01" } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "date": 1420070400001 } } ] } }

排序結果：

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": null, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": null, "_source": { "date": "2015-01-01" }, "sort": [ 1420070400000 ] }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": null, "_source": { "date": 1420070400001 }, "sort": [ 1420070400001 ] }, { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": null, "_source": { "date": "2015-01-01T12:10:30Z" }, "sort": [ 1420114230000 ] } ] } }

1.7 Array型別

ELasticsearch沒有專用的陣列型別，預設情況下任何欄位都可以包含一個或者多個值，但是一個數組中的值要是同一種類型。例如：

字元陣列: [ “one”, “two” ]
整型陣列：[1,3]
巢狀陣列：[1,[2,3]],等價於[1,2,3]
物件陣列：[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]

注意事項：

動態新增資料時，陣列的第一個值的型別決定整個陣列的型別
混合陣列型別是不支援的，比如：[1,”abc”]
陣列可以包含null值，空陣列[ ]會被當做missing field對待。

1.8 binary型別

binary型別接受base64編碼的字串，預設不儲存也不可搜尋。

PUT my_index { "mappings": { "my_type": { "properties": { "name": { "type": "text" }, "blob": { "type": "binary" } } } } } PUT my_index/my_type/1 { "name": "Some binary blob", "blob": "U29tZSBiaW5hcnkgYmxvYg==" }

搜尋blog欄位：

GET my_index/_search { "query": { "match": { "blob": "test" } } } 返回結果： { "error": { "root_cause": [ { "type": "query_shard_exception", "reason": "Binary fields do not support searching", "index_uuid": "fgA7UM5XSS-56JO4F4fYug", "index": "my_index" } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "my_index", "node": "3dQd1RRVTMiKdTckM68nPQ", "reason": { "type": "query_shard_exception", "reason": "Binary fields do not support searching", "index_uuid": "fgA7UM5XSS-56JO4F4fYug", "index": "my_index" } } ] }, "status": 400 }

Base64加密、解碼工具：http://www1.tc711.com/tool/BASE64.htm

1.9 ip型別

ip型別的欄位用於儲存IPV4或者IPV6的地址。

PUT my_index { "mappings": { "my_type": { "properties": { "ip_addr": { "type": "ip" } } } } } PUT my_index/my_type/1 { "ip_addr": "192.168.1.1" } GET my_index/_search { "query": { "term": { "ip_addr": "192.168.0.0/16" } } }

1.10 range型別

range型別支援以下幾種：

型別	範圍
integer_range	-2^31至2^31-1
float_range	32-bit IEEE 754
long_range	-2^63至2^63-1
double_range	64-bit IEEE 754
date_range	64位整數，毫秒計時

range型別的使用場景：比如前端的時間選擇表單、年齡範圍選擇表單等。
例子：

PUT range_index { "mappings": { "my_type": { "properties": { "expected_attendees": { "type": "integer_range" }, "time_frame": { "type": "date_range", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" } } } } } PUT range_index/my_type/1 { "expected_attendees" : { "gte" : 10, "lte" : 20 }, "time_frame" : { "gte" : "2015-10-31 12:00:00", "lte" : "2015-11-01" } }

上面程式碼建立了一個range_index索引，expected_attendees的人數為10到20，時間是2015-10-31 12:00:00至2015-11-01。

查詢：

POST range_index/_search { "query" : { "range" : { "time_frame" : { "gte" : "2015-08-01", "lte" : "2015-12-01", "relation" : "within" } } } }

查詢結果：

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "range_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "expected_attendees": { "gte": 10, "lte": 20 }, "time_frame": { "gte": "2015-10-31 12:00:00", "lte": "2015-11-01" } } } ] } }

1.11 nested型別

nested巢狀型別是object中的一個特例，可以讓array型別的Object獨立索引和查詢。使用Object型別有時會出現問題，比如文件 my_index/my_type/1的結構如下：

PUT my_index/my_type/1 { "group" : "fans", "user" : [ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } ] }

user欄位會被動態新增為Object型別。
最後會被轉換為以下平整的形式：

{ "group" : "fans", "user.first" : [ "alice", "john" ], "user.last" : [ "smith", "white" ] }

user.first和user.last會被平鋪為多值欄位，Alice和White之間的關聯關係會消失。上面的文件會不正確的匹配以下查詢(雖然能搜尋到,實際上不存在Alice Smith)：

GET my_index/_search { "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "Smith" }} ] } } }

使用nested欄位型別解決Object型別的不足：

PUT my_index { "mappings": { "my_type": { "properties": { "user": { "type": "nested" } } } } } PUT my_index/my_type/1 { "group" : "fans", "user" : [ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } ] } GET my_index/_search { "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "Smith" }} ] } } } } } GET my_index/_search { "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "White" }} ] } }, "inner_hits": { "highlight": { "fields": { "user.first": {} } } } } } }

1.12token_count型別

token_count用於統計詞頻：

 PUT my_index { "mappings": { "my_type": { "properties": { "name": { "type": "text", "fields": { "length": { "type": "token_count", "analyzer": "standard" } } } } } } } PUT my_index/my_type/1 { "name": "John Smith" } PUT my_index/my_type/2 { "name": "Rachel Alice Williams" } GET my_index/_search { "query": { "term": { "name.length": 3 } } }

1.13 geo point 型別

地理位置資訊型別用於儲存地理位置資訊的經緯度：

PUT my_index { "mappings": { "my_type": { "properties": { "location": { "type": "geo_point" } } } } } PUT my_index/my_type/1 { "text": "Geo-point as an object", "location": { "lat": 41.12, "lon": -71.34 } } PUT my_index/my_type/2 { "text": "Geo-point as a string", "location": "41.12,-71.34" } PUT my_index/my_type/3 { "text": "Geo-point as a geohash", "location": "drm3btev3e86" } PUT my_index/my_type/4 { "text": "Geo-point as an array", "location": [ -71.34, 41.12 ] } GET my_index/_search { "query": { "geo_bounding_box": { "location": { "top_left": { "lat": 42, "lon": -72 }, "bottom_right": { "lat": 40, "lon": -74 } } } } }

二、Meta-Fields(元資料)

2.1 _all

_all欄位是把其它欄位拼接在一起的超級欄位，所有的欄位用空格分開，_all欄位會被解析和索引，但是不儲存。當你只想返回包含某個關鍵字的文件但是不明確地搜某個欄位的時候就需要使用_all欄位。
例子：

PUT my_index/blog/1 { "title": "Master Java", "content": "learn java", "author": "Tom" }

_all欄位包含:[ “Master”, “Java”, “learn”, “Tom” ]

搜尋：

GET my_index/_search { "query": { "match": { "_all": "Java" } } }

返回結果：

{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39063013, "hits": [ { "_index": "my_index", "_type": "blog", "_id": "1", "_score": 0.39063013, "_source": { "title": "Master Java", "content": "learn java", "author": "Tom" } } ] } }

使用copy_to自定義_all欄位：

PUT myindex { "mappings": { "mytype": { "properties": {

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    Elasticsearch 5.4 Mapping詳解
      targe   AI   ref   last   OS   clas   class   5.4   blog   這裏面講解的mapping說的很清楚了。
http://blog.csdn.net/napoay/article/details/73100110
 
修改mapping就比較坑了：
 

  
 

    

    
    Elasticsearch 5.5 Mapping詳解
       
  
   
   前言 
   一Field datatype欄位資料型別 
     
     1string型別 
     2 text型別 
     3 keyword型別 
     4 數字型別 
     5 Object型別 
     6 date型別 
     7 Array型 

  
 

    

    
    CentOS7.5 系統目錄詳解
       
 
 CentOS7系統目錄 
 檔案系統的型別 
   
  
 LINUX有四種基本檔案系統型別：普通檔案、目錄檔案、連線檔案和特殊檔案，可用file命令來識別。 
  普通檔案：如文字檔案、C語言元程式碼、SHELL指令碼、二進位制的可執行檔案等，可用cat、less、more、 

  
 

    

    
    第5章：座標和依賴/5.2 座標詳解
       
 
 
  座標詳解 
 
 
  座標內容包括 
   
    groupid：必選 
     
      概念：通用用java包的形式表示（也就是.（點）表示法），內容一般是組織或者公司下的某個專案 
      例如：org.sonatype.nexus，org.sonatype 為非盈利組織 

  
 

    

    
    Java 資料結構5：Hash詳解
      
							
							
							雜湊表
雜湊表也稱散列表（Hash），Hash表是基於健值對（key - value）直接進行訪問的資料結構。但是他的底層是基於陣列的，通過特定的雜湊函式把key對映到陣列的某個下標來加快查詢速度，對於雜湊表來說，查詢元素的複雜度是O(1)
我們來看一下Hash 

  
 

    

    
    最新安卓整合環信SDK3.5.1步驟詳解大白菜版本，記錄下
      
                最近兩天一直在查詢整合環信SDK的部落格與文章，找來找去，最新的整合過程詳解也是環信官方SDK更新前的，大部分都是SDK3.4.1之前的，剛才測試環信SDK3.5.1測試成功後就來寫篇文章記錄下，在這裡先感謝下這位大神的部落格，附上連線，我是按照他寫的部落格一步一步測試成功的 

  
 

    

    
    訊息佇列RabbitMQ入門與5種模式詳解
      
                

1.RabbitMQ概述

簡介：

MQ全稱為Message Queue，訊息佇列是應用程式和應用程式之間的通訊方法；
	RabbitMQ是開源的，實現了AMQP協議的，採用Erlang(面向併發程式語言)編寫的，可複用的企業級訊息系統；
	AMQP（高階訊息佇列協議） 

  
 

    

    
    AngularJS 1.5 版本Component詳解
      
現在比較火的前段JS框架像 VUE,REACK,ANGULAR，這三種框架都有共同的特點那就是，雙向資料繫結，元件化開發。而在angular1.5的版本之前，都是以directive作為元件化的形式，而directive本身是一個指令，而並非是一個元件，所以它並不能很好的承擔元件這一個職責，所以google 

  
 

    

    
    CentOS 6.5的安裝詳解（圖文詳解）
      

不多說，直接上乾貨！


　　主流： 目前的Linux作業系統主要應用於生產環境， 主流企業級Linux系統仍舊是RedHat或者CentOS。
　　免費： RedHat 和CentOS差別不大，CentOS是一個基於Red Hat Linux 提供的可自由使用原始碼的企業級Linux發行版本 

  
 

    

    
    5.CND技術詳解---全域性負載均衡工作原理及實現
      
                2.基於 DNS 解析的 GSLB 實現機制2.1 DNS工作流程：2.2 DNS 記錄型別及報文格式2.3 基於 DNS 解析的 GSLB 工作方式2.4 負載均衡的策略判斷條件3.基於 DNS 的 GSLB 應用部署方式4.基於應用層協議重定向的 GSLB5.基於 IP  

  
 

    

    
    JVM(5) JVM 引數詳解
      
								
								            
							
							
							晚上忽然發現自己的MAC從執行程式到看到Spring boot日誌時間超過20秒。新建個空的boot空工程也需要10秒才會看到boot的啟動日誌。 
最後設定了gc日誌看了下有無異常情況。 
 
從jv 

  
 

    

    
    Scrapy筆記（5）- Item詳解
      
Item是儲存結構資料的地方，Scrapy可以將解析結果以字典形式返回，但是Python中字典缺少結構，在大型爬蟲系統中很不方便。
Item提供了類字典的API，並且可以很方便的宣告欄位，很多Scrapy元件可以利用Item的其他資訊。
定義Item
定義Item非常簡單，只需要繼承scrapy.Item類 

  
 

    

    
    Mysql5.5升級到5.6步驟詳解
      
                

MySQL升級主要涉及升級包下載、資料許可權備份、配置檔案備份、資料匯入及配置檔案恢復等步驟，本人mysql實際安裝路徑是/usr/local/mysql-5.5.27-linux2.6-x86_64，但本人在建了個link地址/usr/local/mysql鏈到實際地址 

  
 

    

    
    （5）Makefile詳解
           Makefile是一個自動化的編譯工具，關係到整個工程的編譯規則，極大的提高了軟體開發的效率。
 
    （1）Makefile的編譯規則

//Makefile 也可以寫作 makefile1）如果這個工程沒有編譯過，那麼我們 

  
 

    

    
    elasticsearch配置文件詳解
      配置文件   文件夾   master   記錄   elasticsearch的config文件夾裏面有兩個配置文 件：elasticsearch.yml和logging.yml，第一個是es的基本配置文件，第二個是日誌配置文件，es也是使用log4j來記錄日 誌的，所以logging.yml裏的設置按普通 

  
 

    

    
    ES mapping 詳解
       
 
 
 1 mapping type 
 對映（mapping） 
 對映是定義一個文件以及其所包含的欄位如何被儲存和索引的方法。 
 例如，用對映來定義以下內容： 
 
  哪些 string 型別的 field 應當被當成當成 full-text 欄位 
  哪些欄位應該是數值型別、日期型別或者是地 

  
 

    

    
    Elasticsearch date 類型詳解
      ping   轉化   rsa   查詢   apidoc   cond   format   ise   因此   引言
一直對 elasticsearch 中的 date 類型認識比較模糊，而且在使用中又比較常見,這次決定多花些時間，徹底弄懂它，希望能對用到的同學提供幫助。
註意：本文測試使用是 elas 

  
 

    

    
    logstash-input-jdbc實現mysql 與elasticsearch實時同步深入詳解
       
 引言： 
 elasticsearch 的出現使得我們的儲存、檢索資料更快捷、方便。但很多情況下，我們的需求是：現在的資料儲存在mysql、oracle等關係型傳統資料庫中，如何儘量不改變原有資料庫表結構，將這些資料的insert,update,delete操作結果實時同步到elasticsearch( 

  
 

    

    
    Elasticsearch Java API深入詳解
       
 0、題記 
 之前Elasticsearch的應用比較多，但大多集中在關係型、非關係型資料庫與Elasticsearch之間的同步。以上內容完成了Elasticsearch所需要的基礎資料量的供給。但想要在海量的資料中找到和自己相關的業務資料，實現對已有的資料實現全文檢索、分類統計等功能並應用到業務系統 

  
 

    

    
    乾貨 | Elasticsearch Nested型別深入詳解
      
							
							
							0、概要

在Elasticsearch實戰場景中，我們或多或少會遇到巢狀文件的組合形式，反映在ES中稱為父子文件。
父子文件的實現，至少包含以下兩種方式：
1）父子文件
父子文件在5.X版本中通過parent-child父子type實現，即：1個索引對應多個t