Elasticsearch學習之相關度評分TF&IDF

阿新 • • 發佈：2017-06-26

doc ont core 文檔 color ood req oda 匹配

relevance score算法，簡單來說，就是計算出，一個索引中的文本，與搜索文本，他們之間的關聯匹配程度

Elasticsearch使用的是 term frequency/inverse document frequency算法，簡稱為TF/IDF算法

Term frequency(TF)：搜索文本中的各個詞條在field文本中出現了多少次，出現次數越多，就越相關

Inverse document frequency(IDF)：搜索文本中的各個詞條在整個索引的所有文檔中出現了多少次，出現的次數越多，就越不相關

示例：

搜索請求：hello world

doc1：hello, today is 
 very good
doc2：hi world, how are you

比如說，在index中有1萬條document，hello這個單詞在所有的document中，一共出現了1000次；world這個單詞在所有的document中，一共出現了100次
doc2更相關

Field-length norm：field長度，field越長，相關度越弱

doc1：{ "title": "hello article", "content": "babaaba 1萬個單詞" }
doc2：{ "title": "my article", "content": "blablabala 1萬個單詞，hi world 
" }

hello world 在整個index中出現的次數是一樣多的

doc1 更相關，title field更短

分析一個document是如何被匹配上的

GET /test_index/test_type/6/_explain
{
    "query": {
        "match": {
            "test_field": "test hello"
        }
    }
}

Elasticsearch學習之相關度評分TF&IDF

doc ont core 文檔 color ood req oda 匹配 relevance score算法，簡單來說，就是計算出，一個索引中的文本，與搜索文本，他們之間的關聯匹配程度 Elasticsearch使用的是 term frequency/inverse doc

elasticsearch 筆記十五：相關度評分TF&IDF演算法簡介

演算法介紹 relevance score ,就是計算出一個索引中的文字，與搜尋文字，他們之間的關係，它們之間的關聯匹配度是由Elasticsearch使用的 term frequency/inverse document frequency 演算法，簡稱TF/IDF演算法 Term f

ElasticSearch最佳入門實踐（五十四）相關度評分 TF & IDF 演算法解密

1、演算法介紹 relevance score演算法，簡單來說，就是計算出，一個索引中的文字，與搜尋文字，他們之間的關聯匹配程度 Elasticsearch使用的是 term frequency / inverse document frequency演算法

ElasticSearch教程——lucene的相關度評分TF&IDF演算法以及向量空間模型演算法

1、boolean model 類似and這種邏輯操作符，先過濾出包含指定term的doc query "hello world" --> 過濾 --> hello / world / hello & world bool --> must/mu

elasticsearch 筆記十五：相關度評分TF&IDF演算法簡介

演算法介紹 relevance score ,就是計算出一個索引中的文字，與搜尋文字，他們之間的關係，它們之間的關聯匹配度是由Elasticsearch使用的 term frequency/inverse document frequency 演算法，簡稱TF/IDF演算法

ES 解決字串的排序問題以及相關度評分TF&IDF演算法

如何將一個field索引倆次來解決字串的排序問題如果對一個stringfield進行排序，結果往往不準確，因為分詞後是多個單詞，再排序就不是我們想要的結果了通常的解決方案是，將一個string fi

Elasticsearch學習之多種查詢方式

完全 mar commerce 生產 pro 命令行 str 令行 {} 1. query string search 　　搜索全部商品：GET /ecommerce/product/_search 　　took：耗費了幾毫秒　　timed_out：是否超時，這裏是沒有　　

Elasticsearch學習之深入聚合分析三---案例實戰

引用實戰 avg buck oba core 電視針對過濾 1. 統計指定品牌下每個顏色的銷量任何的聚合，都必須在搜索出來的結果數據中進行，搜索結果，就是聚合分析操作的scope GET /tvs/sales/_search { "size": 0, "

Elasticsearch學習之深入聚合分析五---案例實戰

ppi ont doc indices 理解 req eve 同步 nod 1. fielddata核心原理　　fielddata加載到內存的過程是lazy加載的，對一個analzyed field執行聚合時，才會加載，而且是field-level加載的,一個index的

Elasticsearch學習之深入搜索一 --- 提高查詢的精準度

ast 多少 opera 相關度滿足 ini 無法 sea 進行 1. 為帖子增加標題字段 POST /forum/article/_bulk { "update": { "_id": "1"} } { "doc" : {"title" : "this is java

Elasticsearch學習之深入搜索五 --- phrase matching搜索技術

size 才會匹配 rms blog 文本 mit base 舉例 1. 近似匹配什麽是近似匹配，兩個句子 java is my favourite programming language, and I also think spark is a very good

ElasticSearch學習之——基本的文檔CURD

文檔 uniq 同時 base64 arch source code 多條 pan 一、文檔的添加 POST http://127.0.0.1:9200/{index}/{type}/{id} { "key":"value", "key2":"value2", "key2

Oracle 12c 學習之啟動關閉CDB&PDB

pro sysdba pdb lob ins connect nec plus start 1、默認登陸到CDB <roidb01:cdb:/home/oracle>$sqlplus / as sysdba SQL*Plus: Release 12.1.0.2

Elasticsearch學習之Java操作1

默認 time field java客戶端 OS should timeval nodes spa 1. Elasticsearch為Java用戶提供了兩種內置客戶端 1.1 節點客戶端(node client)：節點客戶端以無數據節點(none data nod

Elasticsearch學習之head插件安裝

left 運行分享 http size itl com sta git 通過elasticseach自帶的plugin命令 elasticsearch/bin/plugin -install mobz/elasticsearch-head 如下圖： 2. z

Elasticsearch 學習之 Marvel概念

毫秒影響合並 marvel com 第一個概念 nbsp arc 概要含義如下：搜索速率：對於單個索引，它是每秒查找次數*分片數。對於多個索引，它是每個索引的搜索速率的總和。搜索延遲：每個分片中的平均延遲。索引速率：對於單個索引，它是每秒索引的數量*分片數量

Elasticsearch 學習之配置文件詳解

服務器 elastic 設置 .org settings filter ESS 通用其它 Elasticsearch配置文件##################### Elasticsearch Configuration Example ################

Elasticsearch學習之ES節點類型以及各種節點的分工

重要決定 ont 增刪改查和數建議獨立 cpu bsp ES各種節點的分工 1. 客戶端節點　　當主節點和數據節點配置都設置為false的時候，該節點只能處理路由請求，處理搜索，分發索引操作等，從本質上來說該客戶節點表現為智能負載平衡器。獨立的客戶端節點在一個比較

Elasticsearch 學習之攜程機票ElasticSearch集群運維馴服記(強烈推薦)

使用情況 strong 簡單而且第一個並不是 5.x ber als 轉自： https://mp.weixin.qq.com/s/wmSTyIGCVhItVNPHcH7nsA 一、整體架構為什麽采用ES作為搜索引擎呢？在做任何事情的時候，不要一上來就急

Elasticsearch 學習之不停止服務，完成升級重啟維護操作

tran cluster 集群 ransient details color settings none nbsp 我們可以設置集群的平衡參數來暫時禁用掉平衡，具體步驟如下： 1.如果可能的話，先暫停掉數據新增和更新操作，這樣會提高集群恢復的時間； 2.禁用集群分片平衡操作

Elasticsearch學習之相關度評分TF&IDF

相關推薦