ElasticSearch最佳入門實踐(三十九)倒排索引核心原理揭祕
阿新 • • 發佈:2018-11-07
1、例子,兩段文字
doc1:I really liked my small dogs, and I think my mom also liked them
doc2:He never liked any dogs, so I hope that my mom will not expect me to liked him.
2、初步的倒排索引的建立
演示倒排索引最簡單的建立的一個過程
work | doc1 | doc2 |
---|---|---|
I | * | * |
really | * | |
liked | * | * |
my | * | * |
small | * | |
dogs | * | |
and | * | |
think | * | |
mom | * | * |
also | * | |
them | * | |
He | * | |
never | * | |
any | * | |
so | * | |
hope | * | |
that | * | |
will | * | |
not | * | |
expect | * | |
me | * | |
to | * | |
him | * |
搜尋
mother like little dog,不可能有任何結果
mother
like
little
dog
這個是不是我們想要的搜尋結果???絕對不是,因為在我們看來,mother和mom有區別嗎?同義詞,都是媽媽的意思。like和liked有區別嗎?沒有,都是喜歡的意思,只不過一個是現在時,一個是過去時。little和small有區別嗎?同義詞,都是小小的。dog和dogs有區別嗎?狗,只不過一個是單數,一個是複數。
normalization,建立倒排索引的時候,會執行一個操作,也就是說對拆分出的各個單詞進行相應的處理,以提升後面搜尋的時候能夠搜尋到相關聯的文件的概率
時態的轉換,單複數的轉換,同義詞的轉換,大小寫的轉換
mom —> mother
liked —> like
small —> little
dogs —> dog
3、重新建立倒排索引,加入normalization,再次用mother liked little dog搜尋,就可以搜尋到了
work | doc1 | doc2 |
---|---|---|
I | * | * |
really | * | |
like | * | * |
my | * | * |
little | * | |
dog | * | |
and | * | |
think | * | |
mom | * | * |
also | * | |
them | * | |
He | * | |
never | * | |
any | * | |
so | * | |
hope | * | |
that | * | |
will | * | |
not | * | |
expect | * | |
me | * | |
to | * | |
him | * |
mother like little dog,分詞,normalization
mother --> mom
like --> like
little --> little
dog --> dog
doc1和doc2都會搜尋出來