1. 程式人生 > 程式設計 >999 - Elasticsearch Analysis 05 - Character Filter

999 - Elasticsearch Analysis 05 - Character Filter

HTML Strip Character Filter

  • 將html元素替換成對應的解碼值(例如&替換成&)。
  • 示例
POST _analyze
{
  "char_filter": [
    "html_strip"
    ],"tokenizer": "keyword","text": "<p>I&apos;m so <b>happy</b>!</p>"
}
複製程式碼

產生


I'm so happy!

複製程式碼

keyword換成standard產生[ I'm,so,happy ]

  • 配置引數
引數 說明
escaped_tags 會被保留的HTML元素

示例

POST _analyze
{
    "char_filter": [
       {
         "type": "html_strip","escaped_tags":["b"]
       }
      ],"text": "<p>I&apos;m so <b>happy</b>!</p>"
}
複製程式碼

產生


I'm so <b>happy</b>!

複製程式碼

Mapping Character Filter

  • 定義一堆鍵值對,匹配到鍵就替換成值。
  • 配置引數
引數 說明
mappings 鍵值對陣列,格式為key => value
mappings_path 鍵值對檔案路徑。相對於config或絕對路徑。
UTF-8編碼。
每行一個鍵值對,格式為key => value

示例

POST _analyze
{
  "char_filter": [
    {
      "type": "mapping","mappings": [
          "٠ => 0","١ => 1","٢ => 2","٣ => 3","٤ => 4","٥ => 5","٦ => 6"
,"٧ => 7","٨ => 8","٩ => 9" ] } ],"text": "My license plate is ٢٥٠١٥" } 複製程式碼

產生[ My license plate is 25015 ]

上一個例子是單字元的替換,也可以多字元。

POST _analyze
{
  "char_filter": [
    {
      "type": "mapping","mappings": [
        ":) => _happy_",":( => _sad_"
      ]
    }
  ],"text": "I'm delighted about it :("
}
複製程式碼

產生[ I'm delighted about it _sad_ ]

Pattern Replace Character Filter

  • 使用正則表示式去替換。替換文字可以引用捕獲組中的內容。
  • 配置引數
引數 說明
pattern Java正則表示式。必須。
replacement 替換文字。可以使用$1..$9這樣的語法,引用捕獲組中的值。
flags Java正則表示式flags,多個用|分離,例如"CASE_INSENSITIVE | COMMENTS"。

示例

POST _analyze
{
  "char_filter": [
    {
      "type": "pattern_replace","pattern": "(\\d+)-(?=\\d)","replacement": "$1_"
    }
  ],"text": "My credit card is 123-456-789"
}
複製程式碼

產生[ My credit card is 123_456_789 ]