1. 程式人生 > >ElasticSearch的搜尋關鍵字高亮

ElasticSearch的搜尋關鍵字高亮

一:需求:搜尋一本書的題目,搜尋關鍵字高亮加粗顯示,(根據title中的文字進行全文搜尋,展示出結果來,統計搜尋時間了,返回記錄數).

     1. 使用技術如下        

            SpringBoot 2.0.5 RELEASE ,ElasticSearch  6.4.1,Bootstrap,Thymeleaf 3.0.9 RELEASE, Maven 3.3.9,lombok,IDEA熱部署.  ES的測試資料.搭建了一主兩備簡單叢集.叢集名:elasticsearch.

     application.yml

server:
  port: 8082

     2. Maven依賴如下

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.0.5.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<properties>
		<!-- set thymeleaf version SpringBoot預設使用的是Thymeleaf的2.0的版本.-->
		<thymeleaf.version>3.0.9.RELEASE</thymeleaf.version>
		<thymeleaf-layout-dialect.version>2.1.1</thymeleaf-layout-dialect.version>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
		<java.version>1.8</java.version>
	</properties>
	<dependencies>
        <!-- 引入ElasticSearch-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
		</dependency>
        <dependency>
            <groupId>org.elasticsearch.plugin</groupId>
            <artifactId>transport-netty3-client</artifactId>
            <version>5.6.10</version>
        </dependency>
		<!-- SpringBoot的Web-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
        <!-- Thymeleaf-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-thymeleaf</artifactId>
		</dependency>
		<!--新增lombok -->
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
			<version>1.16.18</version>
		</dependency>
		<!-- 熱部署-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-devtools</artifactId>
			<optional>true</optional>
		</dependency>
		<!-- SpringBoot的test-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

       3. 配置ElasticSearch(這裡使用Java註解的配置方式)

          ElasticSearchConfig配置如下.

import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.net.InetAddress;

@Configuration
public class ElasticSearchConfig {
    private static final Logger logger = LoggerFactory.getLogger(ElasticSearchConfig.class);
    /** ES的配置資訊*/
    private static final String CLUSTER_NAME="elasticsearch";
    private static final String HOST_NAME="localhost";
    private static final Integer PORT=9300;

    /** 返回TransportClient*/
   @Bean
    public TransportClient client() {
           logger.info("初始化開始中...");
           TransportClient client = null;
           try {
               TransportAddress transportAddress = new InetSocketTransportAddress(InetAddress.getByName(HOST_NAME),PORT);
               // 配置資訊
               Settings esSetting = Settings.builder()
                       .put("cluster.name", CLUSTER_NAME)
                       .build();
               // 配置資訊Settings自定義
               client= new PreBuiltTransportClient(esSetting);
               client.addTransportAddresses(transportAddress);
           } catch (Exception e) {
               logger.error("elasticsearch TransportClient create error!!!", e);
           }
           return client;
   }
}

4. Novel實體類

import lombok.Getter;
import lombok.Setter;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import java.util.Date;

@Document(indexName = "book",type="novel")
@Setter
@Getter
public class Novel {
    @Id
    private String id;
    private String title;
    private String author;
    private Integer word_count;
    private Date publish_data;

    public Novel(){
      super();
    }
    

    @Override
    public String toString() {
        return "Novel{" +
                "id=" + id +
                ", title='" + title + '\'' +
                ", author='" + author + '\'' +
                ", word_count=" + word_count +
                ", publish_data=" + publish_data +
                '}';
    }
}

5. NovelController

import com.lx.search.elastic.entity.Novel;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.index.query.Operator;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SearchQuery;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.servlet.ModelAndView;
import java.time.Duration;
import java.time.Instant;
import java.util.*;

@Controller
@RequestMapping("/novel")
public class NovelController {
    private static final String PATH="searchtitle";

    @Autowired
    private ElasticsearchTemplate elasticsearchTemplate;

    @Autowired
    private TransportClient client;

    /**
     * 搜尋Title並且高亮顯示
     */
    @RequestMapping("/search/title/{keyword}")
    public ModelAndView searchTitle(@PathVariable String keyword) {
        ModelAndView modelAndView=new ModelAndView();
        // 統計查詢時間,這裡開始
        Instant start = Instant.now();
        // 構造查詢條件,使用標準分詞器.
        QueryBuilder matchQuery = QueryBuilders.matchQuery("title", keyword).analyzer("standard").operator(Operator.OR);
        // 設定高亮,使用預設的highlighter高亮器
        HighlightBuilder highlightBuilder = new HighlightBuilder()
                .field("title")
                .preTags("<span style=\"color:red;font-weight:bold;font-size:15px;\">")
                .postTags("</span>");
        // 設定查詢欄位
        SearchResponse response = client.prepareSearch("book")
                .setQuery(matchQuery)
                .highlighter(highlightBuilder)
                // 設定一次返回的文件數量
                .setSize(10)
                .get();
        // 返回搜尋結果
        SearchHits hits = response.getHits();
        // 統計搜尋結束時間
        Instant end = Instant.now();
        System.out.println("共搜尋到: "+hits.getTotalHits() + " 條結果" + "," + "共耗時: " +Duration.between(start, end).toMillis()/1000 + " 秒");
        List<Map<Object, Object>> novel=new ArrayList();
        for (int i=0;i<hits.getTotalHits();i++) {
            // 得到SearchHit物件
            SearchHit hit=hits.getAt(i);
            // 遍歷結果,使用HashMap存放
            Map<Object,Object> map=new LinkedHashMap();
            map.put("Source As String", hit.getSourceAsString());
            // 返回String格式的文件結果
            System.out.println("Source As String:" + hit.getSourceAsString());
            map.put("Source As Map", hit.getSource());
            // 返回Map格式的文件結果
            System.out.println("Source As Map:" + hit.getSource());
            // 返回文件所在的索引
            map.put("Index", hit.getIndex());
            System.out.println("Index:" + hit.getIndex());
            // 返回文件所在的型別
            map.put("Type", hit.getType());
            System.out.println("Type:" + hit.getType());
            // 返回文件所在的ID編號
            map.put("Id", hit.getId());
            System.out.println("Id:" + hit.getId());
            // 返回指定欄位的內容,例如這裡返回完整的title的內容
            map.put("Title", hit.getSource().get("title"));
            System.out.println("title: " + hit.getSource().get("title"));
            // 返回文件的評分
            map.put("Scope", hit.getScore());
            System.out.println("Scope:" + hit.getScore());
            // 返回文件的高亮欄位
            Text[] text = hit.getHighlightFields().get("title").getFragments();
            String hight="";
            if (text != null) {
                for (Text str : text) {
                    hight+=str;
                    System.out.println(str.toString());
                }
            }
            map.put("Highlight", hight);
            novel.add(map);
        }
        modelAndView.addObject("resultlist", novel);
        modelAndView.addObject("count", "檢索出: "+"<span style=\"color:red;font-weight:bold;font-size:18px;\">"+hits.getTotalHits()+"</span>"+"條記錄");
        modelAndView.addObject("time", ",共耗時: "+"<span style=\"color:red;font-weight:bold;font-size:18px;\">"+Duration.between(start, end).toMillis() + "</span>"+ "ms");
        modelAndView.setViewName(PATH);
        return modelAndView;
    }
}

6. 檢視展示

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:th="http://www.thymeleaf.org">
<head>
    <meta charset="UTF-8">
    <title>【小說標題關鍵字搜尋】</title>
    <link rel="stylesheet" th:href="@{/css/bootstrap.min.css}"  media="all">
    <link rel="stylesheet" th:href="@{/css/index.css}" />
</head>
<body>
      <div style="width:100%;height:60px;" align="center">
           <h2 style="color:#985f0d;">書籍名稱關鍵字搜尋</h2>
      </div>
      <br/>
      <div align="center">
          <span style="font-size: 18px;" th:utext="${count}"></span>
          <span style="font-size: 18px;" th:utext="${time}"></span>
      </div>
      <br/>
      <br/>
      <div class="bs-example" data-example-id="striped-table">
      <table class="table table-bordered table-hover">
             <thead>
                 <tr>
                     <th style="text-align:center;" scope="row">序號</th>
                     <th style="text-align:center;">Index</th>
                     <th style="text-align:center;">Type</th>
                     <th style="text-align:center;">ID</th>
                     <th style="text-align:center;">Title</th>
                     <th style="text-align:center;">Score</th>
                 </tr>
             </thead>
             <tbody>
                 <tr th:each="novel,stat:${resultlist}">
                     <th style="text-align:center;" th:text="${stat.count}"></th>
                     <th style="text-align:center;" th:text="${novel['Index']}"></th>
                     <th style="text-align:center;" th:text="${novel['Type']}"></th>
                     <th style="text-align:center;" th:text="${novel['Id']}"></th>
                     <th style="text-align:center;" th:utext="${novel['Highlight']}"></th>
                     <th style="text-align:center;" th:text="${novel['Scope']}"></th>
                </tr>
             </tbody>
      </table>
      </div>
</body>
</html>

 注意點如下:
      :如何model域中的屬性值有HTML標籤,使用th:utext解析即可.

      :如何獲取遍歷列表的序號,方式一:${stat.count}是從1開始的,方式  二:${stat.index}是從0開始的,如果從1開始就${stat.index+1}.

        :Thymeleaf遍歷Map集合${novel['新增的物件名']}.

       :關鍵字高亮使用了ES的高亮器+Html標籤+Thymeleaf解析含有Html標籤的th:utext實現功能的.

7.搜尋結果展示

  

8.搜尋結果分析總結.

    8.1 分詞器問題.

    使用的是standard分詞器.就是預設按照中文一個字一個字的切分,使用的是全文搜尋(),可以指定分詞器,

   這裡如何使用ik_max_word和ik_smart均不符合這裡的場景的.

// 構造查詢條件
QueryBuilder matchQuery = QueryBuilders.matchQuery("title", keyword).analyzer("standard").operator(Operator.OR);

     matchQuery會對查詢語句進行分詞,分詞後查詢語句中任何一個詞項被匹配,文件就被搜尋到了,如果想查詢匹配所有關鍵字的文件就使用AND條件連線.只匹配一個的就使用OR.(多詞查詢使用).  

     這篇文章介紹比較好.match查詢是如何使用bool查詢的

     使用 Operator.OR (下面兩個條件只要匹配了其中一個就可以搜尋到文件了).

    搜尋關鍵字:mybatis 雲飛

     使用 Operator.AND(下面兩個條件都要匹配,才能搜尋到文件).

     搜尋關鍵字:mybatis 雲飛

    8.2 高亮器.

    ES提供了三種高亮器,預設的是highlighter高亮器,postings-highlighter高亮器,fast-vector-highlighter高亮器.預設的highlighter高亮器對儲存的原始文件進行二次分析,速度最慢,但是不需要額外的儲存空間.

  

至此.簡單完成了ES的全文搜尋搜尋結果高亮加粗顯示.