ElasticSearch的搜尋關鍵字高亮
一:需求:搜尋一本書的題目,搜尋關鍵字高亮加粗顯示,(根據title中的文字進行全文搜尋,展示出結果來,統計搜尋時間了,返回記錄數).
1. 使用技術如下
SpringBoot 2.0.5 RELEASE ,ElasticSearch 6.4.1,Bootstrap,Thymeleaf 3.0.9 RELEASE, Maven 3.3.9,lombok,IDEA熱部署. ES的測試資料.搭建了一主兩備簡單叢集.叢集名:elasticsearch.
application.yml
server:
port: 8082
2. Maven依賴如下
<parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.0.5.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <properties> <!-- set thymeleaf version SpringBoot預設使用的是Thymeleaf的2.0的版本.--> <thymeleaf.version>3.0.9.RELEASE</thymeleaf.version> <thymeleaf-layout-dialect.version>2.1.1</thymeleaf-layout-dialect.version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <java.version>1.8</java.version> </properties> <dependencies> <!-- 引入ElasticSearch--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <dependency> <groupId>org.elasticsearch.plugin</groupId> <artifactId>transport-netty3-client</artifactId> <version>5.6.10</version> </dependency> <!-- SpringBoot的Web--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <!-- Thymeleaf--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <!--新增lombok --> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.16.18</version> </dependency> <!-- 熱部署--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-devtools</artifactId> <optional>true</optional> </dependency> <!-- SpringBoot的test--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies>
3. 配置ElasticSearch(這裡使用Java註解的配置方式)
ElasticSearchConfig配置如下.
import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.transport.InetSocketTransportAddress; import org.elasticsearch.common.transport.TransportAddress; import org.elasticsearch.transport.client.PreBuiltTransportClient; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import java.net.InetAddress; @Configuration public class ElasticSearchConfig { private static final Logger logger = LoggerFactory.getLogger(ElasticSearchConfig.class); /** ES的配置資訊*/ private static final String CLUSTER_NAME="elasticsearch"; private static final String HOST_NAME="localhost"; private static final Integer PORT=9300; /** 返回TransportClient*/ @Bean public TransportClient client() { logger.info("初始化開始中..."); TransportClient client = null; try { TransportAddress transportAddress = new InetSocketTransportAddress(InetAddress.getByName(HOST_NAME),PORT); // 配置資訊 Settings esSetting = Settings.builder() .put("cluster.name", CLUSTER_NAME) .build(); // 配置資訊Settings自定義 client= new PreBuiltTransportClient(esSetting); client.addTransportAddresses(transportAddress); } catch (Exception e) { logger.error("elasticsearch TransportClient create error!!!", e); } return client; } }
4. Novel實體類
import lombok.Getter;
import lombok.Setter;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import java.util.Date;
@Document(indexName = "book",type="novel")
@Setter
@Getter
public class Novel {
@Id
private String id;
private String title;
private String author;
private Integer word_count;
private Date publish_data;
public Novel(){
super();
}
@Override
public String toString() {
return "Novel{" +
"id=" + id +
", title='" + title + '\'' +
", author='" + author + '\'' +
", word_count=" + word_count +
", publish_data=" + publish_data +
'}';
}
}
5. NovelController
import com.lx.search.elastic.entity.Novel;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.index.query.Operator;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SearchQuery;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.servlet.ModelAndView;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
@Controller
@RequestMapping("/novel")
public class NovelController {
private static final String PATH="searchtitle";
@Autowired
private ElasticsearchTemplate elasticsearchTemplate;
@Autowired
private TransportClient client;
/**
* 搜尋Title並且高亮顯示
*/
@RequestMapping("/search/title/{keyword}")
public ModelAndView searchTitle(@PathVariable String keyword) {
ModelAndView modelAndView=new ModelAndView();
// 統計查詢時間,這裡開始
Instant start = Instant.now();
// 構造查詢條件,使用標準分詞器.
QueryBuilder matchQuery = QueryBuilders.matchQuery("title", keyword).analyzer("standard").operator(Operator.OR);
// 設定高亮,使用預設的highlighter高亮器
HighlightBuilder highlightBuilder = new HighlightBuilder()
.field("title")
.preTags("<span style=\"color:red;font-weight:bold;font-size:15px;\">")
.postTags("</span>");
// 設定查詢欄位
SearchResponse response = client.prepareSearch("book")
.setQuery(matchQuery)
.highlighter(highlightBuilder)
// 設定一次返回的文件數量
.setSize(10)
.get();
// 返回搜尋結果
SearchHits hits = response.getHits();
// 統計搜尋結束時間
Instant end = Instant.now();
System.out.println("共搜尋到: "+hits.getTotalHits() + " 條結果" + "," + "共耗時: " +Duration.between(start, end).toMillis()/1000 + " 秒");
List<Map<Object, Object>> novel=new ArrayList();
for (int i=0;i<hits.getTotalHits();i++) {
// 得到SearchHit物件
SearchHit hit=hits.getAt(i);
// 遍歷結果,使用HashMap存放
Map<Object,Object> map=new LinkedHashMap();
map.put("Source As String", hit.getSourceAsString());
// 返回String格式的文件結果
System.out.println("Source As String:" + hit.getSourceAsString());
map.put("Source As Map", hit.getSource());
// 返回Map格式的文件結果
System.out.println("Source As Map:" + hit.getSource());
// 返回文件所在的索引
map.put("Index", hit.getIndex());
System.out.println("Index:" + hit.getIndex());
// 返回文件所在的型別
map.put("Type", hit.getType());
System.out.println("Type:" + hit.getType());
// 返回文件所在的ID編號
map.put("Id", hit.getId());
System.out.println("Id:" + hit.getId());
// 返回指定欄位的內容,例如這裡返回完整的title的內容
map.put("Title", hit.getSource().get("title"));
System.out.println("title: " + hit.getSource().get("title"));
// 返回文件的評分
map.put("Scope", hit.getScore());
System.out.println("Scope:" + hit.getScore());
// 返回文件的高亮欄位
Text[] text = hit.getHighlightFields().get("title").getFragments();
String hight="";
if (text != null) {
for (Text str : text) {
hight+=str;
System.out.println(str.toString());
}
}
map.put("Highlight", hight);
novel.add(map);
}
modelAndView.addObject("resultlist", novel);
modelAndView.addObject("count", "檢索出: "+"<span style=\"color:red;font-weight:bold;font-size:18px;\">"+hits.getTotalHits()+"</span>"+"條記錄");
modelAndView.addObject("time", ",共耗時: "+"<span style=\"color:red;font-weight:bold;font-size:18px;\">"+Duration.between(start, end).toMillis() + "</span>"+ "ms");
modelAndView.setViewName(PATH);
return modelAndView;
}
}
6. 檢視展示
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8">
<title>【小說標題關鍵字搜尋】</title>
<link rel="stylesheet" th:href="@{/css/bootstrap.min.css}" media="all">
<link rel="stylesheet" th:href="@{/css/index.css}" />
</head>
<body>
<div style="width:100%;height:60px;" align="center">
<h2 style="color:#985f0d;">書籍名稱關鍵字搜尋</h2>
</div>
<br/>
<div align="center">
<span style="font-size: 18px;" th:utext="${count}"></span>
<span style="font-size: 18px;" th:utext="${time}"></span>
</div>
<br/>
<br/>
<div class="bs-example" data-example-id="striped-table">
<table class="table table-bordered table-hover">
<thead>
<tr>
<th style="text-align:center;" scope="row">序號</th>
<th style="text-align:center;">Index</th>
<th style="text-align:center;">Type</th>
<th style="text-align:center;">ID</th>
<th style="text-align:center;">Title</th>
<th style="text-align:center;">Score</th>
</tr>
</thead>
<tbody>
<tr th:each="novel,stat:${resultlist}">
<th style="text-align:center;" th:text="${stat.count}"></th>
<th style="text-align:center;" th:text="${novel['Index']}"></th>
<th style="text-align:center;" th:text="${novel['Type']}"></th>
<th style="text-align:center;" th:text="${novel['Id']}"></th>
<th style="text-align:center;" th:utext="${novel['Highlight']}"></th>
<th style="text-align:center;" th:text="${novel['Scope']}"></th>
</tr>
</tbody>
</table>
</div>
</body>
</html>
注意點如下:
① :如何model域中的屬性值有HTML標籤,使用th:utext解析即可.
② :如何獲取遍歷列表的序號,方式一:${stat.count}是從1開始的,方式 二:${stat.index}是從0開始的,如果從1開始就${stat.index+1}.
③ :Thymeleaf遍歷Map集合${novel['新增的物件名']}.
④ :關鍵字高亮使用了ES的高亮器+Html標籤+Thymeleaf解析含有Html標籤的th:utext實現功能的.
7.搜尋結果展示
8.搜尋結果分析總結.
8.1 分詞器問題.
使用的是standard分詞器.就是預設按照中文一個字一個字的切分,使用的是全文搜尋(),可以指定分詞器,
這裡如何使用ik_max_word和ik_smart均不符合這裡的場景的.
// 構造查詢條件
QueryBuilder matchQuery = QueryBuilders.matchQuery("title", keyword).analyzer("standard").operator(Operator.OR);
matchQuery會對查詢語句進行分詞,分詞後查詢語句中任何一個詞項被匹配,文件就被搜尋到了,如果想查詢匹配所有關鍵字的文件就使用AND條件連線.只匹配一個的就使用OR.(多詞查詢使用).
這篇文章介紹比較好.match查詢是如何使用bool查詢的
使用 Operator.OR (下面兩個條件只要匹配了其中一個就可以搜尋到文件了).
搜尋關鍵字:mybatis 雲飛
使用 Operator.AND(下面兩個條件都要匹配,才能搜尋到文件).
搜尋關鍵字:mybatis 雲飛
8.2 高亮器.
ES提供了三種高亮器,預設的是highlighter高亮器,postings-highlighter高亮器,fast-vector-highlighter高亮器.預設的highlighter高亮器對儲存的原始文件進行二次分析,速度最慢,但是不需要額外的儲存空間.