lucene整合初體驗
阿新 • • 發佈:2018-11-07
最近需要整合一個搜尋引擎到專案上,首先用solr整合到tomcat伺服器,然後通過配置檔案寫sql,從資料庫中直接取資料。但是一直取不到資料。調查了好久也沒有查到問題原因。因為時間比較緊,就換了相對簡單好搞得lucene.大體的思路是通過全檢索,把資料庫中的所有文章資料,和lucene檔案建立起同步索引。
public static void Index(List<Article> rs, String lucenepath) { try { Directory directory = FSDirectory.open(new File(lucenepath)); IndexWriter indexWriter = new IndexWriter(directory,LuceneUtils.analyzer,MaxFieldLength.LIMITED); for(Article article : rs) { Document doc = new Document(); doc.add(new Field("id", article.getId(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); if(article.getContent()!= null){ doc.add(new Field("content", article.getContent(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); } doc.add(new Field("title", article.getTitle(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); doc.add(new Field("column_info_id", article.getColumnInfo().getId(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); indexWriter.addDocument(doc); } indexWriter.optimize(); indexWriter.close(); } catch (IOException e) { System.out.println(e); } }
建好索引之後就是檢索了
public static List<Article> seacher(String queryString, String lucenepath) { List<Article> articleList = new ArrayList<Article>(); try { Directory directory = FSDirectory.open(new File(lucenepath)); IndexSearcher is = new IndexSearcher(directory); MultiFieldQueryParser parser=new MultiFieldQueryParser(Version.LUCENE_30, new String[]{"title","content"},LuceneUtils.analyzer); /* QueryParser parser = new QueryParser(Version.LUCENE_30, "content", LuceneUtils.analyzer);*/ Query query = parser.parse(queryString); //返回搜尋結果 TopDocs docs = is.search(query, 100); ScoreDoc[] scoreDocs = docs.scoreDocs; for (ScoreDoc scoreDoc : scoreDocs) { int num = scoreDoc.doc; Document document = is.doc(num); Article article = DocumentUtils.document2Article(document); articleList.add(article); } //重複資料過濾 articleList = articleList.stream().distinct() .collect(Collectors.toList()); articleList.forEach(System.out::println); } catch (Exception e) { System.out.print(e); } return articleList; }
這個時候一個簡單的lucene就寫好了
當然還有pom.xml引入,因為版本的原因這裡花了很長時間
<!--lucene--> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-memory</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-highlighter</artifactId> <version>3.0.1</version> </dependency> <!--mmseg4j 的分析器的使用 --> <!--lucene--> <dependency> <groupId>com.chenlb.mmseg4j</groupId> <artifactId>mmseg4j-core</artifactId> <version>1.10.0</version> </dependency>
關於分詞的問題也考慮了幾個不同的分詞器,後來決定用盤古
拿到資料之後就涉及到分頁的問題了,
//查詢起始記錄位置 int begin = DEFAULT_SIZE * (Integer.parseInt(pageStr) - 1) ; //查詢終止記錄位置 int end = Math.min(begin + DEFAULT_SIZE, articleList.size()); List<Article> articles = new ArrayList<Article>(); //進行分頁查詢 for(int i=begin;i<end;i++) { articles.add(articleList.get(i)); } Map pageMap = new HashMap<>(); pageMap.put("currentPage", pageStr); pageMap.put("pageSize", sizeStr); pageMap.put("totalCount", articleList.size()); pageMap.put("totalPage", getTotalPage(articleList.size(), Integer.parseInt(sizeStr))); pageMap.put("pagination", getPagination(null) ); pageMap.put("term", term); getJspContext().setAttribute(var, articles); getJspContext().setAttribute(varPage, pageMap); getJspBody().invoke(null); }
這樣一個基本檢索分頁的功能就實現了。當然有很多的不足需要去優化比如高亮展示,提升檢索速度等