1. 程式人生 > >lucene整合初體驗

lucene整合初體驗

最近需要整合一個搜尋引擎到專案上,首先用solr整合到tomcat伺服器,然後通過配置檔案寫sql,從資料庫中直接取資料。但是一直取不到資料。調查了好久也沒有查到問題原因。因為時間比較緊,就換了相對簡單好搞得lucene.大體的思路是通過全檢索,把資料庫中的所有文章資料,和lucene檔案建立起同步索引。

public static void Index(List<Article> rs, String lucenepath) {
    try {
        Directory directory = FSDirectory.open(new File(lucenepath));
        IndexWriter indexWriter = new IndexWriter(directory,LuceneUtils.analyzer,MaxFieldLength.LIMITED);
        for(Article article : rs) {
            Document doc = new Document();

            doc.add(new Field("id", article.getId(), Store.YES,
                    org.apache.lucene.document.Field.Index.ANALYZED));
            if(article.getContent()!= null){
                doc.add(new Field("content", article.getContent(), Store.YES,
                        org.apache.lucene.document.Field.Index.ANALYZED));
            }
            doc.add(new Field("title", article.getTitle(), Store.YES,
                    org.apache.lucene.document.Field.Index.ANALYZED));
            doc.add(new Field("column_info_id", article.getColumnInfo().getId(), Store.YES,
                    org.apache.lucene.document.Field.Index.ANALYZED));

            indexWriter.addDocument(doc);
        }
        indexWriter.optimize();
        indexWriter.close();
    } catch (IOException e) {
        System.out.println(e);
    }
}

建好索引之後就是檢索了

public static List<Article> seacher(String queryString, String lucenepath) {
    List<Article> articleList = new ArrayList<Article>();

    try {
        Directory directory = FSDirectory.open(new File(lucenepath));
        IndexSearcher is = new IndexSearcher(directory);
        MultiFieldQueryParser parser=new MultiFieldQueryParser(Version.LUCENE_30, new String[]{"title","content"},LuceneUtils.analyzer);
       /* QueryParser parser = new QueryParser(Version.LUCENE_30, "content",
                LuceneUtils.analyzer);*/
        Query query = parser.parse(queryString);
        //返回搜尋結果
        TopDocs docs = is.search(query, 100);

        ScoreDoc[] scoreDocs = docs.scoreDocs;

        for (ScoreDoc scoreDoc : scoreDocs) {
            int num = scoreDoc.doc;
            Document document = is.doc(num);
            Article article = DocumentUtils.document2Article(document);
            articleList.add(article);
        }
        //重複資料過濾
        articleList = articleList.stream().distinct()
                .collect(Collectors.toList());
        articleList.forEach(System.out::println);
    } catch (Exception e) {
        System.out.print(e);
    }
    return articleList;
}

這個時候一個簡單的lucene就寫好了

當然還有pom.xml引入,因為版本的原因這裡花了很長時間

<!--lucene-->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-memory</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>3.0.1</version>
</dependency>

<!--mmseg4j 的分析器的使用  -->
<!--lucene-->
<dependency>
    <groupId>com.chenlb.mmseg4j</groupId>
    <artifactId>mmseg4j-core</artifactId>
    <version>1.10.0</version>
</dependency>

關於分詞的問題也考慮了幾個不同的分詞器,後來決定用盤古

拿到資料之後就涉及到分頁的問題了,

 //查詢起始記錄位置
    int begin = DEFAULT_SIZE * (Integer.parseInt(pageStr) - 1) ;
    //查詢終止記錄位置
    int end = Math.min(begin + DEFAULT_SIZE, articleList.size());
    List<Article> articles = new ArrayList<Article>();
    //進行分頁查詢
    for(int i=begin;i<end;i++) {
        articles.add(articleList.get(i));
    }

    Map pageMap = new HashMap<>();
    pageMap.put("currentPage", pageStr);
    pageMap.put("pageSize", sizeStr);
    pageMap.put("totalCount", articleList.size());
    pageMap.put("totalPage", getTotalPage(articleList.size(), Integer.parseInt(sizeStr)));
    pageMap.put("pagination", getPagination(null) );
    pageMap.put("term", term);
    getJspContext().setAttribute(var, articles);
    getJspContext().setAttribute(varPage, pageMap);
    getJspBody().invoke(null);
}

這樣一個基本檢索分頁的功能就實現了。當然有很多的不足需要去優化比如高亮展示,提升檢索速度等