Java搜尋工具——Lucene例項總結(一)

阿新 • • 發佈：2019-02-07

搞了一天半，終於利用lucene工具Demo完了我想要的功能，這其中包括為資料庫建立增量索引、從索引檔案根據id刪除索引、單欄位查詢功能、多欄位查詢功能、多條件查詢功能以及查詢結果關鍵字高亮顯示的功能。今天晚些的時候把這些功能進行了整理。看樣子一時半會還下不了班，就把Demo的結果一一列舉下來吧。。。

1. 所需要的檔案(見附件)

依賴包：

lucene-core-2.4.0.jar lucene工具包

lucene-highlighter-2.4.0.jar 高亮顯示工具包

IKAnalyzer2.0.2OBF.jar 分詞工具(支援字典分詞)

mysql-connector-java-5.0.3-bin 連結mysql驅動

資料表：

pd_ugc.sql(所在資料庫為lucenetest)

類檔案：

在附件index.rar和test.rar，解壓後放入java工程中的src下即可

2. 為資料庫建立增量索引

參考網頁：http://www.blogjava.net/laoding/articles/279230.html

Java程式碼

package index;
//--------------------- Change Logs----------------------
// <p>@author zhiqiang.zhang Initial Created at 2010-12-23<p>
//-------------------------------------------------------
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import

java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Date;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
//增量索引
/*
* 實現思路:首次查詢資料庫表所有記錄，對每條記錄建立索引，並將最後一條記錄的id儲存到storeId.txt檔案中
* 當新插入一條記錄時，再建立索引時不必再對所有資料重新建一遍索引，
* 可根據存放在storeId.txt檔案中的id查出新插入的資料，只對新增的資料新建索引，並把新增的索引追加到原來的索引檔案中
* */
public class IncrementIndex {
public static void main(String[] args) {
try {
IncrementIndex index = new IncrementIndex();
String path = "E:\\workspace2\\Test\\lucene_test\\poiIdext";//索引檔案的存放路徑
String storeIdPath = "E:\\workspace2\\Test\\lucene_test\\storeId.txt";//儲存ID的路徑
String storeId = "";
Date date1 = new Date();
storeId = index.getStoreId(storeIdPath);
ResultSet rs = index.getResult(storeId);
System.out.println("開始建立索引。。。。");
index.indexBuilding(path, storeIdPath, rs);
Date date2 = new Date();
System.out.println("耗時："+(date2.getTime()-date1.getTime())+"ms");
storeId = index.getStoreId(storeIdPath);
System.out.println(storeId);//打印出這次儲存起來的ID
} catch (Exception e) {
e.printStackTrace();
}
}
public static void buildIndex(String indexFile, String storeIdFile) {
try {
String path = indexFile;//索引檔案的存放路徑
String storeIdPath = storeIdFile;//儲存ID的路徑
String storeId = "";
storeId = getStoreId(storeIdPath);
ResultSet rs = getResult(storeId);
indexBuilding(path, storeIdPath, rs);
storeId = getStoreId(storeIdPath);
System.out.println(storeId);//打印出這次儲存起來的ID
} catch (Exception e) {
e.printStackTrace();
}
}
public static ResultSet getResult(String storeId) throws Exception {
Class.forName("com.mysql.jdbc.Driver").newInstance();
String url = "jdbc:mysql://localhost:3306/lucenetest";
String userName = "root";
String password = "****";
Connection conn = DriverManager.getConnection(url, userName, password);
Statement stmt = conn.createStatement();
String sql = "select * from pd_ugc";
ResultSet rs = stmt.executeQuery(sql + " where id > '" + storeId + "'order by id");
return rs;
}
public static boolean indexBuilding(String path, String storeIdPath, ResultSet rs) {
try {
Analyzer luceneAnalyzer = new StandardAnalyzer();
// 取得儲存起來的ID，以判定是增量索引還是重新索引
boolean isEmpty = true;
try {
File file = new File(storeIdPath);
if (!file.exists()) {
file.createNewFile();
}
FileReader fr = new FileReader(storeIdPath);
BufferedReader br = new BufferedReader(fr);
if (br.readLine() != null) {
isEmpty = false;
}
br.close();
fr.close();
} catch (IOException e) {
e.printStackTrace();
}
//isEmpty=false表示增量索引
IndexWriter writer = new IndexWriter(path, luceneAnalyzer, isEmpty);
String storeId = "";
boolean indexFlag = false;
String id;
String name;
String address;
String citycode;
while (rs.next()) {
id = rs.getInt("id") + "";
name = rs.getString("name");
address = rs.getString("address");
citycode = rs.getString("citycode");
writer.addDocument(Document(id, name, address, citycode));
storeId = id;//將拿到的id給storeId，這種拿法不合理，這裡為了方便
indexFlag = true;
}
writer.optimize();
writer.close();
if (indexFlag) {
// 將最後一個的ID存到磁碟檔案中
writeStoreId(storeIdPath, storeId);
}
return true;
} catch (Exception e) {
e.printStackTrace();
System.out.println("出錯了" + e.getClass() + "\n 錯誤資訊為: " + e.getMessage());
return false;
}
}
public static Document Document(String id, String name, String address, String citycode) {
Document doc = new Document();
doc.add(new Field("id", id, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED));//查詢欄位
doc.add(new Field("address", address, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("citycode", citycode, Field.Store.YES, Field.Index.TOKENIZED));//查詢欄位
return doc;
}
// 取得儲存在磁碟中的ID
public static String getStoreId(String path) {
String storeId = "";
try {
File file = new File(path);
if (!file.exists()) {
file.createNewFile();
}
FileReader fr = new FileReader(path);
BufferedReader br = new BufferedReader(fr);
storeId = br.readLine();
if (storeId == null || storeId == "") storeId = "0";
br.close();
fr.close();
} catch (Exception e) {
e.printStackTrace();
}
return storeId;
}
// 將ID寫入到磁碟檔案中
public static boolean writeStoreId(String path, String storeId) {
boolean b = false;
try {
File file = new File(path);
if (!file.exists()) {
file.createNewFile();
}
FileWriter fw = new FileWriter(path);
PrintWriter out = new PrintWriter(fw);
out.write(storeId);
out.close();
fw.close();
b = true;
} catch (IOException e) {
e.printStackTrace();
}
return b;
}
}

3. 索引操作

Java程式碼

package index;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocCollector;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleFragmenter;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.mira.lucene.analysis.IK_CAnalyzer;
public class IndexUtils {
//0. 建立增量索引
public static void buildIndex(String indexFile, String storeIdFile) {
IncrementIndex.buildIndex(indexFile, storeIdFile);
}
//1. 單欄位查詢
@SuppressWarnings("deprecation")
public static List<IndexResult> queryByOneKey(IndexSearcher indexSearcher, String field,
String key) {
try {
Date date1 = new Date();
QueryParser queryParser = new QueryParser(field, new StandardAnalyzer());
Query query = queryParser.parse(key);
Hits hits = indexSearcher.search(query);
Date date2 = new Date();
System.out.println("耗時：" + (date2.getTime() - date1.getTime()) + "ms");
List<IndexResult> list = new ArrayList<IndexResult>();
for (int i = 0; i < hits.length(); i++) {
list.add(getIndexResult(hits.doc(i)));
}
return list;
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
//2. 多條件查詢。這裡實現的是and操作
//注：要查詢的欄位必須是index的
//即doc.add(new Field("pid", rs.getString("pid"), Field.Store.YES,Field.Index.TOKENIZED));
@SuppressWarnings("deprecation")
public static List<IndexResult> queryByMultiKeys(IndexSearcher indexSearcher, String[] fields,
String[] keys) {
try {
BooleanQuery m_BooleanQuery = new BooleanQuery();
if (keys != null && keys.length > 0) {
for (int i = 0; i < keys.length; i++) {
QueryParser queryParser = new QueryParser(fields[i], new StandardAnalyzer());
Query query = queryParser.parse(keys[i]);
m_BooleanQuery.add(query, BooleanClause.Occur.MUST);//and操作
}
Hits hits = indexSearcher.search(m_BooleanQuery);
List<IndexResult> list = new ArrayList<IndexResult>();
for (int i = 0; i < hits.length(); i++) {
list.add(getIndexResult(hits.doc(i)));
}
return list;
}
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
//3.高亮顯示實現了單條件查詢
//可改造為多條件查詢
public static List<IndexResult> highlight(IndexSearcher indexSearcher, String key) {
try {
QueryParser queryParser = new QueryParser("name", new StandardAnalyzer());
Query query = queryParser.parse(key);
TopDocCollector collector = new TopDocCollector(800);
indexSearcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
Highlighter highlighter = null;
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color='red'>",
"</font>");
highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
highlighter.setTextFragmenter(new SimpleFragmenter(200));
List<IndexResult> list = new ArrayList<IndexResult>();
Document doc;
for (int i = 0; i < hits.length; i++) {
//System.out.println(hits[i].score);
doc = indexSearcher.doc(hits[i].doc);
TokenStream tokenStream = new StandardAnalyzer().tokenStream("name",
new StringReader(doc.get("name")));
IndexResult ir = getIndexResult(doc);
ir.setName(highlighter.getBestFragment(tokenStream, doc.get("name")));
list.add(ir);
}
return list;
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
//4. 多欄位查詢
@SuppressWarnings("deprecation")
public static List<IndexResult> queryByMultiFileds(IndexSearcher indexSearcher,
String[] fields, String key) {
try {
MultiFieldQueryParser mfq = new MultiFieldQueryParser(fields, new StandardAnalyzer());
Query query = mfq.parse(key);
Hits hits = indexSearcher.search(query);
List<IndexResult> list = new ArrayList<IndexResult>();
for (int i = 0; i < hits.length(); i++) {
list.add(getIndexResult(hits.doc(i)));
}
return list;
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
//5. 刪除索引
public static void deleteIndex(String indexFile, String id) throws CorruptIndexException,
IOException {
IndexReader indexReader = IndexReader.open(indexFile);
indexReader.deleteDocuments(new Term("id", id));
indexReader.close();
}
//6. 一元分詞
@SuppressWarnings("deprecation")
public static String Standard_Analyzer(String str) {
Analyzer analyzer = new StandardAnalyzer();
Reader r = new StringReader(str);
StopFilter sf = (StopFilter) analyzer.tokenStream("", r);
System.out.println("=====StandardAnalyzer====");
System.out.println("分析方法：預設沒有詞只有字（一元分詞）");
Token t;
String results = "";
try {
while ((t = sf.next()) != null) {
System.out.println(t.termText());
results = results + " " + t.termText();
}
} catch (IOException e) {
e.printStackTrace();
}
return results;
}
//7. 字典分詞
@SuppressWarnings("deprecation")
public static String ik_CAnalyzer(String str) {
Analyzer analyzer = new IK_CAnalyzer();
Reader r = new StringReader(str);
TokenStream ts = (TokenStream) analyzer.tokenStream("", r);
System.out.println("=====IK_CAnalyzer====");
System.out.println("分析方法:字典分詞,正反雙向搜尋");
Token t;
String results = "";
try {
while ((t = ts.next()) != null) {
System.out.println(t.termText());
results = results + " " + t.termText();
}
} catch (IOException e) {
e.printStackTrace();
}
return results;
}
//在結果中搜索
public static void queryFromResults() {
}
//組裝物件
public static IndexResult getIndexResult(Document doc) {
IndexResult ir = new IndexResult();
ir.setId(doc.get("id"));
ir.setName(doc.get("name"));
ir.setAddress(doc.get("address"));
ir.setCitycode(doc.get("citycode"));
return ir;
}
}

查詢索引結果物件：IndexResult

Java程式碼

package index;
public class IndexResult {
private String id;
private String name;
private String address;
private String citycode;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getAddress() {
return address;
}
public void setAddress(String address) {
this.address = address;
}
public String getCitycode() {
return citycode;
}
public void setCitycode(String citycode) {
this.citycode = citycode;
}
}

4. 測試類

Java程式碼

package test;
/**
* $Id$
* Copyright 2009-2010 Oak Pacific Interactive. All rights reserved.
*/
import index.IndexResult;
import index.IndexUtils;
import java.util.Date;
import java.util.List;
import org.apache.lucene.search.IndexSearcher;
public class Test {
//存放索引檔案
private static String indexFile = "E:\\workspace2\\Test\\lucene_test\\poiIdext";
//存放id
private static String storeIdFile = "E:\\workspace2\\Test\\lucene_test\\storeId.txt";
public static void main(String[] args) throws Exception {
//0. 建立增量索引
IndexUtils.buildIndex(indexFile, storeIdFile);
IndexSearcher indexSearcher = new IndexSearcher(indexFile);
String key = IndexUtils.ik_CAnalyzer("靜安中心");
//1.單欄位查詢
Date date1 = new Date();
List<IndexResult> list = IndexUtils.queryByOneKey(indexSearcher, "name", key);
Date date2 = new Date();
System.out.println("耗時：" + (date2.getTime() - date1.getTime()) + "ms\n" + list.size()
+ "條=======================================單欄位查詢");
//printResults(list);
//2.多條件查詢
String[] fields = { "name", "citycode" };
String[] keys = { IndexUtils.ik_CAnalyzer("靜安中心"), "0000" };
date1 = new Date();
list = IndexUtils.queryByMultiKeys(indexSearcher, fields, keys);
date2 = new Date();
System.out.println("耗時：" + (date2.getTime() - date1.getTime()) + "ms\n" + list.size()
+ "條\n===============================多條件查詢");
printResults(list);
//3.高亮顯示單欄位查詢
System.out.println("\n\n");
date1 = new Date();
list = IndexUtils.highlight(indexSearcher, key);
date2 = new Date();
System.out.println("耗時：" + (date2.getTime() - date1.getTime()) + "ms\n" + list.size()
+ "條\n======================================高亮顯示");
// printResults(list);
//4. 多欄位查詢
date1 = new Date();
list = IndexUtils.queryByMultiFileds(indexSearcher, fields, key);
date2 = new Date();
System.out.println("耗時：" + (date2.getTime() - date1.getTime()) + "ms\n" + list.size()
+ "條\n=====================================多欄位查詢");
// printResults(list);
//5. 刪除索引中的欄位根據id進行刪除
IndexUtils.deleteIndex(indexFile, "123");
}
//列印結果
public static void printResults(List<IndexResult> list) {
if (list != null && list.size() > 0) {
for (int i = 0; i < list.size(); i++) {
System.out.println(list.get(i).getId() + "," + list.get(i).getName() + ","
+ list.get(i).getAddress() + "," + list.get(i).getCitycode()+"--->"+i);
}
}
}
}

5. 其它

全文索引：

目前的情況是，搜尋hello,"hello world"、"hi hello, how are you"但"worldhello"顯示不出來

預設情況下，QueryParser不支援萬用字元打頭的查詢（如，*ook）。不過在Lucene 2.1版本以後，他們可以通過呼叫QueryParser.setAllowLeadingWildcard( true )的 方法開啟這一功能。注意，這是一個開銷很大的操作：它需要掃描索引中全部記號的列表，來尋找匹配這個模式的詞。(譯註：高效支援這種字尾查詢的辦法是，建立反序的記號表，Lucene沒有實現這一模式。)http://www.codechina.org/faq/show/42/

支援空格分詞搜尋："廁所 26 瀋陽" 這是三個詞

不支援：“廁所瀋陽”這是一個詞

http://www.codechina.org/faq/show/63/

可以。主要有兩種做法：

使用QueryFilter把第一個查詢當作一個過濾器處理。（你可以在Lucene的郵件列表裡面搜尋 QueryFilter， Doug Cutting（Lucene的最初作者）反對這種做法。）
用BooleanQuery把前後兩個查詢結合起來，前一個查詢使用 required選項。

我們推薦使用BooleanQuery的方法。

============

// 建立標準文字分析器，標準的是可以支援的中文的

Analyzer luceneAnalyzer = new StandardAnalyzer();

indexWriter = new IndexWriter(indexDir, luceneAnalyzer, true);

// 可以說是建立一個新的寫入工具

// 第一個引數是要索引建立在哪個目錄裡

// 第二個引數是新建一個文字分析器,這裡用的是標準的大家也可以自己寫一個

// 第三個引數如果是true，在建立索引之前先將c: \\index目錄清空

poi_data_ugc搜尋中，索引放在記憶體裡還是磁碟上？？？？

針對於lucene使用和優化

http://hi.baidu.com/lewutian/blog/item/48a86d03de58b984d43f7c1b.html

ucene入門例項(1):索引文字檔案

http://www.java3z.com/cwbwebhome/article/article5/51021.html

Java搜尋工具——Lucene例項總結(一)

Java搜尋工具——Lucene例項總結(一)

Elasticsearch Java Rest Client API 整理總結 (一)

Java原始碼分析——java.util工具包解析（一）——ArrayList、LinkedList、Vector類解析

Elasticsearch Java Rest Client API 整理總結 (一)——Document API

java常用工具類元件總結

黑馬程式設計師——Java高新技術之反射學習總結一

java多執行緒程式設計總結(一)

深入理解Java虛擬機器總結一虛擬機器效能監控工具與效能調優(三)

併發程式設計學習總結(一) ：java 建立執行緒的三種方式的優缺點和例項

java-----NIO總結(一)

201621123021《JAVA程序設計》第十一周學習總結

201621123080《Java程序設計》第十一周學習總結

剛參加工作一年的JAVA程序員的年終總結

JAVA多線程提高十一:同步工具Exchanger

閱讀java編程思想的總結（一）

Java消息隊列總結只需一篇解決ActiveMQ、RabbitMQ、ZeroMQ、Kafka

java webservice - cxf使用總結一

java IO 位元組流、字元流操作總結一之File類

java基礎知識總結一

微信小程式（看文件寫例項十一）微信小程式課堂寶APP完結總結及github地址

Java搜尋工具——Lucene例項總結(一)

相關推薦