lucene4.8.0 + IKAnalyzer5.0.1 建立索引與查詢demo

阿新 • • 發佈：2019-02-05

主要程式碼：

建立索引：

public void createIndex(){
	
	try {
		// 有檔案系統或者記憶體儲存方式,這裡使用檔案系統儲存索引資料
		Directory directory = new SimpleFSDirectory(new File("C:\\myindex"));
		//例項化IKAnalyzer分詞器
		Analyzer analyzer = new IKAnalyzer(false);
		//配置IndexWriterConfig
		IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_48 , analyzer);
		indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
		IndexWriter indexWriter = new IndexWriter(directory , indexWriterConfig);
		//刪除全部索引
		indexWriter.deleteAll();
		
		//寫入索引
		Document doc = new Document();
		doc.add(new StringField("id", "1", Store.YES));
		doc.add(new TextField("title", "IKAnalyzer的介紹", Store.YES));
		doc.add(new TextField("content", "IK Analyzer是一個結合詞典分詞和文法分詞的中文分詞開源工具包。它使用了全新的正向迭代最細粒度切分演算法。", Store.YES));
		
		// 向IndexWriter中增加新的一行記錄
		indexWriter.addDocument(doc);
		// 提交資料內容
		indexWriter.commit();
		
		indexWriter.close();
		directory.close();
	} catch (Exception e) {
		e.printStackTrace();
	}
}

查詢+高亮：

public void search(){
	try {
		// 有檔案系統或者記憶體儲存方式,這裡使用檔案系統儲存索引資料
		Directory directory = new SimpleFSDirectory(new File("C:\\myindex"));
		IndexReader reader = DirectoryReader.open(directory);
		IndexSearcher searcher = new IndexSearcher(reader);
		
		Query query = new TermQuery(new Term("content","演算法"));
		
		String preTag = "<font color='red'>";
		String postTag = "</font>";
		Formatter formatter = new SimpleHTMLFormatter(preTag, postTag);
		
		Scorer fragmentScorer = new QueryScorer(query);
		Highlighter highlighter = new Highlighter(formatter, fragmentScorer);
		// 這個一般等於你要返回的，高亮的資料長度  
		highlighter.setTextFragmenter(new SimpleFragmenter(Integer.MAX_VALUE));
		
		TopDocs topDocs = searcher.search(query, 10);
		System.out.println("一共查到:" + topDocs.totalHits + "條記錄");

		//例項化IKAnalyzer分詞器
		Analyzer analyzer = new IKAnalyzer(false);
		ScoreDoc[] scoreDoc = topDocs.scoreDocs;
		for (int i = 0; i < scoreDoc.length; i++) {
			// 內部編號
			int docId = scoreDoc[i].doc;
			System.out.println("內部編號:" + docId);
			// 根據文件id找到文件
			Document doc = searcher.doc(docId);
			
			//String id = highlighter.getBestFragment(analyzer, "id", doc.get("id"));
			//String title = highlighter.getBestFragment(analyzer, "title", doc.get("title"));
			String content = highlighter.getBestFragment(analyzer, "content", doc.get("content"));
			
			//System.out.println("id:" + id + " title:" + title);
			System.out.println("content:" + content);
		}
		
		directory.close();
	} catch (Exception e) {
		e.printStackTrace();
	}
}

查詢結果：

IK Analyzer是一個結合詞典分詞和文法分詞的中文分詞開源工具包。它使用了全新的正向迭代最細粒度切分<font color='red'>演算法</font>。

索引可以用luke來檢視：

開啟cmd，進入luke所在目錄，輸入命令 java -jar lukeall-4.10.2.jar即可執行。

pom.xml中：

<!--Lucene -->
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-core</artifactId>
	<version>${lucene}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-highlighter</artifactId>
	<version>${lucene}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-memory</artifactId>
	<version>${lucene}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-queries</artifactId>
	<version>${lucene}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-queryparser</artifactId>
	<version>${lucene}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-analyzers-common</artifactId>
	<version>${lucene}</version>
</dependency>

IKAnalyzer.cfg.xml（在src/main/resources下）：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
	<comment>IK Analyzer 擴充套件配置</comment>
	<!--使用者可以在這裡配置自己的擴充套件字典 
	<entry key="ext_dict">/mydict.dic;</entry> 
	-->	
	<entry key="ext_dict">mydict.dic</entry> 
	<!--使用者可以在這裡配置自己的擴充套件停止詞字典
	<entry key="ext_stopwords">ext_stopword.dic</entry>-->
	
</properties>

lucene4.8.0 + IKAnalyzer5.0.1 建立索引與查詢demo

主要程式碼：建立索引： public void createIndex(){ try { // 有檔案系統或者記憶體儲存方式,這裡使用檔案系統儲存索引資料 Directory directory = new SimpleFSDirectory(new Fi

Kinect for Windows SDK v2.0 開發筆記 (四)玩家索引與綠屏技術

（轉載請註明出處）使用SDK: Kinect for Windows SDK v2.0 public preview CSND部落格出現了已經發布了，修改了一個字就變成“待稽核”狀態導致慢更了，不過反正幾乎沒人看也就影響為0啦。這次說的是Kinect的玩家索引(B

Elasticsearch 5.2.1 建立索引-自定義分詞器

開發語言：JAVA 解決問題：ES 5.2.1預設使用的 standard 分詞器，該分詞器是單個漢字進行分詞的，而需求使用二元分詞，故使用ngram 實現二元分詞通過mapping 將分詞器與要分詞的欄位進行對映示例程式碼： // DoubleAnalyzer

OPENSHIFT-280-1-建立使用者與授權

0.實驗環境的簡單介紹。lab install-post setup主要是配置好檔案（HOSTS配置和執行ANSIBLE的指令碼）。ansible-playbook -i inventory full_classroom_install.yml | grep TASK執行ANSIBLE指令碼，這裡篩

先插入資料再建立索引與先建立索引再插入資料的區別

表記錄越大，索引個數越多，差異越明顯。以前有過一個記錄。某表記錄有1億條左右，12個索引，刪除全部索引的插入速度和保留這12個索引的，插入速度百倍。其實原理很簡單，邊插入邊維護索引，開銷太大了。索引要小心的控制，我寫的一些檢查工具中，針對索引這個模組，就有制定過N個體檢規則，比如對單表索引個數超過8個的

先插入資料再建立索引與先建立索引再插入資料的區別

表記錄越大，索引個數越多，差異越明顯。以前有過一個記錄。某表記錄有1億條左右，12個索引，刪除全部索引的插入速度和保留這12個索引的，插入速度百倍。其實原理很簡單，邊插入邊維護索引，開銷太大了。索引要小心的控制，我寫的一些檢查工具中，針對索引這個模組，就有制定過N個體檢規則，比如對單表索引個數

Lucene簡單實現建立索引以及查詢

package com.rdz.test; import java.io.File; import java.io.FileReader; import java.io.IOException; import org.apache.lucene.analysis.Ana

4.Lucene3.案例介紹，建立索引，查詢等操作驗證

案例： Article.java package cn.toto.lucene.quickstart; publicclassArticle { privateint

（2.8）Mysql之SQL基礎——索引的查詢與刪除

sele rom db_name name test unique 查詢 img alt （2.8）Mysql之SQL基礎——索引的查詢與刪除 1、索引查詢（1）按庫查詢　　select * from information_schema.statistics wher

資料結構——排序與查詢（1）——排序與查詢簡介

排序與查詢排序，是指將一系列無序的記錄，通過某種方式或者演算法，將其變為有序的過程。如果排出來的順序是由小到大排列，我們就稱這種排序叫升序排序。如果是由大到小，我們就稱為降序排序。例如有一組資料：開始時為： 2 4 7 1 9 升序排序： 1 2 4 7 9 降序排序： 9 7

IR中python 寫倒排索引與查詢處理

學習資訊檢索課程，老師讓寫一個倒排索引與查詢處理的程式，於是抱著試試的心態自學python寫了出來。整個沒有什麼太大的演算法技巧，唯一的就是查詢處理那裡遞迴函式正反兩次反覆查詢需要多除錯下。資料結構： #-*-coding:utf-8-*- #!/usr/bin/pyt

MongoDB的索引與查詢優化

MongoDB的索引的機制與普通資料庫基本相似，主要有如下幾部分：單欄位索引 MongoDB預設為所有集合建立了一個_id欄位的單欄位索引，該索引唯一，且不能刪除（_id為集合的主鍵）索引的建立方法： db.customers.ensureInd

Lucene 實現txt檔案的構建索引與查詢

package net.jqsoft.hecv.util; import net.sf.json.JSONArray; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis

Ubuntu 18.04上CUDA 9.0、cuDNN7.0及Tensorflow 1.8的安裝

http amd64 時間 com ++ dnn 7 清華配置示例配置筆者使用Dell Inspiron 7559筆記本電腦，顯卡為NVIDIA GTX 960M。目標由於本機顯卡僅有nvidia-384驅動包能夠良好支持（nvidia-387、nvidia-3

Windows基礎環境_安裝配置教程（Windows7 64、JDK1.8、Android SDK23.0、TortoiseSVN 1.9.5）

tools 直接 x86_64 ase php JD network not using Windows基礎環境_安裝配置教程（Windows7 64、JDK1.8、Android SDK23.0、TortoiseSVN 1.9.5）安裝包版本 1) JDK版

eclipse4.7.0+maven3.3.9+scala2.11.8+spark2.1.0+hadoop2.7.1在ubuntu16裡的wordcount例項

刪掉src/test下的junit內容 pom.xml參考如下進行修改（確認好使） <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XM

TensorFlow版本更新（從1.0升到1.8），查詢版本

先設定pip下載優先選擇清華映象，這樣下載快很多 pip install pip -U pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple 安裝命令：對於 GPU 版本：（也可以用pip3）

paip jdk1 4 1 5 5 0 1 6 6 0 7 0 8 0特點比較與不同

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

Lucene7.0與HanLP分詞器整合索引資料庫建立索引檔案

HanLP官網：http://hanlp.linrunsoft.com/ GitHup地址：https://github.com/hankcs/HanLP HanLP外掛地址：https://github.com/hankcs/hanlp-lucene-plugin 需要一下ja

在centos7基於hadoop2.8.0安裝hive2.1.1注意點

安裝參考文章：安裝hive2.1.1連結我安裝hive2.1.1基本上就是看這個博主的，但是在安裝的時候出現一些錯誤，不知道博主為什麼可以安裝成功，我就會出現錯誤，所以寫這一篇就是記錄一下自己怎麼對博主安裝做了哪些改動的。 1.直接關閉防火牆，因為前面都沒有單獨開放埠，所以為了統一就統統

lucene4.8.0 + IKAnalyzer5.0.1 建立索引與查詢demo

相關推薦