1. 程式人生 > >solr的搭建與配置

solr的搭建與配置

存儲 resource dual 需求 fields hat 分析器 \n AC

搭建

1、下載solr壓縮包solr-7.2.1.tgz。

2、解壓solr-7.2.1.tgz包。

3、復制solr-7.2.1/server/solr-webapp目錄下的webapp文件夾重命名為solr,並復制到jetty/webapps目錄下。

4、server/lib/metrics* 開頭的5個jar復制到 /usr/local/jetty/webapps/solr/WEB-INF/lib/下。

5、server/lib/ext/下的所有jar復制到 /usr/local/jetty/webapps/solr/WEB-INF/lib/下,都是些日誌相關的jar包。

6、solr-7.2.1/dist/solr-dataimporthandler-*的jar復制到 /usr/local/jetty/webapps/solr/WEB-INF/lib/下

7、將server下的solr目錄復制到/usr/local/jetty/webapps/solr/WEB-INF下。

8、修改solr工程中的web.xml文件,該部分代碼初始時是被註掉的,需要取消註釋,將路徑改為第7步創建的solrhome的路徑:

<env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>/usr/local/jetty/webapps/solr/WEB-INF/solr</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>

9、啟動jetty

配置

一、schema.xml

<schema name="example" version="1.2">    
  <types>    
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>    
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>    
    <fieldtype name="binary" class="solr.BinaryField"/>    
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true"     
                                                                positionIncrementGap="0"/>    
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true"     
                                                                positionIncrementGap="0"/>    
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true"     
                                                                positionIncrementGap="0"/>    
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true"     
                                                                positionIncrementGap="0"/>  
    <fieldType name="name_n_gram" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="com.kingdee.lucene.analysis.ScriptTokenFilterFactory" />
            <filter class="com.kingdee.lucene.analysis.NGramExtensionTokenFilterFactory"
minGramSize="1" maxGramSize="40" />
        <filter
class="com.kingdee.lucene.analysis.MultipleSpellingTokenFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>
  ...    
  </types>    
  ...    
</schema>

1、schema名字

<schema name="example" version="1.2">

  • name:標識這個schema的名字
  • version:現在版本是1.2

2、fieldType

solr.StrField類型

  <fieldTypename="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />

  • name:標識而已。
  • class和其他屬性決定了這個fieldType的實際行為。(class以solr開始的,都是在org.appache.solr.analysis包下)

  可選的屬性:

  • sortMissingLast和sortMissingFirst兩個屬性是用在可以內在使用String排序的類型上(包括:string,boolean,sint,slong,sfloat,sdouble,pdate)。
  • sortMissingLast="true",沒有該field的數據排在有該field的數據之後,而不管請求時的排序規則。
  • sortMissingFirst="true",跟上面倒過來唄。
  • 2個值默認是設置成false

  StrField類型不被分析,而是被逐字地索引/存儲。

solr.TextField

  solr.TextField 允許用戶通過分析器來定制索引和查詢,分析器包括 一個分詞器(tokenizer)和多個過濾器(filter)。

  positionIncrementGap:可選屬性,定義在同一個文檔中此類型數據的空白間隔,避免短語匹配錯誤。

  <analyzer type="" isMaxWordLength=“”>為分詞器,當type=index時表示在添加索引時使用的分詞器,當type="query"時表示在查詢時需要使用的分詞器。

  isMaxWordLength是指分詞的細粒度,可以分別制定index索引和query查詢的分詞細粒度,建議將index的isMaxWordLength設置為false,這樣就采用最細分詞,是索引更精確,查詢時盡量能匹配,而將query的isMaxWordLength設置為true,采用最大分詞,這樣能夠使查詢出來的結果更符合用戶的需求。

  <tokenizer class="solr.WhitespaceTokenizerFactory" />所使用的分詞器,如solr.WhitespaceTokenizerFactory就是空格分詞。

  <filter class="solr.LowerCaseFilterFactory" />為所使用的過濾器,如solr.StopFilterFactory,solr.WordDelimiterFilterFactory,solr.LowerCaseFilterFactory,solr.EnglishPorterFilterFactory,solr.RemoveDuplicatesTokenFilterFactory 這幾個過濾器。在使用分詞器後才使用過濾器。在向索引庫中添加text類型的索引的時候,Solr會首先用空格進行分詞,然後把分詞結果依次使用指定的過濾器進行過濾,最後剩下的結果才會加入到索引庫中以備查詢。

3、Fields

  filed定義包括name,type(為之前定義過的各種FieldType),indexed(是否被索引),stored(是否被儲存),multiValued(是否有多個值)等等。field的定義相當重要,有幾個技巧需註意一下,對可能存在多值得字段盡量設置 multiValued屬性為true,避免建索引是拋出錯誤;如果不需要存儲相應字段值,盡量將stored屬性設為false。

二、solrconfig.xml

solrconfig.xml,主要定義solr的處理程序(handler)和一些擴展程序,包括索引數據的存放 位置,更新,刪除,查詢的一些規則配置。

<?xml version="1.0" encoding="UTF-8" ?>
<config>
	<luceneMatchVersion>LUCENE_42</luceneMatchVersion>
	<dataDir>${solr.data.dir:}</dataDir>
	<directoryFactory name="DirectoryFactory"
		class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}" />
    <codecFactory class="solr.SchemaCodecFactory"/>

	<indexConfig>
		<filter class="solr.LimitTokenCountFilterFactory"
			maxTokenCount="10000" />
		<writeLockTimeout>1000</writeLockTimeout>
		<maxIndexingThreads>8</maxIndexingThreads>
		<useCompoundFile>false</useCompoundFile>
		<ramBufferSizeMB>32</ramBufferSizeMB>
		<maxBufferedDocs>1000</maxBufferedDocs>
		<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
			<int name="maxMergeAtOnce">10</int>
			<int name="segmentsPerTier">10</int>
		</mergePolicy>
		<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler" />
		<lockType>native</lockType>
		<unlockOnStartup>false</unlockOnStartup>
		<termIndexInterval>128</termIndexInterval>
		<reopenReaders>true</reopenReaders>
		<deletionPolicy class="solr.SolrDeletionPolicy">
			<str name="maxCommitsToKeep">1</str>
			<str name="maxOptimizedCommitsToKeep">0</str>
			<str name="maxCommitAge">30MINUTES</str>
			<str name="maxCommitAge">1DAY</str>
		</deletionPolicy>
		<infoStream file="INFOSTREAM.txt">false</infoStream>
	</indexConfig>

	<updateHandler class="solr.DirectUpdateHandler2">
		<autoCommit>
			<maxTime>15000</maxTime>
			<openSearcher>false</openSearcher>
		</autoCommit>
		<autoSoftCommit>
			<maxTime>1000</maxTime>
		</autoSoftCommit>
		<updateLog>
			<str name="dir">${solr.data.dir:}</str>
		</updateLog>
	</updateHandler>

	<query>
		<maxBooleanClauses>1024</maxBooleanClauses>
		<filterCache class="solr.FastLRUCache" size="512"
			initialSize="512" autowarmCount="0" />
		<queryResultCache class="solr.LRUCache" size="512"
			initialSize="512" autowarmCount="0" />
		<documentCache class="solr.LRUCache" size="512"
			initialSize="512" autowarmCount="0" />
		<fieldValueCache class="solr.FastLRUCache" size="512"
			autowarmCount="128" showItems="32" />
		<enableLazyFieldLoading>true</enableLazyFieldLoading>
		<queryResultWindowSize>60</queryResultWindowSize>
		<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
		<listener event="newSearcher" class="solr.QuerySenderListener">
			<arr name="queries">
			</arr>
		</listener>
		<listener event="firstSearcher" class="solr.QuerySenderListener">
			<arr name="queries">
			</arr>
		</listener>
		<useColdSearcher>false</useColdSearcher>
		<maxWarmingSearchers>4</maxWarmingSearchers>
	</query>

	<requestDispatcher handleSelect="false">
		<requestParsers enableRemoteStreaming="false"
			multipartUploadLimitInKB="2048000" />
		<httpCaching never304="true" />
	</requestDispatcher>

	<requestHandler name="/select" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="echoParams">explicit</str>
			<int name="rows">10</int>
		</lst>
	</requestHandler>

	<!-- A request handler that returns indented JSON by default -->
	<requestHandler name="/query" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="echoParams">explicit</str>
			<str name="wt">json</str>
			<str name="indent">true</str>
			<str name="df">text</str>
		</lst>
	</requestHandler>

	<requestHandler name="/get" class="solr.RealTimeGetHandler">
		<lst name="defaults">
			<str name="omitHeader">true</str>
			<str name="wt">json</str>
			<str name="indent">true</str>
		</lst>
	</requestHandler>

	<requestHandler name="/update" class="solr.UpdateRequestHandler">
	</requestHandler>

	<!-- Solr Cell Update Request Handler http://wiki.apache.org/solr/ExtractingRequestHandler -->
	<requestHandler name="/update/extract" startup="lazy"
		class="solr.extraction.ExtractingRequestHandler">
		<lst name="defaults">
			<str name="lowernames">true</str>
			<str name="uprefix">ignored_</str>

			<!-- capture link hrefs but ignore div attributes -->
			<str name="captureAttr">true</str>
			<str name="fmap.a">links</str>
			<str name="fmap.div">ignored_</str>
		</lst>
	</requestHandler>

	<requestHandler name="/analysis/field" startup="lazy"
		class="solr.FieldAnalysisRequestHandler" />
	<requestHandler name="/analysis/document"
		class="solr.DocumentAnalysisRequestHandler" startup="lazy" />

	<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
	<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
		<lst name="invariants">
			<str name="q">solrpingquery</str>
		</lst>
		<lst name="defaults">
			<str name="echoParams">all</str>
		</lst>
	</requestHandler>

	<!-- Echo the request contents back to the client -->
	<requestHandler name="/debug/dump" class="solr.DumpRequestHandler">
		<lst name="defaults">
			<str name="echoParams">explicit</str>
			<str name="echoHandler">true</str>
		</lst>
	</requestHandler>
	<requestHandler name="/replication" class="solr.ReplicationHandler" />

	<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
		<str name="queryAnalyzerFieldType">textSpell</str>
		<!-- Multiple "Spell Checkers" can be declared and used by this component -->
		<!-- a spellchecker built from a field of the main index -->
		<lst name="spellchecker">
			<str name="name">default</str>
			<str name="field">name</str>
			<str name="classname">solr.DirectSolrSpellChecker</str>
			<!-- the spellcheck distance measure used, the default is the internal 
				levenshtein -->
			<str name="distanceMeasure">internal</str>
			<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
			<float name="accuracy">0.5</float>
			<!-- the maximum #edits we consider when enumerating terms: can be 1 or 
				2 -->
			<int name="maxEdits">2</int>
			<!-- the minimum shared prefix when enumerating terms -->
			<int name="minPrefix">1</int>
			<!-- maximum number of inspections per result. -->
			<int name="maxInspections">5</int>
			<!-- minimum length of a query term to be considered for correction -->
			<int name="minQueryLength">4</int>
			<!-- maximum threshold of documents a query term can appear to be considered 
				for correction -->
			<float name="maxQueryFrequency">0.01</float>
		</lst>

		<!-- a spellchecker that can break or combine words. See "/spell" handler 
			below for usage -->
		<lst name="spellchecker">
			<str name="name">wordbreak</str>
			<str name="classname">solr.WordBreakSolrSpellChecker</str>
			<str name="field">name</str>
			<str name="combineWords">true</str>
			<str name="breakWords">true</str>
			<int name="maxChanges">10</int>
		</lst>
	</searchComponent>

	<requestHandler name="/spell" class="solr.SearchHandler"
		startup="lazy">
		<lst name="defaults">
			<str name="df">text</str>
			<!-- Solr will use suggestions from both the ‘default‘ spellchecker and 
				from the ‘wordbreak‘ spellchecker and combine them. collations (re-written 
				queries) can include a combination of corrections from both spellcheckers -->
			<str name="spellcheck.dictionary">default</str>
			<str name="spellcheck.dictionary">wordbreak</str>
			<str name="spellcheck">on</str>
			<str name="spellcheck.extendedResults">true</str>
			<str name="spellcheck.count">10</str>
			<str name="spellcheck.alternativeTermCount">5</str>
			<str name="spellcheck.maxResultsForSuggest">5</str>
			<str name="spellcheck.collate">true</str>
			<str name="spellcheck.collateExtendedResults">true</str>
			<str name="spellcheck.maxCollationTries">10</str>
			<str name="spellcheck.maxCollations">5</str>
		</lst>
		<arr name="last-components">
			<str>spellcheck</str>
		</arr>
	</requestHandler>

	<!-- Term Vector Component http://wiki.apache.org/solr/TermVectorComponent -->
	<searchComponent name="tvComponent" class="solr.TermVectorComponent" />
	<!-- A request handler for demonstrating the term vector component This 
		is purely as an example. In reality you will likely want to add the component 
		to your already specified request handlers. -->
	<requestHandler name="/tvrh" class="solr.SearchHandler"
		startup="lazy">
		<lst name="defaults">
			<str name="df">text</str>
			<bool name="tv">true</bool>
		</lst>
		<arr name="last-components">
			<str>tvComponent</str>
		</arr>
	</requestHandler>

	<requestHandler name="/dataimport"
		class="org.apache.solr.handler.dataimport.DataImportHandler">
		<lst name="defaults">
			<str name="config">data-config.xml</str>
		</lst>
	</requestHandler>

	<searchComponent name="clustering"
		enable="${solr.clustering.enabled:false}" class="solr.clustering.ClusteringComponent">
		<!-- Declare an engine -->
		<lst name="engine">
			<!-- The name, only one can be named "default" -->
			<str name="name">default</str>

			<!-- Class name of Carrot2 clustering algorithm. Currently available algorithms 
				are: * org.carrot2.clustering.lingo.LingoClusteringAlgorithm * org.carrot2.clustering.stc.STCClusteringAlgorithm 
				* org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm See http://project.carrot2.org/algorithms.html 
				for the algorithm‘s characteristics. -->
			<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
			</str>

			<!-- Overriding values for Carrot2 default algorithm attributes. For a 
				description of all available attributes, see: http://download.carrot2.org/stable/manual/#chapter.components. 
				Use attribute key as name attribute of str elements below. These can be further 
				overridden for individual requests by specifying attribute key as request 
				parameter name and attribute value as parameter value. -->
			<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>

			<!-- Location of Carrot2 lexical resources. A directory from which to 
				load Carrot2-specific stop words and stop labels. Absolute or relative to 
				Solr config directory. If a specific resource (e.g. stopwords.en) is present 
				in the specified dir, it will completely override the corresponding default 
				one that ships with Carrot2. For an overview of Carrot2 lexical resources, 
				see: http://download.carrot2.org/head/manual/#chapter.lexical-resources -->
			<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>

			<!-- The language to assume for the documents. For a list of allowed values, 
				see: http://download.carrot2.org/stable/manual/#section.attribute.lingo.MultilingualClustering.defaultLanguage -->
			<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
		</lst>
		<lst name="engine">
			<str name="name">stc</str>
			<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
		</lst>
	</searchComponent>

	<searchComponent name="terms" class="solr.TermsComponent" />
	<searchComponent class="solr.HighlightComponent" name="highlight">
		<highlighting>
			<!-- Configure the standard fragmenter -->
			<!-- This could most likely be commented out in the "default" case -->
			<fragmenter name="gap" default="true"
				class="solr.highlight.GapFragmenter">
				<lst name="defaults">
					<int name="hl.fragsize">100</int>
				</lst>
			</fragmenter>

			<!-- A regular-expression-based fragmenter (for sentence extraction) -->
			<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
				<lst name="defaults">
					<!-- slightly smaller fragsizes work better because of slop -->
					<int name="hl.fragsize">70</int>
					<!-- allow 50% slop on fragment sizes -->
					<float name="hl.regex.slop">0.5</float>
					<!-- a basic sentence pattern -->
					<str name="hl.regex.pattern">[-\w ,/\n\"‘]{20,200}</str>
				</lst>
			</fragmenter>

			<!-- Configure the standard formatter -->
			<formatter name="html" default="true"
				class="solr.highlight.HtmlFormatter">
				<lst name="defaults">
					<str name="hl.simple.pre"><![CDATA[<em class="highlight">]]></str>
					<str name="hl.simple.post"><![CDATA[</em>]]></str>
				</lst>
			</formatter>

			<!-- Configure the standard encoder -->
			<encoder name="html" class="solr.highlight.HtmlEncoder" />

			<!-- Configure the standard fragListBuilder -->
			<fragListBuilder name="simple"
				class="solr.highlight.SimpleFragListBuilder" />

			<!-- Configure the single fragListBuilder -->
			<fragListBuilder name="single"
				class="solr.highlight.SingleFragListBuilder" />

			<!-- Configure the weighted fragListBuilder -->
			<fragListBuilder name="weighted" default="true"
				class="solr.highlight.WeightedFragListBuilder" />

			<!-- default tag FragmentsBuilder -->
			<fragmentsBuilder name="default" default="true"
				class="com.kingdee.solr.highlight.ScoreOrderFragmentsBuilder">
				<!-- <lst name="defaults"> <str name="hl.multiValuedSeparatorChar">/</str> 
					</lst> -->
			</fragmentsBuilder>

			<boundaryScanner name="default" default="true"
				class="solr.highlight.SimpleBoundaryScanner">
				<lst name="defaults">
					<str name="hl.bs.maxScan">10</str>
					<str name="hl.bs.chars">.,!? 	

</str>
				</lst>
			</boundaryScanner>

			<boundaryScanner name="breakIterator"
				class="solr.highlight.BreakIteratorBoundaryScanner">
				<lst name="defaults">
					<!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
					<str name="hl.bs.type">WORD</str>
					<!-- language and country are used when constructing Locale object. -->
					<!-- And the Locale object will be used when getting instance of BreakIterator -->
					<str name="hl.bs.language">en</str>
					<str name="hl.bs.country">US</str>
				</lst>
			</boundaryScanner>
		</highlighting>
	</searchComponent>

	<queryResponseWriter name="xml" default="true"
		class="solr.XMLResponseWriter" />
	<queryResponseWriter name="json" class="solr.JSONResponseWriter" />
	<queryResponseWriter name="python"
		class="solr.PythonResponseWriter" />
	<queryResponseWriter name="ruby" class="solr.RubyResponseWriter" />
	<queryResponseWriter name="php" class="solr.PHPResponseWriter" />
	<queryResponseWriter name="phps"
		class="solr.PHPSerializedResponseWriter" />
	<queryResponseWriter name="csv" class="solr.CSVResponseWriter" />
	<queryResponseWriter name="velocity"
		class="solr.VelocityResponseWriter" startup="lazy" />
	<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
		<int name="xsltCacheLifetimeSeconds">5</int>
	</queryResponseWriter>

	<!-- Legacy config for the admin interface -->
	<admin>
		<defaultQuery>*:*</defaultQuery>
	</admin>
</config>

1.1.datadir 節點

<dataDir>${solr.data.dir:d:/Server/Solr/data}</dataDir> 定義了索引數據和日 誌文件的存放位置

1.2.luceneMatchVersion

<luceneMatchVersion>4.10.1</luceneMatchVersion> 表 示 solr 底 層 使 用 的 是 lucene4.8

1.3.lib

<lib dir="../../../contrib/extraction/lib"regex=".*\.jar"/> 表示 solr 引用包的位置, 當 dir 對應的目錄不存在時候,會忽略此屬性

1.4.directoryFactory

索引存儲方案,共有以下存儲方案

1、 solr.StandardDirectoryFactory,這是一個基於文件系統存儲目錄的工廠,它會試 圖選擇最好的實現基於你當前的操作系統和 Java 虛擬機版本。

2、 solr.SimpleFSDirectoryFactory,適用於小型應用程序,不支持大數據和多線程。

3、 solr.NIOFSDirectoryFactory,適用於多線程環境,但是不適用在 windows 平臺 (很慢),是因為 JVM 還存在 bug。

4、 solr.MMapDirectoryFactory,這個是 solr3.1 到 4.0 版本在 linux64 位系統下默認 的實現。它是通過使用虛擬內存和內核特性調用 mmap 去訪問存儲在磁盤中 的索引文件。它允許 lucene 或 solr 直接訪問 I/O 緩存。如果不需要近實時搜 索功能,使用此工廠是個不錯的方案。

5、 solr.NRTCachingDirectoryFactory,此工廠設計目的是存儲部分索引在內存中, 從而加快了近實時搜索的速度。

6、 solr.RAMDirectoryFactory,這是一個內存存儲方案,不能持久化存儲,在系統 重啟或服務器 crash 時數據會丟失。且不支持索引復制。

1.5.codecFactory

編解碼工廠允許使用自定義的編解碼器。例如:如果想啟動 per-fieldDocValues 格式, 可以在 solrconfig.xml 裏面設置 SchemaCodecFactory: docValuesFormat="Lucene42": 這是默認設置,所有數據會被加載到堆內存中。 docValuesFormat="Disk": 這是另外一個實現,將部分數據存儲在磁盤上。 docValuesFormat="SimpleText": 文本格式,非常慢,用於學習。

1.6.indexconfig

用於設置索引的低級別的屬性

1、<filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="10000"/>//限 制 token 最大長度

2、<writeLockTimeout>1000</writeLockTimeout>//IndexWriter 等待解鎖的最長時 間(毫秒)。

3、<maxIndexingThreads>8</maxIndexingThreads>//

4、<useCompoundFile>false</useCompoundFile>//solr 默認為 false。如果為 true, 索引文件減少,檢索性能降低,追求平衡。

5、<ramBufferSizeMB>100</ramBufferSizeMB>//緩存

6、<maxBufferedDocs>1000</maxBufferedDocs>//同上。兩個同時定義時命中較低 的那個。

7、<mergePolicyclass="org.apache.lucene.index.TieredMergePolicy"> <intname="maxMergeAtOnce">10</int> <intname="segmentsPerTier">10</int> </mergePolicy> //合並策略。

8、<mergeFactor>10</mergeFactor>//合並因子,每次合並多少個 segments。

9 、 <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>//合並調度器。

10、<lockType>${solr.lock.type:native}</lockType>//鎖工廠。

11、<unlockOnStartup>false</unlockOnStartup>//是否啟動時先解鎖。

12、<termIndexInterval>128</termIndexInterval>//Lucene loadsterms intomemory 間隔

13、<reopenReaders>true</reopenReaders>//重新打開,替代先關閉-再打開。

14、<deletionPolicy class="solr.SolrDeletionPolicy">//提交刪除策略,必須實現 org.apache.lucene.index.IndexDeletionPolicy

15、<strname="maxCommitsToKeep">1</str>

16、<strname="maxOptimizedCommitsToKeep">0</str>

17 、 <str name="maxCommitAge">30MINUTES</str> OR <str name="maxCommitAge">1DAY</str><br>

18、 <infoStream file="INFOSTREAM.txt">false</infoStream>//相當於把創建索 引時的日誌輸出。

<lockType>${solr.lock.type:native}</lockType>

設置索引庫的鎖方式,主要有三種:

1. single:適用於只讀的索引庫,即索引庫是定死的,不會再更改

2. native:使用本地操作系統的文件鎖方式,不能用於多個 solr 服務共用同一個 索引庫。Solr3.6 及後期版本使用的默認鎖機制。

3. simple:使用簡單的文件鎖機制

1.7.updateHandler

<updateLog> <strname="dir">${solr.ulog.dir:}</str> </updateLog> 設置索引庫更新日誌,默認路徑為 solrhome 下面的 data/tlog。

隨著索引庫的頻 繁更新,tlog 文件會越來越大,所以建議提交索引時采用硬提交方式 <autoCommit>,即批量提交。

自動硬提交方式:maxTime:設置多長時間提交一次 maxDocs:

設置達到多少文檔 提交一次 openSearcher:文檔提交後是否開啟新的 searcher,如果 false,文檔只是 提交到 index 索引庫,搜索結果中搜不到此次提交的文檔;如果 true,既提交到 index索引庫,也能在搜索結果中搜到此次提交的內容。

1.8.Query 查詢

<maxBooleanClauses>1024</maxBooleanClauses>
設置 boolean 查詢中,最大條件數。在範圍搜索或者前綴搜索時,會產生大量的 boolean 條件,如果條件數達到這個數值時,將拋出異常,限制這個條件數,可 以防止條件過多查詢等待時間過長。

1.9.RequestDispatcher 請求轉發器

<!--RequestDispatcher 主要是介紹當有請求訪問 SolrCore 時 SolrDispatchFilter 如何處理。 handleSelect 是一個以前版本中遺留下來的屬性,會影響請求的對應行為(比 如/select?qt=XXX)。 當 handleSelect="true"時導致 SolrDispatchFilter 將請求轉發給 qt 指定的處理 器(前提是/select 已經註冊)。 當 handleSelect="false"時會直接訪問/select,若/select 未註冊則為 404。 --> <requestDispatcherhandleSelect="false">
<!--RequestParsing:請求解析 這些設置說明 SolrRequests 如何被解析,以及對 ContentStreams 有什麽 限制。
enableRemoteStreaming- 是否允許使用stream.file和stream.url參數來 指定遠程 streams。
multipartUploadLimitInKB- 指定多文件上傳時 Solr 允許的最大的 size。
formdataUploadLimitInKB- 表單通過 POST 請求發送的最大 size --> <requestParsersenableRemoteStreaming="true" multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048"/>
<!--HTTPCaching 設置 HTTP 緩存的相關參數。 --> <httpCachingnever304="true"/>
<!-<httpCachingnever304="true"> <cacheControl>max-age=30,public</cacheControl> </httpCaching> -->
<!-<httpCachinglastModifiedFrom="openTime" etagSeed="Solr"> <cacheControl>max-age=30,public</cacheControl> </httpCaching> --> </requestDispatcher>

solr的搭建與配置