solr建立索引時出現的異常org.apache.solr.common.SolrException: Exception writing document id xx to the index;
阿新 • • 發佈:2019-01-27
丟擲的全部異常大概如下:
org.apache.solr.common.SolrException: Exception writing document id 216989 to the index; possible analysis error: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=27,endOffset=30,lastStartOffset=29 for field 'product_goods_name' at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:226) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:910) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1121) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:616) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:257) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:527) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=27,endOffset=30,lastStartOffset=29 for field 'product_goods_name' at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:767) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:240) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:496) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1729) at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:965) at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:954) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:334) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:271) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221) ... 32 more
大概意思是分詞的時候出現了衝突,你自定義的分詞方式跟solr內部分詞方式衝突,例如,solr內部可能對一個英文單詞不進行分詞了,但是你自定義的分詞方法又想分開,所以衝突,丟擲異常。
解決辦法:
1.首先檢查一下你的分詞器的擴充套件詞庫,拿我的舉例,我是用的IK分詞器,我是從百度上下載了一個跟TB相關的關鍵詞庫,整理之後就加入到了我的擴充套件詞庫,剛開始沒做任何篩選,結果就報這種錯誤。研究發現是因為我的擴充套件詞庫裡面含有大量的英文字母和數字導致報這種錯誤,後來一想確實是,人家solr內部就已經對英文數字做了完美的分詞,何須再分,還有就是人家擴充套件詞庫只支援擴充套件中文,所以如過加上英文數字不就衝突了嗎。所以把擴充套件詞庫的英文和數字全部去掉 ,也就是首先將ext.dic轉成ext.txt,然後通過word開啟去掉所有數字英文字母(具體方法自行bd),然後去重通過EmEditor工具去重(具體方法自行bd),之後轉成dic檔案,注意是utf-8無bom格式,最後上傳。
2.如果還有異常,那麼就檢查你的schema配置檔案,分詞器的配置,例如我的:
<filter>標籤裡的是參考的人家的配置複製上去的,結果掉坑了,這應該是人家自己封裝的方法,用不上,想看的請點選點選開啟連結。後來注掉就不拋異常了。<!-- IKAnalyzer --> <fieldType name="text_ik" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> <!-- <filter class="org.wltea.analyzer.lucene.IKTokenFilterFactory" useSingle="true" useItself="false" /> --> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> <!-- <filter class="org.wltea.analyzer.lucene.IKTokenFilterFactory" useSingle="true" useItself="false" /> --> </analyzer> </fieldType>