The index Plan

阿新 • • 發佈：2017-07-05

apian 索引

In order to index the CSV, we want to take two fields from each row, title and description, and turn them into suitable terms. For straightforward textual search we don’t need document values.

Because we’re dealing with free text, and because we know the whole dataset is in English, we can use stemming so that for instance searching for “sundial” and “sundials” will both match the same documents. This way people don’t need to worry too much about exactly which words to use in their query.

Finally, we want a way of separating the two fields. In Xapian this is done using term prefixes, basically by putting short strings at the beginning of terms to indicate which field the term indexes. As well as prefixed terms, we also want to generate unprefixed terms, so that as well as searching within fields you can also search for text in any field.

There are some conventional prefixes used, which is helpful if you ever need to interoperate with omega (a web-based search engine) or other compatible systems. From this, we’ll use ‘S’ to prefix title (it stands for ‘subject’), and for description we’ll use ‘XD’. A full list of conventional prefixes is given at the top of the omega documentation on termprefixes.

When you’re indexing multiple fields like this, the term positions used for each field when indexed unprefixed need to be kept apart. Say you have a title of “The Saints”, and description “Don’t like rabbits? Keep reading.” If you index those fields without a gap, the phrase search “Saints don’t like rabbits” will match, where it really shouldn’t. Usually a gap of 100 between each field is enough.

To write to a database, we use the WritableDatabase class, which allows us to create, update or overwrite a database.

To create terms, we use Xapian’s TermGenerator, a built-in class to make turning free text into terms easier. It will split into words, apply stemming, and then add term prefixes as needed. It can also take care of term positions, including the gap between different fields.

為了對CSV進行索引，我們要從每行中取兩個字段，標題和描述，並將其轉換成合適的term。對於簡單的文本搜索，我們不需要文檔值。

因為我們正在處理自由文本，並且因為我們知道整個數據集是英文的，所以我們可以使用詞幹，例如搜索“sundial”和“sundials”都將匹配相同的文檔。這樣一來，人們不需要太多關心在查詢中使用哪些單詞。

最後，我們想要一種分離這兩個字段的方法。在Xapian中，這是使用trem prefixes完成的，基本上是通過在術語開頭放短字符串來指示術語索引的字段。除了前綴術語之外，我們還要生成無偏見的術語，以便在字段內搜索，也可以在任何字段中搜索文本。

有一些常規的前綴使用，如果您需要與omega（基於Web的搜索引擎）或其他兼容系統進行互操作，這是有幫助的。從此，我們將使用‘S‘來標題（它代表‘subject‘），對於描述，我們將使用‘XD‘。 omega文檔的頂部提供了常規前綴的完整列表。

當您對這樣的多個字段進行索引時，需要將索引未修改的每個字段使用的術語位置分開。說你有一個標題“聖徒”，並描述“不喜歡兔子？繼續讀書。“如果你沒有間隙地索引這些字段，搜索”聖徒不喜歡兔子“這個詞將會匹配，真的不應該。通常每個領域之間的差距就足夠了。

要寫入數據庫，我們使用WritableDatabase類，它允許我們創建，更新或覆蓋數據庫。

要創建條款，我們使用Xapian的TermGenerator，一個內置的類來使自由文本變得更容易。它將分割成單詞，應用詞幹，然後根據需要添加術語前綴。它也可以照顧到職位，包括不同領域之間的差距。

The index Plan

apian 索引In order to index the CSV, we want to take two fields from each row, title and description, and turn them into suitable terms. For straightforward

Script" References MACLEAN‘s post Speed up the index creation.

alter session set workarea_size_policy=MANUAL; alter session set db_file_multiblock_read_count=512; alter session set events '10351 trace name cont

solr建立索引時出現的異常org.apache.solr.common.SolrException: Exception writing document id xx to the index;

丟擲的全部異常大概如下：org.apache.solr.common.SolrException: Exception writing document id 216989 to the index; possible analysis error: startOffset

【MongoDB】The basic operation of Index in MongoDB

drop desc ould lar text and tracking num ack In the past four blogs, we attached importance to the index, including description and c

OperationFailed Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

imu command mongo sed 大內存 gson style fan index 　　按照錯誤提示，知道這是排序的時候報的錯，因為 mongo 的 sort 操作是在內存中操作的，必然會占據內存，同時mongo 內的一個機制限制排序時最大內存為 32M，當排序的

java.lang.IllegalArgumentException: the bind value at index 1 is null

java.lang.IllegalArgumentException: the bind value at index 1 is null at android.database.sqlite.SQLiteProgram.bindString(SQLiteProgram.j

《Pro SQL Server Internals, 2nd edition》的CHAPTER 7 Designing and Tuning the Indexes中的Clustered Index Design Considerations一節（即P155~P165）

聚集索引設計注意事項每次更改聚簇索引鍵的值時，都會發生兩件事。首先，SQL Server將行移動到聚簇索引頁鏈和資料檔案中的不同位置。其次，它更新了row-id，它是聚集索引鍵。儲存了行id，需要在所有非聚簇索引中更新。就I / O而言，這可能是昂貴的，特別是在批量更新的情況下。此外，它可以增加聚簇索引的

關於Tomcat啟動成功後index.jsp無法正常顯示的問題（The origin server did not find a current representation ）

首先我遇到的是這樣的錯誤：The origin server did not find a current representation for the target resource or is not willing to disclose that one exist

《Pro SQL Server Internals, 2nd edition》（pdf已傳至群檔案）的CHAPTER 7 Designing and Tuning the Indexes中的Clustered Index Design Considerations一節（即P155~P165）

聚集索引設計考慮因素每次你改變聚簇索引鍵的值時，都會發生兩件事。首先，SQL Server將行移動到聚簇索引頁鏈和資料檔案中的不同位置。其次，它更新聚集索引鍵，行編號。行編號被儲存起來而且要在所有非聚簇索引中更新。對於I / O而言，這花銷可能很昂貴，尤其是在批處理更新的情況下。此外，它可以增加聚

The index Plan

The index Plan

Script" References MACLEAN‘s post Speed up the index creation.

solr建立索引時出現的異常org.apache.solr.common.SolrException: Exception writing document id xx to the index;

【MongoDB】The basic operation of Index in MongoDB

OperationFailed Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

java.lang.IllegalArgumentException: the bind value at index 1 is null

《Pro SQL Server Internals, 2nd edition》的CHAPTER 7 Designing and Tuning the Indexes中的Clustered Index Design Considerations一節（即P155~P165）

關於Tomcat啟動成功後index.jsp無法正常顯示的問題（The origin server did not find a current representation ）

《Pro SQL Server Internals, 2nd edition》（pdf已傳至群檔案）的CHAPTER 7 Designing and Tuning the Indexes中的Clustered Index Design Considerations一節（即P155~P165）

錯誤的異常資訊為Index column size too large. The maximum column size is 767 bytes

多執行緒Session賦值,可能存在Index was outside the bounds of the array.

AI is the only way to plan and run a dense 5G network, says AT&T

World Wide Web founder reveals plan to reinvent the internet

Here's China's massive plan to retool the web

How solid is Tim’s plan to redecentralize the web?

NASA's 'brilliant' plan for a cloud city of airships in the atmosphere of Venus

New White House Select Committee on AI looks to update the national AI strategic plan

The Case of the Site that Wouldn’t Index

What Happened to Facebook’s Grand Plan to Wire the World?

Python Software Foundation News: Redesigning the Python Package Index

The index Plan

相關推薦