深入elasticsearch原始碼之索引過程

阿新 • • 發佈：2019-02-12

呼叫es 2..2.1 的 java Api在ES叢集中索引一個文件

客戶端大致流程：

使用XContentBuilder構建索引的json串，也可直接用json字串
使用TransportClient連線ES叢集
傳送索引到叢集並獲取IndexResponse

測試程式碼如下：

package index;

import java.io.IOException;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.transport.TransportClient 
;
import org.elasticsearch.common.xcontent.XContent;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.elasticsearch.common.xcontent.json.JsonXContentGenerator;
import com 
.fasterxml.jackson.core.filter.JsonPointerBasedFilter;
import es.MyTransportClient;

public class MyIndex {

    public static void main(String[] args) {
        TransportClient client = MyTransportClient.getInstance().getTransportClient();
        /**
         * 直接構造json
         */
        // IndexResponse response = client.prepareIndex 
("library","book","1")
        // .setSource("{\"title\":\"mastering elasticsearch\"}")
        // .execute().actionGet();
        /**
         * 程式碼構造json
         */
        XContentBuilder builder;
        try {
            builder = JsonXContent.contentBuilder().startObject().field("user", "qiaqia")
                    .field("title", "this is title")
                    .field("subtitle", new String[] { "title1", "title2", "title3" }).endObject();
            IndexResponse response = client.prepareIndex("library", "book", "4").setSource(builder)
                    .get();
            System.out.println(response.toString());
            System.out.println(response.isCreated());
            System.out.println(response.getVersion());

            /*
             * 在es儲存的結果如下：
             {
                "_index": "library",
                "_type": "book",
                "_id": "5",
                "_score": 1,
                "_source": {
                        "user": "qiaqia",
                        "title": "this is title",
                        "subtitle": [
                            "title1"
                            ,
                            "title2"
                            ,
                            "title3"
                        ]
                    }
                }
             */

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

構建好XContent後，生成IndexRequest,IndexRequest封裝了索引的操作，索引內容，路由，索引型別，id, 時間戳，版本號，超時，ttl等等資訊
然後將IndexRequest由TransportService通過tcp傳送到叢集，TransportService封裝了非同步，事件驅動的高效能網路應用程式框架Netty

服務端流程

獲取到TransportAction後,讀取叢集狀態，確定資料分配到哪個分片上。

把請求提交到主分片處理，可檢視TransportIndexAction

  protected Tuple<IndexResponse, IndexRequest> shardOperationOnPrimary(MetaData metaData, IndexRequest request) throws Throwable {

        // validate, if routing is required, that we got routing
        IndexMetaData indexMetaData = metaData.index(request.shardId().getIndex());
        MappingMetaData mappingMd = indexMetaData.mappingOrDefault(request.type());
        if (mappingMd != null && mappingMd.routing().required()) {
            if (request.routing() == null) {
                throw new RoutingMissingException(request.shardId().getIndex(), request.type(), request.id());
            }
        }

        IndexService indexService = indicesService.indexServiceSafe(request.shardId().getIndex());
        IndexShard indexShard = indexService.shardSafe(request.shardId().id());

        final WriteResult<IndexResponse> result = executeIndexRequestOnPrimary(null, request, indexShard, mappingUpdatedAction);
        final IndexResponse response = result.response;
        final Translog.Location location = result.location;
        processAfterWrite(request.refresh(), indexShard, location);
        return new Tuple<>(response, request);//返回操作後的IndexResponse
    }

執行索引寫入前，TransportIndexAction

public static Engine.IndexingOperation prepareIndexOperationOnPrimary(BulkShardRequest shardRequest, IndexRequest request, IndexShard indexShard) {
/**
 將IndexRequest中的資料解析出來
*/
        SourceToParse sourceToParse = SourceToParse.source(SourceToParse.Origin.PRIMARY, request.source()).index(request.index()).type(request.type()).id(request.id())
            .routing(request.routing()).parent(request.parent()).timestamp(request.timestamp()).ttl(request.ttl());
        boolean canHaveDuplicates = request.canHaveDuplicates();
        if (shardRequest != null) {
            canHaveDuplicates |= shardRequest.canHaveDuplicates();
        }
    /**
          判斷是索引還是建立，當opType是Index時，如果文件id存在，更新文件，否則建立文件
          當opType是Create，如果文件id存在，丟擲文件存在的錯誤
    */
        if (request.opType() == IndexRequest.OpType.INDEX) {
            return indexShard.prepareIndexOnPrimary(sourceToParse, request.version(), request.versionType(), canHaveDuplicates);
        } else {
            assert request.opType() == IndexRequest.OpType.CREATE : request.opType();

            return indexShard.prepareCreateOnPrimary(sourceToParse, request.version(), request.versionType(), canHaveDuplicates, canHaveDuplicates);//呼叫indexShard對Lucene進行操作
        }
    }

在IndexShard中，

public Engine.Index prepareIndexOnPrimary(SourceToParse source, long version, VersionType versionType, boolean canHaveDuplicates) {
        try {
            if (shardRouting.primary() == false) {
                throw new IllegalIndexShardStateException(shardId, state, "shard is not a primary");
            }
            return prepareIndex(docMapper(source.type()), source, version, versionType, Engine.Operation.Origin.PRIMARY, state !=
                    IndexShardState.STARTED || canHaveDuplicates);
        } catch (Throwable t) {
            verifyNotClosed(t);
            throw t;
        }
    }

static Engine.Index prepareIndex(DocumentMapperForType docMapper, SourceToParse source, long version, VersionType versionType, Engine
            .Operation.Origin origin, boolean canHaveDuplicates) {
        long startTime = System.nanoTime();
        /**
       解析json為ParsedDocument
    */
        ParsedDocument doc = docMapper.getDocumentMapper().parse(source);
        if (docMapper.getMapping() != null) {
            doc.addDynamicMappingsUpdate(docMapper.getMapping());
        }
        //寫入Lucene
        return new Engine.Index(docMapper.getDocumentMapper().uidMapper().term(doc.uid().stringValue()), doc, version, versionType,
                origin, startTime, canHaveDuplicates);
    }

有關如何跟Lucene底層進行資料互動的問題，由於本人剛入門ES，也沒讀過Lucene原始碼，所以等以後有時間再補充好了。

深入elasticsearch原始碼之索引過程

呼叫es 2..2.1 的 java Api在ES叢集中索引一個文件客戶端大致流程：使用XContentBuilder構建索引的json串，也可直接用json字串使用TransportClient連線ES叢集傳送索引到叢集並獲取IndexRes

深入Log4J原始碼之LoggerRepository和Configurator

LoggerRepository從字面上理解，它是一個Logger的容器，它會建立並快取Logger例項，從而具有相同名字的Logger例項不會多次建立，以提高效能。它的這種特性有點類似Spring的IOC概念。Log4J支援兩種配置檔案：properties檔案和xml檔案

深入Jetty原始碼之Servlet框架及實現(ServletContext)

publicinterface ServletContext {// Servlet Container為當前Web Application設定臨時目錄，並將該臨時目錄的值儲存到當前ServletContext的屬性中使用的屬性名。// Jetty使用WebInfCon

深入JDK原始碼_Index --> 深入JDK原始碼之ThreadLocal類 --> 陶邦仁又發現一牛人

深入JDK原始碼 http://my.oschina.net/xianggao/blog/392440 ThreadLocal概述學習JDK中的類，首先看下JDK API對此類的描述，描述如下：該類提供了執行緒區域性 (thread-local) 變數。

elasticsearch原始碼分析---索引資料

跟正常的網路通訊相似，es的client跟server是通過netty進行通訊的，client封裝各種request，通過netty傳送給es的server。server解析收到的各類request，dispatch到對應的handler中進行處理。下面我們看一下索引一條

深入ASM原始碼之ClassReader、ClassVisitor、ClassWriter

概述 ASM是Java中比較流行的用來讀寫位元組碼的類庫，用來基於位元組碼層面對程式碼進行分析和轉換。在讀寫的過程中可以加入自定義的邏輯以增強或修改原來已編譯好的位元組碼，比如CGLIB用它來實現動態代理。ASM被設計用於在執行時對Java類進行生成和轉換，當然也包括離線

ElasticStack學習（九）：深入ElasticSearch搜尋之詞項、全文字、結構化搜尋及相關性算分

一、基於詞項與全文的搜尋　　1、詞項　　　　Term（詞項）是表達語意的最小單位，搜尋和利用統計語言模型進行自然語言處理都需要處理Term。　　　　Term的使用說明：　　　　1）Term Level Query：Term Query、Range Query、Exists Query

ElasticStack學習（十）：深入ElasticSearch搜尋之QueryFiltering、多/單字串的多欄位查詢

一、複合查詢　　1、在ElasticSearch中，有Query和Filter兩種不同的Context。Query Context進行了相關性算分，Filter Context不需要進行算分，同時可以利用Cache，獲取更好的效能。　　2、bool Query：一個布林查詢，是一個或者多個查詢子

elasticsearch document的索引過程分析

elasticsearch專欄：https://www.cnblogs.com/hello-shf/category/1550315.html 一、預備知識 1.1、索引不可變看到這篇文章相信大家都知道es是倒排索引，不瞭解也沒關係，在我的另一篇博文中詳細分析了es的倒排索引機制。在e

elasticsearch原始碼分析之索引操作（九）

上節介紹了es的node啟動如何建立叢集服務的過程，這節在其基礎之上介紹es索引的基本操作功能（create、exist、delete），用來進一步細化es叢集是如果工作的。客戶端部分的操作就不予介紹了，詳細可以參照elasticsearch原始碼分析之客戶

elasticsearch原始碼分析之啟動過程（二）

最近開始廣泛的使用elasticsearch，也開始寫一些java程式碼了，為了提高java程式碼能力，也為了更加深入一點了解elasticsearch的內部運作機制，所以開始看一些elasticsearch的原始碼了。對於這種廣受追捧的開源專案，細細品讀一定會受益匪淺，

OpenStack Nova深入學習 -- 建立instance的過程之原始碼分析

Nova的核心元件有nova API, nova Conductor，nova Scheduler和nova compute。如下圖所示： nova主要元件的作用： 1). nova API -- 主要是接收HTTP請求(通常來自nova client)，將其轉換成命令

Elasticsearch學習之深入聚合分析三---案例實戰

引用實戰 avg buck oba core 電視針對過濾 1. 統計指定品牌下每個顏色的銷量任何的聚合，都必須在搜索出來的結果數據中進行，搜索結果，就是聚合分析操作的scope GET /tvs/sales/_search { "size": 0, "

Elasticsearch學習之深入聚合分析五---案例實戰

ppi ont doc indices 理解 req eve 同步 nod 1. fielddata核心原理　　fielddata加載到內存的過程是lazy加載的，對一個analzyed field執行聚合時，才會加載，而且是field-level加載的,一個index的

Elasticsearch學習之深入搜索一 --- 提高查詢的精準度

ast 多少 opera 相關度滿足 ini 無法 sea 進行 1. 為帖子增加標題字段 POST /forum/article/_bulk { "update": { "_id": "1"} } { "doc" : {"title" : "this is java

Elasticsearch學習之深入搜索五 --- phrase matching搜索技術

size 才會匹配 rms blog 文本 mit base 舉例 1. 近似匹配什麽是近似匹配，兩個句子 java is my favourite programming language, and I also think spark is a very good

深入理解SpringCloud之Eureka註冊過程分析

.net then media inject seq tar view inf cas 　　eureka是一種去中心化的服務治理應用，其顯著特點是既可以作為服務端又可以作為服務向自己配置的地址進行註冊。那麽這篇文章就來探討一下eureka的註冊流程。一、Eureka的服

Elasticsearch必備技能之索引遷移

將ES中的索引拷貝到其他ES中，是不是很重要呢？長話短說，推薦一個工具：一、elasticsearch-dump 安裝： #yum install epel-release #yum install nodejs #yum install npm #npm install elasticdu

tomcat原始碼之connector啟動過程

connector原始碼部分建構函式生命週期啟動啟動endPoint 啟動accepter 執

Elasticsearch學習之深入搜尋一 --- 提高查詢的精準度

為帖子增加標題欄位 POST /forum/article/_bulk { "update": { "_id": "1"} } { "doc" : {"title" : "this is java and elasticsearch blog"} } { "update":

深入elasticsearch原始碼之索引過程

客戶端大致流程：

服務端流程

相關推薦