1. 程式人生 > 實用技巧 >圖資料庫HugeGraph原始碼解讀 (1) —— 入門介紹

圖資料庫HugeGraph原始碼解讀 (1) —— 入門介紹

HugeGraph介紹

以下引自官方文件:

HugeGraph是一款易用、高效、通用的開源圖資料庫系統(Graph Database,GitHub專案地址), 實現了Apache TinkerPop3框架及完全相容Gremlin查詢語言, 具備完善的工具鏈元件,助力使用者輕鬆構建基於圖資料庫之上的應用和產品。HugeGraph支援百億以上的頂點和邊快速匯入,並提供毫秒級的關聯關係查詢能力(OLTP), 並可與Hadoop、Spark等大資料平臺整合以進行離線分析(OLAP)。

HugeGraph典型應用場景包括深度關係探索、關聯分析、路徑搜尋、特徵抽取、資料聚類、社群檢測、 知識圖譜等,適用業務領域有如網路安全、電信詐騙、金融風控、廣告推薦、社交網路和智慧機器人等。

劃重點:
- 基於TinkerPop3框架,相容Gremlin查詢語言
- OLTP(開源) 與 OLAP(商業版)
- 常用圖應用支援—— 路徑搜尋、推薦等

架構介紹

架構圖

HugeGraph包括三個層次的功能,分別是儲存層、計算層和使用者介面層。 HugeGraph支援OLTP和OLAP兩種圖計算型別

元件

HugeGraph的主要功能分為HugeCore、ApiServer、HugeGraph-Client、HugeGraph-Loader和HugeGraph-Studio等元件構成,各元件之間的通訊關係如下圖所示。

其中核心元件:

  • HugeCore :HugeGraph的核心模組,TinkerPop的介面主要在該模組中實現。
  • ApiServer :提供RESTFul Api介面,對外提供Graph Api、Schema Api和Gremlin Api等介面服務。
  • HugeGraph-Client:基於Java客戶端驅動程式

生態元件:

  • HugeGraph-Loader:資料匯入模組。HugeGraph-Loader可以掃描並分析現有資料,自動生成Graph Schema建立語言,通過批量方式快速匯入資料。
  • HugeGraph-Studio:基於Web的視覺化IDE環境。以Notebook方式記錄Gremlin查詢,視覺化展示Graph的關聯關係。HugeGraph-Studio也是本系統推薦的工具。

HugeGraph-Studio

看起來已經被拋棄了,研發團隊正開發一個名為'hugegraph-hubble' 的新專案:

hugegraph-hubble is a graph management and analysis platform that provides features: graph data load, schema management, graph relationship analysis and graphical display.

根據官方的說明,hubble定義為圖譜管理和分析平臺,提供圖譜資料載入、schema管理、圖分析和視覺化展示,目前正在研發中,預計2020年9月份會發布首個版本。

設計理念

常見的圖資料表示模型有兩種:

  • RDF(Resource Description Framework)模型: 學術界的選擇,通過sparql來進行查詢,jenagStore等等
  • 屬性圖(Property Graph)模型,工業界的選擇,neo4jjanusgraph都是這種方案。

RDF是W3C標準,而Property Graph是工業標準,受到廣大圖資料庫廠商的廣泛支援。HugeGraph採用Property Graph,遵循工業標準。

HugeGraph儲存概念模型詳見下圖:

主要包含幾個部分:

  • Vertex(頂點),對應一個實體(Entity)
  • Vertex Label(頂點的型別),對應一個概念(Concept)
  • 屬性(圖裡的name、age),PropertyKey
  • Edge邊(圖裡的lives),對應RDF裡的Relation

可擴充套件性

HugeGraph提供了豐富的外掛擴充套件機制,包含幾個維度的擴充套件項:

  • 後端儲存
  • 序列化器
  • 自定義配置項
  • 分詞器

外掛實現機制

  1. HugeGraph提供外掛介面HugeGraphPlugin,通過Java SPI機制支援外掛化
  2. HugeGraph提供了4個擴充套件項註冊函式:registerOptions()registerBackend()registerSerializer()registerAnalyzer()
  3. 外掛實現者實現相應的Options、Backend、Serializer或Analyzer的介面
  4. 外掛實現者實現HugeGraphPlugin介面的register()方法,在該方法中註冊上述第3點所列的具體實現類,並打成jar包
  5. 外掛使用者將jar包放在HugeGraph Server安裝目錄的plugins目錄下,修改相關配置項為外掛自定義值,重啟即可生效

從案例深入原始碼

想要深入的理解一個系統的原始碼,先從具體的應用入手。先檢視example程式碼:

https://github.com/hugegraph/hugegraph/blob/master/hugegraph-example/src/main/java/com/baidu/hugegraph/example/Example1.java

 public static void main(String[] args) throws Exception {
        LOG.info("Example1 start!");

        HugeGraph graph = ExampleUtil.loadGraph();

        Example1.showFeatures(graph);

        Example1.loadSchema(graph);
        Example1.loadData(graph);
        Example1.testQuery(graph);
        Example1.testRemove(graph);
        Example1.testVariables(graph);
        Example1.testLeftIndexProcess(graph);

        Example1.thread(graph);

        graph.close();

        HugeFactory.shutdown(30L);
    }

1. loadGraph

要使用hugegraph,需要先初始化一個HugeGraph物件,而LoadGraph 正是做這個的。

public static HugeGraph loadGraph(boolean needClear, boolean needProfile) {
        if (needProfile) {
            profile();
        }

        registerPlugins();

        String conf = "hugegraph.properties";
        try {
            String path = ExampleUtil.class.getClassLoader()
                                     .getResource(conf).getPath();
            File file = new File(path);
            if (file.exists() && file.isFile()) {
                conf = path;
            }
        } catch (Exception ignored) {
        }

        HugeGraph graph = HugeFactory.open(conf);

        if (needClear) {
            graph.clearBackend();
        }
        graph.initBackend();

        return graph;
    }
1.1 registerPlugins

其中 registerPlugins 註冊外掛,注意上面介紹的擴充套件機制。hugegraph所有的後端儲存都需要通過外掛註冊。

 public static void registerPlugins() {
        if (registered) {
            return;
        }
        registered = true;

        RegisterUtil.registerCassandra();
        RegisterUtil.registerScyllaDB();
        RegisterUtil.registerHBase();
        RegisterUtil.registerRocksDB();
        RegisterUtil.registerMysql();
        RegisterUtil.registerPalo();
    }

註冊主要是register配置、序列化器和backend,比如下面是mysql的。

public static void registerMysql() {
        // Register config
        OptionSpace.register("mysql",
                "com.baidu.hugegraph.backend.store.mysql.MysqlOptions");
        // Register serializer
        SerializerFactory.register("mysql",
                "com.baidu.hugegraph.backend.store.mysql.MysqlSerializer");
        // Register backend
        BackendProviderFactory.register("mysql",
                "com.baidu.hugegraph.backend.store.mysql.MysqlStoreProvider");
    }
1.2 HugeFactory.open

HugeFactory 是Hugraph的工廠類,支援傳入Configuraion配置資訊,構建一個HugeGraph例項,注意這裡為了執行緒安全,簽名採用synchronized

 public static synchronized HugeGraph open(Configuration config) {
        HugeConfig conf = config instanceof HugeConfig ?
                          (HugeConfig) config : new HugeConfig(config);
        String name = conf.get(CoreOptions.STORE);
        checkGraphName(name, "graph config(like hugegraph.properties)");
        name = name.toLowerCase();
        HugeGraph graph = graphs.get(name);
        if (graph == null || graph.closed()) {
            graph = new StandardHugeGraph(conf);
            graphs.put(name, graph);
        } else {
            String backend = conf.get(CoreOptions.BACKEND);
            E.checkState(backend.equalsIgnoreCase(graph.backend()),
                         "Graph name '%s' has been used by backend '%s'",
                         name, graph.backend());
        }
        return graph;
    }

這裡順帶提下配置檔案,通過程式碼看到,預設是讀取hugegraph.properties.

1.3 HugeGraph 物件

HugeGraph是一個interface,繼承gremlin的Graph介面,定義了圖譜的Schema定義、資料儲存、查詢等API方法。從上面1.2可以看到,預設的實現是StandardHugeGraph

public interface HugeGraph extends Graph {

    public HugeGraph hugegraph();

    public SchemaManager schema();

    public Id getNextId(HugeType type);

    public void addPropertyKey(PropertyKey key);
    public void removePropertyKey(Id key);
    public Collection<PropertyKey> propertyKeys();
    public PropertyKey propertyKey(String key);
    public PropertyKey propertyKey(Id key);
    public boolean existsPropertyKey(String key);

...
   

1.4 graph.clearBackend 與initBackend

clearBackend將後端資料清理,initBackend初始化基本的資料結構。

2. loadSchema

該方法,用來定義schema:

public static void loadSchema(final HugeGraph graph) {

        SchemaManager schema = graph.schema();

        // Schema changes will be commit directly into the back-end
        LOG.info("===============  propertyKey  ================");
        schema.propertyKey("id").asInt().create();
        schema.propertyKey("name").asText().create();
        schema.propertyKey("gender").asText().create();
        schema.propertyKey("instructions").asText().create();
        schema.propertyKey("category").asText().create();
        schema.propertyKey("year").asInt().create();
        schema.propertyKey("time").asText().create();
        schema.propertyKey("timestamp").asDate().create();
        schema.propertyKey("ISBN").asText().create();
        schema.propertyKey("calories").asInt().create();
        schema.propertyKey("amount").asText().create();
        schema.propertyKey("stars").asInt().create();
        schema.propertyKey("age").asInt().valueSingle().create();
        schema.propertyKey("comment").asText().valueSet().create();
        schema.propertyKey("contribution").asText().valueSet().create();
        schema.propertyKey("nickname").asText().valueList().create();
        schema.propertyKey("lived").asText().create();
        schema.propertyKey("country").asText().valueSet().create();
        schema.propertyKey("city").asText().create();
        schema.propertyKey("sensor_id").asUUID().create();
        schema.propertyKey("versions").asInt().valueList().create();

        LOG.info("===============  vertexLabel  ================");

        schema.vertexLabel("person")
              .properties("name", "age", "city")
              .primaryKeys("name")
              .create();
        schema.vertexLabel("author")
              .properties("id", "name", "age", "lived")
              .primaryKeys("id").create();
        schema.vertexLabel("language").properties("name", "versions")
              .primaryKeys("name").create();
        schema.vertexLabel("recipe").properties("name", "instructions")
              .primaryKeys("name").create();
        schema.vertexLabel("book").properties("name")
              .primaryKeys("name").create();
        schema.vertexLabel("reviewer").properties("name", "timestamp")
              .primaryKeys("name").create();

        // vertex label must have the properties that specified in primary key
        schema.vertexLabel("FridgeSensor").properties("city")
              .primaryKeys("city").create();

        LOG.info("===============  vertexLabel & index  ================");
        schema.indexLabel("personByCity")
              .onV("person").secondary().by("city").create();
        schema.indexLabel("personByAge")
              .onV("person").range().by("age").create();

        schema.indexLabel("authorByLived")
              .onV("author").search().by("lived").create();

        // schemaManager.getVertexLabel("author").index("byName").secondary().by("name").add();
        // schemaManager.getVertexLabel("recipe").index("byRecipe").materialized().by("name").add();
        // schemaManager.getVertexLabel("meal").index("byMeal").materialized().by("name").add();
        // schemaManager.getVertexLabel("ingredient").index("byIngredient").materialized().by("name").add();
        // schemaManager.getVertexLabel("reviewer").index("byReviewer").materialized().by("name").add();

        LOG.info("===============  edgeLabel  ================");

        schema.edgeLabel("authored").singleTime()
              .sourceLabel("author").targetLabel("book")
              .properties("contribution", "comment")
              .nullableKeys("comment")
              .create();

        schema.edgeLabel("write").multiTimes().properties("time")
              .sourceLabel("author").targetLabel("book")
              .sortKeys("time")
              .create();

        schema.edgeLabel("look").multiTimes().properties("timestamp")
              .sourceLabel("person").targetLabel("book")
              .sortKeys("timestamp")
              .create();

        schema.edgeLabel("created").singleTime()
              .sourceLabel("author").targetLabel("language")
              .create();

        schema.edgeLabel("rated")
              .sourceLabel("reviewer").targetLabel("recipe")
              .create();
    }

劃重點:
- SchemaManager schema = graph.schema() 獲取SchemaManager
- schema.propertyKey(NAME).asXXType().create() 建立屬性
- schema.vertexLabel("person") // 定義概念
.properties("name", "age", "city") // 定義概念的屬性
.primaryKeys("name") // 定義primary Keys,primary Key組合後可以唯一確定一個實體
.create();
- schema.indexLabel("personByCity").onV("person").secondary().by("city").create(); 定義索引
- schema.edgeLabel("authored").singleTime()
.sourceLabel("author").targetLabel("book")
.properties("contribution", "comment")
.nullableKeys("comment")
.create(); // 定義關係

3. loadData

建立實體,注意格式,K-V成對出現:

graph.addVertex(T.label, "book", "name", "java-3");

建立關係,Vertex的addEdge方法:

    Vertex james = tx.addVertex(T.label, "author", "id", 1,
                                "name", "James Gosling",  "age", 62,
                                "lived", "San Francisco Bay Area");

    Vertex java = tx.addVertex(T.label, "language", "name", "java",
                               "versions", Arrays.asList(6, 7, 8));
    Vertex book1 = tx.addVertex(T.label, "book", "name", "java-1");
    Vertex book2 = tx.addVertex(T.label, "book", "name", "java-2");
    Vertex book3 = tx.addVertex(T.label, "book", "name", "java-3");

    james.addEdge("created", java);
    james.addEdge("authored", book1,
                  "contribution", "1990-1-1",
                  "comment", "it's a good book",
                  "comment", "it's a good book",
                  "comment", "it's a good book too");
    james.addEdge("authored", book2, "contribution", "2017-4-28");

    james.addEdge("write", book2, "time", "2017-4-28");
    james.addEdge("write", book3, "time", "2016-1-1");
    james.addEdge("write", book3, "time", "2017-4-28");	

新增後,需要commit

4. testQuery 測試查詢

查詢主要通過GraphTraversal, 可以通過graph.traversal()獲得:

public static void testQuery(final HugeGraph graph) {
        // query all
        GraphTraversal<Vertex, Vertex> vertices = graph.traversal().V();
        int size = vertices.toList().size();
        assert size == 12;
        System.out.println(">>>> query all vertices: size=" + size);

        // query by label
        vertices = graph.traversal().V().hasLabel("person");
        size = vertices.toList().size();
        assert size == 5;
        System.out.println(">>>> query all persons: size=" + size);

        // query vertex by primary-values
        vertices = graph.traversal().V().hasLabel("author").has("id", 1);
        List<Vertex> vertexList = vertices.toList();
        assert vertexList.size() == 1;
        System.out.println(">>>> query vertices by primary-values: " +
                           vertexList);

        VertexLabel author = graph.schema().getVertexLabel("author");
        String authorId = String.format("%s:%s", author.id().asString(), "11");

        // query vertex by id and query out edges
        vertices = graph.traversal().V(authorId);
        GraphTraversal<Vertex, Edge> edgesOfVertex = vertices.outE("created");
        List<Edge> edgeList = edgesOfVertex.toList();
        assert edgeList.size() == 1;
        System.out.println(">>>> query edges of vertex: " + edgeList);

        vertices = graph.traversal().V(authorId);
        vertexList = vertices.out("created").toList();
        assert vertexList.size() == 1;
        System.out.println(">>>> query vertices of vertex: " + vertexList);

        // query edge by sort-values
        vertices = graph.traversal().V(authorId);
        edgesOfVertex = vertices.outE("write").has("time", "2017-4-28");
        edgeList = edgesOfVertex.toList();
        assert edgeList.size() == 2;
        System.out.println(">>>> query edges of vertex by sort-values: " +
                           edgeList);

        // query vertex by condition (filter by property name)
        ConditionQuery q = new ConditionQuery(HugeType.VERTEX);
        PropertyKey age = graph.propertyKey("age");
        q.key(HugeKeys.PROPERTIES, age.id());
        if (graph.backendStoreFeatures()
                 .supportsQueryWithContainsKey()) {
            Iterator<Vertex> iter = graph.vertices(q);
            assert iter.hasNext();
            System.out.println(">>>> queryVertices(age): " + iter.hasNext());
            while (iter.hasNext()) {
                System.out.println(">>>> queryVertices(age): " + iter.next());
            }
        }

        // query all edges
        GraphTraversal<Edge, Edge> edges = graph.traversal().E().limit(2);
        size = edges.toList().size();
        assert size == 2;
        System.out.println(">>>> query all edges with limit 2: size=" + size);

        // query edge by id
        EdgeLabel authored = graph.edgeLabel("authored");
        VertexLabel book = graph.schema().getVertexLabel("book");
        String book1Id = String.format("%s:%s", book.id().asString(), "java-1");
        String book2Id = String.format("%s:%s", book.id().asString(), "java-2");

        String edgeId = String.format("S%s>%s>%s>S%s",
                                      authorId, authored.id(), "", book2Id);
        edges = graph.traversal().E(edgeId);
        edgeList = edges.toList();
        assert edgeList.size() == 1;
        System.out.println(">>>> query edge by id: " + edgeList);

        Edge edge = edgeList.get(0);
        edges = graph.traversal().E(edge.id());
        edgeList = edges.toList();
        assert edgeList.size() == 1;
        System.out.println(">>>> query edge by id: " + edgeList);

        // query edge by condition
        q = new ConditionQuery(HugeType.EDGE);
        q.eq(HugeKeys.OWNER_VERTEX, IdGenerator.of(authorId));
        q.eq(HugeKeys.DIRECTION, Directions.OUT);
        q.eq(HugeKeys.LABEL, authored.id());
        q.eq(HugeKeys.SORT_VALUES, "");
        q.eq(HugeKeys.OTHER_VERTEX, IdGenerator.of(book1Id));

        Iterator<Edge> edges2 = graph.edges(q);
        assert edges2.hasNext();
        System.out.println(">>>> queryEdges(id-condition): " +
                           edges2.hasNext());
        while (edges2.hasNext()) {
            System.out.println(">>>> queryEdges(id-condition): " +
                               edges2.next());
        }

        // NOTE: query edge by has-key just supported by Cassandra
        if (graph.backendStoreFeatures().supportsQueryWithContainsKey()) {
            PropertyKey contribution = graph.propertyKey("contribution");
            q.key(HugeKeys.PROPERTIES, contribution.id());
            Iterator<Edge> edges3 = graph.edges(q);
            assert edges3.hasNext();
            System.out.println(">>>> queryEdges(contribution): " +
                               edges3.hasNext());
            while (edges3.hasNext()) {
                System.out.println(">>>> queryEdges(contribution): " +
                                   edges3.next());
            }
        }

        // query by vertex label
        vertices = graph.traversal().V().hasLabel("book");
        size = vertices.toList().size();
        assert size == 5;
        System.out.println(">>>> query all books: size=" + size);

        // query by vertex label and key-name
        vertices = graph.traversal().V().hasLabel("person").has("age");
        size = vertices.toList().size();
        assert size == 5;
        System.out.println(">>>> query all persons with age: size=" + size);

        // query by vertex props
        vertices = graph.traversal().V().hasLabel("person")
                        .has("city", "Taipei");
        vertexList = vertices.toList();
        assert vertexList.size() == 1;
        System.out.println(">>>> query all persons in Taipei: " + vertexList);

        vertices = graph.traversal().V().hasLabel("person").has("age", 19);
        vertexList = vertices.toList();
        assert vertexList.size() == 1;
        System.out.println(">>>> query all persons age==19: " + vertexList);

        vertices = graph.traversal().V().hasLabel("person")
                        .has("age", P.lt(19));
        vertexList = vertices.toList();
        assert vertexList.size() == 1;
        assert vertexList.get(0).property("age").value().equals(3);
        System.out.println(">>>> query all persons age<19: " + vertexList);

        String addr = "Bay Area";
        vertices = graph.traversal().V().hasLabel("author")
                        .has("lived", Text.contains(addr));
        vertexList = vertices.toList();
        assert vertexList.size() == 1;
        System.out.println(String.format(">>>> query all authors lived %s: %s",
                           addr, vertexList));
    }

劃重點

查詢指定label的實體:
 vertices = graph.traversal().V().hasLabel("person");
 size = vertices.toList().size();
根據primary-values查詢實體:
 vertices = graph.traversal().V().hasLabel("author").has("id", 1);
        List<Vertex> vertexList = vertices.toList();
查詢edge:

查詢所有edge:

GraphTraversal<Edge, Edge> edges = graph.traversal().E().limit(2);

根據ID查詢edge:

	EdgeLabel authored = graph.edgeLabel("authored");
    VertexLabel book = graph.schema().getVertexLabel("book");
    String book1Id = String.format("%s:%s", book.id().asString(), "java-1");
    String book2Id = String.format("%s:%s", book.id().asString(), "java-2");

    String edgeId = String.format("S%s>%s>%s>S%s",
                                  authorId, authored.id(), "", book2Id);
    edges = graph.traversal().E(edgeId);

注意,edge的id由幾個欄位拼接起來的: "S%s>%s>%s>S%s",authorId, authored.id(), "", book2Id)

根據條件查詢edge:

 q = new ConditionQuery(HugeType.EDGE);
        q.eq(HugeKeys.OWNER_VERTEX, IdGenerator.of(authorId));
        q.eq(HugeKeys.DIRECTION, Directions.OUT);
        q.eq(HugeKeys.LABEL, authored.id());
        q.eq(HugeKeys.SORT_VALUES, "");
        q.eq(HugeKeys.OTHER_VERTEX, IdGenerator.of(book1Id));

        Iterator<Edge> edges2 = graph.edges(q);
        assert edges2.hasNext();
        System.out.println(">>>> queryEdges(id-condition): " +
                           edges2.hasNext());
        while (edges2.hasNext()) {
            System.out.println(">>>> queryEdges(id-condition): " +
                               edges2.next());
        }

可以指定DIRECTION,

5. 刪除

刪除Vetex,呼叫vetex自帶的remove方法

       // remove vertex (and its edges)
        List<Vertex> vertices = graph.traversal().V().hasLabel("person")
                                     .has("age", 19).toList();
        assert vertices.size() == 1;
        Vertex james = vertices.get(0);
        Vertex book6 = graph.addVertex(T.label, "book", "name", "java-6");
        james.addEdge("look", book6, "timestamp", "2017-5-2 12:00:08.0");
        james.addEdge("look", book6, "timestamp", "2017-5-3 12:00:08.0");
        graph.tx().commit();
        assert graph.traversal().V(book6.id()).bothE().hasNext();
        System.out.println(">>>> removing vertex: " + james);
        james.remove();
        graph.tx().commit();
        assert !graph.traversal().V(james.id()).hasNext();
        assert !graph.traversal().V(book6.id()).bothE().hasNext();

    

刪除關係,也類似:

    // remove edge
        VertexLabel author = graph.schema().getVertexLabel("author");
        String authorId = String.format("%s:%s", author.id().asString(), "11");
        EdgeLabel authored = graph.edgeLabel("authored");
        VertexLabel book = graph.schema().getVertexLabel("book");
        String book2Id = String.format("%s:%s", book.id().asString(), "java-2");

        String edgeId = String.format("S%s>%s>%s>S%s",
                                      authorId, authored.id(), "", book2Id);

        List <Edge> edges = graph.traversal().E(edgeId).toList();
        assert edges.size() == 1;
        Edge edge = edges.get(0);
        System.out.println(">>>> removing edge: " + edge);
        edge.remove();
        graph.tx().commit();
        assert !graph.traversal().E(edgeId).hasNext();

小結

本文初步介紹了hugegraph設計理念、基本使用等。


作者:Jadepeng
出處:jqpeng的技術記事本--http://www.cnblogs.com/xiaoqi
您的支援是對博主最大的鼓勵,感謝您的認真閱讀。
本文版權歸作者所有,歡迎轉載,但未經作者同意必須保留此段宣告,且在文章頁面明顯位置給出原文連線,否則保留追究法律責任的權利。