Giraph源代碼分析（六）——Edge 分析

阿新 • • 發佈：2018-01-13

available align put and mark lin cer mutable oba

HamaWhite 原創，轉載請註明出處。歡迎大家增加Giraph 技術交流群： 228591158

歡迎訪問：西北工業大學 - 大數據與知識管理研究室（Northwestern Polytechnical University - BigData and Knowledge Management Lab），鏈接：http://wowbigdata.cn/。http://wowbigdata.net.cn/。http://wowbigdata.com.cn。

1. 在Vertex類中，頂點的存儲方式採用鄰接表形式。每一個頂點有 VertexId、VertexValue、OutgoingEdges和Halt，boolean型的halt變量用於記錄頂點的狀態，false時表示active，true表示inactive狀態。片段代碼例如以下：

/** Vertex id. */
  private I id;
  /** Vertex value. */
  private V value;
  /** Outgoing edges. */
  private OutEdges<I, E> edges;
  /** If true, do not do anymore computation on this vertex. */
  private boolean halt;
  /** Global graph state **/
  private GraphState<I, V, E, M> graphState;

2 org.apache.giraph.edge.Edge 接口，用於存儲頂點的邊。每條邊包括targetVertexId和edgeValue兩個屬性。類關系圖例如以下：

技術分享圖片

Giraph默認使用DefaultEdge類存儲邊，該類中有兩個變量： I targetVertexId和 E value。I為頂點ID的類型。E為邊的類型。註意。DefaultEdge類同一時候繼承ReusableEdge<I,E>接口。在ReusableEdge<I,E>類的定義中，有例如以下說明文字：

A complete edge, the target vertex and the edge value. Can only be one edge with a destination vertex id per edge map. This edge can be reused, that is you can set it‘s target vertex ID and edge value. Note: this class is useful for certain optimizations, but it‘s not meant to be exposed to the user. Look at MutableEdge instead.

從上述說明文字可知，edge能夠被重用，僅僅須要改動targetVertexId和value的值即可。即每一個Vertex若有多條出邊。僅僅會創建一個DefaultEdge對象來存儲邊。

3. org.apache.giraph.edge.OutEdges<I,E> 用於存儲每一個頂點的out-edges。從Vertex類的定義可知，頂點的每條邊都被存儲在OutEdges<I,E>類型的edge對象中。OutEdges<I,E>接口的關系圖例如以下：

技術分享圖片

Giraph默認的使用ByteArrayEdges<I,E>，每一個頂點的全部邊都被存儲在byte[ ]中。當頂點向它的出邊發送消息時，須要遍歷Vertex類中的edges對象。

演示樣例代碼例如以下：

//遍歷全部的邊。getEdges()返回的是Vertex中的edges對象，
//那麽該for循環會調用edges對象的iterator()方法，即調用ByteArrayEdges類中的iterator方法。
for (Edge<LongWritable, FloatWritable> edge : getEdges()) {
	//edge對象表示每條邊。默覺得DefaultEdge類型。

    double distance = minDist + edge.getValue().get();
    sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));
}

註意：由DefaultEdge的定義可知，遍歷getEdges時，返回的Edge對象時同一個對象。僅僅是該對象中值改變了。

以下繼續查看代碼來證明此觀點。

查看ByteArrayEdges類的iterator()方法，例如以下。

 @Override
  public Iterator<Edge<I, E>> iterator() {
    return new ByteArrayEdgeIterator();
  }

返回的是內部類ByteArrayEdgeIterator對象。定義例如以下：

 /**
   * Iterator that reuses the same Edge object.
   */
  private class ByteArrayEdgeIterator
      extends UnmodifiableIterator<Edge<I, E>> {
	 //extendedDataInput存儲全部Edge邊相應的字節
    /** Input for processing the bytes */
    private ExtendedDataInput extendedDataInput =
        getConf().createExtendedDataInput(
            serializedEdges, 0, serializedEdgesBytesUsed);
	//創建一個Edge對象，默認返回的是DefaultEdge對象。

    /** Representative edge object. */
    private ReusableEdge<I, E> representativeEdge =
        getConf().createReusableEdge();

    @Override
    public boolean hasNext() {
      return serializedEdges != null && extendedDataInput.available() > 0;
    }

    @Override
    public Edge<I, E> next() {
      try {
	    //核心：此處遍歷每條Edge時，都是從extendedDataInput讀入每天邊的數據存儲在representativeEdge對象中。
		//從此處就可知，每一個頂點的全部出邊僅僅有一個Edge對象， 遍歷時改動每條邊的數據的就可以
        WritableUtils.readEdge(extendedDataInput, representativeEdge);  
      } catch (IOException e) {
        throw new IllegalStateException("next: Failed on pos " +
            extendedDataInput.getPos() + " edge " + representativeEdge);
      }
      return representativeEdg
	}
  }

總結：當頂點的出度非常大時，此優化甚好，能非常好的節約內存。如UK-2005數據中，頂點的最大出度為 5213。

如果頂點1的出度頂點有<2 , 0.4>。<3 , 7.8> ，<5 , 6.4> 。

例如以下代碼：

//定義list列表用於存儲出度頂點的Id。
List<LongWritable> list=new ArrayList<LongWritable>();
for (Edge<LongWritable, FloatWritable> edge : getEdges()) {
	list.add(edge.getTargetVertexId());
	System.out.println(list);
}

輸出結果為：

[ 2 ]

[ 3 , 3 ]

[ 5 , 5 , 5 ]

並不是是希望的 [ 2 , 3 , 5 ]

完。

本人原創，轉載請註明出處！

本人QQ：530422429。歡迎大家指正、討論。

Giraph源代碼分析（六）——Edge 分析

available align put and mark lin cer mutable oba HamaWhite 原創，轉載請註明出處。歡迎大家增加Giraph 技術交流群： 228591158 歡迎訪問：

Giraph源代碼分析（六）——Edge 分析

Giraph源代碼分析（六）——Edge 分析

jQuery源代碼解析（1）—— jq基礎、data緩存系統

OSChinaclient源代碼學習（3）--輪詢機制的實現

jQuery源代碼解析（3）—— ready載入、queue隊列

SpringMVC源代碼學習（二）FrameworkServlet內處理請求的流程

SDWebImage源代碼解析（二）

數據庫路由中間件MyCat - 源代碼篇（1）

數據庫路由中間件MyCat - 源代碼篇（9）

數據庫路由中間件MyCat - 源代碼篇（8）

數據庫路由中間件MyCat - 源代碼篇（16）

數據庫路由中間件MyCat - 源代碼篇（17）

lua源代碼學習（一）lua的c api外圍實現

Giraph源代碼分析（九）—— Aggregators 原理解析

Spring 源碼分析（六）--bean的加載整體分析

Java B2B2C多用戶商城 springcloud架構- 企業雲架構common-service代碼結構分析（六）

bleve源碼閱讀（一）目錄分析

區塊鏈教程Fabric1.0源代碼gRPC（Fabric中註冊的gRPC Service）一

下載android4.4.2源代碼全過程（附已下載的源代碼）

jdbc連接數據庫以及crud（簡單易懂，本人親測可用有源代碼和數據庫）

巧用Notepad++插件：JS代碼格式化（JSToolNpp）

Giraph源代碼分析（六）——Edge 分析

相關推薦