Giraph原始碼分析（八）—— 統計每個SuperStep中參與計算的頂點數目

作者|白松

目的：科研中，需要分析在每次迭代過程中參與計算的頂點數目，來進一步優化系統。比如，在SSSP的compute()方法最後一行，都會把當前頂點voteToHalt，即變為InActive狀態。所以每次迭代完成後，所有頂點都是InActive狀態。在大同步後，收到訊息的頂點會被啟用，變為Active狀態，然後呼叫頂點的compute()方法。本文的目的就是統計每次迭代過程中，參與計算的頂點數目。下面附上SSSP的compute()方法：

@Override
  public void compute(Iterable messages) {
    if (getSuperstep() == 0) {
      setValue(new DoubleWritable(Double.MAX_VALUE));
    }
    double minDist = isSource() ? 0d : Double.MAX_VALUE;
    for (DoubleWritable message : messages) {
      minDist = Math.min(minDist, message.get());
    }
    if (minDist < getValue().get()) {
      setValue(new DoubleWritable(minDist));
      for (Edge edge : getEdges()) {
        double distance = minDist + edge.getValue().get();
        sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));
      }
    }
	//把頂點置為InActive狀態
    voteToHalt();
  }

附：giraph中演算法的終止條件是：沒有活躍頂點且worker間沒有訊息傳遞。

hama-0.6.0中演算法的終止條件只是：判斷是否有活躍頂點。不是真正的pregel思想，半成品。

修改過程如下：

org.apache.giraph.partition. PartitionStats 類

新增變數和方法，用來統計每個Partition在每個超步中參與計算的頂點數目。新增的變數和方法如下：

/** computed vertices in this partition */
private long computedVertexCount=0;
 
/**
* Increment the computed vertex count by one.
*/
public void incrComputedVertexCount() {
    ++ computedVertexCount;
}
 
/**
 * @return the computedVertexCount
 */
public long getComputedVertexCount() {
	return computedVertexCount;
}

修改readFields()和write()方法，每個方法追加最後一句。當每個Partition計算完成後，會把自己的computedVertexCount傳送給Master，Mater再讀取彙總。

@Override
public void readFields(DataInput input) throws IOException {
    partitionId = input.readInt();
    vertexCount = input.readLong();
    finishedVertexCount = input.readLong();
    edgeCount = input.readLong();
    messagesSentCount = input.readLong();
    //新增下條語句
    computedVertexCount=input.readLong();
}
 
@Override
public void write(DataOutput output) throws IOException {
    output.writeInt(partitionId);
    output.writeLong(vertexCount);
    output.writeLong(finishedVertexCount);
    output.writeLong(edgeCount);
    output.writeLong(messagesSentCount);
    //新增下條語句
    output.writeLong(computedVertexCount);
}

org.apache.giraph.graph. GlobalStats 類

新增變數和方法，用來統計每個超步中參與計算的頂點總數目，包含每個Worker上的所有Partitions。

 /** computed vertices in this partition 
  *  Add by BaiSong 
  */
  private long computedVertexCount=0;
	 /**
	 * @return the computedVertexCount
	 */
	public long getComputedVertexCount() {
		return computedVertexCount;
	}

修改addPartitionStats(PartitionStats partitionStats)方法，增加統計computedVertexCount功能。

/**
  * Add the stats of a partition to the global stats.
  *
  * @param partitionStats Partition stats to be added.
  */
  public void addPartitionStats(PartitionStats partitionStats) {
    this.vertexCount += partitionStats.getVertexCount();
    this.finishedVertexCount += partitionStats.getFinishedVertexCount();
    this.edgeCount += partitionStats.getEdgeCount();
    //Add by BaiSong，新增下條語句
    this.computedVertexCount+=partitionStats.getComputedVertexCount();
 }

當然為了Debug方便，也可以修改該類的toString()方法（可選），修改後的如下：

public String toString() {
		return "(vtx=" + vertexCount + ", computedVertexCount="
				+ computedVertexCount + ",finVtx=" + finishedVertexCount
				+ ",edges=" + edgeCount + ",msgCount=" + messageCount
				+ ",haltComputation=" + haltComputation + ")";
	}

org.apache.giraph.graph. ComputeCallable<I,V,E,M>

新增統計功能。在computePartition()方法中，新增下面一句。

if (!vertex.isHalted()) {
        context.progress();
        TimerContext computeOneTimerContext = computeOneTimer.time();
        try {
            vertex.compute(messages);
	    //新增下面一句，當頂點呼叫完compute()方法後，就把該Partition的computedVertexCount加1
            partitionStats.incrComputedVertexCount();
        } finally {
           computeOneTimerContext.stop();
        }
……

新增Counters統計，和我的部落格Giraph原始碼分析（七）—— 新增訊息統計功能類似，此處不再詳述。新增的類為：org.apache.giraph.counters.GiraphComputedVertex，下面附上該類的原始碼：

package org.apache.giraph.counters;
 
import java.util.Iterator;
import java.util.Map;
 
import org.apache.hadoop.mapreduce.Mapper.Context;
import com.google.common.collect.Maps;
 
/**
 * Hadoop Counters in group "Giraph Messages" for counting every superstep
 * message count.
 */
 
public class GiraphComputedVertex extends HadoopCountersBase {
	/** Counter group name for the giraph Messages */
	public static final String GROUP_NAME = "Giraph Computed Vertex";
 
	/** Singleton instance for everyone to use */
	private static GiraphComputedVertex INSTANCE;
 
	/** superstep time in msec */
	private final Map superstepVertexCount;
 
	private GiraphComputedVertex(Context context) {
		super(context, GROUP_NAME);
		superstepVertexCount = Maps.newHashMap();
	}
 
	/**
	 * Instantiate with Hadoop Context.
	 * 
	 * @param context
	 *            Hadoop Context to use.
	 */
	public static void init(Context context) {
		INSTANCE = new GiraphComputedVertex(context);
	}
 
	/**
	 * Get singleton instance.
	 * 
	 * @return singleton GiraphTimers instance.
	 */
	public static GiraphComputedVertex getInstance() {
		return INSTANCE;
	}
 
	/**
	 * Get counter for superstep messages
	 * 
	 * @param superstep
	 * @return
	 */
	public GiraphHadoopCounter getSuperstepVertexCount(long superstep) {
		GiraphHadoopCounter counter = superstepVertexCount.get(superstep);
		if (counter == null) {
			String counterPrefix = "Superstep: " + superstep+" ";
			counter = getCounter(counterPrefix);
			superstepVertexCount.put(superstep, counter);
		}
		return counter;
	}
 
	@Override
	public Iterator iterator() {
		return superstepVertexCount.values().iterator();
	}
}

實驗結果，執行程式後。會在終端輸出每次迭代參與計算的頂點總數目。測試SSSP（SimpleShortestPathsVertex類），輸入圖中共有9個頂點和12條邊。輸出結果如下：

上圖測試中，共有6次迭代。紅色框中，顯示出了每次迭代過沖參與計算的頂點數目，依次是：9,4,4,3,4,0

解釋：在第0個超步，每個頂點都是活躍的，所有共有9個頂點參與計算。在第5個超步，共有0個頂點參與計算，那麼就不會向外傳送訊息，加上每個頂點都是不活躍的，所以演算法迭代終止。

【閱讀更多文章請訪問

Giraph原始碼分析（八）—— 統計每個SuperStep中參與計算的頂點數目

Giraph原始碼分析（八）—— 統計每個SuperStep中參與計算的頂點數目

mochiweb原始碼分析（八）

Glide原始碼分析（八），Glide的自定義模組擴充套件與實踐

Giraph原始碼分析（二）—啟動Master/Worker服務

Giraph原始碼分析（四）—— Master 如何檢查Worker啟動成功

Giraph原始碼分析（三）—— 訊息通訊

Giraph 原始碼分析（五）—— 載入資料+同步總結

Tomcat原始碼分析（八）----- HTTP請求處理過程（一）

mybatis 原始碼分析（八）ResultSetHandler 詳解

Netty原始碼分析（八）----- write過程原始碼分析

Spring 源碼分析（八）--容器的功能擴展

Spring源碼分析（八）AbstractBeanDefinition屬性

Android ADB 原始碼分析（三）

Mybatis 原始碼分析（2）—— 引數處理

Mybatis 原始碼分析（9）—— 事物管理

Mybatis 原始碼分析（8）—— 一二級快取

Mybatis原始碼分析（7）—— 結果集處理

Mybatis原始碼分析（6）—— 從JDBC看Mybatis的設計

Mybatis原始碼分析（5）—— 外掛的原理

Mybatis原始碼分析（4）—— Mapper的建立和獲取

Giraph原始碼分析（八）—— 統計每個SuperStep中參與計算的頂點數目

相關推薦