1. 程式人生 > >YARN任務監控介面Aggregate Resource Allocation指標解析

YARN任務監控介面Aggregate Resource Allocation指標解析

YARN的原生任務監控介面中,我們經常能看到Aggregate Resource Allocation這個指標(圖中高亮選中部分),這個指標表示的是任務每秒消耗的記憶體和CPU數量:

Aggregate Resource Allocation是在org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt類中進行計算的,主要邏輯如下:

  // 資源資訊更新間隔:3秒
  private static final long MEM_AGGREGATE_ALLOCATION_CACHE_MSECS = 3000;
  // 最後更新時間、最後更新時的每秒的記憶體和CPU使用量
  protected long lastMemoryAggregateAllocationUpdateTime = 0;
  private long lastMemorySeconds = 0;
  private long lastVcoreSeconds = 0;

 /**
   * 返回與任務關聯的所有的container每秒消耗的CPU和記憶體資源數量
   * @return
   */
  synchronized AggregateAppResourceUsage getRunningAggregateAppResourceUsage() {
    long currentTimeMillis = System.currentTimeMillis();
    // Don't walk the whole container list if the resources were computed
    // recently.
    // 判斷是否達到更新條件:當前時間 - 最後更新時間 > 最大更新間隔(3秒)
    if ((currentTimeMillis - lastMemoryAggregateAllocationUpdateTime)
        > MEM_AGGREGATE_ALLOCATION_CACHE_MSECS) {
      long memorySeconds = 0;
      long vcoreSeconds = 0;
      // 迭代所有的container,計算每個container每秒所消耗的資源(記憶體、CPU)
      for (RMContainer rmContainer : this.liveContainers.values()) {
        // 獲取container的執行時間
        long usedMillis = currentTimeMillis - rmContainer.getCreationTime(); 
        // 計算container每秒所消耗的資源(記憶體、CPU)
        Resource resource = rmContainer.getContainer().getResource();
        // 彙總記憶體和CPU使用量
        memorySeconds += resource.getMemory() * usedMillis /  
            DateUtils.MILLIS_PER_SECOND;
        vcoreSeconds += resource.getVirtualCores() * usedMillis  
            / DateUtils.MILLIS_PER_SECOND;
      }
      
      // 記錄最後更新任務資源使用情況的時間、任務最後每秒使用的記憶體和CPU數量
      lastMemoryAggregateAllocationUpdateTime = currentTimeMillis;
      lastMemorySeconds = memorySeconds;
      lastVcoreSeconds = vcoreSeconds;
    }
    return new AggregateAppResourceUsage(lastMemorySeconds, lastVcoreSeconds);
  }

  /**
   * 返回任務使用的資源情況
   * @return
   */
  public synchronized ApplicationResourceUsageReport getResourceUsageReport() {
    AggregateAppResourceUsage resUsage = getRunningAggregateAppResourceUsage();
    // 返回任務所使用的資源情況:所使用的container數量、預留的container數量、當前消耗的資源、當前預留的資源、所需的總資源(當前消耗的資源+當前預留的資源)、每秒的記憶體和CPU使用量
    return ApplicationResourceUsageReport.newInstance(liveContainers.size(),
               reservedContainers.size(), Resources.clone(currentConsumption),
               Resources.clone(currentReservation),
               Resources.add(currentConsumption, currentReservation),
               resUsage.getMemorySeconds(), resUsage.getVcoreSeconds());
  }

getResourceUsageReport方法是一個用synchronized關鍵字修飾的同步方法,被在org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler類的getAppResourceUsageReport方法中呼叫。因此,synchronized關鍵字在這裡起的是物件鎖的作用,保證在同一時刻多個執行緒更新任務資源使用資訊時,不會產生併發更新問題。

 @Override
  public ApplicationResourceUsageReport getAppResourceUsageReport(
      ApplicationAttemptId appAttemptId) {
    SchedulerApplicationAttempt attempt = getApplicationAttempt(appAttemptId);
    if (attempt == null) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("Request for appInfo of unknown attempt " + appAttemptId);
      }
      return null;
    }
    return attempt.getResourceUsageRe