Hadoop提交Job Client端原始碼分析

阿新 • • 發佈：2019-01-11

在之前分析了hadoop執行jar的流程分析（部落格連結http://blog.csdn.net/a822631129/article/details/50310903），分析到了執行使用者寫的mapreduce程式，本文分析mapreduce程式中hadoop client端是如何提交job的。
主要涉及的五個java類檔案：
hadoop-mapreduce-client-core下的包org.apache.hadoop.mapreduce：
Job.java、JobSubmitter.java
hadoop-mapreduce-client-jobclient下的包org.apache.hadoop.mapred：
YARNRunner.java、ResourceMgrDelegate.java
hadoop-yarn-project下的包org.apache.hadoop.yarn.client.api.impl：

YarnClientImpl.java

1.hadoop wordcount程式：

public class WordCount {  
  
    public static class WordCountMap extends  
            Mapper<LongWritable, Text, Text, IntWritable> {  
  
        private final IntWritable one = new IntWritable(1);  
        private Text word = new Text();  
  
        public void map(LongWritable key, Text value, Context context)  
                throws IOException, InterruptedException {  
            String line = value.toString();  
            StringTokenizer token = new StringTokenizer(line);  
            while (token.hasMoreTokens()) {  
                word.set(token.nextToken());  
                context.write(word, one);  
            }  
        }  
    }  
  
    public static class WordCountReduce extends  
            Reducer<Text, IntWritable, Text, IntWritable> {  
  
        public void reduce(Text key, Iterable<IntWritable> values,  
                Context context) throws IOException, InterruptedException {  
            int sum = 0;  
            for (IntWritable val : values) {  
                sum += val.get();  
            }  
            context.write(key, new IntWritable(sum));  
        }  
    }  
  
    public static void main(String[] args) throws Exception {  
        Configuration conf = new Configuration();  
        Job job = new Job(conf);  
        job.setJarByClass(WordCount.class);  
        job.setJobName("wordcount");  
  
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(IntWritable.class);  
  
        job.setMapperClass(WordCountMap.class);  
        job.setReducerClass(WordCountReduce.class);  
  
        job.setInputFormatClass(TextInputFormat.class);  
        job.setOutputFormatClass(TextOutputFormat.class);  
  
        FileInputFormat.addInputPath(job, new Path(args[0]));  
        FileOutputFormat.setOutputPath(job, new Path(args[1]));  
  
        job.waitForCompletion(true);  
    }  
}

2.提交程式呼叫了Job中的waitForCompletion()函式

/**
   * Submit the job to the cluster and wait for it to finish.
   * @param verbose print the progress to the user
   * @return true if the job succeeded
   * @throws IOException thrown if the communication with the 
   *         <code>JobTracker</code> is lost
   */
  public boolean waitForCompletion(boolean verbose
                                   ) throws IOException, InterruptedException,
                                            ClassNotFoundException {
    if (state == JobState.DEFINE) {
      submit();
    }
    if (verbose) {
      monitorAndPrintJob();
    } else {
      // get the completion poll interval from the client.
      int completionPollIntervalMillis = 
        Job.getCompletionPollInterval(cluster.getConf());
      while (!isComplete()) {
        try {
          Thread.sleep(completionPollIntervalMillis);
        } catch (InterruptedException ie) {
        }
      }
    }
    return isSuccessful();
  }

if判斷state == JobState.DEFINE中變數state已初始化為JobState.DEFINE，所以執行submit提交Job，在下步中詳細分析submit函式。
verbose為true，monitorAndPrintJob監測job執行情況並列印相應資訊，不詳細分析；若verbose為false，自身進入迴圈，以一定的時間間隔輪詢檢查所提交的Job是是否執行完成。如果執行完成，跳出迴圈，呼叫isSuccessful()函式返回執行後的狀態。

3.waitForCompletion()中的submit()函式

  /**
   * Submit the job to the cluster and return immediately.
   * @throws IOException
   */
  public void submit() throws IOException, InterruptedException, ClassNotFoundException {
    ensureState(JobState.DEFINE);
    setUseNewAPI();
    connect();
    final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException {
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
    state = JobState.RUNNING;
    LOG.info("The url to track the job: " + getTrackingURL());
   }

ensureState(JobState.DEFINE)校驗job狀態；
setUseNewAPI()設定一些api（mapred.input.format.class、mapred.partitioner.class、mapred.output.format.class等）的使用，預設使用使用hadoop2中api；
connect()獲取需的呼叫協議(ClientProtocol)資訊，連線資訊，最後寫入Cluster物件中;
然後呼叫JobSubmitter類下的submitJobInternal()函式，下步詳細分析；
將state設為RUNNING。

4.JobSubmitter類下的submitJobInternal()函式

 /**
   * Internal method for submitting jobs to the system.
   * 
   * <p>The job submission process involves:
   * <ol>
   *   <li>
   *   Checking the input and output specifications of the job.
   *   </li>
   *   <li>
   *   Computing the {@link InputSplit}s for the job.
   *   </li>
   *   <li>
   *   Setup the requisite accounting information for the 
   *   {@link DistributedCache} of the job, if necessary.
   *   </li>
   *   <li>
   *   Copying the job's jar and configuration to the map-reduce system
   *   directory on the distributed file-system. 
   *   </li>
   *   <li>
   *   Submitting the job to the <code>JobTracker</code> and optionally
   *   monitoring it's status.
   *   </li>
   * </ol></p>
   * @param job the configuration to submit
   * @param cluster the handle to the Cluster
   * @throws ClassNotFoundException
   * @throws InterruptedException
   * @throws IOException
   */
  JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {

    //validate the jobs output specs 
    checkSpecs(job);

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost();
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();
      submitHostName = ip.getHostName();
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();
    job.setJobID(jobId);
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }

      copyAndConfigureFiles(job, submitJobDir);
      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());
      if (status != null) {
        return status;
      } else {
        throw new IOException("Could not launch job");
      }
    } finally {
      if (status == null) {
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);

      }
    }
  }

檢驗輸出引數，獲取配置資訊和提交Job主機的地址，確定jobId，確定job submit目錄，設定一些引數
生成金鑰用於shuffle傳輸認證
拷貝所需的files，libjars，archives，jobJar（wordcount程式jar包）
job進行分片，並獲取分片數量，用於確定map的數量
設定job提交佇列
將配置寫到job submit目錄
呼叫YARNRunner類下的submitJob()函式，提交Job，傳入相應引數(JobID，job submit目錄，Credentials)。
等待submit()執行返回Job執行狀態，最後刪除相應的工作目錄。

5.YARNRunner類下的submitJob()函式

public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
  throws IOException, InterruptedException {
    
    addHistoryToken(ts);
    
    // Construct necessary information to start the MR AM
    ApplicationSubmissionContext appContext =
      createApplicationSubmissionContext(conf, jobSubmitDir, ts);

    // Submit to ResourceManager
    try {
      ApplicationId applicationId =
          resMgrDelegate.submitApplication(appContext);

      ApplicationReport appMaster = resMgrDelegate
          .getApplicationReport(applicationId);
      String diagnostics =
          (appMaster == null ?
              "application report is null" : appMaster.getDiagnostics());
      if (appMaster == null
          || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
          || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
        throw new IOException("Failed to run job : " +
            diagnostics);
      }
      return clientCache.getClient(jobId).getJobStatus(jobId);
    } catch (YarnException e) {
      throw new IOException(e);
    }
  }

初始化Application上下文資訊，上下文資訊包括MRAppMaster所需要的記憶體、CPU，jobJar，jobConf，資料split，執行MRAppMaster的命令
然後呼叫ResourceMgrDelegate的submitApplication()方法將application提交到ResourceManager，同時傳入Application上下文資訊，提交Job到ResourceManager，函式執行最後返回已生成的ApplicationId(實際生成JobID的時候ApplicationId就已經生成)。
最後返回Job此時的狀態

6.ResourceMgrDelegate類下的submitApplication()函式：

public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
return client.submitApplication(appContext);
}
client.submitApplication(appContext);client物件是YarnClient,找到YarnClient的實現YarnClientImpl中的submitApplication方法

YarnClientImpl中的submitApplication（）函式：
設定ApplicationId
封裝提交Application請求，將上下文資訊設定進去。
增加安全許可權認證一些東西。
rmClient.submitApplication 用Hadoop RPC遠端呼叫ResourcesManager端的ClientRMService類下的submitApplication()方法
定時獲取Application狀態，當Application狀態為NEW或NEW_SAVING時，Application提交成功，或是在限定時間內一直沒有提交成功就報超時錯誤。若是獲取不到Application資訊，就再一次用RPC遠端呼叫提交Application。

public ApplicationId submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
ApplicationId applicationId = appContext.getApplicationId();
if (applicationId == null) {
throw new ApplicationIdNotProvidedException(
"ApplicationId is not provided in ApplicationSubmissionContext");
}
SubmitApplicationRequest request =
Records.newRecord(SubmitApplicationRequest.class);
request.setApplicationSubmissionContext(appContext);

// Automatically add the timeline DT into the CLC
// Only when the security and the timeline service are both enabled
if (isSecurityEnabled() && timelineServiceEnabled) {
addTimelineDelegationToken(appContext.getAMContainerSpec());
}

//TODO: YARN-1763:Handle RM failovers during the submitApplication call.
rmClient.submitApplication(request);

int pollCount = 0;
long startTime = System.currentTimeMillis();

while (true) {
try {
YarnApplicationState state =
getApplicationReport(applicationId).getYarnApplicationState();
if (!state.equals(YarnApplicationState.NEW) &&
!state.equals(YarnApplicationState.NEW_SAVING)) {
LOG.info("Submitted application " + applicationId);
break;
}

long elapsedMillis = System.currentTimeMillis() - startTime;
if (enforceAsyncAPITimeout() &&
elapsedMillis >= asyncApiPollTimeoutMillis) {
throw new YarnException("Timed out while waiting for application " +
applicationId + " to be submitted successfully");
}

// Notify the client through the log every 10 poll, in case the client
// is blocked here too long.
if (++pollCount % 10 == 0) {
LOG.info("Application submission is not finished, " +
"submitted application " + applicationId +
" is still in " + state);
}
try {
Thread.sleep(submitPollIntervalMillis);
} catch (InterruptedException ie) {
LOG.error("Interrupted while waiting for application "
+ applicationId
+ " to be successfully submitted.");
}
} catch (ApplicationNotFoundException ex) {
// FailOver or RM restart happens before RMStateStore saves
// ApplicationState
LOG.info("Re-submit application " + applicationId + "with the " +
"same ApplicationSubmissionContext");
rmClient.submitApplication(request);
}
}
return applicationId;
}

7.至此Job已提交到ResourceManager，提交Job Client端工作已經完成，server端就複雜了，在以後的部落格裡再做分析。

Hadoop提交Job Client端原始碼分析

Hadoop提交Job Client端原始碼分析

Flink的Job啟動Driver端(原始碼分析)

Flink的Job啟動JobManager端(原始碼分析)

Flink的Job啟動TaskManager端(原始碼分析)

MQTT再學習 -- MQTT 客戶端原始碼分析

RabbitMQ客戶端原始碼分析之BlockingCell.md

RabbitMQ客戶端原始碼分析(三)之Command

RabbitMQ客戶端原始碼分析(五)之ConsumerWorkSerivce與WorkPool

RabbitMQ客戶端原始碼分析(六)之IntAllocator

RabbitMQ客戶端原始碼分析(七)之Channel與ChannelManager

開源中國APP Android端原始碼分析系列（一）

RabbitMQ客戶端原始碼分析(九)之RPC請求響應

Hadoop中Mapper過程的原始碼分析

shuffle的關鍵階段sort(Map端和Reduce端)原始碼分析

Zookeeper客戶端原始碼分析

hbase客戶端原始碼分析--deletetable

hbase客戶端原始碼分析--put流程

Zookeeper客戶端原始碼分析（一）建立連線

MapReduce-提交job原始碼分析

Cat原始碼分析（一）：Client端

Hadoop提交Job Client端原始碼分析

相關推薦