1. 程式人生 > >hadoop jar xxxx.jar的流程

hadoop jar xxxx.jar的流程

jar -cvf xxx.jar .
hadopp jar xxx.jar clalss-name [input] [output]
----------------------------------------------------------------------
hadoop jar hadoop-0.20.2-examples.jar [class name]的實質是:1.利用hadoop這個指令碼啟動一個jvm程序;2.jvm程序去執行org.apache.hadoop.util.RunJar這個java類;3.org.apache.hadoop.util.RunJar解壓hadoop-0.20.2-examples.jar到hadoop.tmp.dir/hadoop-unjar*/目錄下;4.org.apache.hadoop.util.RunJar動態的載入並執行Main-Class或指定的Class;5.Main-Class或指定的Class中設定Job的各項屬性6.提交job到JobTracker上並監視執行情況。注意:以上都是在jobClient上執行的。執行jar檔案的時候,jar會被解壓到hadoop.tmp.dir/hadoop-unjar*/目錄下(如:/home/hadoop/hadoop-fs/dfs/temp/hadoop-unjar693919842639653083, 注意:這個目錄是JobClient的目錄,不是JobTracker的目錄)。解壓後的檔案為:drwxr-xr-x 2 hadoop hadoop 4096 Jul 30 15:40 META-INFdrwxr-xr-x 3 hadoop hadoop 4096 Jul 30 15:40 org有圖有真相:

提交job的實質是:生成${job-id}/job.xml檔案到hdfs://${mapred.system.dir}/(比如hdfs://bcn152:9990/home/hadoop/hadoop-fs/dfs/temp/mapred/system/job_201007301137_0012/job.xml),job的描述包括jar檔案的路徑,map|reduce類路徑等等.上傳${job-id}/job.jar檔案到hdfs://${mapred.system.dir}/(比如hdfs://bcn152:9990/home/hadoop/hadoop-fs/dfs/temp/mapred/system/job_201007301137_0012/job.jar)有圖有真相:

生成job之後,通過static JobClient.runJob()就會向jobTracker提交job:JobClient jc = new JobClient(job);RunningJob rj = jc.submitJob(job);之後JobTracker就會排程此job,提交job之後,使用下面的程式碼獲取job的進度:    try {      if (!jc.monitorAndPrintJob(job, rj)) {        throw new IOException("Job failed!");      }    } catch (InterruptedException ie) {      Thread.currentThread().interrupt();    }