1. 程式人生 > 其它 >在JAVA應用中遠端提交MapReduce程式至Hadoop叢集執行

在JAVA應用中遠端提交MapReduce程式至Hadoop叢集執行

由於在單獨的JAVA應用中,程式沒有指明叢集的一些配置資訊,導致程式不知道叢集的位置以及其他的一些資訊,故首先在配置類中,即Configuration,需要指明叢集的位置,配置程式碼如下:


Configuration conf = new Configuration(true);
conf.set("fs.default.name", "hdfs://master:9000");
conf.set("hadoop.job.user", "hadoop");
conf.set("mapreduce.framework.name", "yarn");
conf.set("mapreduce.jobtracker.address", "master:9001");
conf.set("yarn.resourcemanager.hostname", "master");
conf.set("mapreduce.jobhistory.address", "master:10020");
ToolRunner.run(conf, new MatrixMP(), null);
上述內容主要是配置Hadoop時的部分配置內容,它指明瞭叢集、HDFS、JOBTRACKER等的位置,最後通過ToolRunner.run()啟動執行,其中MatrixMP為我們執行的MapReduce類。下述為其他可能會用到的:

conf.set("yarn.resourcemanager.admin.address", "master:8033");
conf.set("yarn.resourcemanager.address", "master:8032");
conf.set("yarn.resourcemanager.resource-tracker.address", "master:8036");
conf.set("yarn.resourcemanager.scheduler.address", "master:8030");
conf.set("mapreduce.jobhistory.webapp.address", "master:19888");
conf.set("yarn.application.classpath", "/home/hadoop/hadoop/etc/hadoop,"
+"/home/hadoop/hadoop/share/hadoop/common/*,"
+"/home/hadoop/hadoop/share/hadoop/common/lib/*,"
+"/home/hadoop/hadoop/share/hadoop/hdfs/*,"
+"/home/hadoop/hadoop/share/hadoop/hdfs/lib/*,"
+"/home/hadoop/hadoop/share/hadoop/mapreduce/*,"
+"/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*,"
+"/home/hadoop/hadoop/share/hadoop/yarn/*,"
+"/home/hadoop/hadoop/share/hadoop/yarn/lib/*");
conf.set("mapreduce.application.classpath", "/home/hadoop/hadoop/etc/hadoop,"
+"/home/hadoop/hadoop/share/hadoop/common/*,"
+"/home/hadoop/hadoop/share/hadoop/common/lib/*,"
+"/home/hadoop/hadoop/share/hadoop/hdfs/*,"
+"/home/hadoop/hadoop/share/hadoop/hdfs/lib/*,"
+"/home/hadoop/hadoop/share/hadoop/mapreduce/*,"
+"/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*,"
+"/home/hadoop/hadoop/share/hadoop/yarn/*,"
+"/home/hadoop/hadoop/share/hadoop/yarn/lib/*");
在MatrixMP的run方法中,我們還需要額外呼叫下面的createTempJar(String root)方法,其作用是將class檔案打包成Jar檔案(在eclipse提交時用,在其他地方直接用程式碼return new File(System.getProperty("java.class.path"))返回將該程式打包成的Jar檔案 ),並在生成Job之後執行 ((JobConf) job .getConfiguration()).setJar( jarFile .toString());注意,在打包到叢集執行時,不要加這些程式碼(即hadoop jar XXX.jar時)。

public static File createTempJar(String root) throws IOException {
if (!new File(root).exists()) {
return new File(System.getProperty("java.class.path"));
}
Manifest manifest = new Manifest();
manifest.getMainAttributes().putValue("Manifest-Version", "1.0");
final File jarFile = File.createTempFile("EJob-", ".jar", new File(System.getProperty("java.io.tmpdir")));
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
jarFile.delete();
}
});
JarOutputStream out = new JarOutputStream(new FileOutputStream(jarFile), manifest);
createTempJarInner(out, new File(root), "");
out.flush();
out.close();
return jarFile;
}

private static void createTempJarInner(JarOutputStream out, File f,
String base) throws IOException {
if (f.isDirectory()) {
File[] fl = f.listFiles();
if (base.length() > 0) {
base = base + "/";
}
for (int i = 0; i < fl.length; i++) {
createTempJarInner(out, fl[i], base + fl[i].getName());
}
} else {
out.putNextEntry(new JarEntry(base));
FileInputStream in = new FileInputStream(f);
byte[] buffer = new byte[1024];
int n = in.read(buffer);
while (n != -1) {
out.write(buffer, 0, n);
n = in.read(buffer);
}
in.close();
}
}
在MatrixMP的run方法完整程式碼:

public int run(String[] args)throws IOException, ClassNotFoundException, InterruptedException{
File jarFile = createTempJar("bin");

Job job = new Job(getConf(), "MatrixMP");
job.setJarByClass(MatrixMP.class);

((JobConf) job.getConfiguration()).setJar(jarFile.toString());

FileInputFormat.addInputPath(job, new Path("hdfs://master:9000/left"));
FileInputFormat.addInputPath(job, new Path("hdfs://master:9000/right"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:9000/"+new Date().getTime()));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(MyFileInputFormat.class);
job.setOutputFormatClass(MyMultiFileOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.waitForCompletion(true);
return job.isSuccessful() ? 0 : 1;
}
在測試的時候,利用java jar XXX.jar遠端提交比在Eclipse中利用run on hadoop或java application遠端提交的時間 長3-5倍,未知其緣由以及解決方法。
上述內容參考:http://blog.csdn.net/fhx007/article/details/42050467

除了上述方法之外,還有一個更為簡單的方法,直接將Hadoop叢集中的配置檔案core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml,log4j.properties(輸出日誌)放到專案的src下面,但除了上述配置不新增外仍然需要將class檔案打包成jar檔案,即run方法中的程式碼不變。


————————————————
版權宣告:本文為CSDN博主「e小王同學V」的原創文章,遵循CC 4.0 BY-SA版權協議,轉載請附上原文出處連結及本宣告。
原文連結:https://blog.csdn.net/ping802363/article/details/78213292