《Hadoop權威指南》學習筆記(三)
本博文是我學習《Hadoop權威指南》第5章的筆記,主要是裡面範例程式的實現,部分實現有修改
1 Mapper測試
需要使用mrunit這個jar包,在pom.xml新增dependency的時候,要新增classifier屬性不然下載不了jar包,根據自己hadoop-core的版本來確定
<dependency> <groupId>org.apache.mrunit</groupId> <artifactId>mrunit</artifactId> <version>1.1.0</version> <classifier>hadoop2</classifier> <scope>test</scope> </dependency>
編寫測試類,測試,一切從簡,你也可以嚴格按照書上的來,注意引用MapDriver的時候有兩個引用,一個是mapreduce一個是mapred,根據自己的Mapper類是哪個版本來,mapred是老版本
package com.tuan.hadoopLearn.io.com.tuan.hadoopLearn.mapreduce; import com.tuan.hadoopLearn.mapreduce.MaxTemperatureMapper; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.junit.jupiter.api.Test; import java.io.IOException; public class MaxTemperatureTest { @Test public void mapperTest() { Text input = new Text("1993 38"); try { new MapDriver<LongWritable, Text, Text, IntWritable>() .withMapper(new MaxTemperatureMapper()) .withInput(new LongWritable(), input) .withOutput(new Text("1993"), new IntWritable(38)) .runTest(); } catch (IOException e) { e.printStackTrace(); } } }
2 Reducer測試
在上面的類裡面再寫一個Reducer測試
@Test public void reducerTest() { try { new ReduceDriver<Text, IntWritable, Text, IntWritable>() .withReducer(new MaxTemperatureReducer()) .withInput(new Pair<>(new Text("1993"), Arrays.asList(new IntWritable(10), new IntWritable(5)))) .withOutput(new Text("1993"), new IntWritable(10)) .runTest(); } catch (IOException e) { e.printStackTrace(); } }
3 作業除錯
例如,在處理最高氣溫的程式中,插入計數器以檢測過大的異常輸入,在Mapper類中插入幾行程式碼,注意這裡書上有一行程式碼的括號有誤,我還奇怪列舉項怎麼increment
package com.tuan.hadoopLearn.mapreduce;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
enum Temperature {
OVER_100
}
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split(" ");
int temperature = Integer.parseInt(line[1]);
if (temperature > 100) {
context.setStatus("Detected possible corrupt input");
context.getCounter(Temperature.OVER_100).increment(1); //這裡書上有錯
}
context.write(new Text(line[0]), new IntWritable(temperature));
}
}
把input.txt後面加一條“1992 520”的異常記錄,執行一下這個MapReduce程式,還是熟悉的命令
hadoop jar hadoopLearn-0.0.1-SNAPSHOT.jar com.tuan.hadoopLearn.mapreduce.MaxTemperature /mapreduce/input.txt /mapreduce/output
在作業結束後,可以看到定義的OVER_100計數器的計數值為2,證明有兩個超過了100的異常輸入
在web端檢視一下historyserver,從下圖這個紅框的地方點進去,到了task介面找到mapper繼續點
最後來到一個介面,可以看到Status已經變成了檢測到異常輸入
還可以檢視Counter
4 效能調優
用Java提供的Hprof工具獲取執行過程中的效能引數
重新寫一個MaxTemperatureDriver,比之前的MaxTemperature多了一些Hprof的配置語句。一開始我的profile.out檔案除了說明資訊其他都是空的,最後發現是"mapreduce.task.profile.params"寫成了"mapreduce.task,profile.params",也是醉了
package com.tuan.hadoopLearn.mapreduce;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MaxTemperatureDriver extends Configured implements Tool {
@Override
public int run(String[] strings) throws Exception {
if (strings.length != 2) {
System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
Configuration conf = getConf();
conf.setBoolean("mapreduce.task.profile", true); //啟用分析工具
conf.set("mapreduce.task.profile.params", "-agentlib:hprof=cpu=samples,heap=sites,depth=6," +
"force=n,thread=y,verbose=n,file=%s"); //JVM的分析引數配置
conf.set("mapreduce.task.profile.maps", "0-2"); //分析的map任務id範圍
conf.set("mapreduce.task.profile.reduces", "0-2"); //分析的reduce任務id範圍
Job job = new Job(conf, "Max Temperature");
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(strings[0]));
FileOutputFormat.setOutputPath(job, new Path(strings[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new MaxTemperatureDriver(), args));
}
}
用熟悉的語句執行
hadoop jar hadoopLearn-0.0.1-SNAPSHOT.jar com.tua
n.hadoopLearn.mapreduce.MaxTemperatureDriver /mapreduce/input.txt /mapreduce/output
進Web端,如下地方點選檢視profile.out檔案
然後選擇最下面的userlogs,點選自己的應用,層層目錄下最終找到profile.out檔案,檔案很長,最後一段是統計了每個方法呼叫比例