Hadoop-使用MRUnit來寫單元測試
阿新 • • 發佈:2019-01-21
簡介
單元測試是用來對一個模組、一個函式或者一個類來進行正確性檢驗的測試工作。在MapReduce開發中,如果能對Mapper和Reducer進行詳盡的單元測試,將及早發現問題,加快開發進度。 本文結合具體的例子,簡單總結如何使用MRUnit來對Hadoop的Mapper和Reducer進行單元測試。本文的相關程式碼可以從Github獲取:https://github.com/liujinguang/hadoop-study.git
MRUnit介紹
在MapReduce中,map函式和reduce函式的獨立測試非常方便,這是由函式風格決定的。MRUnit(http://incubator.apache.org/mrunit/)是一個測試庫,它便於將已知的輸入傳遞給mapper或者檢查reducer的輸出是否符合預期。MRUnit與標準的執行框架(如JUnit)-起使用,因此可以將MapReduce作業的測試作為正常開發環境的一部分執行。
關於Mapper
MaxTemperatureMapper類實現了對固定格式字串中解析年份、溫度和空氣質量,在後面的MRUnit測試中,給出了字串的例子,可以參考。
package com.jliu.mr.intro; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus // signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } private static final int MISSING = 9999; }
使用MRUnit進行測試,首先需要建立MapDriver物件,並設定要測試的Mapper類,設定輸入、期望輸出。具體例子中傳遞一個天氣記錄作為mapper的輸入,然後檢查輸出是否是讀入的年份和氣溫。如果沒有期望的輸出值,MRUnit測試失敗。
package com.jliu.mr.mrunit; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Counters; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.junit.Test; import com.jliu.mr.intro.MaxTemperatureMapper; public class MaxTemperatureMapperTest { @Test public void testParsesValidRecord() throws IOException { Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" + // ++++++++++++++++++++++++++++++year ^^^^ "99999V0203201N00261220001CN9999999N9-00111+99999999999"); // ++++++++++++++++++++++++++++++temperature ^^^^^ // 由於測試的mapper,所以適用MRUnit的MapDriver new MapDriver<LongWritable, Text, Text, IntWritable>() // 配置mapper .withMapper(new MaxTemperatureMapper()) // 設定輸入值 .withInput(new LongWritable(0), value) // 設定期望輸出:key和value .withOutput(new Text("1950"), new IntWritable(-11)).runTest(); } @Test public void testParseMissingTemperature() throws IOException { // 根據withOutput()被呼叫的次數, MapDriver能用來檢查0、1或多個輸出記錄。 // 在這個測試中由於缺失的溫度記錄已經被過濾,保證對這種特定輸入不產生任何輸出 Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" + // ++++++++++++++++++++++++++++++Year ^^^^ "99999V0203201N00261220001CN9999999N9+99991+99999999999"); // ++++++++++++++++++++++++++++++Temperature ^^^^^ new MapDriver<LongWritable, Text, Text, IntWritable>() .withMapper(new MaxTemperatureMapper()) .withInput(new LongWritable(0), value) .runTest(); } }
關於Reducer
結合上面的Mapper,reducer必須找出指定鍵的最大值。
package com.jliu.mr.intro;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
對Reducer的測試,與Mapper類似,參考下面的具體測試例:package com.jliu.mr.mrunit;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import java.io.IOException;
import java.util.Arrays;
import org.apache.hadoop.io.*;
import org.junit.Test;
import com.jliu.mr.intro.MaxTemperatureReducer;
public class MaxTemperatureReducerTest {
@Test
public void testRetrunsMaximumIntegerValues() throws IOException {
new ReduceDriver<Text, IntWritable, Text, IntWritable>()
//設定Reducer
.withReducer(new MaxTemperatureReducer())
//設定輸入key和List
.withInput(new Text("1950"), Arrays.asList(new IntWritable(10), new IntWritable(5)))
//設定期望輸出
.withOutput(new Text("1950"), new IntWritable(10))
//執行測試
.runTest();
}
}
總結
通過MRUnit框架對MapReduce測試比較簡單,配合JUnit,建立MapperDriver或ReduceDriver物件,設定需要測試的類,設定輸入和期望的輸出,通過runTest()來執行測試例。參考資料
1. Hadoop權威指南 第3版