Hadoop-使用MRUnit來寫單元測試

阿新 • • 發佈：2019-01-21

簡介

單元測試是用來對一個模組、一個函式或者一個類來進行正確性檢驗的測試工作。在MapReduce開發中，如果能對Mapper和Reducer進行詳盡的單元測試，將及早發現問題，加快開發進度。本文結合具體的例子，簡單總結如何使用MRUnit來對Hadoop的Mapper和Reducer進行單元測試。本文的相關程式碼可以從Github獲取：https://github.com/liujinguang/hadoop-study.git

MRUnit介紹

在MapReduce中，map函式和reduce函式的獨立測試非常方便，這是由函式風格決定的。MRUnit(http://incubator.apache.org/mrunit/)是一個測試庫，它便於將已知的輸入傳遞給mapper或者檢查reducer的輸出是否符合預期。MRUnit與標準的執行框架(如JUnit)-起使用，因此可以將MapReduce作業的測試作為正常開發環境的一部分執行。

關於Mapper

MaxTemperatureMapper類實現了對固定格式字串中解析年份、溫度和空氣質量，在後面的MRUnit測試中，給出了字串的例子，可以參考。

package com.jliu.mr.intro;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		String line = value.toString();
		String year = line.substring(15, 19);
		int airTemperature;

		if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
										// signs
			airTemperature = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemperature = Integer.parseInt(line.substring(87, 92));
		}

		String quality = line.substring(92, 93);
		if (airTemperature != MISSING && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemperature));
		}
	}

	private static final int MISSING = 9999;
}

使用MRUnit進行測試，首先需要建立MapDriver物件，並設定要測試的Mapper類，設定輸入、期望輸出。具體例子中傳遞一個天氣記錄作為mapper的輸入，然後檢查輸出是否是讀入的年份和氣溫。如果沒有期望的輸出值，MRUnit測試失敗。

package com.jliu.mr.mrunit;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;

import com.jliu.mr.intro.MaxTemperatureMapper;

public class MaxTemperatureMapperTest {
	@Test
	public void testParsesValidRecord() throws IOException {
		Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
		// ++++++++++++++++++++++++++++++year ^^^^
				"99999V0203201N00261220001CN9999999N9-00111+99999999999");
		// ++++++++++++++++++++++++++++++temperature ^^^^^
		// 由於測試的mapper，所以適用MRUnit的MapDriver
		new MapDriver<LongWritable, Text, Text, IntWritable>()
				// 配置mapper
				.withMapper(new MaxTemperatureMapper())
				// 設定輸入值
				.withInput(new LongWritable(0), value)
				// 設定期望輸出：key和value
				.withOutput(new Text("1950"), new IntWritable(-11)).runTest();
	}

	@Test
	public void testParseMissingTemperature() throws IOException {
		// 根據withOutput()被呼叫的次數， MapDriver能用來檢查0、1或多個輸出記錄。
		// 在這個測試中由於缺失的溫度記錄已經被過濾，保證對這種特定輸入不產生任何輸出
		Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
		// ++++++++++++++++++++++++++++++Year ^^^^
				"99999V0203201N00261220001CN9999999N9+99991+99999999999");
		// ++++++++++++++++++++++++++++++Temperature ^^^^^
		new MapDriver<LongWritable, Text, Text, IntWritable>()
				.withMapper(new MaxTemperatureMapper())
				.withInput(new LongWritable(0), value)
				.runTest();
	}
}

關於Reducer

結合上面的Mapper，reducer必須找出指定鍵的最大值。

package com.jliu.mr.intro;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {

		int maxValue = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			maxValue = Math.max(maxValue, value.get());
		}

		context.write(key, new IntWritable(maxValue));
	}
}

對Reducer的測試，與Mapper類似，參考下面的具體測試例：

package com.jliu.mr.mrunit;

import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import java.io.IOException;
import java.util.Arrays;
import org.apache.hadoop.io.*;
import org.junit.Test;

import com.jliu.mr.intro.MaxTemperatureReducer;

public class MaxTemperatureReducerTest {
	@Test
	public void testRetrunsMaximumIntegerValues() throws IOException {
		new ReduceDriver<Text, IntWritable, Text, IntWritable>()
		//設定Reducer
		.withReducer(new MaxTemperatureReducer())
		//設定輸入key和List
		.withInput(new Text("1950"),  Arrays.asList(new IntWritable(10), new IntWritable(5)))
		//設定期望輸出
		.withOutput(new Text("1950"), new IntWritable(10))
		//執行測試
		.runTest();
	}
}

總結

通過MRUnit框架對MapReduce測試比較簡單，配合JUnit，建立MapperDriver或ReduceDriver物件，設定需要測試的類，設定輸入和期望的輸出，通過runTest()來執行測試例。

參考資料

1. Hadoop權威指南第3版

Hadoop-使用MRUnit來寫單元測試

簡介

MRUnit介紹

關於Mapper

關於Reducer

總結

參考資料

Hadoop-使用MRUnit來寫單元測試

如何用mockito來寫單元測試

到底為啥要寫單元測試

2018-08-06 期 MapReduce MRUnit安裝及單元測試

如何優雅的寫單元測試？

年輕時，我不寫單元測試

ASP.NET Core中如何針對一個使用HttpClient物件的類編寫單元測試

使用Powermock和mockito來進行單元測試

在python中對一個類編寫單元測試

在springMVC中的controller寫單元測試

說一說在SpringBoot寫單元測試遇到的坑

用JMockit寫單元測試

用Hadoop Streaming來寫wordcount

為什麼程式設計師討厭寫單元測試程式碼

單元測試系列一-為什麼要寫單元測試，何時寫，寫多細

為什麼很多程式設計師不喜歡寫單元測試？

為什麽從前那些.NET開發者都不寫單元測試呢？

你會寫單元測試嗎

net core WebApi——使用xUnits來實現單元測試

在HADOOP中使用MRUNIT進行單元測試

Hadoop-使用MRUnit來寫單元測試

簡介

MRUnit介紹

關於Mapper

關於Reducer

總結

參考資料

相關推薦