1. 程式人生 > >專題:如何寫測試——MapReduce

專題:如何寫測試——MapReduce

寫不寫測試是個人選擇問題,對於我自己而言,寫測試不是為了有X格,而是為了對程式碼更有信心。

  MapReduce的測試確實沒有那麼方便,但是還是有辦法的。下面的內容主要加工自MRUnit Tutorial,Tutorial中另外還介紹了Counter的測試(也就是如何獲取Counter)和Configuration傳引數(如何在Mock中獲取conf物件)。

1. 基本功 - JUnit

  如果不會這個我還能說什麼呢,還好很少有人不會。

座標junit:junit

import org.junit.*;

public class TestCases{
  @Test
public void testXXX(){ assertEquals(1 == 1); } }

  這一部分是程式碼功能性測試的基礎,一般來說是與環境不太相關的都可以用JUnit來做函式級的測試。這部分完成之後才有必要進行下面的Mapper、Reducer測試。

2. MapReduce Mock - MRUnit

座標

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version
>
1.1.0</version> <classifier>hadoop2</classifier> <scope>test</scope> </dependency>

  注意:需要顯式地指定classifier來指定hadoop1還是hadoop2,兩者在API上是有區別的。
  
  下面以測試WordCount為例說明如何對各個部分寫測試。

2.1 測試Mapper

  • 初始化一個MapDriver
WordCount.Map mapper = new WordCount.Map();
mapDriver = MapDriver.newMapDriver(mapper);
  • 給定輸入檢查輸出
@Test
public void testMapper() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text("a b a"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
            ))
            .runTest();
}

  大部分情況下,測試不太可能寫得這麼優雅,比如遇到了float/double,這個時候就需要把結果取出來判斷(這種方式顯然才是更靈活的)。

@Test
public void testMpper2() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text(
            "a b a"));
    List<Pair<Text, IntWritable>> actual = mapDriver.run();

    List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
    );

    // apache commons-collection: 判斷元素相等,考慮了每個元素的頻次
    assertTrue(CollectionUtils.isEqualCollection(actual, expected));

    assertEquals(actual.get(0).getSecond().get(), 1);
}

2.2 測試Reducer

  • 與Mapper類似,需要先初始化一個ReduceDriver
WordCount.Reduce reducer = new WordCount.Reduce();
reduceDriver = ReduceDriver.newReduceDriver(reducer);
  • 給定輸入檢查輸出
@Test
public void testReducer() throws IOException {
    List<IntWritable> values = Lists.newArrayList();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("a"), values);
    reduceDriver.withOutput(new Text("a"), new IntWritable(2));
    reduceDriver.runTest();
}

2.3 測試整個流程

  • 需要初始化三個部分——MapDriver, ReduceDriverMapReduceDriver
WordCount.Map mapper = new WordCount.Map();
WordCount.Reduce reducer = new WordCount.Reduce();
mapDriver = MapDriver.newMapDriver(mapper);
reduceDriver = ReduceDriver.newReduceDriver(reducer);
mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
  • 設定Map的輸入,檢查Reduce的輸出
@Test
public void testMapReduce() throws IOException {
    mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
            .withInput(new LongWritable(), new Text("a b b"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
            .runTest();
}

3. 附錄

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.mrunit</groupId>
        <artifactId>mrunit</artifactId>
        <version>1.1.0</version>
        <classifier>hadoop2</classifier>
    </dependency>
</dependencies>
  • 程式碼
package du00.tests;

import com.google.common.collect.Lists;
import org.apache.commons.collections.CollectionUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.apache.hadoop.mrunit.types.Pair;
import org.junit.*;

import static org.junit.Assert.*;

import java.io.IOException;
import java.util.List;

public class WordCountTest {
    MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
    ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
    MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;

    @Before
    public void setUp() {
        WordCount.Map mapper = new WordCount.Map();
        WordCount.Reduce reducer = new WordCount.Reduce();
        mapDriver = MapDriver.newMapDriver(mapper);
        reduceDriver = ReduceDriver.newReduceDriver(reducer);
        mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
    }

    @Test
    public void testMapper() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text("a b a"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
                ))
                .runTest();
    }

    /**
     * 有時候結果會比較複雜,取出來抽取結果的一部分比較會是比較好的選擇。比如物件的某個欄位是double型別的。
     *
     * @throws IOException
     */
    @Test
    public void testMpper2() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text(
                "a b a"));
        List<Pair<Text, IntWritable>> actual = mapDriver.run();

        List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
        );

        // apache commons-collection: 判斷元素相等,考慮了每個元素的頻次
        assertTrue(CollectionUtils.isEqualCollection(actual, expected));

        assertEquals(actual.get(0).getSecond().get(), 1);
    }

    @Test
    public void testReducer() throws IOException {
        List<IntWritable> values = Lists.newArrayList();
        values.add(new IntWritable(1));
        values.add(new IntWritable(1));
        reduceDriver.withInput(new Text("a"), values);
        reduceDriver.withOutput(new Text("a"), new IntWritable(2));
        reduceDriver.runTest();
    }

    @Test
    public void testMapReduce() throws IOException {
        mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
                .withInput(new LongWritable(), new Text("a b b"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
                .runTest();
    }
}