專題：如何寫測試——MapReduce

阿新 • • 發佈：2019-02-07

寫不寫測試是個人選擇問題，對於我自己而言，寫測試不是為了有X格，而是為了對程式碼更有信心。

　　MapReduce的測試確實沒有那麼方便，但是還是有辦法的。下面的內容主要加工自MRUnit Tutorial，Tutorial中另外還介紹了Counter的測試（也就是如何獲取Counter）和Configuration傳引數（如何在Mock中獲取conf物件）。

1. 基本功 - JUnit

　　如果不會這個我還能說什麼呢，還好很少有人不會。

座標junit:junit

import org.junit.*;

public class TestCases{
  @Test 

  public void testXXX(){
   assertEquals(1 == 1);  
  }
}

　　這一部分是程式碼功能性測試的基礎，一般來說是與環境不太相關的都可以用JUnit來做函式級的測試。這部分完成之後才有必要進行下面的Mapper、Reducer測試。

2. MapReduce Mock - MRUnit

座標

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version 
>1.1.0</version>
    <classifier>hadoop2</classifier>
    <scope>test</scope>
</dependency>

　　注意：需要顯式地指定classifier來指定hadoop1還是hadoop2，兩者在API上是有區別的。
　　
　　下面以測試WordCount為例說明如何對各個部分寫測試。

2.1 測試Mapper

初始化一個MapDriver

WordCount.Map mapper = new WordCount.Map();
mapDriver = MapDriver.newMapDriver(mapper);

給定輸入檢查輸出

@Test
public void testMapper() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text("a b a"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
            ))
            .runTest();
}

　　大部分情況下，測試不太可能寫得這麼優雅，比如遇到了float/double，這個時候就需要把結果取出來判斷（這種方式顯然才是更靈活的）。

@Test
public void testMpper2() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text(
            "a b a"));
    List<Pair<Text, IntWritable>> actual = mapDriver.run();

    List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
    );

    // apache commons-collection: 判斷元素相等，考慮了每個元素的頻次
    assertTrue(CollectionUtils.isEqualCollection(actual, expected));

    assertEquals(actual.get(0).getSecond().get(), 1);
}

2.2 測試Reducer

與Mapper類似，需要先初始化一個ReduceDriver

WordCount.Reduce reducer = new WordCount.Reduce();
reduceDriver = ReduceDriver.newReduceDriver(reducer);

給定輸入檢查輸出

@Test
public void testReducer() throws IOException {
    List<IntWritable> values = Lists.newArrayList();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("a"), values);
    reduceDriver.withOutput(new Text("a"), new IntWritable(2));
    reduceDriver.runTest();
}

2.3 測試整個流程

需要初始化三個部分——MapDriver, ReduceDriver和MapReduceDriver

WordCount.Map mapper = new WordCount.Map();
WordCount.Reduce reducer = new WordCount.Reduce();
mapDriver = MapDriver.newMapDriver(mapper);
reduceDriver = ReduceDriver.newReduceDriver(reducer);
mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);

設定Map的輸入，檢查Reduce的輸出

@Test
public void testMapReduce() throws IOException {
    mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
            .withInput(new LongWritable(), new Text("a b b"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
            .runTest();
}

3. 附錄

完整工程
pom依賴

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.mrunit</groupId>
        <artifactId>mrunit</artifactId>
        <version>1.1.0</version>
        <classifier>hadoop2</classifier>
    </dependency>
</dependencies>

程式碼

package du00.tests;

import com.google.common.collect.Lists;
import org.apache.commons.collections.CollectionUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.apache.hadoop.mrunit.types.Pair;
import org.junit.*;

import static org.junit.Assert.*;

import java.io.IOException;
import java.util.List;

public class WordCountTest {
    MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
    ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
    MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;

    @Before
    public void setUp() {
        WordCount.Map mapper = new WordCount.Map();
        WordCount.Reduce reducer = new WordCount.Reduce();
        mapDriver = MapDriver.newMapDriver(mapper);
        reduceDriver = ReduceDriver.newReduceDriver(reducer);
        mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
    }

    @Test
    public void testMapper() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text("a b a"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
                ))
                .runTest();
    }

    /**
     * 有時候結果會比較複雜，取出來抽取結果的一部分比較會是比較好的選擇。比如物件的某個欄位是double型別的。
     *
     * @throws IOException
     */
    @Test
    public void testMpper2() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text(
                "a b a"));
        List<Pair<Text, IntWritable>> actual = mapDriver.run();

        List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
        );

        // apache commons-collection: 判斷元素相等，考慮了每個元素的頻次
        assertTrue(CollectionUtils.isEqualCollection(actual, expected));

        assertEquals(actual.get(0).getSecond().get(), 1);
    }

    @Test
    public void testReducer() throws IOException {
        List<IntWritable> values = Lists.newArrayList();
        values.add(new IntWritable(1));
        values.add(new IntWritable(1));
        reduceDriver.withInput(new Text("a"), values);
        reduceDriver.withOutput(new Text("a"), new IntWritable(2));
        reduceDriver.runTest();
    }

    @Test
    public void testMapReduce() throws IOException {
        mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
                .withInput(new LongWritable(), new Text("a b b"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
                .runTest();
    }
}

專題：如何寫測試——MapReduce

寫不寫測試是個人選擇問題，對於我自己而言，寫測試不是為了有X格，而是為了對程式碼更有信心。　　MapReduce的測試確實沒有那麼方便，但是還是有辦法的。下面的內容主要加工自MRUnit Tutorial，Tutorial中另外還介紹了Counter

專題：如何寫測試——HBase

最近做Spark Streaming任務時用到了HBase做中間狀態查詢和儲存，順手寫了一些測試，小小總結了一下各部分測試的寫法。話說這裡為什麼不用redis呢？Redis作為kv儲存系統還是太簡單了，HBase可以讓你少操很多很多心，這裡就不跑題了。

MapReduce兩種執行環境介紹：本地測試環境，服務器環境

拷貝本地測試 servle 第一個 host lang hdf ces ati 本地測試環境(windows)：1、在windows下配置hadoop的環境變量2、拷貝debug工具(winutils.exe)到hadoop目錄中的bin目錄，註意winutils.exe

大資料篇：hadoop測試WordCount mapreduce出錯問題

[[email protected] ~]# hadoop jar /usr/local/hadoop-2.8.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar wordcount /data/wordcount /o

MapReduce的兩種執行環境：本地測試環境，伺服器環境

本地測試環境(windows)：1、在windows下配置hadoop的環境變數2、拷貝debug工具(winutils.exe)到hadoop目錄中的bin目錄，注意winutils.exe的版本要

JUnit4寫測試用例異常： java.lang.Exception: No tests found matching處理辦法

之前很少用以JUnit4來寫測試用例，對於使用JUnit4的一些規則不是很清楚，所以出現了：java.lang.Exception: No tests found matching異常。在網上看了下說是有以下幾點： 1.沒加@Test註解； 2.如果用了spring，可能

MapReduce學習寫測試

參考測試方式 mrunit mrunit 已經被退休了，很多人建議不要再使用這個進行測試千遍一律的WordCount public class WordCountMRUnitTest { MapReduceDriver&

基礎測試理論與實踐-連載（一）：寫的背景與動機

注：本文是原創，轉載麻煩務必註明出處。我第一次接觸測試的概念，是我大三的一次技術通識課上。我的學校是一所國家級重點大學，在廣東省更是辦學的佼佼者。雖然我的專業，電腦科學與技術不是S學校的強項，但是我們班有很多同學在校期間就拿了ACM的世界金牌。在這種大牛大神的崇尚學習

效能專題：一文搞懂效能測試常見指標

1. 前言上週，對效能測試系列專題，在公號內發表了第一篇介紹：【效能系列連載一】開篇：效能測試不可不知的“乾貨”，但反響貌似並不太好，但既然此前已答應了部分讀者要連載分享效能這塊的知識，含著淚也得繼續寫。效能測試的基礎：就是在確保功能實現正確的前提下，通過合適的效能測試加壓方式

測試開發專題：spring-boot自定義返回引數校驗錯誤資訊

之前兩篇文章 [Spring-boot自定義引數校驗註解](https://www.immortalp.com/articles/2020/05/15/1589509382896.html)和[如何在spring-boot中進行引數校驗](https://www.immortalp.com/articles/

linux設備驅動第三篇：寫一個簡單的字符設備驅動

提示 copy flags 驅動程序相關 clas open ugo param 在linux設備驅動第一篇：設備驅動程序簡介中簡單介紹了字符驅動，本篇簡單介紹如何寫一個簡單的字符設備驅動。本篇借鑒LDD中的源碼，實現一個與硬件設備無關的字符設備驅動，僅僅操

centos7搭建ELK Cluster集群日誌分析平臺（四）：簡單測試

-1 簡單測試 logs ima .tar.gz 分析 -c cluster images 續之前安裝好的ELK集群　　各主機：es-1 ~ es-3 :192.168.1.21/22/23 　　　　　　logstash:　　192.168.1.24 　　　　　　ki

寫文章賺錢33期：寫文章到底能賺哪些錢？

寫文章賺錢寫文章能賺錢？很多朋友看到都在笑，寫文章怎麽可能賺錢呢？你可能會想到：去打工、當老板才能賺錢，沒有聽說寫寫文章就能賺錢？俊哥說：這是一個非常嚴重的錯誤思維。網絡上的大咖、名人他們都在偷偷地做著一件事件，就是寫文章。經過時間的積累，大咖們的文章會覆蓋到網絡上的所有角落，影響了很多人，改變了很

戴文的Linux內核專題：03 驅動程序【轉】

規模閃存目錄超級計算機用戶 memory ipa mes 摩托轉自：http://www.lai18.com/content/432194.html 驅動程序是使內核能夠溝通和操作硬件或協議（規則和標準）的小程序。沒有驅動程序，內核不知道如何與硬件溝通或者處理協

【蟲師講Selenium+Python】第三講：操作測試對象

最大寬度運行 sub alt mail rom baidu bdr 一、首先呢，選擇一個編輯器，我們這裏選擇的是Sublime Text >Ctrl+B為運行當前腳本的快捷方式二、編寫代碼 1 #coding==utf-8 2 from selenium

練習：寫一個腳本，完成以下任務

ech 相同寫一個腳本 src 腳本 blog 用戶添加 http user 練習：寫一個腳本，完成以下任務 1.添加5個用戶，user1，users，。。。user5 2.每個用戶的密碼和用戶名相同，並且要求，添加密碼完成後不顯示passwd命令的執行結果信息： 3.每

TiDB（1）： server測試安裝

變量 bst emp fun big monit sub trac 下載代碼本文的原文連接是: http://blog.csdn.net/freewebsys/article/details/50600352 未經博主同意不得轉載。博主地址是：h

寫出MapReduce程序完成以下功能

oid exce 目標 app list con pan public word 寫出MapReduce程序完成以下功能. input1： 2012-3-1 a 2012-3-2 b 2012-3-3 c 2012-3-4 d 2012-3-5 a 2012-3-6 b

[ 測試思維 ] 轉載：啟發式測試策略模型(HTSM)

automatic 列表 efi 同時探索很好 min 回歸定制啟發式測試策略模型（Heuristic Test Strategy Model，簡稱HTSM）是測試專家James Bach提出的一組幫助測試設計的指南（guideline）。本文將介紹HTSM

《Java從入門到放棄》JavaSE入門篇：單元測試

java 單元測試單元測試其實沒什麽好說的，直接看操作步驟！我們來測試前一篇的小明買食物的方法。第一步：在小明類上點右鍵，然後再new一個JUnit Test Case第二步：繼續點下一步，圖上的內容相信大家都看得懂吧，如果看不懂···，那就要麽學習，要麽放棄吧，哈哈！第三步：勾選要測試的方法：第四

專題：如何寫測試——MapReduce

1. 基本功 - JUnit

2. MapReduce Mock - MRUnit

2.1 測試Mapper

2.2 測試Reducer

2.3 測試整個流程

3. 附錄

相關推薦