Hadoop-rm綜合案例2.分割槽排序多目錄輸出

阿新 • • 發佈：2021-01-10

rm綜合案例-好差評

程式碼實現

需求
現在有一些訂單的評論資料，需求，將訂單按照好評與差評區分開來，將資料輸出到不同的檔案目錄下，資料內容如下圖，其中資料第九個欄位表示好評，中評，差評。0：好評，1：中評，2：差評。現需要根據好評，中評，差評把資料分類並輸出到不同的目錄中,並且要求按照時間順序降序排列。
在這裡插入圖片描述

分析

自定義InputFormat合併小檔案
自定義分割槽根據評論等級把資料分割槽

自定義OutputFormat把資料輸出到多個目錄

程式碼實現

Mapper

package com.lagou.mr.comment.step2;

import org.apache.commons.lang3.StringUtils;

import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java. 
io.IOException;


//第一對kv:使用SequenceFileinputformat讀取，所以key:Text,Value:BytesWritable(原因是生成sequencefile 檔案指定就是這種型別) 
public class CommentMapper extends Mapper<Text, BytesWritable, CommentBean, NullWritable> {
    //key就是檔名 
    //value:一個檔案的完整內容 
    @Override
    protected void map(Text key, BytesWritable value, 
 Context context)
        throws IOException, InterruptedException {
        //且分割槽每一行 
        String str = new String(value.getBytes());
        String[] lines = str.split("\n");

        for (String line : lines) {
            CommentBean commentBean = parseStrToCommentBean(line);

            if (null != commentBean) {
                context.write(commentBean, NullWritable.get());
            }
        }
    }

    //切分字串封裝成commentbean物件 
    public CommentBean parseStrToCommentBean(String line) {
        if (StringUtils.isNotBlank(line)) {
            //每一行進行切分 
            String[] fields = line.split("\t");

            if (fields.length >= 9) {
                return new CommentBean(fields[0], fields[1], fields[2],
                    Integer.parseInt(fields[3]), fields[4], fields[5],
                    fields[6], Integer.parseInt(fields[7]), fields[8]);
            }

            return null;
        }

        return null;
    }
}

bean

package com.lagou.mr.comment.step2;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;


public class CommentBean implements WritableComparable<CommentBean> {
    private String orderId;
    private String comment;
    private String commentExt;
    private int goodsNum;
    private String phoneNum;
    private String userName;
    private String address;
    private int commentStatus;
    private String commentTime;

    //無參構造 
    public CommentBean() {
    }

    public CommentBean(String orderId, String comment, String commentExt,
        int goodsNum, String phoneNum, String userName, String address,
        int commentStatus, String commentTime) {
        this.orderId = orderId;
        this.comment = comment;
        this.commentExt = commentExt;
        this.goodsNum = goodsNum;
        this.phoneNum = phoneNum;
        this.userName = userName;
        this.address = address;
        this.commentStatus = commentStatus;
        this.commentTime = commentTime;
    }

    @Override
    public String toString() {
        return orderId + "\t" + comment + "\t" + commentExt + "\t" + goodsNum +
        "\t" + phoneNum + "\t" + userName + "\t" + address + "\t" +
        commentStatus + "\t" + commentTime;
    }

    public String getOrderId() {
        return orderId;
    }

    public void setOrderId(String orderId) {
        this.orderId = orderId;
    }

    public String getComment() {
        return comment;
    }

    public void setComment(String comment) {
        this.comment = comment;
    }

    public String getCommentExt() {
        return commentExt;
    }

    public void setCommentExt(String commentExt) {
        this.commentExt = commentExt;
    }

    public int getGoodsNum() {
        return goodsNum;
    }

    public void setGoodsNum(int goodsNum) {
        this.goodsNum = goodsNum;
    }

    public String getPhoneNum() {
        return phoneNum;
    }

    public void setPhoneNum(String phoneNum) {
        this.phoneNum = phoneNum;
    }

    public String getUserName() {
        return userName;
    }

    public void setUserName(String userName) {
        this.userName = userName;
    }

    public String getAddress() {
        return address;
    }

    public void setAddress(String address) {
        this.address = address;
    }

    public int getCommentStatus() {
        return commentStatus;
    }

    public void setCommentStatus(int commentStatus) {
        this.commentStatus = commentStatus;
    }

    public String getCommentTime() {
        return commentTime;
    }

    public void setCommentTime(String commentTime) {
        this.commentTime = commentTime;
    }

    //定義排序規則,按照時間降序;0,1,-1 
    @Override
    public int compareTo(CommentBean o) {
        return o.getCommentTime().compareTo(this.commentTime);
    }

    //序列化 
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(orderId);
        out.writeUTF(comment);
        out.writeUTF(commentExt);
        out.writeInt(goodsNum);
        out.writeUTF(phoneNum);
        out.writeUTF(userName);
        out.writeUTF(address);
        out.writeInt(commentStatus);
        out.writeUTF(commentTime);
    }

    //反序列化 
    @Override
    public void readFields(DataInput in) throws IOException {
        this.orderId = in.readUTF();
        this.comment = in.readUTF();
        this.commentExt = in.readUTF();
        this.goodsNum = in.readInt();
        this.phoneNum = in.readUTF();
        this.userName = in.readUTF();
        this.address = in.readUTF();
        this.commentStatus = in.readInt();
        this.commentTime = in.readUTF();
    }
}

分割槽器

package com.lagou.mr.comment.step2;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Partitioner;


public class CommentPartitioner extends Partitioner<CommentBean, NullWritable> {
    @Override
    public int getPartition(CommentBean commentBean, NullWritable nullWritable,
        int numPartitions) {
        // return (commentBean.getCommentStatus() & Integer.MAX_VALUE) % numPartitions; 
        return commentBean.getCommentStatus(); //0,1,2 -->對應分割槽編號的 
    }
}

總結與說明

分割槽器的作用是決定mapper中通過map方法出來的資料前往哪個分割槽。（預設通過key）

自定義OutputFormat

CommentOutputFormat

package com.lagou.mr.comment.step2;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;


//最終輸出的kv型別 
public class CommentOutputFormat extends FileOutputFormat<CommentBean, NullWritable> {
    //負責寫出資料的物件 
    @Override
    public RecordWriter<CommentBean, NullWritable> getRecordWriter(
        TaskAttemptContext job) throws IOException, InterruptedException {
        Configuration conf = job.getConfiguration();
        FileSystem fs = FileSystem.get(conf);

        //當前reducetask處理的分割槽編號來建立檔案獲取輸出流 
        //獲取到在Driver指定的輸出路徑;0是好評，1是中評，2是差評 
        String outputDir = conf.get(
                "mapreduce.output.fileoutputformat.outputdir");
        FSDataOutputStream goodOut = null;
        FSDataOutputStream commonOut = null;
        FSDataOutputStream badOut = null;
        int id = job.getTaskAttemptID().getTaskID().getId(); //當前reducetask處理的分割槽編號 

        if (id == 0) {
            //好評資料 
            goodOut = fs.create(new Path(outputDir + "\\good\\good.log"));
        } else if (id == 1) {
            //中評資料
            commonOut = fs.create(new Path(outputDir + "\\common\\common.log"));
        } else {
            badOut = fs.create(new Path(outputDir + "\\bad\\bad.log"));
        }

        return new CommentRecorderWrtier(goodOut, commonOut, badOut);
    }
}

問題與總結

在這裡插入圖片描述
此時指定建立輸出流會導致每一個Task都建立三個流。又因為hadoop不支援追加寫入。因此每次建立新的輸出流都會建立一個空白檔案覆蓋之前的檔案。導致只有差評的檔案有資料，

RecordWriter

package com.lagou.mr.comment.step2;

import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;

import java.io.IOException;


public class CommentRecorderWrtier extends RecordWriter<CommentBean, NullWritable> {
    //定義寫出資料的流 
    private FSDataOutputStream goodOut;
    private FSDataOutputStream commonOut;
    private FSDataOutputStream badOut;

    public CommentRecorderWrtier(FSDataOutputStream goodOut,
        FSDataOutputStream commonOut, FSDataOutputStream badOut) {
        this.goodOut = goodOut;
        this.commonOut = commonOut;
        this.badOut = badOut;
    }

    //實現把資料根據不同的評論型別輸出到不同的目錄下 
    //寫出資料的邏輯 
    @Override
    public void write(CommentBean key, NullWritable value)
        throws IOException, InterruptedException {
        int commentStatus = key.getCommentStatus();
        String beanStr = key.toString();

        if (commentStatus == 0) {
            goodOut.write(beanStr.getBytes());
            goodOut.write("\n".getBytes());
            goodOut.flush();
        } else if (commentStatus == 1) {
            commonOut.write(beanStr.getBytes());
            commonOut.write("\n".getBytes());
            commonOut.flush();
        } else {
            badOut.write(beanStr.getBytes());
            badOut.write("\n".getBytes());
            badOut.flush();
        }
    }

    //釋放資源 
    @Override
    public void close(TaskAttemptContext context)
        throws IOException, InterruptedException {
        IOUtils.closeStream(goodOut);
        IOUtils.closeStream(commonOut);
        IOUtils.closeStream(badOut);
    }
}

Reduce

package com.lagou.mr.comment.step2;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;


public class CommentReducer extends Reducer<CommentBean, NullWritable, CommentBean, NullWritable> {
    @Override
    protected void reduce(CommentBean key, Iterable<NullWritable> values,
        Context context) throws IOException, InterruptedException {
        //遍歷values，輸出的是key；key：是一個引用地址，底層獲取value同時，key的值也發生了變化 
        for (NullWritable value : values) {
            context.write(key, value);
        }
    }
}

Driver

public class CommentDriver {
    public static void main(String[] args)
        throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "CommentDriver");
        job.setJarByClass(CommentDriver.class);
        job.setMapperClass(CommentMapper.class);
        job.setReducerClass(CommentReducer.class);
        job.setMapOutputKeyClass(CommentBean.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setOutputKeyClass(CommentBean.class);
        job.setOutputValueClass(NullWritable.class);
        job.setPartitionerClass(CommentPartitioner.class);
        //指定inputformat型別 
        job.setInputFormatClass(SequenceFileInputFormat.class);
        //指定輸出outputformat型別 
        job.setOutputFormatClass(CommentOutputFormat.class);
        //指定輸入，輸出路徑 
        FileInputFormat.setInputPaths(job,
            new Path("E:\\teach\\hadoop框架\\資料\\data\\mr綜合案例\\out"));
        FileOutputFormat.setOutputPath(job,
            new Path("E:\\teach\\hadoop框架\\資料\\data\\mr綜合案例\\multi-out"));
        //指定reducetask的數量 
        job.setNumReduceTasks(3);

        boolean b = job.waitForCompletion(true);

        if (b) {
            System.exit(0);
        }
    }
}

Hadoop-rm綜合案例2.分割槽排序多目錄輸出

技術標籤：hadoop拉勾隨筆hadoop rm綜合案例-好差評程式碼實現Mapperbean分割槽器總結與說明

多型和介面的綜合案例--筆記本USB介面案例-----Java

筆記本USB介面案例： 1 public interface USB { 2 3public abstract void open();//開啟裝置 4 5public abstract void close();//關閉裝置

JavaWeb25.2【綜合案例：註冊功能】

前臺程式碼&表單校驗 1 <!DOCTYPE html> 2 <html lang=\"en\"> 3<head> 4<meta charset=\"utf-8\">

HADOOP 優化（2）：HDFS (2)多目錄/叢集擴容及縮容

3HDFS—多目錄 3.1 NameNode多目錄配置 1）NameNode的本地目錄可以配置成多個，且每個目錄存放內容相同，增加了可靠性

AM57x 多核SoC開發板——GPMC的多通道AD採集綜合案例手冊（上）

目錄 1 ————案例功能 2 ————操作說明 2.1 ————硬體連線 2.2 ————案例測試

AM57x 多核SoC開發板——GPMC的多通道AD採集綜合案例手冊（下）

本文件適用開發環境： Windows開發環境：Windows 7 64bit、Windows 10 64bit Linux Processor SDK：ti-processor-sdk-linux-rt-am57xx-evm-04.03.00.05

SQL優化案例（分割槽表問題）

SELECT COUNT(1) cnt FROM( SELECT MAX(TT.ORG_NO) 服務區域, MAX(TT.MR_SECT_NO) 抄表段編號, MAX(CBDMC) 抄表段名稱,

01.2-選擇排序

目錄1、簡介2、程式碼3、測試4、複雜度分析 1、簡介選擇排序（Selection sort）是一種簡單直觀的排序演算法。它的工作原理如下：首先在未排序序列中找到最小（大）元素，存放到排序序列的起始位置，然後，再從剩餘

hadoop學習之路(2)

1.本地安裝hadoop(不安裝本地hadoop會報錯,雖然並不影響遠端的環境,但會報錯:Failed to locate the winutils binary in the hadoop binary path)

JavaScript DOM 程式設計藝術 - 綜合案例（一）

檔案結構 html 程式碼 template.html <!DOCTYPE html> <html lang="en"> <head>

DNS，綜合案例：實現internet的DNS服務架構

實驗目的搭建DNS實現internet dns 架構環境要求需要8臺主機DNS客戶端：10.0.0.6/24 本地DNS伺服器（只快取）：10.0.0.8/24轉發目標DNS伺服器：10.0.0.18/24根DNS伺服器：10.0.0.28/24org域DNS伺服器：10.0.0.3

java Collections 排序--多條件排序例項

我就廢話不多說了，大家還是直接看程式碼吧~ // 告警排序 Collections.sort(domesticAirport,comparator);

面向物件(2)__繼承多型1

1.簡單的繼承 class Animal(): def __init__(self): pass def run(self): print(\'animal is running\') class Dog(Animal):

iptables綜合案例：兩個私網的互通

綜合案例：兩個私網的互通 centos6,7安裝httpd，並啟動環境準備 sysctl -a |grep \'ip_forward\'

演算法（第四版）2.2 歸併排序

演算法（第四版）2.2 歸併排序歸併排序，即將兩個有序的陣列歸併成一個更大的有序陣列。很-很快人們就根據這個操作發明了一種簡單的遞迴排序演算法：歸併排序。要將一個數組排序，可以先（遞迴地）將它分成兩半分別

102 01 Android 零基礎入門 02 Java面向物件 03 綜合案例（學生資訊管理） 02 案例分析及實現 06 通過方法實現學生類與專業類關聯——方案三

102 01 Android 零基礎入門02 Java面向物件 03 綜合案例（學生資訊管理） 02 案例分析及實現 06 通過方法實現學生類與專業類關聯——方案三

106 01 Android 零基礎入門 02 Java面向物件 03 綜合案例（學生資訊管理） 03 新增功能及實現 02 新增屬性完成學生資訊儲存

106 01 Android 零基礎入門02 Java面向物件 03 綜合案例（學生資訊管理） 03 新增功能及實現 02 新增屬性完成學生資訊儲存

前端05-CSS-案例2：新聞頁面

<!DOCTYPE html> <html lang=\"en\"> <head> <meta charset=\"UTF-8\"> <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">

hadoop入門----官方案例grep wordcount 簡單上手

hadoop官方案例官方Grep案例：grep:通過指定好的正則，匹配輸入檔案中滿足條件規則的單詞並且輸出

EIGRP綜合實驗2

拓撲圖：實驗說明：採用三層架構，R1和R2為總部，R3和R4為區域，R5和R6為分支機構，R7模擬Internet路由器 R2和R3兒童s0/3連線，R1和R4用s0/1連線，上圖有誤實驗要求： 1.分支到達總部或者其他區域要使用R

Hadoop-rm綜合案例2.分割槽排序多目錄輸出

rm綜合案例-好差評

程式碼實現

Mapper

bean

分割槽器

總結與說明

自定義OutputFormat

CommentOutputFormat

問題與總結

RecordWriter

Reduce

Driver

相關推薦