自定義GroupingComparator -- 求出每一筆訂單中成交金額最大的一筆交易

阿新 • • 發佈：2018-11-20

程式碼地址：
https://gitee.com/tanghongping/hadoopMapReduce/tree/master/src/com/thp/bigdata/secondarySort

訂單id	商品id	成交金額
Order_0000001	Pdt_01	222.8
Order_0000001	Pdt_05	25.8
Order_0000002	Pdt_03	522.8
Order_0000002	Pdt_04	122.4
Order_0000002	Pdt_05	722.4
Order_0000003	Pdt_01	222.8

現在需要求出每一個訂單中成交金額最大的一筆交易

分析：
相同的訂單id必須到同一個reduce去才能進行統計出每個訂單中數量最大的那筆。
寫一個Partition方法，只要是訂單相同的就讓他們到同一個reduce中。
但是傳遞過去的給同一個reduce進行處理的資料都是相同的訂單id，但是卻是三個不同的bean，三個bean是不能看成一個key的。

OrderBean:

package com.thp.bigdata.secondarySort;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;

/**
 * 訂單
 * @author 湯小萌
 *
 */
public class OrderBean implements WritableComparable<OrderBean>{
	private Text itemId;		// 訂單id
	private DoubleWritable mount;	// 訂單數量
	
	public OrderBean() {}
	
	public OrderBean(Text itemId, DoubleWritable mount) {
		set(itemId, mount);
	}
	public void set(Text itemId, DoubleWritable mount) {
		this.itemId = itemId;
		this.mount = mount;
	}
	
	public Text getItemId() {
		return itemId;
	}
	public void setItemId(Text itemId) {
		this.itemId = itemId;
	}
	public DoubleWritable getMount() {
		return mount;
	}
	public void setMount(DoubleWritable mount) {
		this.mount = mount;
	}
	
	
	
	
	@Override
	public String toString() {
		return itemId + "\t" + mount.get();
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeUTF(itemId.toString());
		out.writeDouble(mount.get());
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.itemId = new Text(in.readUTF());
		this.mount = new DoubleWritable(in.readDouble());
	}
	
	// 【注意：】
	//  這個方法是進行排序的
	/**
	 * 在記憶體往外溢位的時候需要呼叫比較方法進行排序
	 * 在檔案進行合併  merge  的時候也需要呼叫比較方法進行排序
	 */
	@Override
	public int compareTo(OrderBean o) {
		int cmp = this.itemId.compareTo(o.getItemId());
		if(cmp == 0) {
			// 加上了  -  號  就變成了倒序排序了  從大往小排序
			cmp = -this.mount.compareTo(o.mount);
		}
		return cmp;
	}
	
}

ItemIdPartitioner :

package com.thp.bigdata.secondarySort;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Partitioner;

import com.thp.bigdata.secondarySort.OrderBean;

/**
 * 自定義的Paritioner：
 * 		讓相同的id分到相同的partition 進行處理
 * @author 湯小萌
 *
 */
public class ItemIdPartitioner extends Partitioner<OrderBean, NullWritable> {
	
	/**
	 * 相同id的OrderBean會發往相同的parttion
	 * 而且產生的分割槽數，是會跟使用者設定的 reduce task保持一致
	 * numPartitions  就是 設定的 reduce task
	 */
	@Override
	public int getPartition(OrderBean bean, NullWritable value, int numPartitions) {
		return (bean.getItemId().hashCode() & Integer.MAX_VALUE) % numPartitions;
	}

}

ItemIdGroupingComparator :

package com.thp.bigdata.secondarySort;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

import com.thp.bigdata.secondarySort.OrderBean;

/**
 * 利用reduce端的ItemIdGroupingComparator來實現將相同的id的OrderBean看成相同的Key
 * @author 湯小萌
 *
 */
public class ItemIdGroupingComparator extends WritableComparator {
	
	// 這個構造方法是一定要有的
	// 傳入作為key的bean的class型別，已經制定主要讓框架做反射的例項物件
	protected ItemIdGroupingComparator() {
		super(OrderBean.class, true);
	}
	
	
	@Override
	public int compare(WritableComparable a, WritableComparable b) {
		OrderBean aBean = (OrderBean) a;
		OrderBean bBean = (OrderBean) b;
		// 相同的orderId就認為是相同的key
		return aBean.getItemId().compareTo(bBean.getItemId());
	}
}

MapReduce過程

package com.thp.bigdata.secondarySort;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SecondarySort {
	
	
	/**
	 * 	Order_0000001,Pdt_01,222.8
		Order_0000001,Pdt_05,25.8
		Order_0000002,Pdt_05,325.8
		Order_0000002,Pdt_03,522.8
		Order_0000002,Pdt_04,122.4
		Order_0000003,Pdt_01,222.8
	 * 
	 * 由於Orderbean定義了compareTo方法，所以在shuffle階段就會進行排序
	 * 接下來就是要使用自定義的partitioner進行分割槽
	 * 我們進行分割槽的目的是要將相同的id的OrderBean發往相同的partition進行處理
	 * 每一個partition拿到的都是相同的id的OrderBean
	 * 但是key卻不是一樣的，我們現在要欺騙parition，讓它以為相同id的OrderBean都是相同的key
	 * 那麼處理的時候，就會只保留第一個key，就是我們之前排序好放在最前面的key就是這個id下的訂單數量最高的OrderBean
	 * 
	 */
	static class SecondarySortMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable> {
		OrderBean bean = new OrderBean();
		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			System.out.println(line);
			String[] fields = line.split(",");
			// System.out.println(fields[0] + " -- " + fields[2]);
			bean.set(new Text(fields[0]), new DoubleWritable(Double.parseDouble(fields[2])));
			// System.out.println(bean.getItemId());
			context.write(bean, NullWritable.get());
		}
	}
	
	static class SecondarySortReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {
		@Override
		protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context)
				throws IOException, InterruptedException {
			context.write(key, NullWritable.get());
		}
	}
	
	public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf);
		
		job.setJarByClass(SecondarySort.class);
		
		job.setMapperClass(SecondarySortMapper.class);
		job.setReducerClass(SecondarySortReducer.class);
		
		
		job.setOutputKeyClass(OrderBean.class);
		job.setOutputValueClass(NullWritable.class);
		
		FileInputFormat.setInputPaths(job, new Path("f:/order/input"));
		FileOutputFormat.setOutputPath(job, new Path("f:/order/output"));
		
		job.setGroupingComparatorClass(ItemIdGroupingComparator.class);
		job.setPartitionerClass(ItemIdPartitioner.class);
		job.setNumReduceTasks(3);
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
	
	
}

自定義GroupingComparator -- 求出每一筆訂單中成交金額最大的一筆交易

程式碼地址： https://gitee.com/tanghongping/hadoopMapReduce/tree/master/src/com/thp/bigdata/secondarySort 訂單id 商品id 成交金額

給出一個不多於5位的正整數，（1）求出它是幾位數（2）分別打印出每一位數字（3）按逆序打印出各位數字，例如原數為321，應輸出123

1、給出一個不多於5位的正整數，要求：（1）求出它是幾位數；（2）分別打印出每一位數字；（3）按逆序打印出各位數字，例如原數為321，應輸出123. #include <stdio.h> int main()

ios 一步一步學會自定義地圖吹出框(CalloutView)-->(百度地圖，高德地圖,google地圖)

前言在ios上邊使用地相簿的同學肯定遇到過這樣的問題：吹出框只能設定title和subtitle和左右的view，不管是百度地圖還是高德地圖還是自帶的google地圖，只提供了這四個屬性，如果想新增更多的view，只能自定義。可是，類庫只能看到.h檔案，.m都看不

ios 一步一步學會自定義地圖吹出框(CalloutView)-->(百度地圖，高德地圖,google地圖)...

前言在ios上邊使用地相簿的同學肯定遇到過這樣的問題：吹出框只能設定title和subtitle和左右的view，不管是百度地圖還是高德地圖還是自帶的google地圖，只提供了這四個屬性，如果想新增更多的view，只能自定義。可是，類庫只能看到.h檔案，.m都看不到

定義一個二維陣列，內容為三個學生的四門課成績，求出每個學生四門課的平均分並輸出，再求出每門課三個學生的平均分並輸出！

jqueryValidator自定義校驗規則的一種方式（不覆蓋源碼）

|| util isdigit tgt car col bug 特殊字符 new 1.封裝自定義驗證方法-validate-methods.js /**************************************************************

jqueryValidator自定義校驗規則的一種方式（覆蓋源碼）

至少聯系電話 wem length 登錄密碼密碼 tro [0 verify 1.自定義js文件：jqValid-extend.js 內容： function setDefaultValidate(){ $.extend(true, $.validato

自定義轉場詳解(一)

assign pda 好的 led hint ext hid case delegate 前言本文是我學習了onevcat的這篇轉場入門做的一點筆記。今天我們來實現一個簡單的自定義轉場，我們先來看看這篇文章將要實現的一個效果圖吧：過程詳解熱身準備我們先創建一

TensorFlow 自定義模型導出：將 .ckpt 格式轉化為 .pb 格式

clear sin onf iat arr keys 部分 use oci 本文承接上文 TensorFlow-slim 訓練 CNN 分類模型（續），闡述通過 tf.contrib.slim 的函數 slim.learning.train 訓練的模型，怎麽通過人為的

求出某一天是那一年的第幾星期,比如2008年1月8日？

視頻下載全部 style 源碼 2008年 dex class ner ont 題目7： 2008年1月8日是那年中的第幾星期？（視頻下載）（全部書籍）本章源碼 import java.util.*;public class Test { public stat

並發編程學習筆記之構建自定義的同步工具(十一)

利用追蹤這不 temp sets nor rac lse 情況下概述: 在並發編程學習筆記之並發工具類(四)中,為大家介紹了幾種同步工具(同步工具就是依靠自己的狀態,調節線程是阻塞還是運行用的.),閉鎖、FutureTask、信號量、關卡. 使用以上的同步工具大部分時

併發程式設計學習筆記之構建自定義的同步工具(十一)

概述: 在併發程式設計學習筆記之併發工具類(四)中,為大家介紹了幾種同步工具(同步工具就是依靠自己的狀態,調節執行緒是阻塞還是執行用的.),閉鎖、FutureTask、訊號量、關卡. 使用以上的同步工具大部分時候可以滿足我們的需求,但是如果沒能滿足我們需要的功能,可以使用語言和類庫提供的底層

微信小程式之動畫 —— 自定義底部彈出層

wxml： <view class='buy' bindtap='showBuyModal'>立即購買</view>  <view class="cover_screen" bindtap="hideBuyModal"

微信小程序之動畫 —— 自定義底部彈出層

modals num view radi let art time cit 點擊 wxml： <view class='buy' bindtap='showBuyModal'>立即購買</view> <!--

百度地圖精準定位，自定義marker，自定義資訊彈出視窗。

先說下業務場景，在資料庫查出相應的專案展示出來，然後點選專案在百度地圖上標註專案的地址，同時彈出視窗顯示相關的資訊。下面就來看看具體的實現： 1.引入百度地圖相關的API，我這裡選的是web的JavaScript API。後端的

Android TextView自定義選中彈出選單記筆記功能

效果圖兩種方案實現一、通過onActionItemClicked 完整程式碼： mManusTv.setCustomSelectionActionModeCallback(new ActionMode.Callback() {

C#已知兩天日期求之間每一天日期字串集合

問題描述：在《C#判斷判斷某一時刻屬於什麼時間段》中提到的訂單處理系統中，有這麼一個需求，就是根據使用者選擇的兩個日期，去mdb中查詢在這連個日期之間的每一天的相關資訊，故需要用每一天的日期字串來拼接sql語句。解決方法：

hadoop入門7：自定義GroupingComparator進行分組

摘要： GroupingComparator是在reduce階段分組來使用的，由於reduce階段，如果key相同的一組，只取第一個key作為key，迭代所有的values。如果reduce的key是自定義的bean，我們只需要bean裡面的某個屬性相同就認為這樣的key

8、jeecg 筆記之自定義word 模板匯出（一）

1、前言 jeecg 中已經自帶 word 的匯出匯出功能，其所使用的也是 easypoi，儘管所匯出的 word 能滿足大部分需求，但總是有需要用到自定義 word匯出模板，下文所用到的皆是 easypoi 提供的，為方便下次翻閱，故記之。 2、程式碼部分 2.1、controll

【LeetCode】Longest Common Subsequence最長公共子序列（求出某一解+LCS長度）

Longest Common Subsequence 給出兩個字串，找到最長公共子序列(LCS)，返回LCS的長度。說明最長公共子序列的定義： • 最長公共子序列問題是在一組序列（通常2個）中找到最長公共子序列（注意：不同於子串，LCS不需要是

自定義GroupingComparator -- 求出每一筆訂單中成交金額最大的一筆交易

OrderBean:

ItemIdPartitioner :

ItemIdGroupingComparator :

MapReduce過程

相關推薦