MapReduce高階分割槽、排序，Combine

阿新 • • 發佈：2022-04-28

一、分割槽

1.1先分析一下具體的業務邏輯，確定大概有多少個分割槽
1.2首先書寫一個類，它要繼承org.apache.hadoop.mapreduce.Partitioner這個類
1.3重寫public int getPartition這個方法，根據具體邏輯，讀資料庫或者配置返回相同的數字
1.4在main方法中設定Partioner的類，job.setPartitionerClass(DataPartitioner.class);
1.5設定Reducer的數量，job.setNumReduceTasks(6);

public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		
		Job job = Job.getInstance(conf);
		
		job.setJarByClass(DataCount.class);
		
		job.setMapperClass(DCMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(DataInfo.class);
		
		job.setReducerClass(DCReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(DataInfo.class);
		
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		job.setPartitionerClass(DCPartitioner.class);
		
		job.setNumReduceTasks(Integer.parseInt(args[2]));
		
		
		job.waitForCompletion(true);

	}
	//Map
	public static class DCMapper extends Mapper<LongWritable, Text, Text, DataInfo>{
		
		private Text k = new Text();
		
		@Override
		protected void map(LongWritable key, Text value,
				Mapper<LongWritable, Text, Text, DataInfo>.Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			String[] fields = line.split("\t");
			String tel = fields[1];
			long up = Long.parseLong(fields[8]);
			long down = Long.parseLong(fields[9]);
			DataInfo dataInfo = new DataInfo(tel,up,down);
			k.set(tel);
			context.write(k, dataInfo);

		}
		
	}
	public static class DCReducer extends Reducer<Text, DataInfo, Text, DataInfo>{
		
		@Override
		protected void reduce(Text key, Iterable<DataInfo> values,
				Reducer<Text, DataInfo, Text, DataInfo>.Context context)
				throws IOException, InterruptedException {
			long up_sum = 0;
			long down_sum = 0;
			for(DataInfo d : values){
				up_sum += d.getUpPayLoad();
				down_sum += d.getDownPayLoad();
			}
			DataInfo dataInfo = new DataInfo("",up_sum,down_sum);
			
			context.write(key, dataInfo);
		}
		
	}
	public static class DCPartitioner extends  Partitioner<Text, DataInfo>{
		
		private static Map<String,Integer> provider = new HashMap<String,Integer>();
		
		static{
			provider.put("138", 1);
			provider.put("139", 1);
			provider.put("152", 2);
			provider.put("153", 2);
			provider.put("182", 3);
			provider.put("183", 3);
		}
		@Override
		public int getPartition(Text key, DataInfo value, int numPartitions) {
			//向資料庫或配置資訊 讀寫
			String tel_sub = key.toString().substring(0,3);
			Integer count = provider.get(tel_sub);
			if(count == null){
				count = 0;
			}
			return count;
		}
		
	}

二、排序

排序MR預設是按key2進行排序的，如果想自定義排序規則，被排序的物件要實WritableComparable介面，在compareTo方法中實現排序規則，然後將這個物件當做k2，即可完成排序

public class InfoBean implements WritableComparable<InfoBean>{

	private String account;
	private double income;
	private double expenses;
	private double surplus;
	
	public void set(String account,double income,double expenses){
		this.account = account;
		this.income = income;
		this.expenses = expenses;
		this.surplus = income - expenses;
	}
	@Override
	public void write(DataOutput out) throws IOException {
		out.writeUTF(account);
		out.writeDouble(income);
		out.writeDouble(expenses);
		out.writeDouble(surplus);
		
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.account = in.readUTF();
		this.income = in.readDouble();
		this.expenses = in.readDouble();
		this.surplus = in.readDouble();
	}

	@Override
	public int compareTo(InfoBean o) {
		if(this.income == o.getIncome()){
			return this.expenses > o.getExpenses() ? 1 : -1;
		}
		return this.income > o.getIncome() ? 1 : -1;
	}

	@Override
	public String toString() {
		return  income + "\t" +	expenses + "\t" + surplus;
	}
	public String getAccount() {
		return account;
	}

	public void setAccount(String account) {
		this.account = account;
	}

	public double getIncome() {
		return income;
	}

	public void setIncome(double income) {
		this.income = income;
	}

	public double getExpenses() {
		return expenses;
	}

	public void setExpenses(double expenses) {
		this.expenses = expenses;
	}

	public double getSurplus() {
		return surplus;
	}

	public void setSurplus(double surplus) {
		this.surplus = surplus;
	}

}

public static class SortMapper extends Mapper<LongWritable, Text, InfoBean, NullWritable>{

		private InfoBean k = new InfoBean();
		@Override
		protected void map(
				LongWritable key,
				Text value,
				Mapper<LongWritable, Text, InfoBean, NullWritable>.Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			String[] fields = line.split("\t");
			k.set(fields[0], Double.parseDouble(fields[1]), Double.parseDouble(fields[2]));
			
			context.write(k, NullWritable.get());
			
		}
		
	}
	public static class SortReducer extends Reducer<InfoBean, NullWritable, Text, InfoBean>{

		private Text k = new Text();
		@Override
		protected void reduce(InfoBean key, Iterable<NullWritable> values,
				Reducer<InfoBean, NullWritable, Text, InfoBean>.Context context)
				throws IOException, InterruptedException {
			k.set(key.getAccount());
			
			context.write(k, key);
		}
		
	}

三、Combine

combiner的作用就是在map端對輸出先做一次合併，以減少傳輸到reducer的資料量。

  job.setCombinerClass(WCReducer.class);
      
  //提交任務
  job.waitForCompletion(true);

MapReduce高階分割槽、排序，Combine

一、分割槽 1.1先分析一下具體的業務邏輯，確定大概有多少個分割槽 1.2首先書寫一個類，它要繼承org.apache.hadoop.mapreduce.Partitioner這個類

[轉]MYSQL--表分割槽、檢視，建立，刪除

一、 mysql分割槽簡介資料庫分割槽資料庫分割槽是一種物理資料庫設計技術。雖然分割槽技術可以實現很多效果，但其主要目的是為了在特定的SQL操作中減少資料讀寫的總量以縮減sql語句的響應時間，同時對於應用來說

Combiner程式設計、reduce join、map join、mapreduce優化總結、通過自定義分割槽類避免資料傾斜、MapReduce自定義排序

【Java必修課】一圖說盡排序，一文細說Sorting(Array、List、Stream的排序)

簡說排序排序是極其常見的使用場景，因為在生活中就有很多這樣的例項。國家GDP排名、奧運獎牌排名、明星粉絲排名等，各大排行榜，給人的既是動力，也是壓力。

【轉】go語言筆記——切片函式常見操作，增刪改查和搜尋、排序

【轉】https://www.cnblogs.com/bonelee/p/6862627.html 7.6.6 搜尋及排序切片和陣列標準庫提供了sort包來實現常見的搜尋和排序操作。您可以使用sort包中的函式func Ints(a []int)來實現對 int 型別的切片排序。例如

MapReduce全流程_分割槽_排序

1、MapReduce完成的工作流程： 2、分割槽操作（Partition分割槽） Partition分割槽案例實操

歸併排序，快速排序，堆排序實現及時間、空間複雜度分析

1. 演算法實現排序中比較複雜的有歸併排序，快速排序，堆排序三大演算法了，三個演算法的時間複雜度都是O(N * logN)，三個演算法的思想我就簡單的展開詳述以下。

用Liunx統計檔案行數，切分、對id進行排序，去重（wc,head,sort,uniq）！

技術標籤：liunxshelllinuxvim 如下我們遇到一個檔案2G以上文字編輯相互甩鍋：此時內心很崩潰，但是我們先看一下檔案一共有多少行。

高階前端進階，為什麼要使用call、apply、bind？

技術標籤：高階前端進階前言： call、apply、bind這3個方法的用處都是更改this指向，在學習call、apply、bind之前，需要先了解this，所以本文會先對this進行講解。

MySQL分割槽表建立，分割槽建立、刪除示例

建立分割槽表示例 CREATE TABLE IF NOT EXISTS `{tb_name}` ( `id` int(11) NOT NULL AUTO_INCREMENT,

棧(stack)、遞迴（八皇后問題）、排序演算法分類，時間和空間複雜度簡介

一、棧的介紹： 1)棧的英文為(stack)2)棧是一個先入後出(FILO-First In Last Out)的有序列表。3)棧(stack)是限制線性表中元素的插入和刪除只能在線性表的同一端進行的一種特殊線性表。允許插入和刪除的一端，為變化

小米 12 細節公佈：“小屏滿血高階旗艦全面升級”，50W 無線快充、10W 反充 ...

12 月 24 日訊息，小米今日繼續預熱即將到來的小米 12 系列，談到了被使用者關心的“小屏滿血高階旗艦”以及小米 12 部分細節。據透露，這款機型除了搭載全新的高通驍龍 8 旗艦 SoC 外，相比小米 11 還有一系列升級

報告：高階電視、智慧手機帶動 2021 年 AMOLED 面板需求大爆發，總面積達 1420 萬平方米

2 月 6 日訊息，據中國臺灣經濟日報報道，Omdia 的一份新報告顯示，去年高階OLED 電視銷量的強勁勢頭加上智慧手機銷量反彈，推動了 AMOLED 顯示面板需求的大幅增長。面板總需求面積達到 1420 萬平方米，與 2020 年相

安徽：聚焦高階晶片、作業系統、基礎軟體等重點領域，推進關鍵核心技術突破

近日，安徽省經濟和資訊化廳印發《安徽省“十四五”中小企業發展規劃》（以下簡稱《規劃》）。《規劃》明確了發展原則，其中包括堅持走專精特新發展之路：引導中小企業堅持走“專精特新”發展道路，推動中小企業在細

【SQL基礎】【記住重新命名】高階查詢：聚合函式（四捨五入）、分組過濾、排序、

〇、概述 1、功能概述高階查詢：聚合函式（四捨五入）、分組過濾、排序、 2、建表語句

linux篇-Linux MBR分割槽、掛載操作步驟，邏輯卷擴容操作

Linux MBR分割槽、掛載操作步驟，邏輯卷擴容操作伺服器開機之後，能自動識別出硬碟，但是硬碟不能夠儲存資料，必須對硬碟進行分割槽、格式化、掛載後才能使用；linux主分割槽和拓展分割槽總數不能超過4個，拓展分

啟動、配置、擴容、伸縮、儲存，開普勒雲平臺之使用指南

本文從啟動、配置、擴容、伸縮、儲存等方面介紹如何使用開普勒雲平臺。一、Kplcloud是什麼?

MySQL單表查詢操作例項詳解【語法、約束、分組、聚合、過濾、排序等】

本文例項講述了MySQL單表查詢操作。分享給大家供大家參考，具體如下：語法

Python高階函式、常用內建函式用法例項分析

本文例項講述了Python高階函式、常用內建函式用法。分享給大家供大家參考，具體如下：

利用PyCharm操作Github(倉庫新建、更新，程式碼回滾)

Github是目前世界上最流行的程式碼儲存和分享平臺，而PyCharm是Python圈中最流行的IDE，它很好地支援了Git操作。本文將會介紹如何利用PyCharm來連線Github，同時演示Github上的倉庫新建、更新，以及程

MapReduce高階 分割槽、排序，Combine

相關推薦

MapReduce高階分割槽、排序，Combine