結合案例講解MapReduce重要知識點 -------- 記憶體排序

阿新 • • 發佈：2018-12-20

TOP N

資料：

hello qianfeng

qianfeng is best

qianfeng better

hadoop is good

spark is nice

取統計後的前三名： qianfeng 4 is 3 hello 2

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.ShortWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.VIntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class MemSort  extends ToolRunner implements Tool{

	/**
	 * 自定義的myMapper
	 */
	static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
		
		Text k = new Text();
		Text v = new Text("1");
        
		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			String lines [] = line.split(" ");
			for (String s : lines) {
				k.set(s);
				context.write(k, v);
			}
		}
	}
	
	/**
	 * 自定義MyReducer
	 */
	static class MyReducer extends Reducer<Text, Text, Text, Text>{
		
		List<String> li = new ArrayList<String>();
		@Override
		protected void reduce(Text key, Iterable<Text> value,Context context)
				throws IOException, InterruptedException {
			int counter = 0;
			for (Text t : value) {
				counter += Integer.parseInt(t.toString());
			}
			//context.write(new Text(counter+""), key);
			li.add(key.toString()+"_"+counter);
			/**
			 * li(qianfeng_4,is_3,hello_2)
			 */
		}
		
		@Override
		protected void cleanup(Context context)throws IOException, InterruptedException {
			//對ist中的元素的第二個進行排序
			for (int i = 0; i < li.size()-1; i++) {
				for (int j = i+1; j < li.size(); j++) {
					//判斷
					if(Integer.parseInt(li.get(i).split("_")[1]) <
							Integer.parseInt(li.get(j).split("_")[1])){
						String tmp = "";
						tmp = li.get(i);
						li.set(i, li.get(j));
						li.set(j, tmp);
					}
				}
			}
			//輸出
			for (int i = 0; i < 3; i++) {
				String l [] = li.get(i).split("_");
				context.write(new Text(l[0]), new Text(l[1]));
			}
		}
	}
	
	
	@Override
	public void setConf(Configuration conf) {
		conf.set("fs.defaultFS", "hdfs://hadoop01:9000");
	}

	@Override
	public Configuration getConf() {
		return new Configuration();
	}
	
	/**
	 * 驅動方法
	 */
	@Override
	public int run(String[] args) throws Exception {
		//1、獲取conf物件
		Configuration conf = getConf();
		//2、建立job
		Job job = Job.getInstance(conf, "model01");
		//3、設定執行job的class
		job.setJarByClass(MemSort.class);
		//4、設定map相關屬性
		job.setMapperClass(MyMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		
		//5、設定reduce相關屬性
		job.setReducerClass(MyReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		//判斷輸出目錄是否存在，若存在則刪除
		FileSystem fs = FileSystem.get(conf);
		if(fs.exists(new Path(args[1]))){
			fs.delete(new Path(args[1]), true);
		}
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		//6、提交執行job
		int isok = job.waitForCompletion(true) ? 0 : 1;
		return isok;
	}
	
	/**
	 * job的主入口
	 * @param args
	 */
	public static void main(String[] args) {
		try {
			//對輸入引數作解析
			String [] argss = new GenericOptionsParser(new Configuration(), args).getRemainingArgs();
			System.exit(ToolRunner.run(new MemSort(), argss));
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

結合案例講解MapReduce重要知識點 -------- 記憶體排序

TOP N 資料： hello qianfeng hello qianfeng qianfeng is best qianfeng better hadoop is good spark is nice 取統計後的前三名： qianfeng 4 is

結合案例講解MapReduce重要知識點 --------- 簡單排序

import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.

結合案例講解MapReduce重要知識點 -------- 使用自定義資料實現記憶體排序

自定義資料WCData import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.WritableComparab

結合案例講解MapReduce重要知識點 ------- 使用自定義MapReduce資料型別實現二次排序

自定義資料型別SSData import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.WritableCompa

結合案例講解MapReduce重要知識點 ----------- 自定義MapReduce資料型別（1）重寫Writable介面

重寫Writable介面如下程式碼就是自定義mr資料型別，在wordcount類使用它。 WordCountWritable import java.io.DataInput; import java.io.DataOutput; import java.io.IOE

結合案例講解MapReduce重要知識點 --------- 多表連線

第一張表的內容： login： uid sexid logindate 1 1 2017-04-17 08:16:20 2 2 2017-04-15 06:18:20 3 1 2017-04-16 05:16:24 4 2 2017-04-14 03:18:20

結合題目講解“快速排序演算法”

　　理論很豐滿，實戰很骨感，作為一個嫌棄自己的胖子，我對實戰很是熱衷，導致總是對理論知識不上心，做題的時候就蒙了。　　比如有這樣一道自考題：　　看完以後第一感覺就是翻書，因為覺得過了一遍書中知

spring心得3--bean的生命週期結合案例詳細講解@普通期圖解與uml圖解一併分析

1.繼上一篇部落格續將，bean生命週期理論概括 bean被載入到容器中時，他的生命週期就開始了。bean工廠在一個bean可以使用前完成很多工作： 1）.容器尋找bean的定義資訊並例項化。 2）.使用依賴注入，spring按bean定義資訊配置bean的所有屬性。 3）

詳細講解MapReduce二次排序過程

我在15年處理大資料的時候還都是使用MapReduce, 隨著時間的推移, 計算工具的發展, 記憶體越來越便宜, 計算方式也有了極大的改變. 到現在再做大資料開發的好多同學都是直接使用spark, hive等工具, 很少有再寫MapReduce的了. 這裡整理一下MapReduce中經常用到的二次排序的方

大資料（hadoop-mapreduce案例講解）

package com.vip; import java.io.IOException; import java.util

JS重要知識點總結-不完善

子函數必須 his 代碼規範重要 line java 全局 lba ###1、閉包 ??閉包就是能夠讀取其他函數內部變量的函數。由於在Javascript語言中，只有函數內部的子函數才能讀取局部變量，因此可以把閉包簡單理解成"定義在一個函數內部的函數"。所以，在本質上，

python- Socket & Mysql 重要知識點

線程進程解釋一對一線程池生產者消費者模型並發基於 ket socket : 解決粘包並發編程生產者消費者模型進程池和線程池回調函數 GIL全局解釋器

C#入門經典（重要知識點）

指向 ack div abstract 傳遞修飾多少 new blog 一、重載和覆蓋的區別：　　相同點：都涉及兩個同名的方法。　　不同點： 1.類層次：重載涉及的是同一個類的兩個同名方法；.覆蓋涉及的是子類的一個方法和父類的一個方法，這兩個方法同名。

mapreduce 的二次排序

大數據 hadoop 二次排序 mapreduce 一：理解二次排序的功能，使用自己理解的方式表達（包括自定義數據類型，分區，分組，排序）二：編寫實現二次排序功能，提供源碼文件。三：理解mapreduce join 的幾種方式，編碼實現reduce join，提供源代碼，說出

第三天重要知識點歸納

元素顯示 per 布局 div 懸停 section lang 就是文本元素（標簽）元素會默認些樣式，最後都可以被CSS覆蓋的 h1~h6標簽：一級標題~6級標題 p標簽：段落 q標簽：小段內容引用，內容默認由雙引號包括--“內容” blockquote標簽：大段內

結合案例撲克牌

案例 shuffle pub AI In oss shuf ber oid 今天給大家帶來的是一個使用集合編寫的撲克牌案例有洗牌看牌發牌功能　　 1 public static void main(String[] args) { 2 HashMap

2018-07-28期 MapReduce實現對數字排序

ide 執行微軟 author 處理升序 .config microsoft 如果 package cn.sjq.mr.sort.number;import java.io.IOException;import org.apache.hadoop.io.LongWrit

Python OS模塊重要知識點

workspace python os 得到 demo listdir 目錄文件 split style 虛擬機 Python OS模塊重要知識點這幾點很重要，主要是關於文件路徑，我之前踩了很多坑，今天總結一下，方便以後能夠避免與path相關的各種坑！ 1，首先我們想獲取

酒水鏈區塊鏈系統開發成功案例講解

進行比特幣 ces 區塊鏈技術假設區塊 term 成功 pro 最近，區塊鏈如火如荼地被市場提起，圍繞區塊鏈商業應用的探討汗牛充棟。目前來看，金融場景是區塊鏈技術應用相對廣泛的領域，比特幣就是區塊鏈運用到數字貨幣領域的成果之一。盡管目前數字貨幣發展仍不規範，尤其是數字

SSM用jq整合Ajax入門案例講解

() ons 一模一樣頁面程序 response req 整合點擊事件 SSM用Jq整合Ajax 我就是一個程序小白，很多原理都不懂，下面的內容只能讓新手知道怎麽用。 Ajax 廢話少說，總之像點贊，評論，之類的很多功能沒必要進行頁面跳轉，SSM框架用表單或者超鏈接的

結合案例講解MapReduce重要知識點 -------- 記憶體排序

相關推薦