MapReduce 編程模板編寫【分析網站基本指標UV】程序

阿新 • • 發佈：2018-04-30

地址自動 trace spa bool this try reducer CI

1.網站基本指標的幾個概念

PV: page view 瀏覽量

頁面的瀏覽次數，用戶每打開一次頁面就記錄一次。

UV:unique visitor 獨立訪客數

一天內訪問某站點的人數（以cookie為例）但是如果用戶把瀏覽器cookie給刪了之後再次訪問會影響記錄。

VV: visit view 訪客的訪問次數

記錄所有訪客一天內訪問了多少次網站，訪客完成訪問直到瀏覽器關閉算一次。

IP：獨立ip數

指一天內使用不同ip地址的用戶訪問網站的數量。

2.編寫MapReduce編程模板

Driver

package mapreduce;
?
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
?
public class MRDriver extends Configured implements Tool {
?
    public int run(String[] args) throws Exception {
        //創建job
        Job job = Job.getInstance(this.getConf(),"mr-demo");
        job.setJarByClass(MRDriver.class);
?
        //input 默認從hdfs讀取數據 將每一行轉換成key-value
        Path inPath = new Path(args[0]);
        FileInputFormat.setInputPaths(job,inPath);
?
        //map 一行調用一次Map方法  對每一行數據進行分割
        job.setMapperClass(null);
        job.setMapOutputKeyClass(null);
        job.setMapOutputValueClass(null);
?
        //shuffle
        job.setPartitionerClass(null);//分組
        job.setGroupingComparatorClass(null);//分區
        job.setSortComparatorClass(null);//排序
?
        //reduce 每有一條key value調用一次reduce方法
        job.setReducerClass(null);
        job.setOutputKeyClass(null);
        job.setOutputValueClass(null);
?
        //output
        Path outPath = new Path(args[1]);
        //this.getConf()來自父類 內容為空可以自己set配置信息
        FileSystem fileSystem = FileSystem.get(this.getConf());
        //如果目錄已經存在則刪除
        if(fileSystem.exists(outPath)){
            //if path is a directory and set to true
            fileSystem.delete(outPath,true);
        }
        FileOutputFormat.setOutputPath(job, outPath);
        //submit
        boolean isSuccess = job.waitForCompletion(true);
        return isSuccess ? 0:1;
    }
?
    public static void main(String[] args) {
        Configuration configuration = new Configuration();
        try {
            int status = ToolRunner.run(configuration, new MRDriver(), args);
            System.exit(status);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
?

Mapper

public class MRModelMapper extends Mapper<LongWritable,Text,Text,LongWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        /**
         * 實現自己的業務邏輯
         */
    }
}

Reduce

public class MRModelReducer extends Reducer<Text,LongWritable,Text,LongWritable> {
?
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
        /**
         * 根據業務需求自己實現
         */
    }
}

3. 統計每個城市的UV數

分析需求：

UV：unique view 唯一訪問數，一個用戶記一次

map:

key: CityId （城市id）數據類型： Text

value: guid （用戶id）數據類型：Text

shuffle:

key: CityId

value: {guid guid guid..}

reduce:

key: CityId

value: 訪問數即shuffle輸出value的集合大小

output:

key : CityId

value : 訪問數

MRDriver.java mapreduce執行過程

package mapreduce;
?
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
?
public class MRDriver extends Configured implements Tool {
?
    public int run(String[] args) throws Exception {
        //創建job
        Job job = Job.getInstance(this.getConf(),"mr-demo");
        job.setJarByClass(MRDriver.class);
?
        //input 默認從hdfs讀取數據 將每一行轉換成key-value
        Path inPath = new Path(args[0]);
        FileInputFormat.setInputPaths(job,inPath);
?
        //map 一行調用一次Map方法  對每一行數據進行分割
        job.setMapperClass(MRMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
?
       /* //shuffle
        job.setPartitionerClass(null);//分組
        job.setGroupingComparatorClass(null);//分區
        job.setSortComparatorClass();//排序
*/
        //reduce
        job.setReducerClass(MRReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
?
        //output
        Path outPath = new Path(args[1]);
        FileSystem fileSystem = FileSystem.get(this.getConf());
        if(fileSystem.exists(outPath)){
            //if path is a directory and set to true
            fileSystem.delete(outPath,true);
        }
        FileOutputFormat.setOutputPath(job, outPath);
        
        //submit
        boolean isSuccess = job.waitForCompletion(true);
        return isSuccess ? 0:1;
    }
?
    public static void main(String[] args) {
        Configuration configuration = new Configuration();
        try {
            int status = ToolRunner.run(configuration, new MRDriver(), args);
            System.exit(status);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

MRMapper.java

package mapreduce;
?
import java.io.IOException;
?
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
?
public class MRMapper extends Mapper<LongWritable,Text,Text,Text> {
    private Text mapOutKey = new Text();
    private Text mapOutKey1 = new Text();
    
    //一行調用一次Map方法  對每一行數據進行分割
    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        
        //獲得每行的值
        String str = value.toString();
        //按空格得到每個item
        String[] items = str.split("\t");
        
        if (items[24]!=null) {
            this.mapOutKey.set(items[24]);
            if (items[5]!=null) {
                this.mapOutKey1.set(items[5]);
            }
        }
        context.write(mapOutKey, mapOutKey1);
    }
    
}

MPReducer.java

package mapreduce;
?
import java.io.IOException;
import java.util.HashSet;
?
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
?
public class MRReducer extends Reducer<Text, Text, Text, IntWritable>{
?
    //每有一個key value數據 就執行一次reduce方法
    @Override
    protected void reduce(Text key, Iterable<Text> texts, Reducer<Text, Text, Text, IntWritable>.Context context)
            throws IOException, InterruptedException {
        
        HashSet<String> set = new HashSet<String>();
        
        for (Text text : texts) {
            set.add(text.toString());
        }
        
        context.write(key,new IntWritable(set.size()));
    
    }   
}

4.MapReduce執行wordcount過程理解

input：默認從HDFS讀取數據

 Path inPath = new Path(args[0]);
 FileInputFormat.setInputPaths(job,inPath);

將每一行數據轉換為key-value（分割），這一步由MapReduce框架自動完成。

輸出行的偏移量和行的內容

技術分享圖片

mapper: 分詞輸出

數據過濾，數據補全，字段格式化

輸入：input的輸出

將分割好的<key,value>對交給用戶定義的map方法進行處理，生成新的<key,value>對。

一行調用一次map方法。

統計word中的map：

技術分享圖片

shuffle: 分區，分組，排序

輸出：

<Bye,1>

<Hello,1>

<World,1,1>

得到map輸出的<key,value>對，Mapper會將他們按照key進行排序，得到mapper的最終輸出結果。

Reduce ：每一條Keyvalue調用一次reduce方法

將相同Key的List<value>，進行相加求和

output：將reduce輸出寫入hdfs

MapReduce 編程模板編寫【分析網站基本指標UV】程序

地址自動 trace spa bool this try reducer CI 1.網站基本指標的幾個概念 PV: page view 瀏覽量頁面的瀏覽次數，用戶每打開一次頁面就記錄一次。 UV:unique visitor 獨立訪客數一天內訪問某站點的人數（以coo

【轉載】MapReduce編程(一) Intellij Idea配置MapReduce編程環境

.net class 上傳 -c word 指定 otl 輸出信息 resource 目錄(?)[-] 一軟件環境二創建maven工程三添加maven依賴四配置log4j 五啟動Hadoop 六運行WordCount從本地讀取文件七運行Word

C++編程模板2

main pan print namespace urn %d c++編程 name std C++編程模板2 1 #include <iostream> 2 using namespace std; 3 4 /* 5 6 */ 7

MapReduce編程模型

.cn map com map() alt 列表 ron 元素過程 MapReduce編程模型一種分布式計算模型框架，解決海量數據的計算問題 MapReduce將整個並行計算過程抽象到兩個函數　　map(映射)：對一些獨立元素組成的列表的每一個元素進行指定的操作，可以

MapReduce編程之Semi Join多種應用場景與使用

得出 mon comm exception strong 相關 path 區別 rep Map Join 實現方式一 ● 使用場景：一個大表（整張表內存放不下，但表中的key內存放得下），一個超大表 ● 實現方式：分布式緩存 ● 用法: SemiJoin就是所謂的半

Java並發編程-AbstractQueuedSynchronizer源碼分析

otherwise 場景獨占鎖 serial moni 流程圖升級版 catch 所有簡介提供了一個基於FIFO隊列，可以用於構建鎖或者其他相關同步裝置的基礎框架。該同步器（以下簡稱同步器）利用了一個int來表示狀態，期望它能夠成為實現大部分同步需求的基礎。使用的方

C#編程の模板

ace int32 沒有泛型類 text mage int spa line C#泛型編程已經深入人心了。為什麽又提出C#模板編程呢？因為C#泛型存在一些局限性，突破這些局限性，需要使用C#方式的模板編程。由於C#語法、編譯器、IDE限制，C#模板編程沒有C++模板編程使

MapReduce編程實例5

ont inter 運行 ide comm rabl ron interrupt fileinput 前提準備： 1.hadoop安裝運行正常。Hadoop安裝配置請參考：Ubuntu下 Hadoop 1.2.1 配置安裝 2.集成開發環境正常。集成開發環境配置請參考：U

Mysql C語言API編程入門講解【轉載】

nbsp ogr http cati 存取編程入門開發實現調用軟件開發中我們經常要訪問數據庫，存取數據，之前已經有網友提出讓雞啄米講講數據庫編程的知識，本文就詳細講解如何使用Mysql的C語言API進行數據庫編程。鞍山皮膚病專科醫院www.0412pfk.

暴力破解MD5的實現（MapReduce編程）

pen brush apt ktr 思想必須 upd 大文件 file 本文主要介紹MapReduce編程模型的原理和基於Hadoop的MD5暴力破解思路。一、MapReduce的基本原理 Hadoop作為一個分布式架構的實現方案，它的核心思想包括以下幾個方面：HDFS

MapReduce編程模型詳解（基於Windows平臺Eclipse）

lib read 找到 lin @override ext logs 設置 otf 本文基於Windows平臺Eclipse，以使用MapReduce編程模型統計文本文件中相同單詞的個數來詳述了整個編程流程及需要註意的地方。不當之處還請留言指出。前期準備 hadoop集群

大數據MapReduce 編程實戰

大數據程序員 hadoop MapReduce 編程實戰一、大數據的起源1、舉例：（1）商品推薦問題1：大量訂單如何存儲？問題2：大量訂單如何計算？（2）天氣預報：問題1：大量的天氣數據如何存儲？問題2：大量的天氣數據如何計算？ 2、大數據核心的問題：（1）數據的存儲：分布式

以MapReduce編程五步走為基礎，說MapReduce工作原理

dfs 核心多少鍵值一行路徑運行 AS map 在之前的Hadoop是什麽中已經說過MapReduce采用了分而治之的思想，MapReduce主要分為兩部分，一部分是Map——分，一部分是Reduce——合 MapReduce全過程的數據都是以鍵值對的形式存在的如

java編程中的異常分析及面向對象的思考總結[圖]

目錄內部釋放資源包括 sta overload 普通 none 命名 java編程中的異常分析及面向對象的思考總結[圖]1.異常：程序中出現的不正常現象。2.異常的由來：程序在運行的過程中出現了不正常的情況，程序把它看成對象提取了屬性行為（名字，原因，位置等信息）形成

結對編程隊友代碼分析

struct for int 初中三角函數至少 out oot 檢驗代碼由C++完成, 主體函數是通過對年級判斷然後調用相應函數進行出題對於運算的實現是通過結構體實現的, 具體如下: struct yunsuan //小學運算的結構體 { strin

結對編程——隊友代碼分析

readline 而是代碼分析 else 實現賬戶 class writer 影響隊友代碼優點： 1.試卷生成方面：很好的實現了避免題目重復的功能，代碼將題目生成之後不先輸出到txt裏，而是保存到一個總題集裏，最後再將總題集輸出。這樣每次生成題目後都可以與總題集裏的所

結對編程-隊友代碼分析

() 一起的人檢測在一起 get 似的登錄 div 按照老師的要求，對隊友的代碼進行解析，先說缺點在看優點，改正缺點學習優點，一起進步！ 1.首先打開代碼的時候，關於賬號密碼的存儲，將賬號和密碼直接存儲在了一個數組中。按照“用戶名密碼"的格式，這在之後的登陸

結對編程隊友代碼分析

用戶登錄要求規範實現參數人的取出主函數顯示　　首先很高興能和我大哥（劉益同學）組成搭檔QAQ，希望接下來的時間能從大哥那裏學到更多的知識。此次項目，我是基於python來進行實現的，搭檔是基於C++來進行實現的。先來縱觀一下搭檔的代碼，當我拿到搭檔的代碼時

結對編程_partner代碼分析

屬性賬號其他 ner 源代碼提示信息中學大量模式優點：1.關鍵代碼處都有註釋，清晰簡潔2.基本上達到了樣例用戶登錄的需求、文件按格式輸出的需求、出題隨機性需求。3.文件控制部分、文件名時間輸出部分，耦合度低，可以在其他需要的地方使用。4.提供了需求之外的部分很

父與子的編程之旅【第二版】高清中文版PDF+高清英文版PDF+源代碼

img 經典 baidu ges ofo term 英文版分享圖片英文下載：https://pan.baidu.com/s/17jzBzVdQ2XMmRIrOZhMnDQ 《父與子的編程之旅【第二版】》高清中文版PDF+高清英文版PDF+源代碼高清中文版PDF，帶目

MapReduce 編程模板編寫【分析網站基本指標UV】程序

1.網站基本指標的幾個概念

PV: page view 瀏覽量

UV:unique visitor 獨立訪客數

VV: visit view 訪客的訪問次數

IP：獨立ip數

2.編寫MapReduce編程模板

Driver

Mapper

Reduce

3. 統計每個城市的UV數

4.MapReduce執行wordcount過程理解

input：默認從HDFS讀取數據

mapper: 分詞輸出

shuffle: 分區，分組，排序

Reduce ：每一條Keyvalue調用一次reduce方法

output：將reduce輸出寫入hdfs

相關推薦