大數據Hadoop Streaming編程實戰之C++、Php、Python

阿新 • • 發佈：2018-04-02

大數據編程 PHP語言 Python編程 C語言的應用

Streaming框架允許任何程序語言實現的程序在HadoopMapReduce中使用，方便已有程序向Hadoop平臺移植。因此可以說對於hadoop的擴展性意義重大。接下來我們分別使用C++、Php、Python語言實現HadoopWordCount。

　　實戰一：C++語言實現Wordcount

　　代碼實現：

　　1）C++語言實現WordCount中的Mapper，文件命名為mapper.cpp，以下是詳細代碼

　　#include

　　usingnamespacestd;

　　intmain{

　　stringkey;

　　stringvalue="1";

　　while(cin>>key){

　　cout<}

　　return0;

　　}

　　2）C++語言實現WordCount中的Reducer，文件命名為reducer.cpp，以下是詳細代碼

　　#include

　　usingnamespacestd;

　　intmain{

　　stringkey;

　　stringvalue;

　　mapword2count;

　　map::iteratorit;

　　while(cin>>key){

　　cin>>value;

　　it=word2count.find(key);

　　if(it!=word2count.end){

　　(it->second)++;

　　}

　　else{

　　word2count.insert(make_pair(key,1));

　　}

　　for(it=word2count.begin;it!=word2count.end;++it){

　　cout}

　　return0;

　　}

　　測試運行C++實現Wordcount的具體步驟

　　1）在線安裝C++

　　在Linux環境下，如果沒有安裝C++，需要我們在線安裝C++

　　yum-yinstallgcc-c++

　　2）對c++文件編譯，生成可執行文件

　　我們通過以下命令將C++程序編譯成可執行文件，然後才能夠運行

　　g++-omappermapper.cpp

　　g++-oreducerreducer.cpp

　　3）本地測試

　　集群運行C++版本的WordCount之前，首先要在Linux本地測試運行，調試成功，確保程序在集群中正常運行，測試運行命令如下：

　　catdjt.txt|./mapper|sort|./reducer

　　4）集群運行

　　切換到hadoop安裝目錄下，提交C++版本的WordCount作業，進行單詞統計。

　　hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

　　-Dmapred.reduce.tasks=2

　　-mapper"./mapper"

　　-reducer"./reducer"

　　-filemapper

　　-filereducer

　　-input/dajiangtai/djt.txt

　　-output/dajiangtai/out

　　如果最終出現想要的結果，說明C++語言成功實現Wordcount

　　實戰二：Php語言實現Wordcount

　　代碼實現：

　　1）Php語言實現WordCount中的Mapper，文件命名為wc_mapper.php，以下是詳細代碼

　　#!/usr/bin/php

　　error_reporting(E_ALL^E_NOTICE);

　　$word2count=array;

　　while(($line=fgets(STDIN))!==false){

　　$line=trim($line);

　　$words=preg_split(‘/\W/‘,$line,0,PREG_SPLIT_NO_EMPTY);

　　foreach($wordsas$word){

　　echo$word,chr(9),"1",PHP_EOL;

　　}

　　2）Php語言實現WordCount中的Reducer，文件命名為wc_reducer.php，以下是詳細代碼

　　#!/usr/bin/php

　　error_reporting(E_ALL^E_NOTICE);

　　$word2count=array;

　　while(($line=fgets(STDIN))!==false){

　　$line=trim($line);

　　list($word,$count)=explode(chr(9),$line);

　　$count=intval($count);

　　$word2count[$word]+=$count;

　　}

　　foreach($word2countas$word=>$count){

　　echo$word,chr(9),$count,PHP_EOL;

　　}

　　測試運行Php實現Wordcount的具體步驟

　　1）在線安裝Php

　　在Linux環境下，如果沒有安裝Php，需要我們在線安裝Php環境

　　yum-yinstallphp

　　2）本地測試

　　集群運行Php版本的WordCount之前，首先要在Linux本地測試運行，調試成功，確保程序在集群中正常運行，測試運行命令如下：

　　catdjt.txt|phpwc_mapper.php|sort|phpwc_reducer.php

　　3）集群運行

　　切換到hadoop安裝目錄下，提交Php版本的WordCount作業，進行單詞統計。

　　hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

　　-Dmapred.reduce.tasks=2

　　-mapper"phpwc_mapper.php"

　　-reducer"phpwc_reducer.php"

　　-filewc_mapper.php

　　-filewc_reducer.php

　　-input/dajiangtai/djt.txt

　　-output/dajiangtai/out

　　如果最終出現想要的結果，說明Php語言成功實現Wordcount

　　實戰三：Python語言實現Wordcount

　　代碼實現：

　　1）Python語言實現WordCount中的Mapper，文件命名為Mapper.py，以下是詳細代碼

　　#!/usr/java/hadoop/envpython

　　importsys

　　word2count={}

　　forlineinsys.stdin:

　　line=line.strip

　　words=filter(lambdaword:word,line.split)

　　forwordinwords:

　　print‘%s\t%s‘%(word,1)

　　2）Python語言實現WordCount中的Reducer，文件命名為Reducer.py，以下是詳細代碼

　　#!/usr/java/hadoop/envpython

　　fromoperatorimportitemgetter

　　importsys

　　word2count={}

　　forlineinsys.stdin:

　　line=line.strip

　　word,count=line.split

　　try:

　　count=int(count)

　　word2count[word]=word2count.get(word,0)+count

　　exceptValueError:

　　pass

　　sorted_word2count=sorted(word2count.items,key=itemgetter(0))

　　forword,countinsorted_word2count:

　　print‘%s\t%s‘%(word,count)

　　測試運行Python實現Wordcount的具體步驟

　　1）在線安裝Python

　　在Linux環境下，如果沒有安裝Python，需要我們在線安裝Python環境

　　yum-yinstallpython27

　　2）本地測試

　　集群運行Python版本的WordCount之前，首先要在Linux本地測試運行，調試成功，確保程序在集群中正常運行，測試運行命令如下：

　　catdjt.txt|pythonMapper.py|sort|pythonReducer.py

　　3）集群運行

　　切換到hadoop安裝目錄下，提交Python版本的WordCount作業，進行單詞統計。

　　hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

　　-Dmapred.reduce.tasks=2

　　-mapper"pythonMapper.py"

　　-reducer"pythonReducer.py"

　　-fileMapper.py

　　-fileReducer.py

　　-input/dajiangtai/djt.txt

　　-output/dajiangtai/out

　　如果最終出現想要的結果，說明Python語言成功實現Wordcount

大數據Hadoop Streaming編程實戰之C++、Php、Python

大數據編程 PHP語言 Python編程 C語言的應用 Streaming框架允許任何程序語言實現的程序在HadoopMapReduce中使用，方便已有程序向Hadoop平臺移植。因此可以說對於hadoop的擴展性意義重大。接下來我們分別使用C++、Php、Python語言實現HadoopWo

大數據Hadoop Streaming編程實戰之C++、Php、Python

大數據Hadoop Streaming編程實戰之C++、Php、Python

大數據學習——shell編程

大數據MapReduce 編程實戰

學習筆記之T-SQL插入數據INSERT語法和數據庫編程實戰技巧[圖]

Solidity編程五之數據類型

【Java並發編程實戰】—–“J.U.C”：ReentrantLock之二lock方法分析

阿裏巴巴Java 開發手冊編程規約之MySQL 數據庫

大數據Hadoop學習之搭建Hadoop平臺（2.1）

牛客網編程練習之編程馬拉松：數據庫連接池

[Java並發編程實戰] 共享對象之可見性

[Java 並發編程實戰] 集合框架之同步容器類 & 並發容器類

編程開發之--Oracle數據庫--存儲過程使用動態參數綁定（3）

基於Hadoop離線大數據分析平臺項目實戰

爬蟲兇猛：爬支付寶、爬微信、竊取數據/編程實戰（2）：爬蟲架構

大數據hadoop入門之hadoop家族詳解

分享《Python數據可視化編程實戰》中文版PDF+英文版PDF+源代碼

《Python數據可視化編程實戰》中文版PDF+英文版PDF+源代碼

福大軟工1816 · 團隊現場編程實戰（抽獎系統）之拖鞋旅遊隊

基於Hadoop大數據分析應用場景與實戰

大數據hadoop新手入門視頻教程培訓 hadoop新手最新實戰教程

大數據Hadoop Streaming編程實戰之C++、Php、Python

相關推薦