【大資料入門二——yarn和mapreduce】

阿新 • • 發佈：2018-11-19

連續幾天夜裡加餐，讓我想起了新兵連的夜訓，在你成為合格戰士之前，你必須經歷新兵連的過程，，，，其實每個行業都有一個屬於它自己的新兵連，不經歷此處的磨練，你難以在這個行業立足，我承認先天的資本，但我更相信後天的努力，也許有的人奮鬥一生都沒有達到他人的起點，我為他人荒廢人生而感到可恥，為此人奮鬥一生而感到幸福，我們即使渺小，我也要努力綻放，苔花如米小，也學牡丹開！
————————————————前言：送給在所有崗位上努力拼搏的你
1.入門
HDFS 儲存
MapReduce 計算
Spark Flink
Yarn 資源作業排程

偽分散式部署
要求環境配置檔案引數檔案 ssh無密碼啟動

jps命令
[[email protected] ~]$ jps
28288 NameNode NN
27120 Jps
28410 DataNode DN
28575 SecondaryNameNode SNN

1.MapReduce job on Yarn
[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$

Configure parameters as follows:
etc/hadoop/mapred-site.xml:

mapreduce.framework.name yarn etc/hadoop/yarn-site.xml: yarn.nodemanager.aux-services mapreduce_shuffle Start ResourceManager daemon and NodeManager daemon: $ sbin/start-yarn.sh

open web:------------

3.執行MR JOB
Linux 檔案儲存系統 mkdir ls
HDFS 分散式檔案儲存系統
-format
hdfs dfs -???

Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/
Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output ‘dfs[a-z.]+’
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:

$ bin/hdfs dfs -get output output
$ cat output/*
or

View the output files on the distributed filesystem:

$ bin/hdfs dfs -cat output/*

bin/hdfs dfs -mkdir /user/hadoop/input
bin/hdfs dfs -put etc/hadoop/core-site.xml /user/hadoop/input

bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar
grep
/user/hadoop/input
/user/hadoop/output
‘fs[a-z.]+’

4.HDFS三個程序啟動以hadoop002啟動
NN: core-site.xml fs.defaultFS引數
DN: slaves
SNN:

dfs.namenode.secondary.http-address hadoop001:50090 dfs.namenode.secondary.https-address hadoop001:50091

5.jps
[[email protected] hadoop-2.6.0-cdh5.7.0]$ jps
16188 DataNode
16379 SecondaryNameNode
16566 Jps
16094 NameNode
[[email protected] hadoop-2.6.0-cdh5.7.0]$

5.1 位置
[[email protected] hadoop-2.6.0-cdh5.7.0]$ which jps
/usr/java/jdk1.7.0_80/bin/jps
[[email protected] hadoop-2.6.0-cdh5.7.0]$

5.2 其他使用者
[[email protected] ~]# jps
16188 – process information unavailable
16607 Jps
16379 – process information unavailable
16094 – process information unavailable
[[email protected] ~]#

[[email protected] ~]# useradd jepson
[[email protected] ~]# su - jepson
[[email protected] ~]$ jps
16664 Jps
[[email protected] ~]$

process information unavailable
真正可用的

[[email protected] ~]# kill -9 16094
[[email protected] ~]#
[[email protected] ~]# jps
16188 – process information unavailable
16379 – process information unavailable
16702 Jps
16094 – process information unavailable
[[email protected] ~]#
[[email protected] ~]# ps -ef|grep 16094
root 16722 16590 0 22:19 pts/4 00:00:00 grep 16094
[[email protected] ~]#
process information unavailable
真正不可用的

正確的做法: process information unavailable
1.找到程序號 pid
2.ps -ef|grep pid 是否存在
3.假如存在，
第二步是可以知道哪個使用者執行這個程序，
su - 使用者，進去檢視

假如刪除rm -f /tmp/hsperfdata_${user}/pid檔案
程序不掛，但是jps命令不顯示了，所依賴的指令碼都會有問題

4.假如不存在，怎樣清空殘留資訊
rm -f /tmp/hsperfdata_${user}/pid檔案

6.補充命令
ssh [email protected] -p 22
ssh root IP地址 date

rz sz

兩個Linux系統怎樣傳輸呢？
hadoop000–>hadoop002
[[email protected] ~]$ scp test.log root IP地址:/tmp/
將當前的Linux系統檔案 scp到遠端的機器上

hadoop000<–hadoop002
[[email protected] ~]$ scp test.log [email protected]:/tmp/

但是 hadoop002屬於生產機器你不可登陸
scp root IP地址:/tmp/test.log /tmp/rz.log

但是: 生產上絕對不可能給你密碼

ssh多臺機器互相信任關係

坑:
scp 傳輸 pub檔案
/etc/hosts檔案裡面配置多臺機器的ip和name

這裡是新兵連，這裡是教導隊，這裡是集訓隊，這裡是你開始脫變的起點，從不拒絕，從不害怕每一次磨礪的過程，因為這個過程會讓你知道，兵到兵王有多大的差距，過程不好受，舒服的話早就爛大街，他也失去了它應有的價值！
————————————————結束語：送給各行各業努力向兵王奮鬥的你

【大資料入門二——yarn和mapreduce】

【大資料入門二——yarn和mapreduce】

大資料入門（9）mapreduce計算wordcount的程式編寫

【大資料技術】3.Mapreduce和Yarn

大資料開發之Hadoop篇----配置yarn和mapreduce

day06.Hadoop快速入門&雲服務三種模式IaaS，PaaS和SaaS【大資料教程】

【若澤大資料實戰第十五天】關於HDFS、YARN及MapReduce的總結

大資料（二十五）：Sqoop的介紹和安裝

大資料（二十六）：Sqoop的import、export命令和命令指令碼

大資料入門（15）hive簡介和配置

大資料入門（14）hadoop+yarn+zookeeper叢集搭建

大資料入門（11）mr自定義分組和切片劃分

大資料入門（7）RPC客戶端和RPC服務端通訊

大資料入門環境搭建整理、大資料入門系列教程合集、大資料生態圈技術整理彙總、大資料常見錯誤合集、大資料的離線和實時資料處理流程分析

10小時入門大資料（二）------初識Hadoop

大資料入門（24）kafka和storm的結合例項

大資料入門（21）storm和kafka結合的例項

大資料（二十七）：Sqoop常用命令和公用引數

【大資料】以航空大資料為例，一窺企業資料架構規劃和治理之道

【專治不明覺厲】之“大資料” Hadoop，Spark和Storm

[DataAnalysis]資料分析和大資料入門推薦書單

【大資料入門二——yarn和mapreduce】

相關推薦