0003.搭建Hadoop的環境
阿新 • • 發佈:2020-10-16
目錄
03-01-Hadoop的目錄結構和本地模式
解壓安裝包
tar -zxvf hadoop-2.7.3.tar.gz -C /root/training tar -zxvf jdk-8u144-linux-x64.tar.gz -C /root/training tar -zxvf apache-hive-2.3.0-bin.tar.gz -C /root/training tar -zxvf hbase-1.3.1-bin.tar.gz -C /root/training
環境變數/etc/profile
JAVA_HOME=/root/training/jdk1.8.0_144 export PATH=$JAVA_HOME/bin:$PATH HADOOP_HOME=/root/training/hadoop-2.7.3 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH HBASE_HOME=/root/training/hbase-1.3.1 export HBASE_HOME PATH=$HBASE_HOME/bin:$PATH export PATH HIVE_HOME=/root/training/apache-hive-2.3.0-bin export HIVE_HOME PATH=$HIVE_HOME/bin:$PATH export PATH
使環境變數生效:
source /etc/profile
檢視目錄:
[rootebigdatalil training]# tree -d -L 2
-d 表示只檢視目錄
-L 檢視深度為兩級
Hadoop的目錄結構.png
本地模式:
特點:沒有HDFS,只能測試MapReduce程式(不是執行在Yarn中,做一個獨立的Java程式來執行)
搭建步驟:修改 /root/training/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME} 改為 export JAVA_HOME=/root/training/jdk1.8.0_144
測試本地模式MapReduce程式
rm -rf * 表示刪除當前目錄下的所有檔案。
root@ubuntu:~/temp# pwd
/root/temp
root@ubuntu:~/temp# nano data.txt
root@ubuntu:~/temp# nano data.txt
root@ubuntu:~/temp# cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# ls hadoop-mapreduce-examples-2.7.3.jar
hadoop-mapreduce-examples-2.7.3.jar
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /root/temp/input/data.txt /root/temp/output/wc
檢視結果:
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# cd /root/temp/output/wc
root@ubuntu:~/temp/output/wc# ls -al
total 20
drwxr-xr-x 2 root root 4096 Oct 16 11:17 .
drwxr-xr-x 3 root root 4096 Oct 16 11:17 ..
-rw-r--r-- 1 root root 55 Oct 16 11:17 part-r-00000
-rw-r--r-- 1 root root 12 Oct 16 11:17 .part-r-00000.crc
-rw-r--r-- 1 root root 0 Oct 16 11:17 _SUCCESS
-rw-r--r-- 1 root root 8 Oct 16 11:17 ._SUCCESS.crc
root@ubuntu:~/temp/output/wc# ls
part-r-00000 _SUCCESS
root@ubuntu:~/temp/output/wc# nano part-r-00000
root@ubuntu:~/temp/output/wc# echo part-r-00000
part-r-00000
root@ubuntu:~/temp/output/wc# cat part-r-00000
Beijing 2
China 2
I 2
capital 1
is 1
love 2
of 1
the 1
檢視結果.png
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /root/temp/input/data.txt /root/temp/output/wc
其中/root/temp/input/data.txt 可以寫目錄,路徑都是本地Linux的路徑
03-02-配置Hadoop的偽分佈模式
特點:在單機上,模擬一個分散式的環境,具備Hadoop的所有功能
HDFS:NameNode + DataNode + SecondaryNameNode
Yarn:ResourceManager + NodeManager
解壓安裝包
同上
環境變數/etc/profile
同上
配置檔案.png
(1)修改 /root/training/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
改為
export JAVA_HOME=/root/training/jdk1.8.0_144
(2)hdfs-site.xml
<!--配置資料塊的冗餘度,預設是3-->
<!--原則冗餘度跟資料節點個數保持一致,最大不要超過3-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--是否開啟HDFS的許可權檢查,預設是true-->
<!--使用預設值,後面會改為false-->
<!--
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
-->
~~~
(3)core-site.xml
~~~
<!--配置HDFS主節點的地址,就是NameNode的地址-->
<!--9000是RPC通訊的埠-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.16.143:9000</value>
</property>
<!--HDFS資料塊和元資訊儲存在作業系統的目錄位置-->
<!--預設是Linux的tmp目錄,一定要修改-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/training/hadoop-2.7.3/tmp</value>
</property>
~~~
自己建立/root/training/hadoop-2.7.3/tmp目錄
(4)mapred-site.xml(預設沒有這個檔案)
~~~
<!--MR程式執行容器或者框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
~~~
(5)yarn-site.xml
~~~
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.16.143</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
~~~
(6)對HDFS的NameNode進行格式化
~~~
命令:hdfs namenode -format
日誌:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
~~~
(7)啟動:
~~~
HDFS:start-dfs.sh
Yarn: start-yarn.sh
統一的:start-all.sh
~~~
~~~
root@bigdata00:~/training/hadoop-2.7.3# jps
2690 NameNode
3219 ResourceManager
3544 NodeManager
3582 Jps
2863 DataNode
3071 SecondaryNameNode
root@bigdata00:~/training/hadoop-2.7.3#
~~~
(8)web console 訪問
Web Console訪問:
hdfs: 埠: 192.168.16.143:50070
yarn: 埠:192.168.16.143:8088
##### hdfs: 埠50070.png
![](0003.搭建Hadoop的環境.assets/50070.png)
##### yarn: 埠8088.png
![](0003.搭建Hadoop的環境.assets/8088.png)
-----------------------------------------------------------------
#### 03-03-免密碼登入的原理和配置
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
~~~
root@bigdata00:~# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
9b:77:c7:5c:ef:b3:85:ac:61:24:4d:30:dc:1c:f3:ed root@bigdata00
The key's randomart image is:
+--[ RSA 2048]----+
| .ooo. |
| .ooo . |
| . . .|
| o . |
| S . o E|
| o o + o.|
| o . + * o|
| . o + o.|
| . .+|
+-----------------+
root@bigdata00:~# ls
tools training
root@bigdata00:~# ls -al
total 40
drwx------ 7 root root 4096 Oct 16 12:19 .
drwxr-xr-x 23 root root 4096 Oct 15 20:44 ..
-rw------- 1 root root 55 Oct 16 08:11 .Xauthority
-rw-r--r-- 1 root root 3106 Apr 19 2012 .bashrc
drwx------ 2 root root 4096 Oct 16 07:46 .cache
drwxr-xr-x 2 root root 4096 Oct 16 12:15 .oracle_jre_usage
-rw-r--r-- 1 root root 140 Apr 19 2012 .profile
drwx------ 2 root root 4096 Oct 16 12:42 .ssh
drwxr-xr-x 2 root root 4096 Oct 16 07:57 tools
drwxr-xr-x 6 root root 4096 Oct 16 11:39 training
root@bigdata00:~# cd .ssh
root@bigdata00:~/.ssh# ls -al
total 20
drwx------ 2 root root 4096 Oct 16 12:42 .
drwx------ 7 root root 4096 Oct 16 12:19 ..
-rw------- 1 root root 1675 Oct 16 12:42 id_rsa
-rw-r--r-- 1 root root 396 Oct 16 12:42 id_rsa.pub
-rw-r--r-- 1 root root 666 Oct 16 12:20 known_hosts
root@bigdata00:~/.ssh# cd ..
root@bigdata00:~# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[email protected]'s password:
Now try logging into the machine, with "ssh '[email protected]'", and check in:
~/.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
root@bigdata00:~# ls .ssh/
authorized_keys id_rsa id_rsa.pub known_hosts
root@bigdata00:~# more .ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/XCppmAEL6AnXoYXlmTr639AupthLny6JQ4zF9Jpg
S4mhycZCrHpVCxhERV9p+HzNFPRZBaWluseCOkzbAXbmMsXSucXcrbV+wyg0el+CHuDopJZ4JiAPjK8t
AnSPK1bdggCAVGaI138pU81YMgOntX3gV49CcIEGx9KFF4wLaPMq/PJrr9+omYhkTF50i+oHwl+bG2DL
GZFmJuk3nxF+rsGEHwdDCfBtcoa1f7Si4BA7gf0dEXBlydPMeYM48rgK0XAgNReBZJWBTooGkSXuxHy1
jccIiwH9G+mlZI38WI7YRIx6HZIwzfpG8yVTXahdPamC2MJ+w54dj0jKyVUL root@bigdata00
root@bigdata00:~# ssh 192.168.16.143
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.11.0-15-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Fri Oct 16 12:47:00 CST 2020
System load: 0.0 Processes: 395
Usage of /: 41.5% of 6.50GB Users logged in: 2
Memory usage: 74% IP address for eth0: 192.168.16.143
Swap usage: 0%
Graph this data and manage this system at:
https://landscape.canonical.com/
0 packages can be updated.
0 updates are security updates.
Last login: Fri Oct 16 08:11:46 2020 from 192.168.16.1
root@bigdata00:~# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [192.168.16.143]
192.168.16.143: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
root@bigdata00:~# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [192.168.16.143]
192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out
~~~
##### 免密碼登入的原理.png
![](0003.搭建Hadoop的環境.assets/免密碼登入的原理.png)
##### 偽分佈模式wordcount
主要命令:
~~~
hdfs dfs -put data.txt /input
cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc2
其中/input/data.txt /output/wc2 為hdfs 地址,其中/output/wc2不能事先存在。
~~~
~~~
root@bigdata00:~# jps
1710 Jps
root@bigdata00:~# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [192.168.16.143]
192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out
root@bigdata00:~# jps
2291 SecondaryNameNode
1894 NameNode
2887 Jps
2602 NodeManager
2447 ResourceManager
2047 DataNode
root@bigdata00:~# hdfs dfs -ls /
root@bigdata00:~# hdfs dfs -mkdir /input
root@bigdata00:~# hdfs dfs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2020-10-16 13:35 /input
root@bigdata00:~# cd /root/temp/input
root@bigdata00:~/temp/input# ls
data.txt
root@bigdata00:~/temp/input# hdfs dfs -put data.txt /input
root@bigdata00:~/temp/input# hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 1 root supergroup 60 2020-10-16 13:36 /input/data.txt
root@bigdata00:~/temp/input# cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc2
20/10/16 13:37:52 INFO client.RMProxy: Connecting to ResourceManager at /192.168.16.143:8032
20/10/16 13:37:54 INFO input.FileInputFormat: Total input paths to process : 1
20/10/16 13:37:54 INFO mapreduce.JobSubmitter: number of splits:1
20/10/16 13:37:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1602826392719_0001
20/10/16 13:37:55 INFO impl.YarnClientImpl: Submitted application application_1602826392719_0001
20/10/16 13:37:55 INFO mapreduce.Job: The url to track the job: http://192.168.16.143:8088/proxy/application_1602826392719_0001/
20/10/16 13:37:55 INFO mapreduce.Job: Running job: job_1602826392719_0001
20/10/16 13:38:13 INFO mapreduce.Job: Job job_1602826392719_0001 running in uber mode : false
20/10/16 13:38:13 INFO mapreduce.Job: map 0% reduce 0%
20/10/16 13:38:25 INFO mapreduce.Job: map 100% reduce 0%
20/10/16 13:38:34 INFO mapreduce.Job: map 100% reduce 100%
20/10/16 13:38:36 INFO mapreduce.Job: Job job_1602826392719_0001 completed successfully
20/10/16 13:38:36 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=93
FILE: Number of bytes written=237535
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=166
HDFS: Number of bytes written=55
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9394
Total time spent by all reduces in occupied slots (ms)=6930
Total time spent by all map tasks (ms)=9394
Total time spent by all reduce tasks (ms)=6930
Total vcore-milliseconds taken by all map tasks=9394
Total vcore-milliseconds taken by all reduce tasks=6930
Total megabyte-milliseconds taken by all map tasks=9619456
Total megabyte-milliseconds taken by all reduce tasks=7096320
Map-Reduce Framework
Map input records=3
Map output records=12
Map output bytes=108
Map output materialized bytes=93
Input split bytes=106
Combine input records=12
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=93
Reduce input records=8
Reduce output records=8
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=270
CPU time spent (ms)=3420
Physical memory (bytes) snapshot=286212096
Virtual memory (bytes) snapshot=4438401024
Total committed heap usage (bytes)=138043392
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=60
File Output Format Counters
Bytes Written=55
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadooproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# cd /root/trainingroot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /inproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /inproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /in
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /output
Found 1 items
drwxr-xr-x - root supergroup 0 2020-10-16 13:38 /output/wc2
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /output/wc2
Found 2 items
-rw-r--r-- 1 root supergroup 0 2020-10-16 13:38 /output/wc2/_SUCCESS
-rw-r--r-- 1 root supergroup 55 2020-10-16 13:38 /output/wc2/part-r-00000
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -cat /output/wc2/part-r-00000
Beijing 2
China 2
I 2
capital 1
is 1
love 2
of 1
the 1
~~~
-----------------------------------------------------------------
#### 03-04-搭建Hadoop的全分佈模式
##### SecureCRT同時給多個Session.png
![](0003.搭建Hadoop的環境.assets/SecureCRT同時給多個Session.png)
##### ##### 設定主機名和IP nano /etc/hosts
##### 至少需要3臺機器叢集的規劃
192.168.16.141 bigdata01 NameNode + SecondaryNameNode + ResourceManager
192.168.16.138 bigdata02 DataNode + NodeManager
192.168.16.139 bigdata03 DataNode + NodeManager
##### 配置免密碼登入:兩兩之間的免密碼登入
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
##### 配置
###### 全分佈模式的主節點配置.png
![](0003.搭建Hadoop的環境.assets/全分佈模式的配置.png)
0. 解壓java、hadoop
1. 3 臺的java、hadoop環境變數,使生效
2. hadoop-env.sh
3. hdfs-site.xml
4. core-site.xml
5. mapred-site.xml
6. yarn-site.xml
7. slaves 配置從節點地址
* 192.168.16.138
* 192.168.16.139
8. 對namenode進行格式化
* hdfs namenode -format
9. 把192.168.16.141上安裝好的目錄複製到從節點上
* scp -r /root/training/jdk1.8.0_144 [email protected]:/root/training
* scp -r /root/training/jdk1.8.0_144 [email protected]:/root/training
* scp -r /root/training/hadoop-2.7.3/ [email protected]:/root/training
* scp -r /root/training/hadoop-2.7.3/ [email protected]:/root/training
10.start-all.sh
~~~
[ rootebigdatal12 training]# start-all.sh
This script is Deprecated. Instead use start-dfs. sh and start-yarn. sh
Starting namenodes on [ bigdatal12]
bigdatal12: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdatal12. out
bigdatal13: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdatal13. out
bigdatal14: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdatal14. out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdatal12. out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdatal12. out bigdatal14: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdatal14. outbigdatal13: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdatal13. out
~~~
~~~
[root@bigdatal12 training]#jps
13254 NameNode
13433 SecondaryNameNode
13578 ResourceManager
13835 Jps
~~~
~~~
[rootebigdata113 training]# jps
11847 DataNode
11943 Nodelanager
12043 Jps
~~~
~~~
[root@bigdata114 training# jps
11744 Jps
11548 Datalode
11644 Nodelanager
~~~
-----------------------------------------------------------------
#### 03-05-主從結構的單點故障