Ubuntu Hadoop 完全分散式搭建
系統:Ubuntu16.04
jdk:jdk1.8.0_101
Hadoop:hadoop2.7.3
首先需要有兩臺以上的計算機,一臺作為Master節點,其他的作為Slave節點,所有的伺服器上都需要配置好jdk的環境,
我這裡準備了兩臺伺服器作為節點
Master 192.168.92.129
Slave1 192.168.92.130
首先修改Master節點的配置
sudo vim /etc/hosts
在裡面新增上
192.168.92.129 Master
192.168.92.130 Slave1
(當然在Slave1節點上也需要添上)
然後我們來實現Master節點免密碼登陸Slave節點
在Master節點的~/.ssh 目錄下,存在檔案id_rsa.pub,通過ssh把這個檔案傳給Slave1節點
scp ~/.ssh/id_rsa.pub hadoop@Slave1:/home/hadoop
然後我們在Slave1節點上操作
mkdir ~/.ssh
cat ~/id_rsa.pub >> authorizde_keys
回到Master節點
對免密碼登陸進行測試
ssh Slave1
如果沒有提示輸入密碼,直接登陸說明配置成功
然後我們需要修改Hadoop的配置檔案
首先是core-site.xml檔案
vim /usr/lib/hadoop/etc/hadoop/core-site.xml
開啟,然後在<configuration></configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name >hadoop.tmp.dir</name>
<value>file:/usr/lib/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
然後是hdfs-site.xml
vim /usr/lib/hadoop/etc/hadoop/hdfs-site.xml
同樣插入
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>127.0.0.1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/lib/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/lib/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
mapred-site.xml
vim /usr/lib/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>127.0.0.1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>127.0.0.1:19888</value>
</property>
yarn-site.xml
vim /usr/lib/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
然後我們需要在主節點中新增子節點的資訊
需要在slaves檔案中進行新增
vim /usr/lib/hadoop/etc/hadoop/slaves
這個檔案是用來儲存dataNode的節點資訊,檔案裡面原有localhost,可以刪除,也可以不刪除(這樣master節點既有NameNode又有DataNode)
當然我們需要在這個之後加上
Slave1
這樣Slave1上的DataNode才會啟動
最後我們修改hadoop-env.sh
中的JAVA_HOME配置
vim /usr/lib/hadoop/etc/hadoop/hadoop-env.sh
找到JAVA_HOME改為
export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_101
然後把配置好的Hadoop通過ssh傳送到Slave節點上
scp -r /usr/lib/hadoop hadoop@Slave1:/home/hadoop
然後在Slave1上把hadoop放到和Master相同的目錄下
格式化hdfs
/usr/lib/hadoop/bin/hdfs namenode -format
啟動hadoop
/usr/lib/hadoop/sbin/start-dfs.sh
/usr/lib/hadoop/sbin/start-yarn.sh
/usr/lib/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
這個時候在Master節點上執行jps會出現
JobHistorySever
SecondaryNameNode
Jps
ResourceManager
NameNode
在Slave節點上會出現
Jps
DataNode
NodeManager
然後我們需要在HDFS上建立目錄
hdfs dfs -mkdir /user/hadoop
hdfs dfs -mkdir input
在本地建立一個words檔案,裡面放入一些字元
word
edmond
monkey
broewning
king
...
把words文件放到HDFS上
hdfs dfs -put words input
我們執行hadoop自帶的example測試是否能正常執行
hadoop jar /usr/lib/hadoop/share/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output
然後會出現類似於:
16/10/13 12:55:19 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
16/10/13 12:55:19 INFO input.FileInputFormat: Total input paths to process : 1
16/10/13 12:55:19 INFO mapreduce.JobSubmitter: number of splits:1
16/10/13 12:55:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476329370564_0003
16/10/13 12:55:20 INFO impl.YarnClientImpl: Submitted application application_1476329370564_0003
16/10/13 12:55:20 INFO mapreduce.Job: The url to track the job: http://15ISK:8088/proxy/application_1476329370564_0003/
16/10/13 12:55:20 INFO mapreduce.Job: Running job: job_1476329370564_0003
16/10/13 12:55:25 INFO mapreduce.Job: Job job_1476329370564_0003 running in uber mode : false
16/10/13 12:55:25 INFO mapreduce.Job: map 0% reduce 0%
16/10/13 12:55:29 INFO mapreduce.Job: map 100% reduce 0%
16/10/13 12:55:33 INFO mapreduce.Job: map 100% reduce 100%
16/10/13 12:55:33 INFO mapreduce.Job: Job job_1476329370564_0003 completed successfully
16/10/13 12:55:33 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=221
FILE: Number of bytes written=238271
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=283
HDFS: Number of bytes written=171
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1771
Total time spent by all reduces in occupied slots (ms)=2005
Total time spent by all map tasks (ms)=1771
Total time spent by all reduce tasks (ms)=2005
Total vcore-milliseconds taken by all map tasks=1771
Total vcore-milliseconds taken by all reduce tasks=2005
Total megabyte-milliseconds taken by all map tasks=1813504
Total megabyte-milliseconds taken by all reduce tasks=2053120
Map-Reduce Framework
Map input records=13
Map output records=12
Map output bytes=204
Map output materialized bytes=221
Input split bytes=120
Combine input records=12
Combine output records=11
Reduce input groups=11
Reduce shuffle bytes=221
Reduce input records=11
Reduce output records=11
Spilled Records=22
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=101
CPU time spent (ms)=1260
Physical memory (bytes) snapshot=459825152
Virtual memory (bytes) snapshot=3895697408
Total committed heap usage (bytes)=353370112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=163
File Output Format Counters
Bytes Written=171