Hadoop之——CentOS + hadoop2.5.2分散式環境配置
一、基礎環境準備
系統:(VMWare) CentOS-6.5-x86_64-bin-DVD1.iso
hadoop版本:hadoop-2.5.2
jdk版本:jdk-7u72-linux-x64.tar.gz
1.叢集機器
三臺測試叢集,一個master(liuyazhuang-01),兩個slave(liuyazhuang-02,liuyazhuang-03)
/etc/hosts
192.168.1.112 liuyazhuang-01
192.168.1.113 liuyazhuang-02
192.168.1.114 liuyazhuang-03
注意不要保留127.0.0.1 localhost
配置同步到其他兩臺機器
scp /etc/hosts [email protected]:/etc/hosts
scp /etc/hosts [email protected]:/etc/hosts
2. 設定linux上ssh是使用者可以自動登入
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
3.java環境配置
1)JAVA_HOME為/usr/local/java/jdk1.7.0_72
2)/etc/profile檔案追加以下內容
JAVA_HOME=/usr/local/jdk1.7.0_72 CLASS_PATH=$JAVA_HOME/lib PATH=$JAVA_HOME/bin:$PATH export PATH JAVA_HOME CLASS_PATH
二、下載解壓hadoop-2.5.2.tar.gz
[email protected]:~/data$ pwd
/home/hadoop/data
[email protected]:~/data$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz
[email protected]:~/data$tar zxvf hadoop-2.5.2.tar.gz
三、配置環境變數
[email protected]:~/data$gedit /etc/profile
追加內容如下:
#HADOOP VARIABLES START
export HADOOP_INSTALL=/home/hadoop/data/hadoop-2.5.2
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
使配置生效[email protected]:~/data$source /etc/profile
同時需要修改$HADOOP_HOME/etc/hadoop/hadoop-env.shexport JAVA_HOME=/usr/local/java/jdk1.7.0_72
四、修改$HADOOP_HOME/etc/hadoop/core-site.xml
新增如下內容:
<property>
<name>fs.default.name</name>
<value>hdfs://liuyazhuang-01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/hadoop-${user.name}</value>
</property>
五、修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
新增如下內容:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>liuyazhuang-01</value>
</property>
更多yarn-site.xml引數配置可參考:http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
六、修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
預設沒有mapred-site.xml檔案,copy mapred-site.xml.template 一份為 mapred-site.xml即可
#cp etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
新增如下內容:<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
七、配置hdfs-site.xml (這裡可以不配,採用預設引數)
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
用來配置叢集中每臺主機都可用,指定主機上作為namenode和datanode的目錄。
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/name1,/home/hadoop/data/hadoop-2.5.2/name2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/data1,/home/hadoop/data/hadoop-2.5.2/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
八、配置salves
告訴hadoop 其他從節點,這樣,只要主節點啟動,他會自動啟動其他機器上的nameNode dataNode 等等
編輯 $HADOOP_HOME/etc/hadoop/slaves
內容如下:
liuyazhuang-02
liuyazhuang-03
九、同步同步該資料夾 到其他各個從主機上即可
因為我們使用ssh免登陸 不需要使用密碼
[email protected]:~/data/hadoop-2.5.2$scp -r /home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/hadoop-2.5.2
[email protected]:~/data/hadoop-2.5.2$scp -r /home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/hadoop-2.5.2
十、格式化hdfs
[email protected]:~/data/hadoop-2.5.2$./bin/hdfs namenode -format
或者
[email protected]:~ hadoop namenode -format
十一、啟動hadoop叢集
[email protected]:~/data/hadoop-2.5.2$./sbin/start-dfs.sh
[email protected]:~/data/hadoop-2.5.2$./sbin/start-yarn.sh
十二、瀏覽器檢視
瀏覽器開啟 http://liuyazhuang-01:50070/,會看到hdfs管理頁面
瀏覽器開啟 http://liuyazhuang-01:8088/,會看到hadoop程序管理頁面
瀏覽器開啟 http://liuyazhuang-01:8088/cluster 檢視cluster情況
十三、驗證(WordCount驗證)
1.dfs上建立input目錄
[email protected]:~/data/hadoop-2.5.2$bin/hadoop fs -mkdir -p input
2.把hadoop目錄下的README.txt拷貝到dfs新建的input裡
[email protected]:~/data/hadoop-2.5.2$bin/hadoop fs -copyFromLocal README.txt input
3.執行WordCount
[email protected]:~/data/hadoop-2.5.2$bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.5.2-sources.jar org.apache.hadoop.examples.WordCount input output
4.執行完畢後,檢視單詞統計結果
[email protected]:~/data/hadoop-2.5.2$bin/hadoop fs -cat output/*
假如程式的輸出路徑為output,如果該資料夾已經存在,先刪除[email protected]:~/data/hadoop-2.5.2$bin/hadoop dfs -rmr output
參考資料:
Ubuntu14.04下安裝Hadoop2.4.0 (單機模式)
http://www.cnblogs.com/kinglau/p/3794433.html
Ubuntu14.04下安裝Hadoop2.4.0 (偽分佈模式)
http://www.cnblogs.com/kinglau/p/3796164.html
偽分佈模式下執行wordcount例項時報錯解決辦法
http://www.cnblogs.com/kinglau/p/3364928.html
Eclipse下搭建Hadoop2.4.0開發環境
http://www.cnblogs.com/kinglau/p/3802705.html
Hadoop學習三十:Win7 Eclipse除錯Centos Hadoop2.2-Mapreduce
http://zy19982004.iteye.com/blog/2024467
hadoop2.5.0 centOS系列 分散式的安裝 部署
http://my.oschina.net/yilian/blog/310189
Centos6.5原始碼編譯安裝Hadoop2.5.1
http://www.myhack58.com/Article/sort099/sort0102/2014/54025.htm
Hadoop MapReduce兩種常見的容錯場景分析
http://www.chinacloud.cn/show.aspx?id=15793&cid=17
hadoop 2.2.0叢集安裝
http://blog.csdn.net/bluishglc/article/details/24591185
Apache Hadoop 2.2.0 HDFS HA + YARN多機部署
http://blog.csdn.net/u010967382/article/details/20380387
Hadoop叢集配置(最全面總結)
http://blog.csdn.net/hguisu/article/details/7237395
Hadoop hdfs-site.xml 配置項清單
http://he.iori.blog.163.com/blog/static/6955953520138107638208/
http://slaytanic.blog.51cto.com/2057708/1101111
Hadoop三種安裝模式
http://blog.csdn.net/liumm0000/article/details/13408855