linux-hadoop叢集搭建
A、系統:
centos7.2 |
|
hadoop-2.6.0-cdh5.15.1 |
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.15.1.tar.gz |
B、角色分配(修改/etc/hostname,/etc/hosts):
192.168.2.199 bigdata0000.tfpay.com bigdata000
192.168.2.201
192.168.2.202 bigdata02.tfpay.com bigdata02
bigdata000 NameNode DataNode ResourceManager Master
bigdata01 DataNode NodeManageer
bigdata02 DataNode NodeManageer
所需檔案:
CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel
CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel.sha1
cloudera-manager-centos7-cm5.15.1_x86_64.tar.gz
creat_sh.sh
hadoop-2.6.0-cdh5.15.1.tar.gz
hadoop-native-64-2.6.0.tar
jdk-8u191-linux-x64.rpm
manifest.json
MySQL-5.6.26-1.linux_glibc2.5.x86_64.rpm-bundle.tar
mysql-connector-java-5.1.47-bin.jar
mysql-connector-java-5.1.47.zip
mysql-connector-java-6.0.2.jar
setup.sh
C、環境搭建
一、ssh配置免密碼
ssh-keygen -t rsa(所有節點)
ssh-copy-id -i ~/.ssh/id_rsa.pub bigdata01
ssh-copy-id -i ~/.ssh/id_rsa.pub bigdata02
ssh-copy-id -i ~/.ssh/id_rsa.pub bigdata000
驗證:
ssh bigdata000
ssh bigdata01
ssh bigdata02
二、JDK安裝:
rpm -ivh --prefix=/app/ ./jdk-8u191-linux-x64.rpm
配置環境變數:
在~/.bash_profile寫入
export JAVA_HOME=/app/jdk1.8.0_191-amd64/
export PATH=$PATH:$JAVA_HOME/bin
生效環境變數:
source ~/.bash_profile
驗證:
java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
javac -version
javac 1.8.0_191
三、叢集搭建
解壓hadoop
mkdir /app
chmod 777 /app
tar -zxvf ./hadoop-2.6.0-cdh5.15.1.tar.gz -C /app/
tar -xvf /mnt/bi/hadoop-native-64-2.6.0.tar -C /app/hadoop-2.6.0-cdh5.15.1/lib/native/
配置環境變數:
在~/.bash_profile寫入
export HADOOP_HOME=/app/hadoop-2.6.0-cdh5.15.1
export PATH=$PATH:$HADOOP_HOME/bin
生效環境變數:
source ~/.bash_profile
驗證:
hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
.....................
配置hadoop-env.sh和core-site.xml
etc/hadoop/hadoop-env.sh
寫入
export JAVA_HOME=/app/jdk1.8.0_191-amd64/
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<!--<name>fs.default.name</name>-->
<!--不能加:hdfs://bigdata000:8020,不知道為何-->
<value>hdfs://bigdata000</value>
</property>
<!--新增一個臨時檔案,重啟時候不會刪除-->
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop-2.6.0-cdh5.15.1/tmp</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<!--
副本系數預設為三個,偽分散式環境下一般不用修改
-->
<!--
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
-->
<!--
配置namenode路徑
-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/name</value>
</property>
<!--
配置datanode路徑
-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/data</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml:
<!--
使用mapreduce
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata000</value>
</property>
</configuration>
etc/hadoop/mapred-site.xml(需要從etc/mapred-site.xml.template複製):
<configuration>
<!--
mapreduce使用的框架
-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
etc/hadoop/slaves寫入slave node的host:
bigdata000
bigdata01
bigdata02
四、master分配到slaves
scp -r /app [email protected]:/
scp -r /root/.bash_profile [email protected]:/root
scp -r /app [email protected]:/
scp -r /root/.bash_profile [email protected]:/root
五、格式化NameNode
hdfs namenode -format
六、開啟關閉
/app/hadoop-2.6.0-cdh5.15.1/sbin/stop-all.sh
/app/hadoop-2.6.0-cdh5.15.1/sbin/start-all.sh
使用jps驗證:
bigdata000:
3620 ResourceManager
3717 NodeManager
5461 Jps
3450 SecondaryNameNode
3197 NameNode
3294 DataNode
bigdata01:
1923 NodeManager
1819 DataNode
2253 Jps
bigdata02:
1639 DataNode
2071 Jps
1743 NodeManager
使用Web Interfaces驗證:
使用命令列驗證:
[[email protected] ~]# hadoop fs -put /tmp/yarn-root-nodemanager.pid
[[email protected] ~]# hadoop fs -ls /
Found 1 items
-rw-r--r-- 3 root supergroup 5 2018-11-17 01:56 /yarn-root-nodemanager.pid
七、使用hadoop叢集
八、異常
多次執行格式化時會出錯(hdfs namenode -format):
a、datanode無法啟動成功
重新格式化後NameNode的clusterID變化,與DataNode中的不一致
修改方法:從/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/name/current/VERSION中獲取clusterID=CID-c043fc46-adf6-4ad9-ab73-a66e75e32567,將其修改至每一個/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/data/current/VERSION中,重啟叢集
b、web ui中只顯示一個datanode
slaves從master scp時,/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/data/current/VERSION也拷貝過去了,每個DataNode的storageID都一致,只能顯示一個,需修改每個storageID,重啟叢集