騰訊雲部署hadoop
1.申請2-3臺騰訊雲伺服器
安全組編輯規則,允許ALL。這樣hadoop叢集伺服器才能訪問。
2.安裝JDK伺服器
jdk配置
**master,slave機器都要裝jdk**-(1)上傳JDK安裝包到虛擬機器/opt目錄
-(2)進入/opt目錄,執行命令“rpm -ivh jdk-8u151-linux-x64.rpm”安裝JDK(注意,自己手動輸入-ivh命令)
-(3)在vi /etc/profile新增
export JAVA_HOME=/usr/java/jdk1.8.0_151 export PATH=$PATH:$JAVA_HOME/bin 執行source /etc/profile 使配置生效
-(4)驗證JDK是否配置成功,執行命令“java -version”
3.設定SSH無密碼登入
ssh免密登入,很重要,否則不同機器無法訪問
(1)雲環境下主機ip設定,配置ip和master對映- master節點,
vi /etc/hosts
,修改後,執行source /etc/hosts
,具體參考這個網址。
master內網雲ip master
slave1公網ip slave1
- slave節點,
vi /etc/hosts
,修改後,執行source /etc/hosts
master內網雲ip master
slave1公網ip slave1
(2)生成私有金鑰id_rsa和公有金鑰id_rsa.pub兩個檔案ssh-keygen -t rsa
(3)遠端負責到各個機器
ssh-copy-id -i /root/.ssh/id_rsa.pub master//依次輸入yes,123456(root使用者的密碼)
ssh-copy-id -i /root/.ssh/id_rsa.pub slave1
ssh-copy-id -i /root/.ssh/id_rsa.pub slave2
(4)驗證是否設定無密碼登入,依次輸入
ssh slave1
ssh slave2
ssh slave3
注意,ssh配置存在問題,slave機器出現authorized_keys拒絕訪問問題。由於你重新生成公鑰,再次寫入authorized_keys,會被拒絕,可以在slave機器上,用chmod修改檔案許可權,也會拒絕可以用如下命令解決
chattr -i authorized_keys #對檔案解鎖
chmod 600 authorized_keys #修改檔案為可讀寫許可權
ssh-copy-id -i /root/.ssh/id_rsa.pub slave1 #slave生成新的公鑰
4.配置Hadoop叢集
配置Hadoop叢集
1.通過xmanager的Xftp上傳hadoop-2.6.5.tar.gz檔案到/opt目錄2.解壓縮hadoop-2.6.5.tar.gz 檔案,tar -zxf hadoop-2.6.5.tar.gz -C /usr/local
,解壓後即可,看到/usr/local/hadoop-2.6.5資料夾
3.配置Hadoop,進入目錄:cd /usr/local/hadoop-2.6.5/etc/hadoop/
4.依次修改下面的檔案:
(1) vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/log/hadoop/tmp</value>
</property>
</configuration>
(2)vi hadoop-env.sh ,export JAVA_HOME=/usr/java/jdk1.8.0_151
(3) vi hdfs-site.xml
注意:/data/hadoop/hdfs/name/裡面存放名稱節點資訊。/data/hadoop/hdfs/data/存放datanode節點資訊
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
注意:replication的value表示有幾臺slave,1臺就寫1
(4) mapred-site.xml,沒有這個檔案,執行復制 cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
(5)vi yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/hadoop/yarn/local</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/data/tmp/logs</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs/</value>
<description>URL for job history server</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
</configuration>
(6)vi yarn-env.sh, export JAVA_HOME=/usr/java/jdk1.8.0_151
5.vi slaves,刪除localhost,新增:
slave1
slave2
slave3
6.拷貝hadoop安裝檔案到叢集slave節點
scp -r /usr/local/hadoop-2.6.5 slave1:/usr/local
scp -r /usr/local/hadoop-2.6.5 slave2:/usr/local
scp -r /usr/local/hadoop-2.6.5 slave3:/usr/local
7.在/etc/profile新增Hadoop路徑
export HADOOP_HOME=/usr/local/hadoop-2.6.5
export PATH=$HADOOP_HOME/bin:$PATH
執行source /etc/profile
使修改生效
8.格式化NameNode,進入目錄,`cd /usr/local/hadoop-2.6.5/bin',執行格式化,'./hdfs namenode -format'
格式化不在改目錄,可以用hdfs命令:hdfs namenode -format
9.啟動叢集,進入目錄,cd /usr/local/hadoop-2.6.5/sbin
執行啟動:
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
或者./start-all.sh
關閉叢集的命令。./stop-all.sh
10.使用jps,檢視程序
master節點資訊如下
[root@master sbin]# jps
1765 NameNode
1929 SecondaryNameNode
2378 JobHistoryServer
2412 Jps
2077 ResourceManager
slave節點jps資訊如下:
[root@slave1 ~]# jps
1844 Jps
1612 DataNode
1711 NodeManager
11.在Windows下C:\Windows\System32\drivers\etc\hosts新增IP對映
公網騰訊雲ip master master.centos.com
公網騰訊雲ip slave1 slave1.centos.com
公網騰訊雲ip slave2 slave2.centos.com
公網騰訊雲ip slave3 slave3.centos.com
- 瀏覽器檢視:
http://master:50070
http://master:8088
5.常見問題
5.1 datanode 無法啟動
- 這個主要由於namenode多次格式化引起叢集id不匹配,參考這個網頁設定叢集id
cd /usr/local/hadoop-2.6.5/logs
ls #查詢datanode的log檔案檢視
cat hadoop-root-datanode-VM-0-14-centos.log #查詢cid值,複製後面內容
master節點進入vi /data/hadoop/hdfs/name/current/VERSION 更改clusterID為新的內容。
slave節點進入vi /data/hadoop/hdfs/data/current/VERSION 更改clusterID為新的內容。
- 或者刪除namenode,datanode下current資料夾.
1.master節點操作
cd /data/hadoop/hdfs/name/
rm -rf current
2.slave節點操作
cd /data/hadoop/hdfs/data/
rm -rf current