1. 程式人生 > 實用技巧 >騰訊雲部署hadoop

騰訊雲部署hadoop

1.申請2-3臺騰訊雲伺服器

安全組編輯規則,允許ALL。這樣hadoop叢集伺服器才能訪問。

2.安裝JDK伺服器

jdk配置 **master,slave機器都要裝jdk**

-(1)上傳JDK安裝包到虛擬機器/opt目錄
-(2)進入/opt目錄,執行命令“rpm -ivh jdk-8u151-linux-x64.rpm”安裝JDK(注意,自己手動輸入-ivh命令)
-(3)在vi /etc/profile新增

export JAVA_HOME=/usr/java/jdk1.8.0_151
export PATH=$PATH:$JAVA_HOME/bin
執行source /etc/profile 使配置生效

-(4)驗證JDK是否配置成功,執行命令“java -version”

3.設定SSH無密碼登入

ssh免密登入,很重要,否則不同機器無法訪問 (1)雲環境下主機ip設定,配置ip和master對映
  • master節點,vi /etc/hosts,修改後,執行source /etc/hosts,具體參考這個網址
master內網雲ip master
slave1公網ip slave1
  • slave節點,vi /etc/hosts,修改後,執行source /etc/hosts
master內網雲ip master
slave1公網ip slave1

(2)生成私有金鑰id_rsa和公有金鑰id_rsa.pub兩個檔案ssh-keygen -t rsa

#接著按三次Enter鍵,ssh-keygen用來生成RSA型別的金鑰以及管理該金鑰,引數“-t”用於指定要建立的SSH金鑰的型別為RSA
(3)遠端負責到各個機器

ssh-copy-id -i /root/.ssh/id_rsa.pub master//依次輸入yes,123456(root使用者的密碼)
ssh-copy-id -i /root/.ssh/id_rsa.pub slave1
ssh-copy-id -i /root/.ssh/id_rsa.pub slave2

(4)驗證是否設定無密碼登入,依次輸入

ssh slave1
ssh slave2
ssh slave3

注意,ssh配置存在問題,slave機器出現authorized_keys拒絕訪問問題。由於你重新生成公鑰,再次寫入authorized_keys,會被拒絕,可以在slave機器上,用chmod修改檔案許可權,也會拒絕可以用如下命令解決

  chattr -i authorized_keys #對檔案解鎖
  chmod 600 authorized_keys #修改檔案為可讀寫許可權
  ssh-copy-id -i /root/.ssh/id_rsa.pub slave1 #slave生成新的公鑰

4.配置Hadoop叢集

配置Hadoop叢集 1.通過xmanager的Xftp上傳hadoop-2.6.5.tar.gz檔案到/opt目錄

2.解壓縮hadoop-2.6.5.tar.gz 檔案,tar -zxf hadoop-2.6.5.tar.gz -C /usr/local ,解壓後即可,看到/usr/local/hadoop-2.6.5資料夾

3.配置Hadoop,進入目錄:cd /usr/local/hadoop-2.6.5/etc/hadoop/

4.依次修改下面的檔案:

(1) vi core-site.xml

<configuration>
    <property>
    <name>fs.defaultFS</name>  
      <value>hdfs://master:8020</value>  
      </property>  
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/var/log/hadoop/tmp</value>
    </property>
</configuration>

(2)vi hadoop-env.sh ,export JAVA_HOME=/usr/java/jdk1.8.0_151
(3) vi hdfs-site.xml
注意:/data/hadoop/hdfs/name/裡面存放名稱節點資訊。/data/hadoop/hdfs/data/存放datanode節點資訊

<configuration>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data/hadoop/hdfs/data</value>
</property>
<property>
     <name>dfs.namenode.secondary.http-address</name>
     <value>master:50090</value>
</property>
<property>
     <name>dfs.replication</name>
     <value>3</value>
</property>
</configuration>

注意:replication的value表示有幾臺slave,1臺就寫1
(4) mapred-site.xml,沒有這個檔案,執行復制 cp mapred-site.xml.template mapred-site.xml

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<!-- jobhistory properties -->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
</property>
<property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>master:19888</value>
</property>
</configuration>

(5)vi yarn-site.xml

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>    
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>${yarn.resourcemanager.hostname}:8090</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/hadoop/yarn/local</value>
  </property>
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/data/tmp/logs</value>
  </property>
<property> 
 <name>yarn.log.server.url</name> 
 <value>http://master:19888/jobhistory/logs/</value>
 <description>URL for job history server</description>
</property>
<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
 <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
<property>  
        <name>yarn.nodemanager.resource.memory-mb</name>  
        <value>2048</value>  
 </property>  
 <property>  
        <name>yarn.scheduler.minimum-allocation-mb</name>  
        <value>512</value>  
 </property>   
 <property>  
        <name>yarn.scheduler.maximum-allocation-mb</name>  
        <value>4096</value>  
 </property> 
 <property> 
    <name>mapreduce.map.memory.mb</name> 
    <value>2048</value> 
 </property> 
 <property> 
    <name>mapreduce.reduce.memory.mb</name> 
    <value>2048</value> 
 </property> 
 <property> 
    <name>yarn.nodemanager.resource.cpu-vcores</name> 
    <value>1</value> 
 </property>
</configuration>

(6)vi yarn-env.sh, export JAVA_HOME=/usr/java/jdk1.8.0_151

5.vi slaves,刪除localhost,新增:

slave1
slave2
slave3

6.拷貝hadoop安裝檔案到叢集slave節點
scp -r /usr/local/hadoop-2.6.5 slave1:/usr/local
scp -r /usr/local/hadoop-2.6.5 slave2:/usr/local
scp -r /usr/local/hadoop-2.6.5 slave3:/usr/local

7.在/etc/profile新增Hadoop路徑

export HADOOP_HOME=/usr/local/hadoop-2.6.5
export PATH=$HADOOP_HOME/bin:$PATH

執行source /etc/profile使修改生效

8.格式化NameNode,進入目錄,`cd /usr/local/hadoop-2.6.5/bin',執行格式化,'./hdfs namenode -format'

格式化不在改目錄,可以用hdfs命令:hdfs namenode -format

9.啟動叢集,進入目錄,cd /usr/local/hadoop-2.6.5/sbin
執行啟動:

./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver

或者./start-all.sh

關閉叢集的命令。./stop-all.sh

10.使用jps,檢視程序
master節點資訊如下

[root@master sbin]# jps
1765 NameNode
1929 SecondaryNameNode
2378 JobHistoryServer
2412 Jps
2077 ResourceManager

slave節點jps資訊如下:

[root@slave1 ~]# jps
1844 Jps
1612 DataNode
1711 NodeManager

11.在Windows下C:\Windows\System32\drivers\etc\hosts新增IP對映

公網騰訊雲ip master master.centos.com
公網騰訊雲ip slave1 slave1.centos.com
公網騰訊雲ip slave2 slave2.centos.com
公網騰訊雲ip slave3 slave3.centos.com
  1. 瀏覽器檢視:
http://master:50070
http://master:8088

5.常見問題

5.1 datanode 無法啟動

cd /usr/local/hadoop-2.6.5/logs
ls #查詢datanode的log檔案檢視
cat hadoop-root-datanode-VM-0-14-centos.log #查詢cid值,複製後面內容
master節點進入vi /data/hadoop/hdfs/name/current/VERSION 更改clusterID為新的內容。
slave節點進入vi /data/hadoop/hdfs/data/current/VERSION 更改clusterID為新的內容。
  • 或者刪除namenode,datanode下current資料夾.
1.master節點操作
cd /data/hadoop/hdfs/name/
rm -rf current
2.slave節點操作
cd /data/hadoop/hdfs/data/
rm -rf current

6.參考資料

-亞馬遜雲部署hadoop