Hadoop叢集部署實戰(cdh發行版)
阿新 • • 發佈:2018-12-27
投稿作者:趙海軍 現就職於一家創業公司任職運維兼DBA,曾就職於獵豹移動,負責資料庫團隊,運維前線作者之一。
一、概要
由於工作需要,最近一段時間開始接觸學習hadoop相關的東西,目前公司的實時任務和離線任務都跑在一個hadoop叢集,離線任務的特點就是每天定時跑,任務跑完了資源就空閒了,為了合理的利用資源,我們打算在搭一個叢集用於跑離線任務,計算節點和儲存節點分離,計算節點結合aws的Auto Scaling(自動擴容、縮容服務)以及競價例項,動態調整,在跑任務的時候拉起一批例項,任務跑完就自動釋放掉伺服器,本文記錄下hadoop叢集的搭建過程,方便自己日後檢視,也希望能幫到初學者,本文所有軟體都是通過yum安裝,大家也可以下載相應的二進位制檔案進行安裝,使用哪種方式安裝,從屬個人習慣。二、環境
1、角色介紹
10.10.103.246 namenode zkfc journalNode QuorumaPeerMain datanode resourcemanager nodemanager WebAppProxyServer JobHistoryServer
10.10.103.144 namenode zkfc journalNode QuorumaPeerMain datanode resourcemanager nodemanager WebAppProxyServer
10.10.103.62 zkfc journalNode QuorumaPeerMain datanode nodemanager
10.10.20.130 client
2、基礎環境說明三、配置部署
1、設定yum源 vim /etc/yum.repos.d/cloudera.repo [cloudera-cdh5-11-0] # Packages for Cloudera’s Distribution for Hadoop, Version 5.11.0, on RedHat or CentOS 6 x86_64 name=Cloudera’s Distribution for Hadoop, Version 5.11.0 baseurl=http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.11.0/ gpgkey=http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera gpgcheck=1 [cloudera-gplextras5b2] # Packages for Cloudera’s GPLExtras, Version 5.11.0, on RedHat or CentOS 6 x86_64 name=Cloudera’s GPLExtras, Version 5.11.0 baseurl=http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/5.11.0/ gpgkey=http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/RPM-GPG-KEY-cloudera gpgcheck=1 PS:我這裡安裝的5.11.0,如果想安裝低版本或者高版本,根據自己的需求修改版本號即可 2、安裝配置zookeeper叢集 yum -y install zookeeper zookeeper-server vi /etc/zookeeper/conf/zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper clientPort=2181 maxClientCnxns=0 server.1=10.10.103.144:2888:3888 server.2=10.10.103.226:2888:3888 server.3=10.10.103.62:2888:3888 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 mkdir /data/zookeeper #建立datadir目錄 /etc/init.d/zookeeper-server init #所有節點先初始化 echo 1 > /data/zookeeper/myid #10.10.103.144上操作 echo 2 > /data/zookeeper/myid #10.10.103.226上操作 echo 3 > /data/zookeeper/myid #10.10.103.62上操作 /etc/init.d/zookeeper-server #啟動服務 /usr/lib/zookeeper/bin/zkServer.sh status #檢視所有節點狀態,其中只有一個節點是Mode: leader就正常 了 3、安裝 a、10.10.103.246和10.10.103.144安裝 yum -y install hadoop hadoop-client hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-zkfc hadoop-hdfs-journalnode hadoop-hdfs-datanode hadoop-mapreduce-historyserver hadoop-yarn-nodemanager hadoop-yarn-proxyserver hadoop-yarn hadoop-mapreduce hadoop-yarn-resourcemanager hadoop-lzo* impala-lzo b、10.10.103.62上安裝 yum -y install hadoop hadoop-client hadoop-hdfs hadoop-hdfs-journalnode hadoop-hdfs-datanode hadoop-lzo* impala-lzo hadoop-yarn hadoop-mapreduce hadoop-yarn-nodemanagerPS:4、配置 a、建立目錄並設定許可權 mkdir -p /data/hadoop/dfs/nn #datanode上操作 chown hdfs:hdfs /data/hadoop/dfs/nn/ -R #datanode上操作 mkdir -p /data/hadoop/dfs/dn #namenode上操作 chown hdfs:hdfs /data/hadoop/dfs/dn/ -R #namenode上操作 mkdir -p /data/hadoop/dfs/jn #journalnode上操作 chown hdfs:hdfs /data/hadoop/dfs/jn/ -R #journalnode上操作 mkdir /data/hadoop/yarn -p #nodemanager上操作 chown yarn:yarn /data/hadoop/yarn -R #nodemanager上操作 b、撰寫配置檔案 5、服務啟動 a、啟動journalnode(三臺伺服器上都啟動) /etc/init.d/hadoop-hdfs-journalnode start b、格式化namenode(在其中一臺namenode10.10.103.246上操作) sudo -u hdfs hadoop namenode -format c、初始化zk中HA的狀態(在其中一臺namenode10.10.103.246上操作) sudo -u hdfs hdfs zkfc -formatZK d、初始化共享Edits檔案(在其中一臺namenode10.10.103.246上操作) sudo -u hdfs hdfs namenode -initializeSharedEdits e、啟動10.10.103.246上namenode /etc/init.d/hadoop-hdfs-namenode start f、同步源資料並啟動10.10.103.144上namenode sudo -u hdfs hdfs namenode -bootstrapStandby /etc/init.d/hadoop-hdfs-namenode start g、在兩臺namenode上啟動zkfc /etc/init.d/hadoop-hdfs-zkfc start h、啟動datanode(所有機器上操作) /etc/init.d/hadoop-hdfs-journalnode start i、在10.10.103.246上啟動WebAppProxyServer、JobHistoryServer、httpfs /etc/init.d/hadoop-yarn-proxyserver start /etc/init.d/hadoop-mapreduce-historyserver start /etc/init.d/hadoop-httpfs start j、在所有機器上啟動nodemanager /etc/init.d/hadoop-yarn-nodemanager restart1、一般小公司,計算節點(ResourceManager)和儲存節點(NameNode)的主節點部署在兩臺伺服器上做HA,計算節點(NodeManager)和儲存節點(DataNode)部署在多臺伺服器上,每臺伺服器上都啟動NodeManager和DataNode服務。 2、如果大叢集,可能需要計算資源和儲存資源分離,叢集的各個角色都有伺服器單獨部署,個人建議劃分如下: a、儲存節點 NameNode: 需要安裝hadoop hadoop-client hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-zkfc hadoop-lzo* impala-lzo DataNode: 需要安裝hadoop hadoop-client hadoop-hdfs hadoop-hdfs-datanode hadoop-lzo* impala-lzo QJM叢集: 需要安裝hadoop hadoop-hdfs hadoop-hdfs-journalnode zookeeper zookeeper-server b、計算節點 ResourceManager: 需要安裝hadoop hadoop-client hadoop-yarn hadoop-mapreduce hadoop-yarn-resourcemanager WebAppProxyServer: 需要安裝 hadoop hadoop-yarn hadoop-mapreduce hadoop-yarn-proxyserver JobHistoryServer: 需要安裝 hadoop hadoop-yarn hadoop-mapreduce hadoop-mapreduce-historyserver NodeManager: 需要安裝hadoop hadoop-client hadoop-yarn hadoop-mapreduce hadoop-yarn-nodemanager