1. 程式人生 > >Hadoop叢集部署實戰(cdh發行版)

Hadoop叢集部署實戰(cdh發行版)

投稿作者:趙海軍 現就職於一家創業公司任職運維兼DBA,曾就職於獵豹移動,負責資料庫團隊,運維前線作者之一。

一、概要

由於工作需要,最近一段時間開始接觸學習hadoop相關的東西,目前公司的實時任務和離線任務都跑在一個hadoop叢集,離線任務的特點就是每天定時跑,任務跑完了資源就空閒了,為了合理的利用資源,我們打算在搭一個叢集用於跑離線任務,計算節點和儲存節點分離,計算節點結合aws的Auto Scaling(自動擴容、縮容服務)以及競價例項,動態調整,在跑任務的時候拉起一批例項,任務跑完就自動釋放掉伺服器,本文記錄下hadoop叢集的搭建過程,方便自己日後檢視,也希望能幫到初學者,本文所有軟體都是通過yum安裝,大家也可以下載相應的二進位制檔案進行安裝,使用哪種方式安裝,從屬個人習慣。

二、環境

1、角色介紹 10.10.103.246 namenode   zkfc    journalNode  QuorumaPeerMain  datanode   resourcemanager    nodemanager    WebAppProxyServer  JobHistoryServer 10.10.103.144 namenode   zkfc    journalNode  QuorumaPeerMain  datanode   resourcemanager    nodemanager    WebAppProxyServer 10.10.103.62             zkfc    journalNode  QuorumaPeerMain  datanode                      nodemanager 10.10.20.130 client 2、基礎環境說明
a、系統版本 我們用的是aws的ec2,用的aws自己定製過的系統,不過和redhat基本相同,核心版本:4.9.20-10.30.amzn1.x86_64 b、java版本 java version “1.8.0_121” c、hadoop版本 hadoop-2.6.0 d、cdh版本 cdh5.11.0 e、關於主機名,因為我這裡用的aws的ec2,預設已有主機名,並且內網可以解析,故就不單獨做主機名的配置了,如果你的主機名內網不能解析,請一定要配置主機名,叢集內部通訊很多元件使用的是主機名

三、配置部署

1、設定yum源 vim /etc/yum.repos.d/cloudera.repo [cloudera-cdh5-11-0] # Packages for Cloudera’s Distribution for Hadoop, Version 5.11.0, on RedHat or CentOS 6 x86_64 name=Cloudera’s Distribution for Hadoop, Version 5.11.0 baseurl=http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.11.0/ gpgkey=http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera gpgcheck=1 [cloudera-gplextras5b2] # Packages for Cloudera’s GPLExtras, Version 5.11.0, on RedHat or CentOS 6 x86_64 name=Cloudera’s GPLExtras, Version 5.11.0 baseurl=http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/5.11.0/ gpgkey=http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/RPM-GPG-KEY-cloudera gpgcheck=1 PS:我這裡安裝的5.11.0,如果想安裝低版本或者高版本,根據自己的需求修改版本號即可 2、安裝配置zookeeper叢集 yum -y  install zookeeper zookeeper-server vi /etc/zookeeper/conf/zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper clientPort=2181 maxClientCnxns=0 server.1=10.10.103.144:2888:3888 server.2=10.10.103.226:2888:3888 server.3=10.10.103.62:2888:3888 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 mkdir /data/zookeeper              #建立datadir目錄 /etc/init.d/zookeeper-server init  #所有節點先初始化 echo 1 > /data/zookeeper/myid      #10.10.103.144上操作 echo 2 > /data/zookeeper/myid      #10.10.103.226上操作 echo 3 > /data/zookeeper/myid      #10.10.103.62上操作 /etc/init.d/zookeeper-server       #啟動服務 /usr/lib/zookeeper/bin/zkServer.sh status  #檢視所有節點狀態,其中只有一個節點是Mode: leader就正常 了 3、安裝 a、10.10.103.246和10.10.103.144安裝 yum -y install hadoop hadoop-client hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-zkfc hadoop-hdfs-journalnode hadoop-hdfs-datanode hadoop-mapreduce-historyserver hadoop-yarn-nodemanager hadoop-yarn-proxyserver  hadoop-yarn hadoop-mapreduce hadoop-yarn-resourcemanager hadoop-lzo* impala-lzo b、10.10.103.62上安裝 yum -y install hadoop hadoop-client hadoop-hdfs hadoop-hdfs-journalnode hadoop-hdfs-datanode  hadoop-lzo* impala-lzo hadoop-yarn hadoop-mapreduce hadoop-yarn-nodemanager
PS:
1、一般小公司,計算節點(ResourceManager)和儲存節點(NameNode)的主節點部署在兩臺伺服器上做HA,計算節點(NodeManager)和儲存節點(DataNode)部署在多臺伺服器上,每臺伺服器上都啟動NodeManager和DataNode服務。 2、如果大叢集,可能需要計算資源和儲存資源分離,叢集的各個角色都有伺服器單獨部署,個人建議劃分如下: a、儲存節點 NameNode: 需要安裝hadoop hadoop-client hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-zkfc hadoop-lzo* impala-lzo DataNode: 需要安裝hadoop hadoop-client hadoop-hdfs hadoop-hdfs-datanode  hadoop-lzo* impala-lzo QJM叢集: 需要安裝hadoop hadoop-hdfs  hadoop-hdfs-journalnode zookeeper zookeeper-server b、計算節點 ResourceManager: 需要安裝hadoop hadoop-client hadoop-yarn hadoop-mapreduce hadoop-yarn-resourcemanager WebAppProxyServer: 需要安裝 hadoop hadoop-yarn hadoop-mapreduce hadoop-yarn-proxyserver JobHistoryServer: 需要安裝 hadoop hadoop-yarn hadoop-mapreduce hadoop-mapreduce-historyserver NodeManager: 需要安裝hadoop hadoop-client hadoop-yarn hadoop-mapreduce hadoop-yarn-nodemanager
4、配置 a、建立目錄並設定許可權 mkdir -p /data/hadoop/dfs/nn             #datanode上操作 chown hdfs:hdfs /data/hadoop/dfs/nn/ -R  #datanode上操作 mkdir -p /data/hadoop/dfs/dn             #namenode上操作 chown hdfs:hdfs /data/hadoop/dfs/dn/ -R  #namenode上操作 mkdir -p /data/hadoop/dfs/jn             #journalnode上操作 chown hdfs:hdfs /data/hadoop/dfs/jn/ -R  #journalnode上操作 mkdir /data/hadoop/yarn -p               #nodemanager上操作 chown yarn:yarn  /data/hadoop/yarn  -R   #nodemanager上操作 b、撰寫配置檔案 5、服務啟動 a、啟動journalnode(三臺伺服器上都啟動) /etc/init.d/hadoop-hdfs-journalnode start b、格式化namenode(在其中一臺namenode10.10.103.246上操作) sudo -u hdfs hadoop namenode -format c、初始化zk中HA的狀態(在其中一臺namenode10.10.103.246上操作) sudo -u hdfs hdfs zkfc -formatZK d、初始化共享Edits檔案(在其中一臺namenode10.10.103.246上操作) sudo -u hdfs hdfs namenode -initializeSharedEdits e、啟動10.10.103.246上namenode /etc/init.d/hadoop-hdfs-namenode start f、同步源資料並啟動10.10.103.144上namenode sudo -u hdfs hdfs namenode -bootstrapStandby /etc/init.d/hadoop-hdfs-namenode start g、在兩臺namenode上啟動zkfc /etc/init.d/hadoop-hdfs-zkfc start h、啟動datanode(所有機器上操作) /etc/init.d/hadoop-hdfs-journalnode start i、在10.10.103.246上啟動WebAppProxyServer、JobHistoryServer、httpfs /etc/init.d/hadoop-yarn-proxyserver start /etc/init.d/hadoop-mapreduce-historyserver start /etc/init.d/hadoop-httpfs start j、在所有機器上啟動nodemanager /etc/init.d/hadoop-yarn-nodemanager restart

四、功能驗證

1、hadoop功能 a、檢視hdfs根目錄 [[email protected] ~]# hadoop fs -ls / Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 Found 3 items drwxr-xr-x   – hdfs   hdfs          0 2017-05-11 11:40 /tmp drwxrwx—   – mapred hdfs          0 2017-05-11 11:28 /user drwxr-xr-x   – yarn   hdfs          0 2017-05-11 11:28 /var b、上傳一個檔案到根目錄 [[email protected] ~]# hadoop fs -put /tmp/test.txt  / Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 [[email protected] ~]# hadoop fs -ls / Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 Found 4 items -rw-r–r–   2 root   hdfs         22 2017-05-11 15:47 /test.txt drwxr-xr-x   – hdfs   hdfs          0 2017-05-11 11:40 /tmp drwxrwx—   – mapred hdfs          0 2017-05-11 11:28 /user drwxr-xr-x   – yarn   hdfs          0 2017-05-11 11:28 /var c、直接刪除檔案不放回收站 [[email protected] ~]# hadoop fs -rm -skipTrash /test.txt Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 Deleted /test.txt d、跑一個wordcount用例 [[email protected] ~]# hadoop fs -put /tmp/test.txt /user/hdfs/rand/ Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 [[email protected] conf]# sudo -u hdfs  hadoop  jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.11.0.jar wordcount /user/hdfs/rand/ /tmp OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 17/05/11 11:40:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 10.10.103.246 17/05/11 11:40:09 INFO input.FileInputFormat: Total input paths to process : 1 17/05/11 11:40:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 17/05/11 11:40:09 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 674c65bbf0f779edc3e00a00c953b121f1988fe1] 17/05/11 11:40:09 INFO mapreduce.JobSubmitter: number of splits:1 17/05/11 11:40:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494472050574_0003 17/05/11 11:40:09 INFO impl.YarnClientImpl: Submitted application application_1494472050574_0003 17/05/11 11:40:09 INFO mapreduce.Job: The url to track the job: http://10.10.103.246:8100/proxy/application_1494472050574_0003/ 17/05/11 11:40:09 INFO mapreduce.Job: Running job: job_1494472050574_0003 17/05/11 11:40:15 INFO mapreduce.Job: Job job_1494472050574_0003 running in uber mode : false 17/05/11 11:40:15 INFO mapreduce.Job:  map 0% reduce 0% 17/05/11 11:40:20 INFO mapreduce.Job:  map 100% reduce 0% 17/05/11 11:40:25 INFO mapreduce.Job:  map 100% reduce 100% 17/05/11 11:40:25 INFO mapreduce.Job: Job job_1494472050574_0003 completed successfully 17/05/11 11:40:25 INFO mapreduce.Job: Counters: 53         File System Counters                 FILE: Number of bytes read=1897                 FILE: Number of bytes written=262703                 FILE: Number of read operations=0                 FILE: Number of large read operations=0                 FILE: Number of write operations=0                 HDFS: Number of bytes read=6431                 HDFS: Number of bytes written=6219                 HDFS: Number of read operations=6                 HDFS: Number of large read operations=0                 HDFS: Number of write operations=2         Job Counters                 Launched map tasks=1                 Launched reduce tasks=1                 Data-local map tasks=1                 Total time spent by all maps in occupied slots (ms)=2592                 Total time spent by all reduces in occupied slots (ms)=5360                 Total time spent by all map tasks (ms)=2592                 Total time spent by all reduce tasks (ms)=2680                 Total vcore-milliseconds taken by all map tasks=2592                 Total vcore-milliseconds taken by all reduce tasks=2680                 Total megabyte-milliseconds taken by all map tasks=3981312                 Total megabyte-milliseconds taken by all reduce tasks=8232960         Map-Reduce Framework                 Map input records=102                 Map output records=96                 Map output bytes=6586                 Map output materialized bytes=1893                 Input split bytes=110                 Combine input records=96                 Combine output records=82                 Reduce input groups=82                 Reduce shuffle bytes=1893                 Reduce input records=82                 Reduce output records=82                 Spilled Records=164                 Shuffled Maps =1                 Failed Shuffles=0                 Merged Map outputs=1                 GC time elapsed (ms)=120                 CPU time spent (ms)=1570                 Physical memory (bytes) snapshot=501379072                 Virtual memory (bytes) snapshot=7842639872                 Total committed heap usage (bytes)=525860864                 Peak Map Physical memory (bytes)=300183552                 Peak Map Virtual memory (bytes)=3244224512                 Peak Reduce Physical memory (bytes)=201195520                 Peak Reduce Virtual memory (bytes)=4598415360         Shuffle Errors                 BAD_ID=0                 CONNECTION=0                 IO_ERROR=0                 WRONG_LENGTH=0                 WRONG_MAP=0                 WRONG_REDUCE=0         File Input Format Counters                 Bytes Read=6321         File Output Format Counters                 Bytes Written=6219 [[email protected] conf]# 2、namenode高可用驗證 檢視http://10.10.103.246:50070 檢視http://10.10.103.144:50070 停掉10.10.103.246節點的namenode程序,檢視10.10.103.144節點是否會提升為active節點 3、resourcemanager高可用驗證 檢視http://10.10.103.246:8088 檢視http://10.10.103.144:8088 在瀏覽器輸入http://10.10.103.144:8088,會跳轉到http://ip-10-10-103-246.ec2.internal:8088/,ip-10-10-103-246.ec2.internal是10.10.103.246的主機名,說明resourcemanager高可用配置ok,停掉10.10.103.144的 resourcemanager程序,在瀏覽器輸入http://10.10.103.144:8088,就不會在跳轉了。

五、總結

1、hadoop叢集能成本部署完成,這才是開始,後期的維護,業務方問題的解決這些經驗需要一點一點積累,多出差多折騰總是好的。 2、對應上面部署的集群后期需要擴容,直接把10.10.103.62這臺機器做個映象,用映象啟動伺服器即可,服務會自動啟動並且加入到叢集 3、雲上hadoop叢集的成本優化,這裡只針對aws而言 a、冷資料存在在s3上,hdfs可以直接支援s3,在hdfs-site.xml裡面新增s3的key引數(fs.s3n.awsAccessKeyId和fs.s3n.awsSecretAccessKey)即可,需要注意的是程式上傳、下載的邏輯需要多加幾個重試機制,s3有時候不穩定會導致上傳或者下載不成功 b、使用Auto Scaling服務結合競價例項,配置擴充套件策略,比如當cpu大於50%的時候就擴容5臺伺服器,當cpu小於10%的時候就縮容5臺伺服器,當然你可以配置更多階梯級的擴容、縮容策略,Auto Scaling還有一個計劃任務的功能,你可以向設定crontab一樣設定,讓Auto Scaling幫你擴容、縮容伺服器。