Docker實戰之安裝配置Hadoop-2.5.2完全分散式叢集
阿新 • • 發佈:2018-11-05
環境配置
VM:VMware Workstation
OS:Ubuntu 14.04 LTS
Hadoop:hadoop-2.5.2
Hadoop叢集規劃
172.17.0.2 hadoop-master
172.17.0.3 hadoop-slave1
172.17.0.4 hadoop-slave2
基於Dockerfile構建Hadoop基礎映象
建立Dockerfile檔案,內容如下 :
構建基礎映象:FROM ubuntu:14.04 MAINTAINER Rain <> ENV REFRESHED_AT 2016-09-15 RUN apt-get update RUN apt-get install -y openssh-server openssh-client ADD jdk-7u80-linux-x64.tar.gz /usr/local/ ENV JAVA_HOME /usr/local/jdk1.7.0_80 ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar ENV PATH $PATH:$JAVA_HOME/bin RUN addgroup hadoop RUN useradd -m hadoop -g hadoop -p qazwsx # RUN sudo usermod -aG sudo hadoop ADD hadoop-2.5.2.tar.gz /usr/local/ RUN chown -R hadoop:hadoop /usr/local/hadoop-2.5.2 RUN cd /usr/local && ln -s ./hadoop-2.5.2 hadoop ENV HADOOP_PREFIX /usr/local/hadoop ENV HADOOP_HOME /usr/local/hadoop ENV HADOOP_COMMON_HOME /usr/local/hadoop ENV HADOOP_HDFS_HOME /usr/local/hadoop ENV HADOOP_MAPRED_HOME /usr/local/hadoop ENV HADOOP_YARN_HOME /usr/local/hadoop ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop RUN cd /etc/sudoers.d && sudo touch nopasswdsudo && echo "hadoop ALL=(ALL) NOPASSWD : ALL" >> nopasswdsudo RUN mkdir /var/run/sshd USER hadoop RUN ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys EXPOSE 22
$ sudo docker build -t="rain:hadoop-base" .
從映象啟動容器:
$ sudo docker run -t -i rain:hadoop-base /bin/bash
基於Dockerfile構建Hadoop主映象
建立Dockerfile檔案,內容如下 :
FROM rain:hadoop-base MAINTAINER Rain <> ENV REFRESHED_AT 2016-09-14 ADD hadoop-env.sh $HADOOP_HOME/etc/hadoop/ ADD mapred-env.sh $HADOOP_HOME/etc/hadoop/ ADD yarn-env.sh $HADOOP_HOME/etc/hadoop/ ADD core-site.xml $HADOOP_HOME/etc/hadoop/ ADD hdfs-site.xml $HADOOP_HOME/etc/hadoop/ ADD mapred-site.xml $HADOOP_HOME/etc/hadoop/ ADD yarn-site.xml $HADOOP_HOME/etc/hadoop/ ADD slaves $HADOOP_HOME/etc/hadoop/ RUN sudo chown -R hadoop:hadoop $HADOOP_HOME/etc/hadoop RUN sudo mkdir -p /opt/hadoop/data #RUN cd /opt && sudo mkdir hadoop && cd hadoop && sudo mkdir data RUN sudo chown -R hadoop:hadoop /opt/hadoop WORKDIR /home/hadoop COPY bootstrap.sh /home/hadoop/ RUN sudo chown -R hadoop:hadoop /home/hadoop RUN sudo chmod 766 /home/hadoop/bootstrap.sh ENTRYPOINT ["/home/hadoop/bootstrap.sh"]
構建Hadoop主映象:
$ sudo docker build -t="rain:hadoop-master" .
啟動容器:
$ sudo docker run --name hadoop-master -h hadoop-master -d -P -p 50070:50070 -p 8088:8088 rain:hadoop-master
建立Dockerfile檔案,內容同Hadoop主映象。
編輯bootstrap.sh用於啟動ssh:
#!/bin/bash sudo /usr/sbin/sshd -D
構建Hadoop從映象:
$ sudo docker build -t="rain:hadoop-slave" .
啟動容器:
$ sudo docker run -t -i --name hadoop-slave1 -h hadoop-slave1 -d rain:hadoop-slave
$ sudo docker run -t -i --name hadoop-slave2 -h hadoop-slave2 -d rain:hadoop-slave
與Hadoop Master和Slave互動
操作命令如下:
docker exec -it hadoop-master /bin/bash
docker exec -it hadoop-slave1 /bin/bash
docker exec -it hadoop-slave2 /bin/bash
配置Host編寫指令碼:
[email protected]:~$ vi run_hosts.sh
內容如下:
#!/bin/bash
echo 172.17.0.2 hadoop-master >> /etc/hosts
echo 172.17.0.3 hadoop-slave1 >> /etc/hosts
echo 172.17.0.4 hadoop-slave2 >> /etc/hosts
執行指令碼:
[email protected]:~$ chmod +x run_hosts.sh
[email protected]:~$ sudo ./run_hosts.sh
複製指令碼到其它兩個從節點,並執行指令碼:
[email protected]:~$ scp run_hosts.sh [email protected]:/home/hadoop
[email protected]:~$ scp run_hosts.sh [email protected]:/home/hadoop
Hadoop叢集操作:
檔案格式化:
$ bin/hdfs namenode -format
啟動叢集:
$ sbin/start-all.sh
節點啟動成功後,可以在相應節點上看到下面程序:
[email protected]:/usr/local/hadoop$ jps
534 ResourceManager
888 Jps
400 SecondaryNameNode
181 NameNode
[email protected]:~$ jps
196 NodeManager
63 DataNode
318 Jps
[email protected]:~$ jps
156 NodeManager
63 DataNode
268 Jps
通過宿主機訪問WEB控制檯:
http://宿主機IP:50070
http://宿主機IP:8088
介面效果圖:
Hadoop配置檔案:
hadoop-env.sh:
編輯內如下:
export JAVA_HOME=/usr/local/jdk1.7.0_80
core-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master/</value>
</property>
</configuration>
hdfs-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>12000000</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>5000000000</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>128m</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>60</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
</property>
</configuration>
mapred-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master:19888</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<!--<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>-->
</configuration>
yarn-site.xml:
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
<discription>The amount of physical memory (in MB) that may be allocated to containers being run by the node manager.</discription>
</property>
-->
<!--<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>100000</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>60</value>
</property>
-->
</configuration>
slaves:
hadoop-slave1
hadoop-slave2
---------------------------結束--------------------------