1. 程式人生 > >Docker實戰之安裝配置Hadoop-2.5.2完全分散式叢集

Docker實戰之安裝配置Hadoop-2.5.2完全分散式叢集

環境配置


VM:VMware Workstation

OS:Ubuntu 14.04  LTS

Hadoop:hadoop-2.5.2


Hadoop叢集規劃


172.17.0.2    hadoop-master

172.17.0.3    hadoop-slave1

172.17.0.4    hadoop-slave2


基於Dockerfile構建Hadoop基礎映象


建立Dockerfile檔案,內容如下 :

FROM ubuntu:14.04
MAINTAINER Rain <>

ENV REFRESHED_AT 2016-09-15
RUN apt-get update
RUN apt-get install -y openssh-server openssh-client

ADD jdk-7u80-linux-x64.tar.gz /usr/local/

ENV JAVA_HOME /usr/local/jdk1.7.0_80
ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
ENV PATH $PATH:$JAVA_HOME/bin

RUN addgroup hadoop
RUN useradd -m hadoop -g hadoop -p qazwsx
# RUN sudo usermod -aG sudo hadoop

ADD hadoop-2.5.2.tar.gz /usr/local/

RUN chown -R hadoop:hadoop /usr/local/hadoop-2.5.2 
RUN cd /usr/local && ln -s ./hadoop-2.5.2 hadoop

ENV HADOOP_PREFIX /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV HADOOP_MAPRED_HOME /usr/local/hadoop
ENV HADOOP_YARN_HOME /usr/local/hadoop
ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop

RUN cd /etc/sudoers.d && sudo touch nopasswdsudo && echo "hadoop ALL=(ALL) NOPASSWD : ALL" >> nopasswdsudo

RUN mkdir /var/run/sshd 

USER hadoop

RUN ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

EXPOSE 22
構建基礎映象:

$ sudo docker build -t="rain:hadoop-base" .

從映象啟動容器:

$ sudo docker run -t -i rain:hadoop-base /bin/bash


基於Dockerfile構建Hadoop主映象

建立Dockerfile檔案,內容如下 :

FROM rain:hadoop-base
MAINTAINER Rain <>

ENV REFRESHED_AT 2016-09-14

ADD hadoop-env.sh $HADOOP_HOME/etc/hadoop/
ADD mapred-env.sh $HADOOP_HOME/etc/hadoop/
ADD yarn-env.sh $HADOOP_HOME/etc/hadoop/
ADD core-site.xml $HADOOP_HOME/etc/hadoop/
ADD hdfs-site.xml $HADOOP_HOME/etc/hadoop/
ADD mapred-site.xml $HADOOP_HOME/etc/hadoop/
ADD yarn-site.xml $HADOOP_HOME/etc/hadoop/
ADD slaves $HADOOP_HOME/etc/hadoop/

RUN sudo chown -R hadoop:hadoop $HADOOP_HOME/etc/hadoop
RUN sudo mkdir -p /opt/hadoop/data

#RUN cd /opt && sudo mkdir hadoop && cd hadoop && sudo mkdir data
RUN sudo chown -R hadoop:hadoop /opt/hadoop

WORKDIR /home/hadoop

COPY bootstrap.sh /home/hadoop/
RUN sudo chown -R hadoop:hadoop /home/hadoop
RUN sudo chmod 766 /home/hadoop/bootstrap.sh

ENTRYPOINT ["/home/hadoop/bootstrap.sh"]

構建Hadoop主映象:

$ sudo docker build -t="rain:hadoop-master" .
啟動容器:
$ sudo docker run --name hadoop-master -h hadoop-master -d -P -p 50070:50070 -p 8088:8088 rain:hadoop-master


基於Dockerfile構建Hadoop從映象


建立Dockerfile檔案,內容同Hadoop主映象。

編輯bootstrap.sh用於啟動ssh:

#!/bin/bash
sudo /usr/sbin/sshd -D

構建Hadoop從映象:

$ sudo docker build -t="rain:hadoop-slave" .
啟動容器:
$ sudo docker run -t -i --name hadoop-slave1 -h hadoop-slave1 -d rain:hadoop-slave
$ sudo docker run -t -i --name hadoop-slave2 -h hadoop-slave2 -d rain:hadoop-slave


與Hadoop Master和Slave互動

操作命令如下

docker exec -it hadoop-master /bin/bash
docker exec -it hadoop-slave1 /bin/bash
docker exec -it hadoop-slave2 /bin/bash

配置Host編寫指令碼:

[email protected]:~$ vi run_hosts.sh 
內容如下:

#!/bin/bash
echo 172.17.0.2 hadoop-master >> /etc/hosts
echo 172.17.0.3 hadoop-slave1 >> /etc/hosts
echo 172.17.0.4 hadoop-slave2 >> /etc/hosts

執行指令碼:

[email protected]:~$ chmod +x run_hosts.sh 
[email protected]:~$ sudo ./run_hosts.sh 

複製指令碼到其它兩個從節點,並執行指令碼:

[email protected]:~$ scp run_hosts.sh [email protected]:/home/hadoop
[email protected]:~$ scp run_hosts.sh [email protected]:/home/hadoop

Hadoop叢集操作:

檔案格式化:

$ bin/hdfs namenode -format
啟動叢集:

$ sbin/start-all.sh
節點啟動成功後,可以在相應節點上看到下面程序:

[email protected]:/usr/local/hadoop$ jps
534 ResourceManager
888 Jps
400 SecondaryNameNode
181 NameNode

[email protected]:~$ jps
196 NodeManager
63 DataNode
318 Jps

[email protected]:~$ jps
156 NodeManager
63 DataNode
268 Jps

通過宿主機訪問WEB控制檯:

http://宿主機IP:50070

http://宿主機IP:8088

介面效果圖:




Hadoop配置檔案:

hadoop-env.sh:

編輯內如下:

export JAVA_HOME=/usr/local/jdk1.7.0_80
core-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://hadoop-master/</value>
	</property>
</configuration>

hdfs-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/opt/hadoop/data/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/opt/hadoop/data/datanode</value>
        </property>
	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
	<property>
		<name>dfs.datanode.balance.bandwidthPerSec</name>
		<value>12000000</value>
	</property>
	<property>
		<name>dfs.datanode.du.reserved</name>
		<value>5000000000</value>
	</property>
	<property>
		<name>dfs.blocksize</name>
                <value>128m</value>
        </property>
	<property>
		<name>dfs.namenode.handler.count</name>
		<value>60</value>
	</property>
	<property>
		<name>dfs.datanode.handler.count</name>
		<value>10</value>
	</property>
	<property>
		<name>dfs.datanode.max.xcievers</name>
		<value>8192</value>
	</property>

</configuration>

mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
</property>
<property>
      <name>mapreduce.jobhistory.address</name>
      <value>hadoop-master:10020</value>
</property>
<property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>hadoop-master:19888</value>
</property>
<property>
     <name>mapred.compress.map.output</name> 
     <value>true</value>
</property>
<!--<property>
     <name>mapred.map.output.compression.codec</name> 
     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>-->
</configuration>

yarn-site.xml:
<?xml version="1.0"?>

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop-master</value>
</property>
<property>
	<name>yarn.resourcemanager.address</name>
	<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
	<name>yarn.resourcemanager.scheduler.address</name>
	<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
	<name>yarn.resourcemanager.resource-tracker.address</name>    
	<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
	<name>yarn.resourcemanager.admin.address</name>
	<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
	<name>yarn.resourcemanager.webapp.address</name>
	<value>${yarn.resourcemanager.hostname}:8088</value>
</property>

<!--
<property>  
    <name>yarn.nodemanager.resource.memory-mb</name>  
    <value>1024</value>  
    <discription>The amount of physical memory (in MB) that may be allocated to containers being run by the node manager.</discription>  
</property>
-->

<!--<property>
	<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
	<name>yarn.log-aggregation-enable</name>  
	<value>true</value>  
</property> 
<property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>100000</value>
</property>
<property>
	<name>yarn.log-aggregation.retain-check-interval-seconds</name>
	<value>60</value>
</property>
-->

</configuration>
slaves:

hadoop-slave1
hadoop-slave2

---------------------------結束--------------------------