Hadoop NameNode HA搭建
hadoop namenode HA的搭建
需要:
192.168.44.128 centos128
192.168.44.129 centos129
192.168.44.130 centos130
192.168.44.131 centos131
centos7.3
hadoop2.7
zookeeper3.4
各叢集節點分佈圖
| NN | DN | ZK | ZKFC | JN | RM | DM
---------------------------------- --------------
centos128 | 1 | | 1 | 1 | | 1 |
---------------------------------- --------------
centos129 | 1 | 1 | 1 | 1 | 1 | | 1
---------------------------------- --------------
centos130 | | 1 | 1 | | 1 | | 1
---------------------------------- --------------
centos131 | | 1 | | | 1 | | 1
namenode叢集
datanode叢集
zeookeepre叢集
zkfc叢集
jouralnode叢集
resouce manage程序
data manage叢集
一、jdk1.8安裝
jdk安裝在/usr/src/jkd
在/etc/profile檔案新增以下內容:
JAVA_HOME=/usr/src/jdk/
PATH=
PATH
CLASSPATH=.:
JAVA_HOME/lib/tools.jar
執行source /etc/profile
二、zookeeper叢集安裝
1、在/etc/profile檔案新增以下內容:
export ZOOKEEPER_HOME=/root/hadoop-0.20.2/zookeeper-3.3.1
export PATH=
ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
在/etc/hosts中添如下內容:
192.168.44.128 centos128
192.168.44.129 centos129
192.168.44.130 centos130
192.168.44.131 centos131
zookeeper解壓在/usr/src/zookeeper.
2、設定叢集配置檔案/usr/src/zookeeper/conf/zoo.cfg:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/src/zookeeper/data
clientPort=2181
#maxClientCnxns=60
#autopurge.snapRetainCount=3
#autopurge.purgeInterval=1
server.1=centos128:2888:3888
server.2=centos129:2888:3888
server.3=centos130:2888:3888
註釋:
tickTime=2000:每個tick設定為2000毫秒
initLimit=10 :初始同步階段10個tick時間
syncLimit=5:傳送請求和得到確認之間5個tick間隔時間,follower之間的時間限制,超過了就會放棄
dataDir=/usr/src/zookeeper/data:snapshot儲存路徑在/usr/src/zookeeper/data下
clientPor=2181: 客戶端通2181埠連線
maxClientCnxns=60:最大的clients連線數為60,可以根據實際情況調整
autopurge.snapRetainCount=3:在dataDir路徑儲存3份snapshot(快照)
autopurge.purgeInterval=1 :每一個小時消除任務,設定為0為關閉自動清除特徵
server.1=centos128:2888:3888 :zookeeper叢集的server.id ,其中id值必須與dataDir目錄下的myid檔案裡的內容一致,都 為1,
host:port:port其中host就是伺服器,兩個port埠,第一個follower用一個埠2888,第二個follower用一個埠3888
server.2=centos129:2888:3888
server.3=centos130:2888:3888
3、在每個節點上開啟zookeeper並檢視程序
[[email protected] bin]# /usr/src/zookeeper/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]# jps
14962 QuorumPeerMain
[[email protected] bin]# /usr/src/zookeeper/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]# jps
14962 QuorumPeerMain
[[email protected] bin]# /usr/src/zookeeper/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]# jps
14962 QuorumPeerMain
三、hadoop安裝
1,ssh trust
ssh trusts設定
ssh-keygen -t rsa
ssh-keygen -t dsa
cd ~/.ssh
ssh-copy-id -i id_rsa.pub centos128
ssh-copy-id -i id_dsa.pub centos128
ssh-copy-id -i id_rsa.pub centos129
ssh-copy-id -i id_dsa.pub centos129
ssh-copy-id -i id_rsa.pub centos130
ssh-copy-id -i id_dsa.pub centos130
ssh-copy-id -i id_rsa.pub centos131
ssh-copy-id -i id_dsa.pub centos131
在其它伺服器做同樣的設定
**2、存放路徑的建立**
mkdir /data/hadoop/name -p
mkdir /data/hadoop/tmp -p
mkdir /Data1 -p
mkdir /Data2 -p
**3、設定配置檔案**
主要幾個配置檔案
**a,在hadoop-env.sh中將**
export JAVA_HOME=${JAVA_HOME}
改成
export JAVA_HOME=/usr/src/jdk
b,
etc/hadoop/core-site.xml 配置NameNode URI
etc/hadoop/hdfs-site.xml 配置NameNode ,配置DataNode,
etc/hadoop/yarn-site.xml 配置ResourceManager ,配置NodeManager ,配置History Server
etc/hadoop/mapred-site.xml 配置MapReduce Applications,配置 MapReduce JobHistory Server
etc/hadoop/slaves 新增slave的IP
b.1,etc/hadoop/core-site.xml 配置如下:
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos128:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>
其中hdfs://centos128:9000是 DataName uri地址
b.2, etc/hadoop/hdfs-site.xml 配置如下:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/name</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<!--
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/Data1,/Data2</value>
</property>
</configuration>
dfs.namenode.name.dir namenode物理路徑
dfs.replication 預設為3個副本
dfs.datanode.data.dir datanode放存物理路徑
b.3, etc/hadoop/yarn-site.xml 配置如下:
含義參考:http://blog.csdn.net/u010719917/article/details/73917217
<!-- Site specific YARN configuration properties -->
<!--
ResourceManager
-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--Configurations for ResourceManager -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>centos128</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>50</value>
</property>
-->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>0</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>512</value>
</property>
<!--
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>
<value>1000</value>
</property>
-->
<!--
nodemanager
-->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
History Serve
-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
-->
</configuration>
b.4. etc/hadoop/mapred-site.xml 配置如下:
含義參考:http://blog.csdn.net/u010719917/article/details/73917217
<configuration>
<!--
MapReduce Applications
-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<!--
MapReduce JobHistory Server
-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>centos128:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>centos128:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
</configuration>
b.5. etc/hadoop/slaves 如下
[root@centos128 hadoop]# cat slaves
centos129
centos130
centos131
如果設定SecondaryNameNode,需要在同路徑下生成master檔案,新增SecondaryNameNode所在hostname
HA中是不必要設定SecondaryNameNode
b.6,日誌路徑:
[root@centos128 logs]# pwd
/usr/src/hadoop/logs
[root@centos128 logs]# ll
7.其它服務安裝hadoop
將配置好的hadoop包,jdk,profile,host 複製到centos129,centos130
cd /
tar cvf hd.tar.gz /usr/src/hadoop/ /usr/src/jdk/ /etc/profile /etc/hosts