搭建 hadoop HA
阿新 • • 發佈:2018-11-15
hadoop2.4.0
zookeeper3.4.6
說明:總共三個節點,namenode1、namenode2、datanode
一、叢集配置ssh免密碼登入:
在三臺機器上都執行以下命令: // 在namenode1節點生成SSH金鑰對 [[email protected] ~]$ cd /usr/bin/ [[email protected] bin]$ ./ssh-keygen -t rsa // 將公鑰複製到叢集所有節點機器上 [[email protected] bin]$ ssh-copy-id namenode1
[[email protected] bin]$ ssh-copy-id namenode2
[[email protected] bin]$ ssh-copy-id datanode
// 測試是否可以免密碼登入 namenode2,無需密碼即可登入 [[email protected] bin]$ ssh namenode2
二、zookeeper 安裝
#設定zookeeper節點 server.1=namenode1:2888:3888 server.2=namenode2:2888:3888 server.3=datanode:2888:3888
192.168.0.202 namenode1 192.168.0.203 namenode2 192.168.0.204 datanode
二、安裝hadoop
export HADOOP_NAMENODE_OPTS="-Xmx1024m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_PID_DIR=/opt/hadoop/pids export HADOOP_LOG_DIR=/opt/hadoopdata/hadooplogs
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://educluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>namenode1:2181,namenode2:2181,datanode:2181</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <!-- 128K --> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/mydisk/disk01/hadoopdata/tmp,file:/opt/mydisk/disk02/hadoopdata/tmp</value> </property> </configuration>
<property> <name>dfs.nameservices</name> <value> educluster</value> </property>
<property> <name>dfs.ha.namenodes. educluster</name> <value> nn1,nn2</value> </property>
<property> <name>dfs.namenode.rpc-address.educluster. nn1</name> <value>namenode1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.educluster.nn2</name> <value>namenode2:8020</value> </property> <property> <name>dfs.namenode.http-address.educluster.nn1</name> <value>namenode1:50070</value> </property> <property> <name>dfs.namenode.http-address.educluster.nn2</name> <value>namenode2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://namenode1:8485;namenode2:8485;datanode:8485/educluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.eduluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/mydisk/journaldata</value> </property> <!-- 啟動自動故障轉移 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:/opt/hadoopdata/sname</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoopdata/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/mydisk/disk01,file:/opt/mydisk/disk02</value> </property> <!-- 副本數 --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.handler.count</name> <value>200</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>150</value> </property> <property> <name>dfs.datanode.max.transfer.threads</name> <value>8192</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/opt/hadoop/etc/hadoop/excludes</value> </property>
[[email protected] ~]$ su [[email protected] hadoop]# cd /opt/ [[email protected] opt]# mkdir mydisk [[email protected] opt]# chown -R hadoop:hadoop mydisk
[[email protected] opt]# exit [[email protected] hadoop]$ cd /opt/mydisk/ [[email protected] mydisk]$ mkdir -p disk01/hadoopdata/tmp [[email protected] mydisk]$ mkdir -p disk02/hadoopdata/tmp [[email protected] mydisk]$ mkdir journaldata #namenode下----------------------------- [[email protected] opt]$ cd ../hadoopdata [[email protected] hadoopdata]$ mkdir sname [[email protected] hadoopdata]$ mkdir name #--------------------------------------- [[email protected] hadoopdata]$ cd /opt/hadoop/etc/hadoop/ [[email protected] hadoop]$ touch excludes
[[email protected] opt]$ scp -r hadoop-2.4.0 [email protected]:/opt/ [[email protected] opt]$ scp -r hadoop-2.4.0 [email protected]:/opt/
[[email protected] opt]# ln -s hadoop-2.4.0 hadoop [[email protected] opt]# chown -R hadoop:hadoop hadoop-2.4.0 [[email protected] opt]# chown -R hadoop:hadoop hadoop
2)在namenode1上格式化 zkfc :
$ cd /opt/hadoop/bin/ $ ./hdfs zkfc -formatZK 格式化後 執行 $/opt/zookeeper/bin/zkCli.sh 再執行 ls / 看到下面的內容即成功。
3)在namenode1 上 格式化、啟動 namenode [[email protected] sbin]$ cd /opt/hadoop/bin/ [[email protected] bin]$ ./hadoop namenode -format [[email protected] sbin]$ ./hadoop-daemon.sh start namenode
4)在namenode2 上同步,並啟動namenode
$ cd /opt/hadoop/bin/ $./hdfs namenode -bootstrapStandby $cd ../sbin $./hadoop-daemon.sh start namenode
5)在namenode1 上啟動dfs:$./start-dfs.sh
說明:總共三個節點,namenode1、namenode2、datanode
一、叢集配置ssh免密碼登入:
在三臺機器上都執行以下命令: // 在namenode1節點生成SSH金鑰對 [[email protected] ~]$ cd /usr/bin/ [[email protected] bin]$ ./ssh-keygen -t rsa // 將公鑰複製到叢集所有節點機器上 [[email protected]
// 測試是否可以免密碼登入 namenode2,無需密碼即可登入 [[email protected] bin]$ ssh namenode2
二、zookeeper 安裝
- 解壓 zookeeper 到 /opt 下
- 建立軟連線:$ln -s zookeeper-3.4.6 zookeeper
- 更改許可權:$chown -R hadoop:hadoop zookeeper
- 切換到hadoop使用者,進入目錄:$cd /opt/zookeeper/conf/
- 複製配置檔案並編輯:
#設定zookeeper節點 server.1=namenode1:2888:3888 server.2=namenode2:2888:3888 server.3=datanode:2888:3888
- 建立資料存放目錄,並修改許可權chown...
- 在/opt /hadoopdata/zookeeperdata 下建立檔案myid 寫入 數字1
- 配置IP與主機名的對應關係,root $vim /etc/hosts(每個節點都得配置)
192.168.0.202 namenode1 192.168.0.203 namenode2 192.168.0.204 datanode
- 將資料夾 hadoopdata 和 zookeeper-3.4.6 拷貝到另外的兩個主機:
- 在另外的兩個主機同樣建立軟連線,更改許可權。
- 在主機namenode2中修改myid檔案內容為 數字2,datanode 為3
- 啟動叢集:
二、安裝hadoop
- 解壓安裝包:$ tar zxvf hadoop-2.4.0.tar.gz
- 建立軟連線:$ln -s hadoop-2.4.0 hadoop
- 修改許可權:$chown -R hadoop:hadoop hadoop
$chown -R hadoop:hadoop hadoop-2.4.0 - 切換到 hadoop 使用者:$su - hadoop
- 配置 hadoop 環境指令碼:$ cd /opt/hadoop/etc/hadoop/
export HADOOP_NAMENODE_OPTS="-Xmx1024m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_PID_DIR=/opt/hadoop/pids export HADOOP_LOG_DIR=/opt/hadoopdata/hadooplogs
- 配置 core-site.xml檔案
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://educluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>namenode1:2181,namenode2:2181,datanode:2181</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <!-- 128K --> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/mydisk/disk01/hadoopdata/tmp,file:/opt/mydisk/disk02/hadoopdata/tmp</value> </property> </configuration>
- hdfs-site.xml
<property> <name>dfs.nameservices</name> <value> educluster</value> </property>
<property> <name>dfs.ha.namenodes. educluster</name> <value> nn1,nn2</value> </property>
<property> <name>dfs.namenode.rpc-address.educluster. nn1</name> <value>namenode1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.educluster.nn2</name> <value>namenode2:8020</value> </property> <property> <name>dfs.namenode.http-address.educluster.nn1</name> <value>namenode1:50070</value> </property> <property> <name>dfs.namenode.http-address.educluster.nn2</name> <value>namenode2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://namenode1:8485;namenode2:8485;datanode:8485/educluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.eduluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/mydisk/journaldata</value> </property> <!-- 啟動自動故障轉移 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:/opt/hadoopdata/sname</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoopdata/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/mydisk/disk01,file:/opt/mydisk/disk02</value> </property> <!-- 副本數 --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.handler.count</name> <value>200</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>150</value> </property> <property> <name>dfs.datanode.max.transfer.threads</name> <value>8192</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/opt/hadoop/etc/hadoop/excludes</value> </property>
- 編輯salves,寫入 DataNode的 hostname,即:datanode
- 在三個主機中的 /opt 下使用root建立目錄 mydisk,並將許可權給hadoop使用者。
[[email protected] ~]$ su [[email protected] hadoop]# cd /opt/ [[email protected] opt]# mkdir mydisk [[email protected] opt]# chown -R hadoop:hadoop mydisk
- 三個主機建立以下目錄:
[[email protected] opt]# exit [[email protected] hadoop]$ cd /opt/mydisk/ [[email protected] mydisk]$ mkdir -p disk01/hadoopdata/tmp [[email protected] mydisk]$ mkdir -p disk02/hadoopdata/tmp [[email protected] mydisk]$ mkdir journaldata #namenode下----------------------------- [[email protected] opt]$ cd ../hadoopdata [[email protected] hadoopdata]$ mkdir sname [[email protected] hadoopdata]$ mkdir name #--------------------------------------- [[email protected] hadoopdata]$ cd /opt/hadoop/etc/hadoop/ [[email protected] hadoop]$ touch excludes
- 在namenode1 中執行以下命令,將hadoop及其配置傳送到namenode2和datanode中。
[[email protected] opt]$ scp -r hadoop-2.4.0 [email protected]:/opt/ [[email protected] opt]$ scp -r hadoop-2.4.0 [email protected]:/opt/
- 在namenode2 和 datanode 中 建立軟連線,並分配許可權
[[email protected] opt]# ln -s hadoop-2.4.0 hadoop [[email protected] opt]# chown -R hadoop:hadoop hadoop-2.4.0 [[email protected] opt]# chown -R hadoop:hadoop hadoop
- 配置環境變數,寫入HADOOP_HOME,儲存後記得 source ~/.bash_profile 使其生效。
[[email protected] opt]$ vim ~/.bash_profile
export HADOOP_HOME=/opt/hadoop/
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[[email protected] opt]$ source ~/.bash_profile
- 啟動,在此之前 zookeeper 已經啟動
2)在namenode1上格式化 zkfc :
$ cd /opt/hadoop/bin/ $ ./hdfs zkfc -formatZK 格式化後 執行 $/opt/zookeeper/bin/zkCli.sh 再執行 ls / 看到下面的內容即成功。
3)在namenode1 上 格式化、啟動 namenode [[email protected] sbin]$ cd /opt/hadoop/bin/ [[email protected] bin]$ ./hadoop namenode -format [[email protected] sbin]$ ./hadoop-daemon.sh start namenode
4)在namenode2 上同步,並啟動namenode
$ cd /opt/hadoop/bin/ $./hdfs namenode -bootstrapStandby $cd ../sbin $./hadoop-daemon.sh start namenode
5)在namenode1 上啟動dfs:$./start-dfs.sh