bigdata——HBase+zookeeper+Hadoop叢集構築 之 Hadoop YARN叢集構築
阿新 • • 發佈:2019-01-03
最近專案調查閒下來了,有時間讓大腦來整體之前做過的調查,為了便於基於存檔,遂將其記錄下來,希望對後來者有所幫助。
背景
為了瞭解Hbase叢集下資料的查詢以及安全性方面的效能資料,需要搭建HBase叢集,簡單測試。
角色分類
因為本次測試的目標是RegionServer發生故障的情況下,region移動對查詢的效能影響,為了測試的簡單化,所以本次搭建的叢集只有一個HMaster,缺點就是一旦HMaster發生故障的話,整個環境就無法使用了,需要重新全部啟動。一般為了避免這種情況,都是建議至少2個HMaster,一個是active狀態的,一個是standby狀態的。+--------------+-------------+---------------------+-----------------+ | mashine | Hadoop | zookeeper | Hbase | +--------------+-------------+---------------------+-----------------+ | sv004 | Master | leader | HMaster | +--------------+-------------+---------------------+-----------------+ | sv001 | Slave1 | follower | HRegionserver | +--------------+-------------+---------------------+-----------------+ | sv002 | Slave2 | follower | HRegionserver | +--------------+-------------+---------------------+-----------------+ | sv003 | Slave3 | follower | HRegionserver | +--------------+-------------+---------------------+-----------------+
虛擬機器list對比如下:
172.28.157.1 sv001
172.28.157.2 sv002
172.28.157.3 sv003
172.28.157.4 sv004
Hadoop YARN叢集搭建
本次使用的物件是hadoop-2.5.2.bin.gz
操作流程
- 下載hadoop-2.5.2.bin.gz,詳細參照hadoop官網
- 解壓
tar -zxvf hadoop-2.5.2.bin.gz
- conf檔案配置
- hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67
根據自己虛擬機器的實際修改
2. core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://sv004:9000</value> </property> </configuration>
3. hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/project/hadoop-2.5.2/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/project/hadoop-2.5.2/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
4. mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>Yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>sv004:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>sv004:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/project/hadoop-2.5.2/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/project/hadoop-2.5.2/done</value>
</property>
</configuration>
5. yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>Yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>sv004</value>
</property>
<property>
<name>Yarn.resourcemanager.address</name>
<value>sv004:18040</value>
</property>
<property>
<name>Yarn.resourcemanager.scheduler.address</name>
<value>sv004:18030</value>
</property>
<property>
<name>Yarn.resourcemanager.resource-tracker.address</name>
<value>sv004:18025</value>
</property>
<property>
<name>Yarn.resourcemanager.admin.address</name>
<value>sv004:18041</value>
</property>
<property>
<name>Yarn.resourcemanager.webapp.address</name>
<value>sv004:8088</value>
</property>
<property>
<name>Yarn.nodemanager.local-dirs</name>
<value>/home/project/hadoop-2.5.2/mynode/my</value>
</property>
<property>
<name>Yarn.nodemanager.log-dirs</name>
<value>/home/project/hadoop-2.5.2/mynode/logs</value>
</property>
<property>
<name>Yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>Yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>Yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>Yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>Yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
</configuration>
6. slaves
sv001
sv002
sv003
- SSH無密碼配置
無密碼登入,這個此處就不記述了,網上的帖子比較多。
- /etc/hosts檔案修改
追加如下資訊:
172.28.157.1 sv001
172.28.157.2 sv002
172.28.157.3 sv003
172.28.157.4 sv004
7. scp到slaves虛擬機器中,位置要保持一致。
scp -r /home/project/hadoop-2.5.2 [email protected]:/home/project
scp -r /home/project/hadoop-2.5.2 [email protected]:/home/project
scp -r /home/project/hadoop-2.5.2 [email protected]:/home/project
啟動流程
- $HADOOP_HOME/bin/hadoop namenode -format
- $HADOOP_HOME/sbin/start-all.sh
- $HADOOP_HOME/bin/hadoop dfsadmin -report (狀態確認)
停止
- $HADOOP_HOME/sbin/stop-all.sh