1. 程式人生 > >Hadoop完全分散式部署

Hadoop完全分散式部署

一、概述

概念:

是一個可靠的、可伸縮的、分散式計算的開源軟體。
是一個框架,允許跨越計算機叢集的大資料及分散式處理,使用簡單的程式設計模型(mapreduce)
可從單臺伺服器擴充套件至幾千臺主機,每個節點提供了計算和儲存功能。
不依賴於硬體處理HA,在應用層面實現

特性4V:

volumn 體量大
velocity 速度快
variaty 樣式多
value 價值密度低

模組:

hadoop common 公共類庫,支援其他模組
HDFS hadoop distributed file system,hadoop分散式檔案系統
Hadoop yarn 作業排程和資源管理框架
hadoop mapreduce 基於yarn系統的大資料集並行處理技術。

二、安裝部署

2.1 主機規劃

主機名稱 IP地址 安裝節點應用
hadoop-1 172.20.2.203 namenode/datanode/nodemanager
hadoop-2 172.20.2.204 secondarynode/datanode/nodemanager
hadoop-3 172.20.2.205 resourcemanager/datanode/nodemanager

2.2 部署

2.2.1 基礎環境配置

a.配置java環境

yum install java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel -y
cat >/etc/profile.d/java.sh<<EOF
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-3.b14.el6_9.x86_64
export CLASSPATH=.:\$JAVA_HOME/jre/lib/rt.jar:\$JAVA_HOME/lib/dt.jar:\$JAVA_HOME/lib/tools.jar 
export PATH=\$PATH:\$JAVA_HOME/bin
EOF
source /etc/profile.d/java.sh

b.修改主機名新增hosts

hostname hadoop-1
cat >>/etc/hosts<<EOF
172.20.2.203    hadoop-1
172.20.2.204    hadoop-2
172.20.2.205    hadoop-3
EOF

c.建立使用者及目錄

useradd hadoop
echo "hadoopwd" |passwd hadoop --stdin
mkdir -pv /data/hadoop/hdfs/{nn,snn,dn}
chown -R hadoop:hadoop /data/hadoop/hdfs/
mkdir -p /var/log/hadoop/yarn
mkdir -p /dbapps/hadoop/logs
chmod g+w /dbapps/hadoop/logs/
chown -R hadoop.hadoop /dbapps/hadoop/

d.配置hadoop環境變數

cat>/etc/profile.d/hadoop.sh<<EOF
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=\$PATH:\$HADOOP_PREFIX/bin:\$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=\${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=\${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=\${HADOOP_PREFIX}
export HADOOP_YARN_HOME=\${HADOOP_PREFIX}
EOF
source /etc/profile.d/hadoop.sh

e.下載並解壓軟體包

mkdir /software 
cd /software 
wget -c http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
tar -zxf hadoop-2.6.5.tar.gz -C /usr/local
ln -sv /usr/local/hadoop-2.6.5/ /usr/local/hadoop
chown hadoop.hadoop /usr/local/hadoop-2.6.5/ -R

f.hadoop使用者免金鑰配置

su - hadoop
ssh-keygen -t rsa
for num in `seq 1 3`;do ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub [email protected]$num;done

2.3 配置hadoop

2.3.1 配置各節點

配置master節點

hadoop-1節點執行namenode/datanode/nodemanager,修改hadoop-1的hadoop配置檔案

core-site.xml(定義namenode節點)

cat>/usr/local/hadoop/etc/hadoop/core-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-1:8020</value>
<final>true</final>
</property>
</configuration>
EOF

hdfs-site.xml修改replication為data節點數目 (定義secondary節點)

cat >/usr/local/hadoop/etc/hadoop/hdfs-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-2:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
</configuration>
EOF

新增mapred-site.xml

cat >/usr/local/hadoop/etc/hadoop/mapred-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
EOF

yarn-site.xml修改對應values為master的主機名(定義resourcemanager節點)

cat >/usr/local/hadoop/etc/hadoop/yarn-site.xml<<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-3:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-3:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-3:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-3:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-3:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
</configuration>
EOF

slaves(定義資料節點)

cat >/usr/local/hadoop/etc/hadoop/slaves<<EOF
hadoop-1
hadoop-2
hadoop-3
EOF

同樣的步驟操作hadoop-2/3,建議將hadoop-1的檔案直接分發至hadoop-2/3

2.3.2 格式化namenode

在NameNode機器上(hadoop-1)執行格式化:

hdfs namenode -format



2.3.3 啟動服務

在namenode hadoop-1執行start-all.sh啟動服務
在hadoop-3啟動resourcemanager服務``


hadoop-2服務檢視

hadoop-3服務檢視

2.3.4 執行測試程式

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar pi 2 10

2.3.5 檢視web介面

HDFS-NameNode

YARN-ResourceManager