在ubuntu上搭建hadoop服務 (叢集模式)
環境:ubuntu 16.04
hadoop-3.0.3
參考:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
1.參考上一篇“在Ubuntu上搭建Hadoop服務(單機模式)”
1.1.安裝JDK,配置Java環境;
1.2.建立hadoop組、hadoop使用者;
1.3.配置SSH無密碼登陸;
1.4.克隆虛擬機器
在虛擬機器關閉狀態,選擇克隆;
配置:
都使用橋接模式;
宿主機:
192.168.1.102
虛擬機器:
192.168.1.106 master
192.168.1.104 node1
192.168.1.105 node2
2.修改各個虛擬機器的hostname,hosts
sudo vi /etc/hostname
依次修改為master,node1,node2
sudo vi /etc/hosts
把以下內容新增到3臺機子的hosts檔案中
192.168.1.106 master
192.168.1.104 node1
192.168.1.105 node2
3.配置SSH遠端無密碼登入
把master上的公鑰檔案,拷貝到node1,node2上;
scp ~/.ssh/authorized_keys [email protected]:~/.ssh/
scp ~/.ssh/authorized_keys
【TODO 或者,使用ssh-copy-id命令,比較便捷】
在node1,node2上修改檔案許可權;
chmod 600 ~/.ssh/authorized_keys
然後,在master上,可以免密SSH登入node1,node2
ssh node1
ssh node2
(因為另外兩臺機子是克隆的,所以已經配置了本地SSH無密碼登入)
(並且,三臺機子的公鑰、私鑰也是一樣的,所以,實際上是不需要拷貝的)
4.配置Hadoop
在master機子上;
建立目錄
mkdir /usr/local/hadoop/dfs
mkdir /usr/local/hadoop/dfs/name
mkdir /usr/local/hadoop/dfs/data
mkdir /usr/local/hadoop/tmp
進入hadoop目錄
cd /usr/local/hadoop
修改./etc/hadoop中的配置檔案
sudo vi ./etc/hadoop/hadoop-env.sh新增如下內容:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
sudo vi ./etc/hadoop/core-site.xml
新增如下內容:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
sudo vi ./etc/hadoop/hdfs-site.xml
新增如下內容:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:9001</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
dfs.replication:副本個數,預設是3
dfs.namenode.secondary.http-address:為了保證整個叢集的可靠性secondarnamenode配置在其他機器比較好
dfs.http.address:進入hadoop web UI的埠
sudo vi ./etc/hadoop/mapred-site.xml
新增如下內容:
<configuration>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>0.0.0.0:50030</value>
</property>
<property>
<name>mapreduce.tasktracker.http.address</name>
<value>0.0.0.0:50060</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
sudo vi ./etc/hadoop/yarn-site.xml
新增如下內容:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8099</value>
</property>
</configuration>
sudo vi ./etc/hadoop/workers
清空檔案,新增如下內容:
node1
node2
把上面在master上配置好的hadoop環境,整體複製到node1,node2上;
scp -r /usr/local/hadoop [email protected]:/usr/local/hadoop
scp -r /usr/local/hadoop [email protected]:/usr/local/hadoop
或者,只複製6個配置檔案;
scp /usr/local/hadoop/etc/hadoop/yarn-site.xml [email protected]:/usr/local/hadoop/etc/hadoop
scp /usr/local/hadoop/etc/hadoop/mapred-site.xml [email protected]:/usr/local/hadoop/etc/hadoop
scp /usr/local/hadoop/etc/hadoop/hdfs-site.xml [email protected]:/usr/local/hadoop/etc/hadoop
scp /usr/local/hadoop/etc/hadoop/core-site.xml [email protected]:/usr/local/hadoop/etc/hadoop
scp /usr/local/hadoop/etc/hadoop/hadoop-env.sh [email protected]:/usr/local/hadoop/etc/hadoop
scp /usr/local/hadoop/etc/hadoop/workers [email protected]:/usr/local/hadoop/etc/hadoop
5.在master上啟動hadoop
cd /usr/local/hadoop
./bin/hdfs namenode -format
./sbin/start-all.sh 【這種啟動方式好像不推薦使用,參照官方文件換一下】
啟動HDFS:start-dfs.sh
啟動Yarn:start-yarn.sh
(注意:Namenode和ResourceManger如果不是同一臺機器,不能在NameNode上啟動 yarn,應該在ResouceManager所在的機器上啟動yarn)
使用jps檢視master,node上的程序
檢視web 介面
http://192.168.1.106:50070
http://192.168.1.106:8099/cluster
停止hadoop;
./sbin/stop-all.sh
6.測試
使用自帶的example測試
yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar pi 1 1
參考:
https://www.cnblogs.com/frankdeng/p/9047698.html
https://blog.csdn.net/xiaoxiangzi222/article/details/52757168/
https://blog.csdn.net/qq_32808045/article/details/76229157