Hadoop完全分散式安裝2
hadoop簡介:
1.獨立模式(standalone|local)單機模式;所有的產品都安裝在一臺機器上且本地磁碟和副本可以在接下來的xml檔案中
nothing!
本地檔案系統。
不需要啟用單獨程序。
2.pesudo(偽分佈模式)
等同於完全分散式,只有一個節點。
SSH: //(Socket),
//public + private
//server : sshd ps -Af | grep sshd
//clint : ssh
//ssh-keygen:生成公私祕鑰。
//authorized_keys 需要使用644
//ssh 192.168.231.201 yes
[配置檔案]
1.core-site.xml //fs.defaultFS=hdfs://hdfs在那些機器進行部署/
2.hdfs-site.xml //replication=1(編寫hdfs副本數)
3.mapred-site.xml //(maprudece配置檔案以及安裝部署)
4. yarn-site.xml //(resourcemanager和節點管理)
3.full distributed(完全分散式)
讓命令列提示符顯式完整路徑
---------------------------
1.編輯profile檔案,新增環境變數PS1
[/etc/profile]
export PS1='[\[email protected]\h `pwd`]\$'
2.source
$>source /etc/profile
配置hadoop,使用符號連線的方式,讓三種配置形態共存。
----------------------------------------------------
1.建立三個配置目錄,內容等同於hadoop目錄
${hadoop_home}/etc/local
${hadoop_home}/etc/pesudo
${hadoop_home}/etc/full
2.建立符號連線
$>ln -s pesudo hadoop
3.對hdfs進行格式化
$>hadoop namenode -format
4.修改hadoop配置檔案,手動指定JAVA_HOME環境變數
[${hadoop_home}/etc/hadoop/hadoop-env.sh]
...
export JAVA_HOME=/root/soft/jdk
...
5.啟動hadoop的所有程序
$>start-all.sh(start-hdf.sh+start-yarn.sh=start-all.sh)
6.啟動完成後,出現以下程序
$>jps(檢視Java的服務)
hadf管理以下程序:
1.33702 NameNode
2.33792 DataNode
3.33954 SecondaryNameNode
yarn管理的程序:
1.29041 ResourceManager
2.34191 NodeManager
7.檢視hdfs檔案系統
$>hdfs dfs -lsr /
8.建立目錄
$>hdfs dfs -mkdir -p /user/centos/hadoop(自己制定檔案路徑)
9.通過webui檢視hadoop的檔案系統
http://localhost:50070/
10.停止hadoop所有程序
$>stop-all.sh
11.centos防火牆操作
[cnetos 6.5之前的版本]
$>sudo service firewalld stop //停止服務
$>sudo service firewalld start //啟動服務
$>sudo service firewalld status //檢視狀態
[centos7]
$>sudo systemctl enable firewalld.service //"開機啟動"啟用
$>sudo systemctl disable firewalld.service //"開機自啟"禁用
$>sudo systemctl start firewalld.service //啟動防火牆
$>sudo systemctl stop firewalld.service //停止防火牆
$>sudo systemctl status firewalld.service //檢視防火牆狀態
[開機自啟]
$>sudo chkconfig firewalld on //"開啟自啟"啟用
$>sudo chkconfig firewalld off //"開啟自啟"禁用
hadoop的埠
-----------------
50070 //namenode http port
50075 //datanode http port
50090 //2namenode http port
8020 //namenode rpc port
50010 //datanode rpc port
hadoop四大模組
-------------------
common
hdfs //namenode + datanode + secondarynamenode
mapred
yarn //resourcemanager + nodemanager
啟動指令碼
-------------------
1.start-all.sh //啟動所有程序
2.stop-all.sh //停止所有程序
3.start-dfs.sh //
4.start-yarn.sh
[hdfs] start-dfs.sh stop-dfs.sh
NN
DN
2NN
[yarn] start-yarn.sh stop-yarn.sh
RM
NM
修改主機名
-------------------
1./etc/hostname
s201
2./etc/hosts
127.0.0.1 localhost
192.168.231.201 s201
192.168.231.202 s202
192.168.231.203 s203
192.168.231.204 s204
完全分散式
--------------------
1.克隆3臺client(centos7)
右鍵centos-7-->管理->克隆-> ... -> 完整克隆
2.啟動client
3.啟用客戶機共享資料夾。
4.修改hostname和ip地址檔案
[/etc/hostname]
s202
[/etc/sysconfig/network-scripts/ifcfg-ethxxxx]
...
IPADDR=..
5.重啟網路服務
$>sudo service network restart
6.修改/etc/resolv.conf檔案
nameserver 192.168.231.2
7.重複以上3 ~ 6過程.
準備完全分散式主機的ssh
-------------------------
1.刪除所有主機上的/home/centos/.ssh/*
2.在s201主機上生成金鑰對
$>ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
3.將s201的公鑰檔案id_rsa.pub遠端複製到202 ~ 204主機上。
並放置/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub [email protected]:/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub [email protected]:/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub [email protected]:/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub [email protected]:/home/centos/.ssh/authorized_keys
4.配置完全分散式(${hadoop_home}/etc/hadoop/)
1) [core-site.xml]
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://s201/</value>
</property>
</configuration>
2) [hdfs-site.xml]
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
3) [mapred-site.xml]
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4)[yarn-site.xml]
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>s201</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Hadoop叢集環境名稱以及IP地址:
192.168.56.11 linux-node1.example.com
192.168.56.12 linux-node2.example.com
192.168.56.13 linux-node3.example.com
192.168.56.14 linux-node4.example.com
[hadoop-env.sh]
...
export JAVA_HOME=/soft/jdk
...
5.分發配置
$>cd /soft/hadoop/etc/
$>scp -r full [email protected]:/soft/hadoop/etc/
$>scp -r full [email protected]:/soft/hadoop/etc/
$>scp -r full [email protected]:/soft/hadoop/etc/
6.刪除符號連線
$>cd /soft/hadoop/etc
$>rm hadoop
$>ssh s202 rm /soft/hadoop/etc/hadoop
$>ssh s203 rm /soft/hadoop/etc/hadoop
$>ssh s204 rm /soft/hadoop/etc/hadoop
7.建立符號連線
$>cd /soft/hadoop/etc/
$>ln -s full hadoop
$>ssh s202 ln -s /soft/hadoop/etc/full /soft/hadoop/etc/hadoop
$>ssh s203 ln -s /soft/hadoop/etc/full /soft/hadoop/etc/hadoop
$>ssh s204 ln -s /soft/hadoop/etc/full /soft/hadoop/etc/hadoop
8.刪除臨時目錄檔案
$>cd /tmp
$>rm -rf hadoop-centos
$>ssh s202 rm -rf /tmp/hadoop-centos
$>ssh s203 rm -rf /tmp/hadoop-centos
$>ssh s204 rm -rf /tmp/hadoop-centos
9.刪除hadoop日誌
$>cd /soft/hadoop/logs
$>rm -rf *
$>ssh s202 rm -rf /soft/hadoop/logs/*
$>ssh s203 rm -rf /soft/hadoop/logs/*
$>ssh s204 rm -rf /soft/hadoop/logs/*
10.格式化檔案系統
$>hadoop namenode -format
11.啟動hadoop程序
$>start-all.sh
rsync
------------------
四個機器均安裝rsync命令。
遠端同步.
$>sudo yum install rsync
將root使用者實現無密登入
1.同編寫指令碼
1.xcall.sh
2.xsync.sh
xsync.sh /home/etc/a.txt
rsync -lr /home/etc/a.txt [email protected]:/home/etc