大資料運維(46)hadoop 2.10叢集搭建
hadoop官方文件:
1 |
https: //hadoop .apache.org /docs/
|
安裝hadoop叢集
配置DNS解析或hosts檔案:
1 2 3 4 5 6 7 |
cat > /etc/hosts <<EOF
127.0.0.1localhostlocalhost.localdomainlocalhost4localhost4.localdomain4
::1localhostlocalhost.localdomainlocalhost6localhost6.localdomain6
10.3.149.20hadoop-master
10.3.149.21hadoop-node1 10.3.149.22hadoop-node2
EOF
|
配置root使用者免祕鑰:
1 2 3 4 5 6 7 |
ssh -keygen
ssh -copy- id -i. ssh /id_rsa .pubroot@hadoop-master
ssh -copy- id -i. ssh /id_rsa .pubroot@hadoop-node1
ssh -copy- id -i. ssh /id_rsa .pubroot@hadoop-node2
ssh root@hadoop-master 'date'
ssh root@hadoop-node1 'date'
ssh root@hadoop-node2 'date'
|
配置hadoop免祕鑰:
1 2 3 4 5 6 7 8 9 10 11 12 |
useradd hadoop
echo '123456' | passwd --stdinhadoop
su hadoop
ssh -keygen
ssh -copy- id -i. ssh /id_rsa .pubhadoop@hadoop-master
ssh -copy- id -i. ssh /id_rsa .pubhadoop@hadoop-node1
ssh -copy- id -i. ssh /id_rsa .pubhadoop@hadoop-node2
ssh hadoop@hadoop-master 'date'
ssh hadoop@hadoop-node1 'date'
ssh hadoop@hadoop-node2 'date'
exit
|
安裝java:
1 |
tar -xfjdk-8u231-linux-x64. tar .gz-C /usr/local/
|
建立軟連線:
1 2 |
cd /usr/local/
ln -svjdk1.8.0_231/jdk
|
新增環境變數:
1 2 3 4 5 6 7 |
cat > /etc/profile .d /java .sh<<EOF
export JAVA_HOME= /usr/local/jdk
export JRE_HOME=\$JAVA_HOME /jre
export CLASSPATH=.:\$JAVA_HOME /lib/dt .jar:\$JAVA_HOME /lib/tools .jar:\$JRE_HOME /lib
export PATH=\$PATH:\$JAVA_HOME /bin :\$JRE_HOME /bin
EOF
. /etc/profile .d /java .sh
|
測試是否安裝成功:
1 2 |
java-version
javac-version
|
安裝hadoop:
hadoop下載地址:
1 2 |
https: //mirrors .tuna.tsinghua.edu.cn /apache/hadoop/common/
http: //archive .apache.org /dist/hadoop/common/
|
hadoop2.7版本的:
1 |
http: //archive .apache.org /dist/hadoop/common/hadoop-2 .7.1 /hadoop-2 .7.1. tar .gz
|
下載安裝包:
1 |
wgethttps: //mirrors .tuna.tsinghua.edu.cn /apache/hadoop/common/hadoop-2 .10.0 /hadoop-2 .10.0. tar .gz
|
解壓:
1 2 3 |
tar -xfhadoop-2.10.0. tar .gz-C /usr/local/
cd /usr/local/
ln -svhadoop-2.10.0/hadoop
|
配置環境變數:
1 2 3 4 |
cat > /etc/profile .d /hadoop .sh<<EOF
export HADOOP_HOME= /usr/local/hadoop
export PATH=\$PATH:\$HADOOP_HOME /bin :\$HADOOP_HOME /sbin
EOF
|
應用環境變數:
1 |
. /etc/profile .d /hadoop .sh
|
建立資料目錄:
1 2 3 4 |
#master
mkdir -pv /data/hadoop/hdfs/ {nn,snn}
#node
mkdir -pv /data/hadoop/hdfs/dn
|
master節點的配置:
進入配置目錄:
1 |
cd /usr/local/hadoop/etc/hadoop
|
core-site.xml
1 2 3 4 5 6 7 |
<configuration>
<property>
<name>fs.defaultFS< /name >
<value>hdfs: //hadoop-master :8020< /value >
<final> true < /final >
< /property >
< /configuration >
|
yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
<configuration>
<property>
<name>yarn.resourcemanager.address< /name >
<value>hadoop-master:8032< /value >
< /property >
<property>
<name>yarn.resourcemanager.scheduler.address< /name >
<value>hadoop-master:8030< /value >
< /property >
<property>
<name>yarn.resourcemanager.resource-tracker.address< /name >
<value>hadoop-master:8031< /value >
< /property >
<property>
<name>yarn.resourcemanager.admin.address< /name >
<value>hadoop-master:8033< /value >
< /property >
<property>
<name>yarn.resourcemanager.webapp.address< /name >
<value>hadoop-master:8088< /value >
< /property >
<property>
<name>yarn.nodemanager.aux-services< /name >
<value>mapreduce_shuffle< /value >
< /property >
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class< /name >
<value>org.apache.hadoop.mapred.ShuffleHandler< /value >
< /property >
<property>
<name>yarn.resourcemanager.scheduler.class< /name >
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler< /value >
< /property >
< /configuration >
|
hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
<configuration>
<property>
<name>dfs.replication< /name >
<value>1< /value >
< /property >
<property>
<name>dfs.namenode.name. dir < /name >
<value> file : ///data/hadoop/hdfs/nn < /value >
< /property >
<property>
<name>dfs.datanode.data. dir < /name >
<value> file : ///data/hadoop/hdfs/dn < /value >
< /property >
<property>
<name>fs.checkpoint. dir < /name >
<value> file : ///data/hadoop/hdfs/snn < /value >
< /property >
<property>
<name>fs.checkpoint.edits. dir < /name >
<value> file : ///data/hadoop/hdfs/snn < /value >
< /property >
< /configuration >
|
mapred-site.xml
1 2 3 4 5 6 |
<configuration>
<property>
<name>mapreduce.framework.name< /name >
<value>yarn< /value >
< /property >
< /configuration >
|
建立master檔案:
1 2 3 |
cat >master<<EOF
hadoop-master
EOF
|
建立slave
1 2 3 4 |
cat >slaves<<EOF
hadoop-node1
hadoop-node2
EOF
|
常用配置註解:
1 |
http: //blog .51yip.com /hadoop/2020 .html
|
node節點上:
將主節點上的配置複製到node節點即可:
1 2 |
scp ./*root@hadoop-node1: /usr/local/hadoop/etc/hadoop/
scp ./*root@hadoop-node2: /usr/local/hadoop/etc/hadoop/
|
刪除slaves檔案:其他配置同master。
1 |
rm /usr/local/hadoop/etc/hadoop/slaves -rf
|
建立日誌目錄:
1 2 |
mkdir /usr/local/hadoop/logs
chmod g+w /usr/local/hadoop/logs/
|
改屬主屬組:
1 2 3 |
chown -Rhadoop.hadoop /data/hadoop/
cd /usr/local/
chown -Rhadoop.hadoophadoophadoop/
|
啟動與停止叢集
格式化hdfs:格式化之後就可以啟動叢集了
1 2 |
su hadoop
[hadoop@hadoop-master~]$hadoopnamenode- format
|
先啟動hdfs:從下面的輸出可以看出各個節點以及執行的程式。
1 2 3 4 5 6 7 |
[hadoop@hadoop-master~]$start-dfs.sh
Startingnamenodeson[hadoop-master]
hadoop-master:startingnamenode,loggingto /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-namenode-hadoop-master .out
hadoop-node2:startingdatanode,loggingto /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-datanode-hadoop-node2 .out
hadoop-node1:startingdatanode,loggingto /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-datanode-hadoop-node1 .out
Startingsecondarynamenodes[0.0.0.0]
0.0.0.0:startingsecondarynamenode,loggingto /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-secondarynamenode-hadoop-master .out
|
檢視本節點執行的程序:可以到任意一個節點上使用如下命令。
1 2 3 4 5 |
~]$jps
1174Jps
32632ResourceManager
32012NameNode
32220SecondaryNameNode
|
再啟動yarn:可以看到對應的節點啟動的程序。
1 2 3 4 5 |
[hadoop@hadoop-master~]$start-yarn.sh
startingyarndaemons
startingresourcemanager,loggingto /usr/local/hadoop-2 .10.0 /logs/yarn-hadoop-resourcemanager-hadoop-master .out
hadoop-node2:startingnodemanager,loggingto /usr/local/hadoop-2 .10.0 /logs/yarn-hadoop-nodemanager-hadoop-node2 .out
hadoop-node1:startingnodemanager,loggingto /usr/local/hadoop-2 .10.0 /logs/yarn-hadoop-nodemanager-hadoop-node1 .out
|
或者一次性啟動:
1 |
[hadoop@hadoop-master~]$start-all.sh
|
檢視hadoop叢集的執行狀態:
1 |
hadoopdfsadmin-report
|
訪問概覽web頁面:
1 |
http: //10 .3.149.20:50070/
|
叢集資訊web頁面:
1 |
http: //10 .3.149.20:8088 /cluster
|
停止叢集:
1 2 |
stop-dfs.sh
stop-yarn.sh
|
或者:
1 |
stop-all.sh
|
hdfs檔案系統的使用
瀏覽目錄:
1 |
~]$hdfsdfs- ls /
|
建立目錄:
1 |
~]$hdfsdfs- mkdir /test
|
上傳檔案:
1 |
~]$hdfsdfs-put /etc/fstab /test/fstab
|
檢視檔案儲存位置:到其中一個datanode上的資料目錄就可以檢視到這個檔案塊,預設為128m,超過這個大小檔案會分成兩塊,但是小於128m的檔案並不會真正佔用128m。
1 |
]$ cat /data/hadoop/hdfs/dn/current/BP-1469813358-10 .3.149.20-1595493741225 /current/finalized/subdir0/subdir0/blk_1073741825
|
遞迴瀏覽
1 |
~]$hdfsdfs- ls -R/
|
檢視檔案:
1 |
~]$hdfsdfs- cat /fstab
|
更多使用命令幫助:
1 |
https: //hadoop .apache.org /docs/r2 .10.0 /hadoop-project-dist/hadoop-common/FileSystemShell .html
|
統計字元數運算示例:
在/usr/local/hadoop/share/hadoop/mapreduce目錄中有很多用於計算的示例可以用來測試。
先上傳用於測試的檔案:
1 2 |
hdfsdfs mkdir /test
hdfsdfs-put /etc/fstab /test/fstab
|
檢視幫助:直接執行程式會給出幫助資訊
1 |
yarnjarhadoop-mapreduce-examples-2.10.0.jar
|
測試:這裡選擇一個單詞統計進行測試。
1 2 |
cd /usr/local/hadoop/share/hadoop/mapreduce
]$yarnjarhadoop-mapreduce-examples-2.10.0.jarwordcount /test/fstab /test/count
|
可以在下面頁面檢視到正在執行的任務:
1 |
http: //10 .3.149.20:8088 /cluster/apps
|
檢視運算的結果:
1 |
]$hdfsdfs- cat /test/count/part-r-00000
|
yarn常用命令:
檢視執行中的應用:
1 |
~]$yarnapplication-list
|
已經執行過的應用:
1 |
~]$yarnapplication-list-appStates=all
|
檢視應用的狀態:
1 |
~]$yarnapplication-statusapplication_1595496103452_0001
|