1. 程式人生 > 實用技巧 >大資料運維(46)hadoop 2.10叢集搭建

大資料運維(46)hadoop 2.10叢集搭建

hadoop官方文件:

1 https://hadoop.apache.org/docs/

安裝hadoop叢集

配置DNS解析或hosts檔案:

1 2 3 4 5 6 7 cat>/etc/hosts<<EOF 127.0.0.1localhostlocalhost.localdomainlocalhost4localhost4.localdomain4 ::1localhostlocalhost.localdomainlocalhost6localhost6.localdomain6 10.3.149.20hadoop-master 10.3.149.21hadoop-node1
10.3.149.22hadoop-node2 EOF

配置root使用者免祕鑰:

1 2 3 4 5 6 7 ssh-keygen ssh-copy-id-i.ssh/id_rsa.pubroot@hadoop-master ssh-copy-id-i.ssh/id_rsa.pubroot@hadoop-node1 ssh-copy-id-i.ssh/id_rsa.pubroot@hadoop-node2 sshroot@hadoop-master'date' sshroot@hadoop-node1'date' sshroot@hadoop-node2'date'

配置hadoop免祕鑰:

1 2 3 4 5 6 7 8 9 10 11 12 useraddhadoop echo'123456'|passwd--stdinhadoop suhadoop ssh-keygen ssh-copy-id-i.ssh/id_rsa.pubhadoop@hadoop-master ssh-copy-id-i.ssh/id_rsa.pubhadoop@hadoop-node1 ssh-copy-id-i.ssh/id_rsa.pubhadoop@hadoop-node2 sshhadoop@hadoop-master'date' sshhadoop@hadoop-node1
'date' sshhadoop@hadoop-node2'date' exit

安裝java:

1 tar-xfjdk-8u231-linux-x64.tar.gz-C/usr/local/

建立軟連線:

1 2 cd/usr/local/ ln-svjdk1.8.0_231/jdk

新增環境變數:

1 2 3 4 5 6 7 cat>/etc/profile.d/java.sh<<EOF exportJAVA_HOME=/usr/local/jdk exportJRE_HOME=\$JAVA_HOME/jre exportCLASSPATH=.:\$JAVA_HOME/lib/dt.jar:\$JAVA_HOME/lib/tools.jar:\$JRE_HOME/lib exportPATH=\$PATH:\$JAVA_HOME/bin:\$JRE_HOME/bin EOF ./etc/profile.d/java.sh

測試是否安裝成功:

1 2 java-version javac-version

安裝hadoop:

hadoop下載地址:

1 2 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ http://archive.apache.org/dist/hadoop/common/

hadoop2.7版本的:

1 http://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

下載安裝包:

1 wgethttps://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

解壓:

1 2 3 tar-xfhadoop-2.10.0.tar.gz-C/usr/local/ cd/usr/local/ ln-svhadoop-2.10.0/hadoop

配置環境變數:

1 2 3 4 cat>/etc/profile.d/hadoop.sh<<EOF exportHADOOP_HOME=/usr/local/hadoop exportPATH=\$PATH:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin EOF

應用環境變數:

1 ./etc/profile.d/hadoop.sh

建立資料目錄:

1 2 3 4 #master mkdir-pv/data/hadoop/hdfs/{nn,snn} #node mkdir-pv/data/hadoop/hdfs/dn

master節點的配置:

進入配置目錄:

1 cd/usr/local/hadoop/etc/hadoop

core-site.xml

1 2 3 4 5 6 7 <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop-master:8020</value> <final>true</final> </property> </configuration>

yarn-site.xml

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 <configuration> <property> <name>yarn.resourcemanager.address</name> <value>hadoop-master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop-master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop-master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop-master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop-master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property> </configuration>

hdfs-site.xml

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/hadoop/hdfs/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/hadoop/hdfs/dn</value> </property> <property> <name>fs.checkpoint.dir</name> <value>file:///data/hadoop/hdfs/snn</value> </property> <property> <name>fs.checkpoint.edits.dir</name> <value>file:///data/hadoop/hdfs/snn</value> </property> </configuration>

mapred-site.xml

1 2 3 4 5 6 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

建立master檔案:

1 2 3 cat>master<<EOF hadoop-master EOF

建立slave

1 2 3 4 cat>slaves<<EOF hadoop-node1 hadoop-node2 EOF

常用配置註解:

1 http://blog.51yip.com/hadoop/2020.html

node節點上:

將主節點上的配置複製到node節點即可:

1 2 scp./*root@hadoop-node1:/usr/local/hadoop/etc/hadoop/ scp./*root@hadoop-node2:/usr/local/hadoop/etc/hadoop/

刪除slaves檔案:其他配置同master。

1 rm/usr/local/hadoop/etc/hadoop/slaves-rf

建立日誌目錄:

1 2 mkdir/usr/local/hadoop/logs chmodg+w/usr/local/hadoop/logs/

改屬主屬組:

1 2 3 chown-Rhadoop.hadoop/data/hadoop/ cd/usr/local/ chown-Rhadoop.hadoophadoophadoop/

啟動與停止叢集

格式化hdfs:格式化之後就可以啟動叢集了

1 2 suhadoop [hadoop@hadoop-master~]$hadoopnamenode-format

先啟動hdfs:從下面的輸出可以看出各個節點以及執行的程式。

1 2 3 4 5 6 7 [hadoop@hadoop-master~]$start-dfs.sh Startingnamenodeson[hadoop-master] hadoop-master:startingnamenode,loggingto/usr/local/hadoop-2.10.0/logs/hadoop-hadoop-namenode-hadoop-master.out hadoop-node2:startingdatanode,loggingto/usr/local/hadoop-2.10.0/logs/hadoop-hadoop-datanode-hadoop-node2.out hadoop-node1:startingdatanode,loggingto/usr/local/hadoop-2.10.0/logs/hadoop-hadoop-datanode-hadoop-node1.out Startingsecondarynamenodes[0.0.0.0] 0.0.0.0:startingsecondarynamenode,loggingto/usr/local/hadoop-2.10.0/logs/hadoop-hadoop-secondarynamenode-hadoop-master.out

檢視本節點執行的程序:可以到任意一個節點上使用如下命令。

1 2 3 4 5 ~]$jps 1174Jps 32632ResourceManager 32012NameNode 32220SecondaryNameNode

再啟動yarn:可以看到對應的節點啟動的程序。

1 2 3 4 5 [hadoop@hadoop-master~]$start-yarn.sh startingyarndaemons startingresourcemanager,loggingto/usr/local/hadoop-2.10.0/logs/yarn-hadoop-resourcemanager-hadoop-master.out hadoop-node2:startingnodemanager,loggingto/usr/local/hadoop-2.10.0/logs/yarn-hadoop-nodemanager-hadoop-node2.out hadoop-node1:startingnodemanager,loggingto/usr/local/hadoop-2.10.0/logs/yarn-hadoop-nodemanager-hadoop-node1.out

或者一次性啟動:

1 [hadoop@hadoop-master~]$start-all.sh

檢視hadoop叢集的執行狀態:

1 hadoopdfsadmin-report

訪問概覽web頁面:

1 http://10.3.149.20:50070/

叢集資訊web頁面:

1 http://10.3.149.20:8088/cluster

停止叢集:

1 2 stop-dfs.sh stop-yarn.sh

或者:

1 stop-all.sh

hdfs檔案系統的使用

瀏覽目錄:

1 ~]$hdfsdfs-ls/

建立目錄:

1 ~]$hdfsdfs-mkdir/test

上傳檔案:

1 ~]$hdfsdfs-put/etc/fstab/test/fstab

檢視檔案儲存位置:到其中一個datanode上的資料目錄就可以檢視到這個檔案塊,預設為128m,超過這個大小檔案會分成兩塊,但是小於128m的檔案並不會真正佔用128m。

1 ]$cat/data/hadoop/hdfs/dn/current/BP-1469813358-10.3.149.20-1595493741225/current/finalized/subdir0/subdir0/blk_1073741825

遞迴瀏覽

1 ~]$hdfsdfs-ls-R/

檢視檔案:

1 ~]$hdfsdfs-cat/fstab

更多使用命令幫助:

1 https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

統計字元數運算示例:

/usr/local/hadoop/share/hadoop/mapreduce目錄中有很多用於計算的示例可以用來測試。

先上傳用於測試的檔案:

1 2 hdfsdfsmkdir/test hdfsdfs-put/etc/fstab/test/fstab

檢視幫助:直接執行程式會給出幫助資訊

1 yarnjarhadoop-mapreduce-examples-2.10.0.jar

測試:這裡選擇一個單詞統計進行測試。

1 2 cd/usr/local/hadoop/share/hadoop/mapreduce ]$yarnjarhadoop-mapreduce-examples-2.10.0.jarwordcount/test/fstab/test/count

可以在下面頁面檢視到正在執行的任務:

1 http://10.3.149.20:8088/cluster/apps

檢視運算的結果:

1 ]$hdfsdfs-cat/test/count/part-r-00000

yarn常用命令:

檢視執行中的應用:

1 ~]$yarnapplication-list

已經執行過的應用:

1 ~]$yarnapplication-list-appStates=all

檢視應用的狀態:

1 ~]$yarnapplication-statusapplication_1595496103452_0001