安裝Spark+hadoop,spark、hadoop分散式叢集搭建...(親自搭建過!!)
首先說一下我所用的版本:
spark-2.1.1-bin-hadoop2.7.tgz
hadoop-2.7.3.tar.gz
jdk-8u131-linux-x64.rpm
我們實驗室有4臺伺服器:每個節點硬碟:300GB
,記憶體:64GB
。四個節點的hostname
分別是master
,slave01
,slave02
,slave03
。
我用的是Spark做平行計算,用HDFS作為資料的分散式儲存,這樣的話就得安裝hadoop利用裡面的HDFS。如果你不用hadoop的話可以直接跳到第7步,直接安裝spark即可!
1。先裝java1.8
環境:給各個節點上傳jdk-8u131-linux-x64.rpm
/home
裡面。用rpm安裝。
[root@localhost home]# rpm -ivh jdk-8u131-linux-x64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:jdk1.8.0_131-2000:1.8.0_131-fcs ################################# [100%]
Unpacking JAR files...
tools.jar...
plugin.jar...
javaws.jar...
deploy.jar...
rt.jar...
jsse.jar...
charsets.jar...
localedata.jar...
[root@localhost home]# java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
如上:java1.8 安裝成功!!
2。叢集核准時間:(如果叢集時間一致的話,此步略過!)
時間必須同步,因為節點之間要傳送心跳,如果時間不一致的話,會產生錯誤。
用date -s 命令也行!(下面是ntp伺服器來同步時間)
##在每個節點上執行安裝ntp服務
[hadoop@master ~]$ sudo yum install -y ntp
##在每個節點上同時執行`sudo ntpdate us.pool.ntp.org`
[hadoop@master ~]$ sudo ntpdate us.pool.ntp.org
5 Oct 18:19:41 ntpdate[2997]: step time server 138.68.46.177 offset -6.006070 sec
或者也可以在某個節點上啟動一個ntp伺服器:
##在每個節點上執行安裝ntp服務
[hadoop@master ~]$ sudo yum install -y ntp
##在192.168.2.219節點上執行`sudo ntpdate us.pool.ntp.org`把這個節點作為ntp同步伺服器
[hadoop@master ~]$ sudo ntpdate us.pool.ntp.org
5 Oct 18:19:41 ntpdate[2997]: step time server 138.68.46.177 offset -6.006070 sec
##在各個節點上開啟ntp服務
[hadoop@master ~]$ sudo service ntpd start
Redirecting to /bin/systemctl start ntpd.service
##在其他節點上同步192.168.2.219節點ntp伺服器上的時間。
[hadoop@slave01 ~]$ sudo ntpdate 192.168.2.219
5 Oct 18:27:45 ntpdate[3014]: adjust time server 192.168.147.6 offset -0.001338 sec
3。新增使用者hadoop:
[root@localhost etc]# useradd -m hadoop -s /bin/bash
useradd: user 'hadoop' already exists
[root@localhost etc]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password fails the dictionary check - it is too simplistic/systematic
Retype new password:
passwd: all authentication tokens updated successfully.
[root@localhost etc]# su - hadoop
[hadoop@localhost ~]$
4。給hadoop使用者增加管理員許可權,方便部署
[root@localhost ~]#visudo
找到 root ALL=(ALL) ALL 這行(應該在第98行,可以先按一下鍵盤上的 ESC 鍵,然後輸入 :98 (按一下冒號,接著輸入98,再按回車鍵),可以直接跳到第98行 ),然後在這行下面增加一行內容:hadoop ALL=(ALL) ALL (當中的間隔為tab),如下圖所示:
5。SSH無密通訊:
[root@master .ssh]#su - hadoop
[hadoop@master ~]$
[hadoop@master ~]$ ssh localhost # 如果沒有該目錄,先執行一次ssh localhost
[hadoop@master ~]$ cd ~/.ssh
[hadoop@master ~]$ rm ./id_rsa* # 刪除之前生成的公匙(如果有)
[hadoop@master ~]$ ssh-keygen -t rsa # 一直按回車就可以
讓 Master 節點需能無密碼 SSH 本機,在 Master 節點上執行:
[hadoop@master .ssh]$ cat ./id_rsa.pub >> ./authorized_keys
完成後可執行 ssh Master 驗證一下(可能需要輸入 yes,成功後執行 exit 返回原來的終端)。接著在 master 節點將上公匙傳輸到 slave01,slave02,slave03 節點:
[hadoop@master .ssh]$ scp ~/.ssh/id_rsa.pub [email protected]:/home/hadoop/
[hadoop@master .ssh]$ scp ~/.ssh/id_rsa.pub [email protected]:/home/hadoop/
[hadoop@master .ssh]$ scp ~/.ssh/id_rsa.pub [email protected]:/home/hadoop/
接著在 slave01,slave02,slave03節點上,將 ssh 公匙加入授權:【分別在其他三個節點上執行以下命令:】
[hadoop@slave03 ~]$ mkdir ~/.ssh
[hadoop@slave03 ~]$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@slave03 ~]$ rm ~/id_rsa.pub
然後在master上執行ssh slave01 但是還是不行解決方法:
常見免密碼登入失敗分析
配置問題
檢查配置檔案/etc/ssh/sshd_config是否開啟了AuthorizedKeysFile選項
檢查AuthorizedKeysFile選項指定的檔案是否存在並內容正常
目錄許可權問題
~許可權設定為700
~/.ssh許可權設定為700
~/.ssh/authorized_keys的許可權設定為600
sudo chmod 700 ~
sudo chmod 700 ~/.ssh
sudo chmod 600 ~/.ssh/authorized_keys
6。安裝hadoop:
下面是hosts檔案內容:
192.168.2.189 slave01
192.168.2.240 slave02
192.168.2.176 slave03
192.168.2.219 master
hadoop-2.7.3.tar.gz放在~下。【一般安裝檔案都放在~下面】
我們選擇將 Hadoop 安裝至 /usr/local/ 中:
[hadoop@master ~]$ sudo tar -zxf ~/hadoop-2.7.3.tar.gz -C /usr/local # 解壓到/usr/local中
[hadoop@master ~]$ cd /usr/local/
[hadoop@master ~]$ sudo mv ./hadoop-2.7.3/ ./hadoop # 將資料夾名改為hadoop
[hadoop@master ~]$ sudo chown -R hadoop:hadoop ./hadoop # 修改檔案許可權
Hadoop 解壓後即可使用。輸入如下命令來檢查 Hadoop 是否可用,成功則會顯示 Hadoop 版本資訊:
[[email protected] local]$ cd /usr/local/hadoop
[[email protected] hadoop]$ ./bin/hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
6.1。 Hadoop單機配置(非分散式) ,注:先把一個節點的hadoop裝好後,然後再依次拷貝到其他的節點上。
Hadoop 預設模式為非分散式模式,無需進行其他配置即可執行。非分散式即單 Java 程序,方便進行除錯。
現在我們可以執行例子來感受下 Hadoop 的執行。Hadoop 附帶了豐富的例子(執行 ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
可以看到所有例子),包括 wordcount、terasort、join、grep
等。
在此我們選擇執行 grep 例子,我們將 input 資料夾中的所有檔案作為輸入,篩選當中符合正則表示式 dfs[a-z.]+
的單詞並統計出現的次數,最後輸出結果到 output 資料夾中。
[hadoop@master hadoop]$ cd /usr/local/hadoop
[hadoop@master hadoop]$ mkdir ./input
[hadoop@master hadoop]$ cp ./etc/hadoop/*.xml ./input # 將配置檔案作為輸入檔案
[hadoop@master hadoop]$./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'
[hadoop@master hadoop]$ cat ./output/* # 檢視執行結果
1 dfsadmin
6.2。修改/usr/local/hadoop/etc/hadoop/slaves:這裡配置的是三個執行節點,master節點只做master不作為執行節點。
[hadoop@master hadoop]$ vi slaves #裡面內容是:
slave01
slave02
slave03
6.3。檔案 core-site.xml 改為下面的配置:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
6.4。檔案 hdfs-site.xml,dfs.replication 一般設為 3:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
6.5。檔案 mapred-site.xml (可能需要先重新命名,預設檔名為 mapred-site.xml.template),然後配置修改如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6.6。檔案 yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
6.7。配置好後,將 master 上的 /usr/local/Hadoop 資料夾複製到各個節點上。在 master 節點上執行:
之前有跑過偽分散式模式,建議在切換到叢集模式前先刪除之前的臨時檔案。
[hadoop@master ~]$cd /usr/local
[hadoop@master ~]$sudo rm -r ./hadoop/tmp # 刪除 Hadoop 臨時檔案
[hadoop@master ~]$sudo rm -r ./hadoop/logs/* # 刪除日誌檔案
[hadoop@master ~]$tar -zcf ~/hadoop.master.tar.gz ./hadoop # 先壓縮再複製
[hadoop@master ~]$cd ~
[hadoop@master ~]$scp ./hadoop.master.tar.gz slave01:/home/hadoop
[hadoop@master ~]$scp ./hadoop.master.tar.gz slave02:/home/hadoop
[hadoop@master ~]$scp ./hadoop.master.tar.gz slave03:/home/hadoop
6.8。在各個節點執行:
在 slave01 節點上執行:
[[email protected] ~]$ sudo rm -r /usr/local/hadoop # 刪掉舊的(如果存在)
[[email protected] ~]$ sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
[[email protected] ~]$ sudo chown -R hadoop /usr/local/hadoop
在 slave02 節點上執行:
[[email protected] ~]$ sudo rm -r /usr/local/hadoop # 刪掉舊的(如果存在)
[[email protected] ~]$ sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
[[email protected] ~]$ sudo chown -R hadoop /usr/local/hadoop
在 slave03 節點上執行:
[[email protected] ~]$ sudo rm -r /usr/local/hadoop # 刪掉舊的(如果存在)
[[email protected] ~]$ sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
[[email protected] ~]$ sudo chown -R hadoop /usr/local/hadoop
6.9。首次啟動需要先在 Master 節點執行 NameNode 的格式化:
[hadoop@master ~]$ hdfs namenode -format #首次執行需要執行初始化,之後不需要
6.10。CentOS系統需要關閉防火牆
CentOS系統預設開啟了防火牆,在開啟 Hadoop 叢集之前,需要關閉叢集中每個節點的防火牆。有防火牆會導致 ping 得通但 telnet 埠不通,從而導致 DataNode 啟動了,但 Live datanodes 為 0 的情況。
在 CentOS 中,可以通過如下命令關閉防火牆:
在 CentOS 6.x 中,可以通過如下命令關閉防火牆:
sudo service iptables stop # 關閉防火牆服務
sudo chkconfig iptables off # 禁止防火牆開機自啟,就不用手動關閉了
Shell 命令
若用是 CentOS 7,需通過如下命令關閉(防火牆服務改成了 firewall):
systemctl stop firewalld.service # 關閉firewall
systemctl disable firewalld.service # 禁止firewall開機啟動
6.11。接著可以啟動 hadoop 了,啟動需要在 master 節點上進行:
注意修改:vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
,
把export JAVA_HOME=${JAVA_HOME}改為:export JAVA_HOME=/usr/java/jdk1.8.0_131/
在/usr/local/hadoop/sbin 啟動hadoop ./start-all.sh
[[email protected] sbin]$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out
slave03: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave03.out
slave01: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave01.out
slave02: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave02.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave01: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave01.out
slave02: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave02.out
slave03: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave03.out
然後jps
[hadoop@master hadoop]$ jps
6194 ResourceManager
5717 NameNode
5960 SecondaryNameNode
6573 Jps
[hadoop@slave01 hadoop]$ jps
4888 Jps
4508 DataNode
4637 NodeManager
[hadoop@slave02 hadoop]$ jps
3841 DataNode
3970 NodeManager
4220 Jps
[hadoop@slave03 hadoop]$ jps
4032 NodeManager
4282 Jps
3903 DataNode
6.12. 開啟hadoop WEBUI
在瀏覽器中輸入http://192.168.2.219:50070 【注意瀏覽器要與192.168.2.219為區域網】
6.13。執行分散式例項
首先建立 HDFS 上的使用者目錄:
[hadoop@master hadoop]$ hdfs dfs -mkdir -p /user/hadoop
將 /usr/local/hadoop/etc/hadoop 中的配置檔案作為輸入檔案複製到分散式檔案系統中:
[hadoop@master hadoop]$ hdfs dfs -mkdir input
[hadoop@master hadoop]$ hdfs dfs -put /usr/local/hadoop/etc/hadoop/*.xml input
通過檢視 DataNode 的狀態(佔用大小有改變),輸入檔案確實複製到了 DataNode 中,如下圖所示:
接著就可以執行 MapReduce 作業了:【注意執行前要保證節點時間一致】
####命令:
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
----------
####執行過程執行的log如下:
[[email protected] hadoop]$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
17/11/13 22:26:21 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.2.219:8032
17/11/13 22:26:21 INFO input.FileInputFormat: Total input paths to process : 9
17/11/13 22:26:21 INFO mapreduce.JobSubmitter: number of splits:9
17/11/13 22:26:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510581226826_0004
17/11/13 22:26:22 INFO impl.YarnClientImpl: Submitted application application_1510581226826_0004
17/11/13 22:26:22 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1510581226826_0004/
17/11/13 22:26:22 INFO mapreduce.Job: Running job: job_1510581226826_0004
17/11/13 22:26:28 INFO mapreduce.Job: Job job_1510581226826_0004 running in uber mode : false
17/11/13 22:26:28 INFO mapreduce.Job: map 0% reduce 0%
17/11/13 22:26:32 INFO mapreduce.Job: map 33% reduce 0%
17/11/13 22:26:33 INFO mapreduce.Job: map 100% reduce 0%
17/11/13 22:26:37 INFO mapreduce.Job: map 100% reduce 100%
17/11/13 22:26:37 INFO mapreduce.Job: Job job_1510581226826_0004 completed successfully
17/11/13 22:26:37 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=51
FILE: Number of bytes written=1190205
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=28817
HDFS: Number of bytes written=143
HDFS: Number of read operations=30
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=9
Launched reduce tasks=1
Data-local map tasks=9
Total time spent by all maps in occupied slots (ms)=26894
Total time spent by all reduces in occupied slots (ms)=2536
Total time spent by all map tasks (ms)=26894
Total time spent by all reduce tasks (ms)=2536
Total vcore-milliseconds taken by all map tasks=26894
Total vcore-milliseconds taken by all reduce tasks=2536
Total megabyte-milliseconds taken by all map tasks=27539456
Total megabyte-milliseconds taken by all reduce tasks=2596864
Map-Reduce Framework
Map input records=796
Map output records=2
Map output bytes=41
Map output materialized bytes=99
Input split bytes=1050
Combine input records=2
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=99
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =9
Failed Shuffles=0
Merged Map outputs=9
GC time elapsed (ms)=762
CPU time spent (ms)=7040
Physical memory (bytes) snapshot=2680807424
Virtual memory (bytes) snapshot=19690971136
Total committed heap usage (bytes)=1957691392
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=27767
File Output Format Counters
Bytes Written=143
17/11/13 22:26:37 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.2.219:8032
17/11/13 22:26:37 INFO input.FileInputFormat: Total input paths to process : 1
17/11/13 22:26:37 INFO mapreduce.JobSubmitter: number of splits:1
17/11/13 22:26:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510581226826_0005
17/11/13 22:26:37 INFO impl.YarnClientImpl: Submitted application application_1510581226826_0005
17/11/13 22:26:37 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1510581226826_0005/
17/11/13 22:26:37 INFO mapreduce.Job: Running job: job_1510581226826_0005
17/11/13 22:26:48 INFO mapreduce.Job: Job job_1510581226826_0005 running in uber mode : false
17/11/13 22:26:48 INFO mapreduce.Job: map 0% reduce 0%
17/11/13 22:26:52 INFO mapreduce.Job: map 100% reduce 0%
17/11/13 22:26:57 INFO mapreduce.Job: map 100% reduce 100%
17/11/13 22:26:58 INFO mapreduce.Job: Job job_1510581226826_0005 completed successfully
17/11/13 22:26:58 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=51
FILE: Number of bytes written=237047
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=271
HDFS: Number of bytes written=29
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2331
Total time spent by all reduces in occupied slots (ms)=2600
Total time spent by all map tasks (ms)=2331
Total time spent by all reduce tasks (ms)=2600
Total vcore-milliseconds taken by all map tasks=2331
Total vcore-milliseconds taken by all reduce tasks=2600
Total megabyte-milliseconds taken by all map tasks=2386944
Total megabyte-milliseconds taken by all reduce tasks=2662400
Map-Reduce Framework
Map input records=2
Map output records=2
Map output bytes=41
Map output materialized bytes=51
Input split bytes=128
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=51
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=110
CPU time spent (ms)=1740
Physical memory (bytes) snapshot=454008832
Virtual memory (bytes) snapshot=3945603072
Total committed heap usage (bytes)=347078656
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=143
File Output Format Counters
Bytes Written=29
----------
檢視執行結果:
[[email protected] hadoop]$ hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
執行時的輸出資訊與偽分散式類似,會顯示 Job 的進度。
可能會有點慢,但如果遲遲沒有進度,比如 5 分鐘都沒看到進度,那不妨重啟 Hadoop 再試試。
6.14。關閉 Hadoop 叢集也是在 Master 節點上執行的:./sbin/stop-all.sh 即可
[[email protected] sbin]$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave01: stopping datanode
slave03: stopping datanode
slave02: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave01: stopping nodemanager
slave02: stopping nodemanager
slave03: stopping nodemanager
slave01: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave02: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave03: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
附錄:增加一個master節點作為slaves ,這樣執行節點就變成4個了,進入/usr/local/hadoop/etc/hadoop/
修改slaves為:
master
slave01
slave02
slave03
$PWD就是當前目錄,把此目錄的slaves拷貝到其他三個節點上進行覆蓋。
[hadoop@master hadoop]$ scp slaves hadoop@slave01:$PWD
[hadoop@master hadoop]$ scp slaves hadoop@slave02:$PWD
[hadoop@master hadoop]$ scp slaves hadoop@slave03:$PWD
再啟動:
[[email protected] hadoop]$ ./sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out
master: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-master.out
slave02: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave02.out
slave01: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave01.out
slave03: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave03.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave02: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave02.out
slave01: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave01.out
master: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-master.out
slave03: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave03.out
至此hadoop安裝完成!!!!
7。安裝spark
先把spark-2.1.1-bin-hadoop2.7.tgz傳到~裡。
執行命令:
[hadoop@master ~]$ sudo tar -zxf ~/spark-2.1.1-bin-hadoop2.7.tgz -C /usr/local/
[hadoop@master ~]$ cd /usr/local
[hadoop@master ~]$ sudo cp ./spark-2.1.1-bin-hadoop2.7.tgz/ ./spark
[hadoop@master ~]$ sudo chown -R hadoop:hadoop ./spark
7.1。 在/usr/local/spark/conf裡,修改spark-env.sh新增:
export JAVA_HOME=/usr/java/jdk1.8.0_131
export SPARK_MASTER_IP=192.168.2.219
export SPARK_MASTER_PORT=7077
7.2。在/usr/local/spark/conf裡,新增內容到slaves,這裡有4個執行節點把master也算進去了,master既做管理又做計算
[[email protected] conf]$ cat slaves
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# A Spark Worker will be started on each of the machines listed below.
master
slave01
slave02
slave03
7.3。 配置好後,將Master主機上的/usr/local/spark資料夾複製到各個節點上。在Master主機上執行如下命令:
[hadoop@master local]$ <