Cent OS安裝Hadoop
環境介紹
兩臺機器。
# uname -a
Linux xxx 2.6.32_1-16-0-0_virtio #1 SMP Thu May 14 15:30:56 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
軟體版本:
zookeeper: 3.4.8
hadoop: 2.7.2
目錄路徑需要自行替換為自己的目錄路徑。
Tip:
開啟 ssh 免認證登入ssh(自行安裝ssh)。
$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
公鑰自行變更
下載資源
Hadoop官網 有詳細的資料介紹。 Linux下可以直接選取最近的點下載。
$ pwd
/home/www/install
$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7 .2.tar.gz
Hadoop依賴JDK 和 Zookeeper, 需要自行下載安裝。
開始安裝
安裝以及日後的運維推薦使用專屬賬戶hadoop
進行.
Hadoop依賴SSH, 因此需要配置 SSH 免密碼登入。
如下執行使用者 hadoop
當前為單機
解壓Hadoop並準備編輯各類配置檔案
$ pwd
/home/www/install
$ tar -xzvf hadoop-2.7.2.tar.gz
$ cd hadoop-2.7.2/etc/hadoop/
被編輯檔案列表:
- core-site.xml
- mapred-site.xml.template
- hdfs-site.xml
core-site.xml
hadoop.tmp.dir
值可以自行替換
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/www/install/hadoop-2.7.2/tmp</value>
<description>Hadoop Temp Dir</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8900</value>
</property>
</configuration>
mapred-site.xml.template
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8901</value>
</property>
</configuration>
hdfs-site.xml
dfs.namenode.name.dir
和dfs.datanode.data.dir
可在hadoop.tmp.dir
目錄之下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/data</value>
</property>
</configuration>
初始化 HDFS 系統
$ pwd
/home/www/install/hadoop-2.7.2
$ ./bin/hdfs namenode -format
開啟
$ pwd
/home/www/install/hadoop-2.7.2
$ sh sbin/start-dfs.sh
此處可能出現的問題:
① 多次提示登入
配置SSH 免密碼登入。
② localhost: Error: JAVA_HOME is not set and could not be found
將 JAVA_HOME
寫死在 hadoop.env.sh
中
$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop
$ echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91" >> hadoop-env.sh
檢視程序資訊
$ jps
20724 SecondaryNameNode
20041 NameNode
22444 Jps
20429 DataNode
Hadoop的啟動和關閉都非常直觀,可見 sbin目錄下。
比如 start-dfs.sh, stop-dfs.sh, start-all.sh, stop-all.sh
多點安裝
叢集安裝中大題跟單點安裝一致。
- 需要注意叢集間能互相訪問並且免密碼
ssh
登入 - 需要配置
slaves
檔案
原封不動的拷貝單點安裝時的hadoop去需要多點安裝的機器。
選擇一臺機器作為Master, 可以免密碼 ssh 登入 Slaves 機器。 修改配置。
$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop
echo "設定的slaves" >> slaves
通過命令 sbin/start-all.sh
啟動hadoop叢集。
在Master上可以看到
$ jps
9792 SecondaryNameNode
10420 NodeManager
9462 DataNode
9031 NameNode
10185 ResourceManager
27806 Jps
在 Slaves 所有機器上可以看到
$ jps
7283 Jps
16564 DataNode
17192 NodeManager
12687 NameNode
Hadoop Hello World
hadoop提供了很多的 Hello World案例。 通常比較喜歡wordcount.
隨意選擇hadoop叢集裡面的一臺機器。
$ pwd
/home/www/install/hadoop-2.7.2/share/hadoop/mapreduce
$ echo "hello world NO.1" > h1
$ echo "hello world NO.2" > h2
$ hadoop fs -put h1 /input
$ hadoop fs -put h2 /input
$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
$ hadoop fs -ls /output/
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2016-07-06 13:51 /output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 30 2016-07-06 13:51 /output/part-r-00000
$ hadoop fs -cat /output/part-r-00000
NO.1 1
NO.2 1
hello 2
world 2
Hadoop有很多的Hello World支援, 可以參照如下文件替換執行。
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
完畢