1. 程式人生 > >Cent OS安裝Hadoop

Cent OS安裝Hadoop

環境介紹

兩臺機器。

# uname -a
Linux xxx 2.6.32_1-16-0-0_virtio #1 SMP Thu May 14 15:30:56 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

軟體版本:

zookeeper: 3.4.8
hadoop: 2.7.2

目錄路徑需要自行替換為自己的目錄路徑。

Tip:
開啟 ssh 免認證登入ssh(自行安裝ssh)。

$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

公鑰自行變更

下載資源

Hadoop官網 有詳細的資料介紹。 Linux下可以直接選取最近的點下載。

$ pwd
/home/www/install

$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7
.2.tar.gz

Hadoop依賴JDK 和 Zookeeper, 需要自行下載安裝。

開始安裝

安裝以及日後的運維推薦使用專屬賬戶hadoop進行.
Hadoop依賴SSH, 因此需要配置 SSH 免密碼登入。

如下執行使用者 hadoop
當前為單機

解壓Hadoop並準備編輯各類配置檔案

$ pwd
/home/www/install

$ tar -xzvf hadoop-2.7.2.tar.gz
$ cd hadoop-2.7.2/etc/hadoop/

被編輯檔案列表:

  • core-site.xml
  • mapred-site.xml.template
  • hdfs-site.xml

core-site.xml
hadoop.tmp.dir 值可以自行替換

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/home/www/install/hadoop-2.7.2/tmp</value>
                <description>Hadoop Temp Dir</description>
        </property>

        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:8900</value>
        </property>
</configuration>

mapred-site.xml.template

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
<configuration>

        <property>
                <name>mapred.job.tracker</name>
                <value>localhost:8901</value>
        </property>

</configuration>

hdfs-site.xml
dfs.namenode.name.dirdfs.datanode.data.dir 可在hadoop.tmp.dir 目錄之下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>

        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/name</value>
        </property>

        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/data</value>
        </property>

</configuration>

初始化 HDFS 系統

$ pwd
/home/www/install/hadoop-2.7.2

$ ./bin/hdfs namenode -format

開啟

$ pwd
/home/www/install/hadoop-2.7.2

$ sh sbin/start-dfs.sh

此處可能出現的問題:
① 多次提示登入

配置SSH 免密碼登入。

② localhost: Error: JAVA_HOME is not set and could not be found

JAVA_HOME 寫死在 hadoop.env.sh

$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop

$ echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91" >> hadoop-env.sh 

檢視程序資訊

$ jps
20724 SecondaryNameNode
20041 NameNode
22444 Jps
20429 DataNode

Hadoop的啟動和關閉都非常直觀,可見 sbin目錄下。
比如 start-dfs.sh, stop-dfs.sh, start-all.sh, stop-all.sh

多點安裝

叢集安裝中大題跟單點安裝一致。

  • 需要注意叢集間能互相訪問並且免密碼ssh登入
  • 需要配置 slaves 檔案

原封不動的拷貝單點安裝時的hadoop去需要多點安裝的機器。
選擇一臺機器作為Master, 可以免密碼 ssh 登入 Slaves 機器。 修改配置。

$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop

echo "設定的slaves" >> slaves

通過命令 sbin/start-all.sh 啟動hadoop叢集。

在Master上可以看到

$ jps
9792 SecondaryNameNode
10420 NodeManager
9462 DataNode
9031 NameNode
10185 ResourceManager
27806 Jps

在 Slaves 所有機器上可以看到

$ jps
7283 Jps
16564 DataNode
17192 NodeManager
12687 NameNode

Hadoop Hello World

hadoop提供了很多的 Hello World案例。 通常比較喜歡wordcount.

隨意選擇hadoop叢集裡面的一臺機器。

$ pwd
/home/www/install/hadoop-2.7.2/share/hadoop/mapreduce

$ echo "hello world NO.1" > h1
$ echo "hello world NO.2" > h2
$ hadoop fs -put h1 /input
$ hadoop fs -put h2 /input

$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
$ hadoop fs -ls /output/
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2016-07-06 13:51 /output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         30 2016-07-06 13:51 /output/part-r-00000

$ hadoop fs -cat /output/part-r-00000
NO.1    1
NO.2    1
hello   2
world   2

Hadoop有很多的Hello World支援, 可以參照如下文件替換執行。

  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

完畢