1. 程式人生 > >Centos 7 安裝Hadoop叢集

Centos 7 安裝Hadoop叢集

Hadoop學習筆記

1. 準備工作

1.1 虛擬機器安裝jdk

準備3臺centos7的虛擬機器

配置jdk

按照連結裡安裝的虛擬機器為最小虛擬機器,裡面並未安裝jdk

安裝jdk

yum install java-1.8.0-openjdk* -y

檢視javaanz

[[email protected] hadoop]# which java

/usr/bin/java

[[email protected] hadoop]# ls -lrt /usr/bin/java

lrwxrwxrwx. 1 root root 22 10月 12 11:16 /usr/bin/java -> /etc/alternatives/java

[[email protected] hadoop]# ls -lrt /etc/alternatives/java

lrwxrwxrwx. 1 root root 73 10月 12 11:16 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java

1.2 虛擬機器間建立信任關係-免密登入

1.2.1 修改hostname

以vm1為例,輸入命令

hostname

修改名稱

 hostname   名稱

這裡ip和hostname對應為

vm1   192.168.191.133   hadoop-server1

vm2   192.168.191.135   hadoop-server2

vm3   192.168.191.134   hadoop-server3

1.2.2 修改hosts檔案

vi /etc/hosts

加入以下內容 (自己3臺虛擬機器的ip及對應的hostname)

192.168.191.133 hadoop-server1

192.168.191.135 hadoop-server2

192.168.191.134 hadoop-server3

1.2.3 檢查是否ping通

vm1輸入以下命令:

ping -c 3 hadoop-server2

ping -c 3 hadoop-server3

vm2 輸入以下命令:

ping -c 3 hadoop-server3

這樣的結果說明三個虛擬機器是互通的

1.2.4 虛擬機器生成祕鑰檔案

以vm1為例,3臺虛擬機器同樣操作

ssh-keygen -t rsa -P ''

檢視是否建立成功(當前賬戶為root賬戶)

1.2.5 建立並分享authorized_keys檔案

進入/root/.ssh目錄,執行命令建立authorized_keys檔案,此時為空白檔案

touch /root/.ssh/authorized_keys

執行命令,將本機公鑰複製到authorized_keys檔案:

 cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

開啟另外兩臺虛擬機器上的公鑰檔案

vi /root/.ssh/id_rsa.pub

複製給vm1上的authorized_keys裡面,如下:

複製vm1上的authorized_keys檔案給另外兩臺虛擬機器的.shh檔案中

通過ftp工具將此檔案複製到另外兩臺虛擬機器上

測試ssh連線,以vm1為例

ssh hadoop-server2

ssh hadoop-server3

測試完成,exit退出,負責是在其他虛擬機器上進行操作了

到這步,三臺虛擬機器之間的信任關係已經建立

2 安裝hadoop

2.1 下載hadoop,連結為:

通過ftp上傳至3臺虛擬機器上新建的/opt/hadoop檔案裡,上傳完成後,進行以下操作

2.2 安裝配置hadoop

cd /opt/hadoop

        執行解壓命令:

tar -xvf hadoop-2.8.0.tar.gz

            說明:3臺機器都要進行上述操作,解壓縮後得到一個名為hadoop-2.8.0的目錄。

新建目錄

mkdir  /root/hadoop

mkdir  /root/hadoop/tmp

mkdir  /root/hadoop/var

mkdir  /root/hadoop/dfs

mkdir  /root/hadoop/dfs/name

mkdir  /root/hadoop/dfs/data

2.2.1 修改配置檔案core-site.xml 

   vi /opt/hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml

在<configuration>節點內加入配置:

        <property>

                <name>hadoop.tmp.dir</name>

                <value>/root/hadoop/tmp</value>

                <description>Abase for other temporary directories.</description>

        </property>

        <property>

                <name>fs.default.name</name>

                <value>hdfs://hadoop-server1:9000</value>

        </property>

2.2.2 修改配置檔案hdfs-site.xml

vi /opt/hadoop/hadoop-2.8.0/etc/hadoop/hdfs-site.xml

在<configuration>節點內加入配置:

        <property>

                <name>dfs.name.dir</name>

                <value>/root/hadoop/dfs/name</value>

                <description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>

        </property>

        <property>

                <name>dfs.data.dir</name>

                <value>/root/hadoop/dfs/data</value>

                <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>

        </property>

        <property>

                <name>dfs.replication</name>

                <value>2</value>

        </property>

        <property>

                <name>dfs.permissions</name>

                <value>ture</value>

                <description>need not permissions</description>

        </property>

2.2.4 生成並修改配置檔案mapred-site.xml

在該版本中,有一個名為mapred-site.xml.template的檔案,複製該檔案,命令是:      

cp   /opt/hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml.template     /opt/hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml

然後改名為mapred-site.xml

vi  /opt/hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml

在<configuration>節點內加入配置:

        <property>

                <name>mapred.job.tracker</name>

                <value>hadoop-server1:49001</value>

        </property>

        <property>

                <name>mapred.local.dir</name>

                <value>/root/hadoop/var</value>

        </property>

        <property>

                <name>mapreduce.framework.name</name>

                <value>yarn</value>

        </property>

2.2.5 修改slaves檔案

vi /opt/hadoop/hadoop-2.8.0/etc/hadoop/slaves

將裡面的localhost刪除,新增如下內容:

hadoop-server2

hadoop-server3

2.2.6 修改yarn-site.xml

vi /opt/hadoop/hadoop-2.8.0/etc/hadoop/yarn-site.xml

在<configuration>節點內加入配置

        <property>

                <name>yarn.resourcemanager.hostname</name>

                <value>hadoop-server1</value>

        </property>

       <property>

                <description>The address of the applications manager interface in the RM.</description>

                <name>yarn.resourcemanager.address</name>

                <value>${yarn.resourcemanager.hostname}:8032</value>

        </property>

        <property>

                <description>The address of the scheduler interface.</description>

                <name>yarn.resourcemanager.scheduler.address</name>

                <value>${yarn.resourcemanager.hostname}:8030</value>

        </property>

        <property>

                <description>The http address of the RM web application.</description>

                <name>yarn.resourcemanager.webapp.address</name>

                <value>${yarn.resourcemanager.hostname}:8088</value>

        </property>

        <property>

                <description>The https adddress of the RM web application.</description>

                <name>yarn.resourcemanager.webapp.https.address</name>

                <value>${yarn.resourcemanager.hostname}:8090</value>

        </property>

        <property>

                <name>yarn.resourcemanager.resource-tracker.address</name>

                <value>${yarn.resourcemanager.hostname}:8031</value>

        </property>

        <property>

                <description>The address of the RM admin interface.</description>

                <name>yarn.resourcemanager.admin.address</name>

                <value>${yarn.resourcemanager.hostname}:8033</value>

        </property>

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

        <property>

                <name>yarn.scheduler.maximum-allocation-mb</name>

                <value>10240</value>

                <discription>每個節點可用記憶體,單位MB,預設8182MB</discription>

        </property>

        <property>

                <name>yarn.nodemanager.vmem-pmem-ratio</name>

                <value>2.1</value>

        </property>

        <property>

                <name>yarn.nodemanager.resource.memory-mb</name>

                <value>10240</value>

        </property>

        <property>

                <name>yarn.nodemanager.vmem-check-enabled</name>

                <value>false</value>

        </property>

注意:3臺虛擬機器安裝的hadoop的修改配置檔案都是一樣的,這些不需要根據不同的虛擬機器更改不同的名稱

 2.2.7 修改hadoop-env.sh

        vi /opt/hadoop/hadoop-2.8.0/etc/hadoop/hadoop-env.sh

         將export   JAVA_HOME=${JAVA_HOME}

         修改為:

         export   JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64

        說明:修改為自己的JDK路徑

3啟動hadoop

3.1 在namenode上執行初始化

  因為hadoop-server1是namenode,hadoop-server2和hadoop-server3都是datanode,所以只需要對hadoop-server1進行初始化操作,也就是對hdfs進行格式化。

cd   /opt/hadoop/hadoop-2.8.0/bin

 執行初始化指令碼,也就是執行命令:

 ./hadoop  namenode  -format

格式化成功後,可以在看到在/root/hadoop/dfs/name/目錄多了一個current目錄,而且該目錄內有一系列檔案

3.2在namenode上執行啟動命令

 因為hadoop-server1是namenode,hadoop-server2和hadoop-server3都是datanode,所以只需要再hadoop-server1上執行啟動命令即可。

進入到hadoop-server1這臺機器的/opt/hadoop/hadoop-2.8.0/sbin目錄,也就是執行命令:

cd /opt/hadoop/hadoop-2.8.0/sbin

執行初始化指令碼,也就是執行命令:

./start-all.sh

第一次執行上面的啟動命令,會需要我們進行互動操作,在問答介面上輸入yes回車

4 測試是否安裝成功

  haddoop啟動了,需要測試一下hadoop是否正常。

 執行命令,關閉防火牆,CentOS7下,3臺伺服器都要關閉防火牆,命令是:

systemctl   stop   firewalld.service

關機啟動後不開啟防火牆

systemctl   disable   firewalld.service 

不關閉防火牆,後期安裝hive後,匯入資料時會報錯的

訪問路徑,ip為namenode虛擬的ip地址,埠號為:50070

如圖所示:

訪問路徑:

如上圖所示為安裝好的頁面