1. 程式人生 > >HUE整合HDFS MR

HUE整合HDFS MR

HUE(HadoopUser Experience)管理工具HUE是一個開源的HadoopUl系統,它基於PythonWEB框架實現,通過使用HUE我們可以在瀏覽器端的Web控制檯上與Hadoop叢集進行互動來分析處理資料。
官網下載頁面 http://gethue.com/category/release/

環境與軟體
系統:CentOS 6.5 三臺 搭建hadoop叢集
軟體:hue-3.7.0-cdh5.3.6.tar.gz

mini01 mini02 mini03
NameNode SecondaryNameNode
DataNode DataNode DataNode
ResourceManager JobHistoryServer
NodeManager NnodeManager NodeManager
HUE

1.準備環境依賴

在這裡插入圖片描述
這裡整理好了,可以直接yum安裝

[[email protected] ~]$  sudo
yum install -y ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

2.解壓HUE

[[email protected] tools]$ tar -zxvf hue-3.7.0-cdh5.3.6.tar.gz -C ..
/install

3.編譯HUE

[[email protected] tools]$ cd ../install/hue-3.7.0-cdh5.3.6/

[[email protected] hue-3.7.0-cdh5.3.6]$ make apps

4.配置HUE

修改Hue.ini檔案
路徑:/home/hadoop/install/hue-3.7.0-cdh5.3.6/desktop/conf/hue.ini
修改內容參照如下

    secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
    http_host=mini01
    http_port=8888
    time_zone=Asia/Shanghai
# Webserver runs as this user
    server_user=hue
    server_group=hue

與HDFS整合按照如下配置

[[hdfs_clusters]]
    # HA support by using HttpFs
    #hdfs如果配置了高可用,則需要使用hffpFs

    [[[default]]]     #我沒有配置高可用所以埠是9000,如果高可用則是8020
      # Enter the filesystem uri
      fs_defaultfs=hdfs://mini01:9000

      # NameNode logical name.
      ## logical_name=

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      ## webhdfs_url=http://localhost:50070/webhdfs/v1
      #這裡如果配置了高可用,那麼埠就是14000
       webhdfs_url=http://mini01:50070/webhdfs/v1
      # Change this if your HDFS cluster is Kerberos-secured
      ##security_enabled=false

      # Default umask for file and directory creation, specified in an octal value.
      ## umask=022

      # Directory of the Hadoop configuration
      ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
      #配置hadoop的一些配置檔案路徑
	  hadoop_conf_dir=/home/hadoop/install/hadoop-2.5.0-cdh5.3.6/etc/hadoop
      hadoop_hdfs_home=/home/hadoop/install/hadoop-2.5.0-cdh5.3.6
      hadoop_bin=/home/hadoop/install/hadoop-2.5.0-cdh5.3.6/bin

與YARN的整合相關配置如下

 [[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      ## resourcemanager_host=localhost
resourcemanager_host=mini01
      # The port where the ResourceManager IPC listens on
      ## resourcemanager_port=8032
resourcemanager_port=8032
      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # URL of the ResourceManager API
      ## resourcemanager_api_url=http://localhost:8088
resourcemanager_api_url=http://mini01:8088
      # URL of the ProxyServer API
      ## proxy_api_url=http://localhost:8088

      # URL of the HistoryServer API
      ## history_server_api_url=http://localhost:19888
      #歷史伺服器
 history_server_api_url=http://mini02:19888
      # In secure mode (HTTPS), if SSL certificates from Resource Manager's
      # Rest Server have to be verified against certificate authority
      ## ssl_cert_ca_verify=False

    # HA support by specifying multiple clusters
    # e.g.
   #配置HA高可用
    # [[[ha]]]
      # Resource Manager logical name (required for HA)
      ## logical_name=my-rm-name

5.配置Hadoop的配置檔案

5.1 core-site.xml

  <property>
                <!--在任何地方代理hue使用者-->
                <name>hadoop.proxyuser.hue.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>hadoop.proxyuser.hue.groups</name>
                <value>*</value>
        </property>

5.2 hdfs-site.xml

<!--Enable WebHDFS (REST API) in Namenodes and Datanodes-->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!--關閉許可權檢查-->
<property>
     <name>dfs.permissions.enabled</name>
     <value>false</value>
</property>

5.3 httpfs-site.xml

<property>
<name>httpfs.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>httpfs.proxyuser.hue.groups</name>
<value>*</value>
</property>
<!--以上兩個屬性主要用於HUE服務與Hadoop服務不在同一臺節點上所必須的配置。
提示:
* 如果沒有配置NameNode的HA,HUE可以用WebHDFS來管理HDFS
* 如果配置了NameNodeHA,則HUE只可用HttpFS來管理HDFS-->

5.4 分發hadoop配置檔案.

xsync install/hadoop-2.5.0-cdh5.3.6/etc/hadoop

xsync同步指令碼,程式碼如下:

#!/bin/bash
#1 獲取輸入引數個數,如果沒有引數,直接退出
pcount=$#
if((pcount==0));then
echo no args;
exit;
fi

#2 獲取檔名稱
p1=$1 fname=`basename $p1`
echo fname=$fname

#3 獲取上級目錄到絕對路徑
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir

#4 獲取當前使用者名稱稱
user=`whoami`
#5 迴圈
for((host=1; host<4; host++)); do
echo $pdir/$fname [email protected]$host:$pdir
echo --------------- mini0$host ----------------
rsync -rvl $pdir/$fname [email protected]$host:$pdir
done

5.5 啟動httpfs服務

[[email protected] install]$ ~/install/hadoop-2.5.0-cdh5.3.6/sbin/httpfs.sh start

6 測試

6.1 啟動HDFS

[[email protected] install]$ start-dfs.sh

6.2 啟動YARN

[[email protected] install]$ start-yarn.sh

6.3 啟動HUB服務

[[email protected] install]$ ~/install/hue-3.7.0-cdh5.3.6/build/env/bin/supervisor

7.結果

[[email protected] sbin]$ xcall.sh jps
============= mini01 jps =============
1344 DataNode
1602 ResourceManager
1250 NameNode
1701 NodeManager
2045 Jps
============= mini02 jps =============
1635 NodeManager
1848 Jps
1530 DataNode
============= mini03 jps =============
1302 NodeManager
1448 Jps
1197 SecondaryNameNode
1134 DataNode
[[email protected] sbin]$ 

啟動HUE服務出現如下提示,表示啟動成功

[INFO] Not running as root, skipping privilege drop
starting server with options {'ssl_certificate': None, 'workdir': None, 'server_name': 'localhost', 'host': '192.168.13.128', 'daemonize': False, 'threads': 10, 'pidfile': None, 'ssl_private_key': None, 'server_group': 'hue', 'ssl_cipher_list': 'DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2', 'port': 8888, 'server_user': 'hue'}

測試一下 開啟mini01的8888 WEB頁面
第一次登入需要建立帳戶
第一次登陸需要建立帳戶,我們建立為admin 密碼admin

登入成功後,介面如下:
第一次登入我們選擇右上角的 File Browser 來管理HDFS平臺的檔案.可以上傳,下載,檢視檔案內容等操作
HDFS
YARN的管理在 Job Browser 下.