1. 程式人生 > >Hadoop整合Hue詳解

Hadoop整合Hue詳解

Hue安裝

環境說明

作業系統:Ubuntu 14.04

叢集節點:

  • Master
  • slave1
  • slave2

hadoop使用者為:root

這裡我們將hue安裝在Slave2節點上

安裝編譯hue需要的相關依賴

sudo apt-get install ant gcc g++ libkrb5-dev libffi-dev libmysqlclient-dev libssl-dev libsasl2-dev libsasl2-modules-gssapi-mit libsqlite3-dev libtidy-0.99-0 libxml2-dev libxslt-dev
make libldap2-dev maven python-dev python-setuptools libgmp3-dev

下載解壓並移動

到官網下載對應tar包

root@slave2:~$ sudo tar zxvf hue-3.10.0.tgz
root@slave2:~$ sudo cp -R hue-3.10.0 /usr/local/hue

編譯

root@slave2:~$ cd /usr/local/hue
root@slave2:/usr/local/hue# sudo make apps

新增hue使用者並賦權

root@slave2:/usr/local/hue
# sudo adduser hue root@slave2:sudo chmod -R 775 /usr/local/hue root@slave2:sudo chown -R hue:hue /usr/local/hue

啟動hue

root@slave2:/usr/local/hue# ./build/env/bin/supervisor

開啟slave2:8888檢視到hue介面,代表hue安裝成功。

下一步就是配置hue,使它能夠管理hdfs、hive、hbase,並能使用Oozie、Pig等,將在下面的文章中給大家介紹。

Hue配置

配置叢集的訪問許可權

由於hue的啟動使用者是hue,所以需要為hue新增叢集的訪問許可權,在各節點的/usr/local/hadoop/etc/hadoop/core-site.xml,新增如下引數:

 <property>
      <name>hadoop.proxyuser.hue.hosts</name>
      <value>*</value>
 </property>
 <property>
      <name>hadoop.proxyuser.hue.groups</name>
      <value>*</value>
 </property>

配置完,記得重啟hadoop叢集

配置hdfs

配置/usr/local/hue/desktop/conf/hue.ini

1)配置hdfs的超級使用者

  # This should be the hadoop cluster admin
  default_hdfs_superuser=root

2)hdfs相關配置

這裡主要配置三項:fs_defaultfs、webhdfs_url、hadoop_conf_dir;

其中,webhdfs_url預設本身就是開啟的,不需要在hadoop中特別開啟。

  [[hdfs_clusters]]
    # HA support by using HttpFs
    [[[default]]]

      # Enter the filesystem uri
      fs_defaultfs=hdfs://Master:8020

      # NameNode logical name.
      ## logical_name=

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://Master:50070/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

      # Directory of the Hadoop configuration
      hadoop_conf_dir=/usr/local/hadoop/etc/hadoop

配置yarn

配置/usr/local/hue/desktop/conf/hue.ini;

主要配置四個地方:resourcemanager_host、resourcemanager_api_url、proxy_api_url、history_server_api_url。

[[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=Master

      # The port where the ResourceManager IPC listens on
      ## resourcemanager_port=8032

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://Master:8088

      # URL of the ProxyServer API
      proxy_api_url=http://Master:8088

      # URL of the HistoryServer API
      history_server_api_url=http://Master:19888

      # URL of the Spark History Server
      ## spark_history_server_url=http://localhost:18088

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

配置hive

1)首先配置hue.ini

主要配置兩個地方:hive_server_host、hive_conf_dir。

[beeswax]

  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=Master

  # Port where HiveServer2 Thrift server runs on.
  ## hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/usr/local/hive/conf

  # Timeout in seconds for thrift calls to Hive service
  ## server_conn_timeout=120

2)啟動hive2

root@Master:/usr/local/hive/bin# hive --service hiveserver2 &

配置hbase

1)首先配置hue.ini

主要配置兩個地方:hbase_clusters、hbase_conf_dir。

[hbase]
  # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
  # Use full hostname with security.
  # If using Kerberos we assume GSSAPI SASL, not PLAIN.
  hbase_clusters=(Cluster|Master:9090)

  # HBase configuration directory, where hbase-site.xml is located.
  hbase_conf_dir=/usr/local/hbase/conf

  # Hard limit of rows or columns per row fetched before truncating.
  ## truncate_limit = 500

  # 'buffered' is the default of the HBase Thrift Server and supports security.
  # 'framed' can be used to chunk up responses,
  # which is useful when used in conjunction with the nonblocking server in Thrift.
  ## thrift_transport=buffered

2)啟動thrift

root@Master:/usr/local/hbase/bin# hbase-daemon.sh start thrift

特別注意:這裡的thrift必須是1,而不是thrift2

啟動hue

root@slave2:/usr/local/hue# ./build/env/bin/supervisor

開啟slave2:8888/about/檢視到hue介面,如果頁面中沒有報hdfs、yarn、hbase、hive相關的警告則代表配置成功,之後就能在hue中使用相關的功能。

但是,我們可能會看到如下警告:

SQLITE_NOT_FOR_PRODUCTION_USE   SQLite is only recommended for small development environments with a few users.
Impala                          No available Impalad to send queries to.
Oozie Editor/Dashboard          The app won't work without a running Oozie server
Pig Editor                      The app won't work without a running Oozie server
Spark                           The app won't work without a running Livy Spark Server

那是由於我們沒有安裝和配置相應功能,該塊內容,將在後續文章中補充。