1. 程式人生 > 其它 >Apache Hive 快速入門

Apache Hive 快速入門

本文節選自《Netkiller Database 手札》

CentOS 7.3 + Server JRE 1.8 + Hadoop-2.8.0 + Hive-2.1.1

第 63 章 Apache Hive

目錄

  • 63.1. 安裝 Apache Hive
    • 63.1.1. MySQL
    • 63.1.2. Hadoop
    • 63.1.3. Hive
    • 63.1.4. 啟動 Hive
    • 63.1.5. 訪問 Hive
  • 63.2. 管理 Hive
    • 63.2.5.1. HDFS 目錄遷移
    • 63.2.5.2. 匯出表資料到本地檔案
    • 63.2.5.3.
    • 63.2.4.1. 從文字檔案匯入資料
    • 63.2.4.2. 從其他表查詢資料並建立新表
    • 63.2.4.3. 從其他表查詢資料然後插入指定表中
    • 63.2.4.4.
    • 63.2.3.1. 建立檢視
    • 63.2.3.2. 刪除檢視
    • 63.2.2.1. 建立分割槽表
    • 63.2.2.2. 顯示分割槽情況
    • 63.2.2.3. 增加分割槽
    • 63.2.2.4. 向分割槽表匯入資料
    • 63.2.1.1. 建立表
    • 63.2.1.2. 顯示錶
    • 63.2.1.3. 刪除表
    • 63.2.1.4. 查看錶結構
    • 63.2.1.5. 為表增加欄位
    • 63.2.1.6. 修改表名稱
    • 63.2.1.7. 使用已有表結構建立新表
    • 63.2.1. 表管理
    • 63.2.2. 分割槽表
    • 63.2.3. 檢視管理
    • 63.2.4. 資料管理
    • 63.2.5. HDFS與本地檔案系統管理
  • 63.3. HiveQL - Hive查詢語言
    • 63.3.1. JOIN 連線查詢
    • 63.3.2. 子查詢

Hive是基於Hadoop構建的一套資料倉庫分析系統,它提供了豐富的SQL查詢方式來分析儲存在Hadoop 分散式檔案系統中的資料。其在Hadoop的架構體系中承擔了一個SQL解析的過程,它提供了對外的入口來獲取使用者的指令然後對指令進行分析,解析出一個MapReduce程式組成可執行計劃,並按照該計劃生成對應的MapReduce任務提交給Hadoop叢集處理,獲取最終的結果。

63.1. 安裝 Apache Hive

安裝 Apache Hive 需要 Hadoop和MySQL,這裡假設你已經懂得如何安裝Hadoop和MySQL,所以一下將採用Netkiller OSCM一件安裝指令碼來初始化Hadoop和MySQL,如果需要詳細的安裝步驟請參考筆者的相關文章。

63.1.1. MySQL

預設情況下, Hive 使用內嵌的 Derby 資料庫儲存元資料, 通常生產環境會使用 MySQL 來存放 Hive 元資料。

使用下面指令碼一鍵安裝MySQL 5.7 安裝後會顯示mysql的初始密碼,是所有初始密碼登陸後修改為你的需要密碼

			curl -s https://raw.githubusercontent.com/oscm/shell/master/database/mysql/5.7/mysql.server.sh | bash

2016-02-16T08:22:58.253030Z 1 [Note] A temporary password is generated for root@localhost: sd%%my.Ak7Ma			

安裝 MySQL JDBC 連線庫。

			curl -s https://raw.githubusercontent.com/oscm/shell/master/database/mysql/5.7/mysql-connector-java.sh | bash			

建立一個 hive 資料庫用來儲存 Hive 元資料,且資料庫訪問的使用者名稱和密碼都為 hive。

			mysql> CREATE DATABASE hive; 
Query OK, 1 row affected (0.03 sec)			

建立使用者hive並授權訪問hive資料庫

			mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
Query OK, 0 rows affected (0.04 sec)

mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye			

63.1.2. Hadoop

安裝 Hadoop 採用單機模式

			curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/hadoop-2.8.0.sh | bash
curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/single.sh | bash
curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/startup.sh | bash 

63.1.3. Hive

可以從 Apache 映象站點中下載最新穩定版的 apache-hive-2.1.1-bin.tar.gz

			cd /usr/local/src
wget http://mirrors.hust.edu.cn/apache/hive/stable-2/apache-hive-2.1.1-bin.tar.gz

tar zxf apache-hive-2.1.1-bin.tar.gz
mv apache-hive-2.1.1-bin /srv/apache-hive-2.1.1
ln -s /srv/apache-hive-2.1.1/ /srv/apache-hive
chown hadoop:hadoop -R /srv/apache-hive-2.1.1			
			cat > /srv/apache-hive/conf/hive-env.sh <<'EOF'
export JAVA_HOME=/srv/java
export HADOOP_HOME=/srv/apache-hadoop
export HBASE_HOME=/srv/apache-hbase
export HIVE_HOME=/srv/apache-hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
EOF

cat >> ~/.bash_profile <<'EOF'
export JAVA_HOME=/srv/java
export HADOOP_HOME=/srv/apache-hadoop
export HBASE_HOME=/srv/apache-hbase
export HIVE_HOME=/srv/apache-hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
EOF

source ~/.bash_profile			

安裝JDBC驅動

			[root@localhost apache-hive]# ln -s  /usr/share/java/mysql-connector-java.jar /srv/apache-hive/lib/
[root@localhost apache-hive]# ll /srv/apache-hive/lib/mysql-connector-java.jar 
lrwxrwxrwx 1 root root 40 Jun 29 01:59 /srv/apache-hive/lib/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar			

修改 hive-site.xml 配置檔案,配置工作目錄

  <property>
    <name>hive.querylog.location</name>
    <value>/tmp/live/hadoop</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/hive/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

  <property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log</value>
    <description>Location of Hive run time structured log file</description>
  </property>			

把預設的 Derby 修改為 MySQL 需要在該檔案中配置 MySQL 資料庫連線資訊。

			<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>APP</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>mine</value>
    <description>password to use against metastore database</description>
  </property>			

將上面配置項 value 改為下面的配置

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
  </property>			

63.1.4. 啟動 Hive

啟動 Hive 前你必須做兩件事,一是建立HDFS目錄,二是初始化 MySQL 資料庫。

為 Hive 建立 HDFS 工作目錄並給它們賦相應的許可權。

			[root@localhost ~]$ su - hadoop
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -mkdir -p /user/hive/warehouse
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -mkdir -p /tmp/hive
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -chmod g+w /user/hive/warehouse
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -chmod 777 /tmp/hive			

初始化 MySQL 資料庫

			[hadoop@localhost ~]$ /srv/apache-hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed			

63.1.5. 訪問 Hive

啟動 Hadoop

			[hadoop@localhost ~]$ /srv/apache-hadoop/sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
starting yarn daemons
starting resourcemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-nodemanager-localhost.localdomain.out			

進入 Hive 然後輸入 show databases; 測試安裝是否正常。

			[hadoop@localhost conf]$ /srv/apache-hive/bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/srv/apache-hive-2.1.1/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default
Time taken: 0.264 seconds, Fetched: 1 row(s)
hive>			

至此 Apache Hive 已經安裝配置完成!