Hive 系列（三）—— Hive CLI 和 Beeline 命令列的基本使用

阿新 • • 發佈：2019-12-31

一、Hive CLI

1.1 Help

使用 hive -H 或者 hive --help 命令可以檢視所有命令的幫助，顯示如下：

usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive 
                                  commands. e.g. -d A=B or --define A=B  --定義使用者自定義變數
    --database <databasename>     Specify the database to use  -- 指定使用的資料庫
 -e 
 <quoted-query-string>         SQL from command line   -- 執行指定的 SQL
 -f <filename>                    SQL from files   --執行 SQL 指令碼
 -H,--help                        Print help information  -- 列印幫助資訊
    --hiveconf <property=value>   Use value for given property    --自定義配置
    --hivevar <key=value>         Variable subsitution to apply to hive  --自定義變數
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file  --在進入互動模式之前執行初始化指令碼
 -S,--silent                      Silent mode in 
 interactive shell    --靜默模式
 -v,--verbose                     Verbose mode (echo executed SQL to the  console)  --詳細模式
複製程式碼

1.2 互動式命令列

直接使用 Hive 命令，不加任何引數，即可進入互動式命令列。

1.3 執行SQL命令

在不進入互動式命令列的情況下，可以使用 hive -e 執行 SQL 命令。

hive -e 'select * from emp';
複製程式碼

1.4 執行SQL指令碼

用於執行的 sql 指令碼可以在本地檔案系統，也可以在 HDFS 上。

# 本地檔案系統
hive -f /usr/file/simple.sql;

# HDFS檔案系統
hive -f hdfs://hadoop001:8020/tmp/simple.sql;
複製程式碼

其中 simple.sql 內容如下：

select * from emp;
複製程式碼

1.5 配置Hive變數

可以使用 --hiveconf 設定 Hive 執行時的變數。

hive -e 'select * from emp' \
--hiveconf hive.exec.scratchdir=/tmp/hive_scratch  \
--hiveconf mapred.reduce.tasks=4;
複製程式碼

hive.exec.scratchdir：指定 HDFS 上目錄位置，用於儲存不同 map/reduce 階段的執行計劃和這些階段的中間輸出結果。

1.6 配置檔案啟動

使用 -i 可以在進入互動模式之前執行初始化指令碼，相當於指定配置檔案啟動。

hive -i /usr/file/hive-init.conf;
複製程式碼

其中 hive-init.conf 的內容如下：

set hive.exec.mode.local.auto = true;
複製程式碼

hive.exec.mode.local.auto 預設值為 false，這裡設定為 true ，代表開啟本地模式。

1.7 使用者自定義變數

--define <key=value> 和 --hivevar <key=value> 在功能上是等價的，都是用來實現自定義變數，這裡給出一個示例:

定義變數：

hive  --define  n=ename --hiveconf  --hivevar j=job;
複製程式碼

在查詢中引用自定義變數：

# 以下兩條語句等價
hive > select ${n} from emp;
hive >  select ${hivevar:n} from emp;

# 以下兩條語句等價
hive > select ${j} from emp;
hive >  select ${hivevar:j} from emp;
複製程式碼

結果如下：

二、Beeline

2.1 HiveServer2

Hive 內建了 HiveServer 和 HiveServer2 服務，兩者都允許客戶端使用多種程式語言進行連線，但是 HiveServer 不能處理多個客戶端的併發請求，所以產生了 HiveServer2。

HiveServer2（HS2）允許遠端客戶端可以使用各種程式語言向 Hive 提交請求並檢索結果，支援多客戶端併發訪問和身份驗證。HS2 是由多個服務組成的單個程式，其包括基於 Thrift 的 Hive 服務（TCP 或 HTTP）和用於 Web UI 的 Jetty Web 伺服器。

HiveServer2 擁有自己的 CLI(Beeline)，Beeline 是一個基於 SQLLine 的 JDBC 客戶端。由於 HiveServer2 是 Hive 開發維護的重點 (Hive0.15 後就不再支援 hiveserver)，所以 Hive CLI 已經不推薦使用了，官方更加推薦使用 Beeline。

2.1 Beeline

Beeline 擁有更多可使用引數，可以使用 beeline --help 檢視，完整引數如下：

Usage: java org.apache.hive.cli.beeline.BeeLine
   -u <database url>               the JDBC URL to connect to
   -r                              reconnect to last saved connect url (in conjunction with !save)
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file>  the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url,driver,user,password) from
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --convertBinaryArrayToString=[true/false]    display binary column data as string or as byte array
   --help                          display this message

複製程式碼

2.3 常用引數

在 Hive CLI 中支援的引數，Beeline 都支援，常用的引數如下。更多引數說明可以參見官方檔案 Beeline Command Options

引數	說明
-u <database URL>	資料庫地址
-n <username>	使用者名稱
-p <password>	密碼
-d <driver class>	驅動 (可選)
-e <query>	執行 SQL 命令
-f <file>	執行 SQL 指令碼
-i (or)--init <file or files>	在進入互動模式之前執行初始化指令碼
--property-file <file>	指定配置檔案
--hiveconf property=value	指定配置屬性
--hivevar name=value	使用者自定義屬性，在會話級別有效

示例：使用使用者名稱和密碼連線 Hive

$ beeline -u jdbc:hive2://localhost:10000  -n username -p password 
複製程式碼

三、Hive配置

可以通過三種方式對 Hive 的相關屬性進行配置，分別介紹如下：

3.1 配置檔案

方式一為使用配置檔案，使用配置檔案指定的配置是永久有效的。Hive 有以下三個可選的配置檔案：

hive-site.xml ：Hive 的主要配置檔案；
hivemetastore-site.xml：關於元資料的配置；
hiveserver2-site.xml：關於 HiveServer2 的配置。

示例如下,在 hive-site.xml 配置 hive.exec.scratchdir：

 <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/mydir</value>
    <description>Scratch space for Hive jobs</description>
  </property>
複製程式碼

3.2 hiveconf

方式二為在啟動命令列 (Hive CLI / Beeline) 的時候使用 --hiveconf 指定配置，這種方式指定的配置作用於整個 Session。

hive --hiveconf hive.exec.scratchdir=/tmp/mydir
複製程式碼

3.3 set

方式三為在互動式環境下 (Hive CLI / Beeline)，使用 set 命令指定。這種設定的作用範圍也是 Session 級別的，配置對於執行該命令後的所有命令生效。set 兼具設定引數和檢視引數的功能。如下：

0: jdbc:hive2://hadoop001:10000> set hive.exec.scratchdir=/tmp/mydir;
No rows affected (0.025 seconds)
0: jdbc:hive2://hadoop001:10000> set hive.exec.scratchdir;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.exec.scratchdir=/tmp/mydir  |
+----------------------------------+--+
複製程式碼

3.4 配置優先順序

配置的優先順序如下 (由低到高)：
hive-site.xml - >hivemetastore-site.xml- > hiveserver2-site.xml - >-- hiveconf- > set

3.5 配置引數

Hive 可選的配置引數非常多，在用到時查閱官方檔案即可AdminManual Configuration

參考資料

更多大資料系列文章可以參見 GitHub 開源專案： 大資料入門指南

Hive 系列（三）—— Hive CLI 和 Beeline 命令列的基本使用

一、Hive CLI

1.1 Help

1.2 互動式命令列

1.3 執行SQL命令

1.4 執行SQL指令碼

1.5 配置Hive變數

1.6 配置檔案啟動

1.7 使用者自定義變數

二、Beeline

2.1 HiveServer2

2.1 Beeline

2.3 常用引數

三、Hive配置

3.1 配置檔案

3.2 hiveconf

3.3 set

3.4 配置優先順序

3.5 配置引數

參考資料

Hive 系列（三）—— Hive CLI 和 Beeline 命令列的基本使用

Hive 系列（六）—— Hive 檢視和索引

Hive 系列（五）—— Hive 分割槽表和分桶表

Hive 系列（七）—— Hive 常用 DML 操作

Hive 系列（八）—— Hive 資料查詢詳解

Hive 系列（四）—— Hive 常用 DDL 操作

我與Hive的不解之謎系列（三）：Hive的分割槽表和分桶表及SQL知識

Hive 系列（二）—— Linux 環境下 Hive 的安裝部署

.NET非同步和多執行緒系列（三）- Task和Parallel

Hive系列（二）安裝部署

Flink實戰（八十四）：flink-sql使用（十一）Flink 與 hive 結合使用（三）Hive Dialect

Hive系列（五）深入理解

徒手從零實現 uTools 系列（三）- 螢幕取色和截圖

Exchange 2013和2019共存部署實驗系列（三）Exchange2013客戶端訪問服務安裝

Hive Tunning（三）最佳實踐

Flink 系列（三）—— Flink Data Source

Hadoop 系列（三）—— 分散式計算框架 MapReduce

Spark 系列（三）—— 彈性式資料集RDDs

Storm 系列（三）—— Storm 單機版本環境搭建

Kafka 系列（三）—— Kafka 生產者詳解

Hive 系列（三）—— Hive CLI 和 Beeline 命令列的基本使用

一、Hive CLI

1.1 Help

1.2 互動式命令列

1.3 執行SQL命令

1.4 執行SQL指令碼

1.5 配置Hive變數

1.6 配置檔案啟動

1.7 使用者自定義變數

二、Beeline

2.1 HiveServer2

2.1 Beeline

2.3 常用引數

三、Hive配置

3.1 配置檔案

3.2 hiveconf

3.3 set

3.4 配置優先順序

3.5 配置引數

參考資料

相關推薦