1.Hive基本概念

2.Hive架構

3.Hive安裝部署

4.Hive使用方式

1.Hive基本概念

1.1 Hive簡介

1.1.1 什麼是Hive

Hive是基於Hadoop的一個數據倉庫工具，可以將結構化的資料檔案對映為一張資料庫表，並提供類SQL查詢功能。

1.1.2 為什麼使用Hive

直接使用hadoop所面臨的問題

人員學習成本太高

專案週期要求太短

MapReduce實現複雜查詢邏輯開發難度太大

為什麼要使用Hive

操作介面採用類SQL語法，提供快速開發的能力。

避免了去寫MapReduce，減少開發人員的學習成本。

擴充套件功能很方便。

1.1.3 Hive特點

可擴充套件

Hive可以自由的擴充套件叢集的規模，一般情況下不需要重啟服務。

延展性

Hive支援使用者自定義函式，使用者可以根據自己的需求來實現自己的函式。

容錯

良好的容錯性，節點出現問題SQL仍可完成執行。

2.Hive架構

2.1 架構圖

Jobtracker是hadoop1.x中的元件，它的功能相當於： Resourcemanager+AppMaster

TaskTracker 相當於： Nodemanager + yarnchild

2.2 基本組成

使用者介面：包括 CLI、JDBC/ODBC、WebGUI。

元資料儲存：通常是儲存在關係資料庫如 mysql , derby中。
直譯器、編譯器、優化器、執行器。

2.3 各元件的基本功能

使用者介面主要由三個：CLI、JDBC/ODBC和WebGUI。其中，CLI為shell命令列；JDBC/ODBC是Hive的JAVA實現，與傳統資料庫JDBC類似；WebGUI是通過瀏覽器訪問Hive。
名字，表的列和分割槽及其屬性，表的屬性（是否為外部表等），表的資料所在目錄等。
直譯器、編譯器、優化器完成 HQL 查詢語句從詞法分析、語法分析、編譯、優化以及查詢計劃的生成。生成的查詢計劃儲存在 HDFS 中，並在隨後有 MapReduce 呼叫執行。元資料儲存：Hive 將元資料儲存在資料庫中。Hive 中的元資料包括表的

2.4 Hive與Hadoop的關係

Hive利用HDFS儲存資料，利用MapReduce查詢資料

2.5 Hive與傳統資料庫的對比

總結：hive具有sql資料庫的外表，但應用場景完全不同，hive只適合用來做批量資料統計分析

2.6 Hive的資料儲存

1、Hive中所有的資料都儲存在 HDFS 中，沒有專門的資料儲存格式（可支援Text，SequenceFile，ParquetFile，RCFILE等）

2、只需要在建立表的時候告訴 Hive 資料中的列分隔符和行分隔符，Hive 就可以解析資料。

3、Hive 中包含以下資料模型：DB、Table，External Table，Partition，Bucket。

db：在hdfs中表現為${hive.metastore.warehouse.dir}目錄下一個資料夾
table：在hdfs中表現所屬db目錄下一個資料夾
external table：與table類似，不過其資料存放位置可以在任意指定路徑
partition：在hdfs中表現為table目錄下的子目錄
bucket：在hdfs中表現為同一個表目錄下根據hash雜湊之後的多個檔案

3.Hive安裝部署

補充：先將hadoop叢集的機器配置時間同步

yum install ntpdate -y ## 安裝時間同步客戶端

ntpdate 0.asia.pool.ntp.org ## 與網際網路時間伺服器同步

若上面的時間伺服器不可用，也可以選擇以下伺服器同步時間

time.nist.gov

time.nuri.net

0.asia.pool.ntp.org

1.asia.pool.ntp.org

2.asia.pool.ntp.org

3.asia.pool.ntp.org

3.1 安裝

單機版：

元資料庫mysql版：

3.1.1 先安裝Mysql

3.1.2 安裝Hive

上傳tar包
解壓 tar -zxvf hive-0.9.0.tar.gz -C /cloud/
配置Hive

hive的元資料配置 vi hive-site.xml

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
</configuration>

上傳一個mysql的驅動jar包到hive的安裝目錄的lib中

配置HADOOP_HOME 和HIVE_HOME到系統環境變數中：/etc/profile

source /etc/profile

hive啟動測試

4.Hive使用方式

4.1 最基本使用方式

啟動一個hive互動shell

bin/hive

hive>

設定一些基本引數，讓hive使用起來更便捷，比如：

1.讓提示符顯示當前庫：

hive>set hive.cli.print.current.db=true;

2.顯示查詢結果時顯示欄位名稱：

hive>set hive.cli.print.header=true;

但是這樣設定只對當前會話有效，重啟hive會話後就失效，解決辦法：

在linux的當前使用者目錄中，編輯一個.hiverc檔案，將引數寫入其中：

vi .hiverc

set hive.cli.print.header=true;

set hive.cli.print.current.db=true;

4.2 啟動Hive服務使用

啟動hive的服務：

[[email protected] hive-1.2.1]# bin/hiveserver2

上述啟動，會將這個服務啟動在前臺，如果要啟動在後臺，則命令如下：

不把日誌記錄在伺服器磁碟：nohup bin/hiveserver2 1>/dev/null 2>&1 &

記錄日誌到伺服器磁碟：nohup bin/hiveserver2 1>/var/log/hiveserver.log 2>/var/log/hiveserver.err &

啟動成功後，可以在別的節點上用beeline去連線

方式（1）

[[email protected] hive-1.2.1]# bin/beeline 回車，進入beeline的命令介面

輸入命令連線hiveserver2

beeline> !connect jdbc:hive2://mini1:10000

（hadoop01是hiveserver2所啟動的那臺主機名，埠預設是10000）

方式（2）

啟動時直接連線：

bin/beeline -u jdbc:hive2://mini1:10000 -n root

接下來就可以做正常sql查詢了

4.3 指令碼化執行

大量的hive查詢任務，如果用互動式shell來進行輸入的話，顯然效率及其低下，因此，生產中更多的是使用指令碼化執行機制：

該機制的核心點是：hive可以用一次性命令的方式來執行給定的hql語句

[[email protected] ~]# hive -e "insert into table t_dest select * from t_src;"

然後，進一步，可以將上述命令寫入shell指令碼中，以便於指令碼化執行hive任務，並控制、排程眾多hive任務，示例如下：

vi t_order_etl.sh

#!/bin/bash

hive -e "select * from db_order.t_order"

hive -e "select * from default.t_user"

hql="create table default.t_bash as select * from db_order.t_order"

hive -e "$hql"

如果要執行的hql語句特別複雜，那麼，可以把hql語句寫入一個檔案：

vi x.hql

select * from db_order.t_order;

select count(1) from db_order.t_user;

然後，用hive -f /root/x.hql 來執行

5.Hive建庫建表與資料匯入

5.1 建庫

hive中有一個預設的庫：

庫名： default 庫目錄：hdfs://hdp20-01:9000/user/hive/warehouse

新建庫：create database db_order;

庫建好後，在hdfs中會生成一個庫目錄： hdfs://hdp20-01:9000/user/hive/warehouse/db_order.db

5.2 建表

建表語法：

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

   [(col_name data_type [COMMENT col_comment], ...)]

   [COMMENT table_comment]

   [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

   [CLUSTERED BY (col_name, col_name, ...)

   [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

   [ROW FORMAT row_format]

   [STORED AS file_format]

   [LOCATION hdfs_path]

說明：

CREATE TABLE 建立一個指定名字的表。如果相同名字的表已經存在，則丟擲異常；使用者可以用 IF NOT EXISTS 選項來忽略這個異常。
EXTERNAL關鍵字可以讓使用者建立一個外部表，在建表的同時指定一個指向實際資料的路徑（LOCATION），Hive 建立內部表時，會將資料移動到資料倉庫指向的路徑；若建立外部表，僅記錄資料所在的路徑，不對資料的位置做任何改變。在刪除表的時候，內部表的元資料和資料會被一起刪除，而外部表只刪除元資料，不刪除資料。
LIKE 允許使用者複製現有的表結構，但是不復制資料。
ROW FORMAT

DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]

[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

使用者在建表的時候可以自定義 SerDe 或者使用自帶的 SerDe。如果沒有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED，將會使用自帶的 SerDe。在建表的時候，使用者還需要為表指定列，使用者在指定表的列的同時也會指定自定義的 SerDe，Hive通過 SerDe 確定表的具體的列的資料。

5.STORED AS

SEQUENCEFILE|TEXTFILE|RCFILE

如果檔案資料是純文字，可以使用 STORED AS TEXTFILE。如果資料需要壓縮，使用 STORED AS SEQUENCEFILE。

6、CLUSTERED BY

對於每一個表（table）或者分割槽， Hive可以進一步組織成桶，也就是說桶是更為細粒度的資料範圍劃分。Hive也是針對某一列進行桶的組織。Hive採用對列值雜湊，然後除以桶的個數求餘的方式決定該條記錄存放在哪個桶當中。

把表（或者分割槽）組織成桶（Bucket）有兩個理由：

（1）獲得更高的查詢處理效率。桶為表加上了額外的結構，Hive 在處理有些查詢時能利用這個結構。具體而言，連線兩個在（包含連線列的）相同列上劃分了桶的表，可以使用 Map 端連線（Map-side join）高效的實現。比如JOIN操作。對於JOIN操作兩個表有一個相同的列，如果對這兩個表都進行了桶操作。那麼將儲存相同列值的桶進行JOIN操作就可以，可以大大較少JOIN的資料量。

（2）使取樣（sampling）更高效。在處理大規模資料集時，在開發和修改查詢的階段，如果能在資料集的一小部分資料上試執行查詢，會帶來很多方便。

5.2.1 基本建表語句

use db_order;

create table t_order(id string,create_time string,amount float,uid string);

表建好後，會在所屬的庫目錄中生成一個表目錄

/user/hive/warehouse/db_order.db/t_order

只是，這樣建表的話，hive會認為表資料檔案中的欄位分隔符為 ^A（\001）

正確的建表語句為：

create table t_order(id string,create_time string,amount float,uid string)

row format delimited

fields terminated by ',';

這樣就指定了，我們的表資料檔案中的欄位分隔符為 ","

注意：hive是不會檢查使用者匯入表中的資料的！如果資料的格式跟表定義的格式不一致，hive也不會做任何處理（能解析就解析，解析不了就是null）；

5.2.2 刪除表

drop table t_order;

刪除表的效果是：

hive會從元資料庫中清除關於這個表的資訊；

hive還會從hdfs中刪除這個表的表目錄；

5.2.3 內部表與外部表

內部表(MANAGED_TABLE)：表目錄按照hive的規範來部署，位於hive的倉庫目錄/user/hive/warehouse中

外部表(EXTERNAL_TABLE)：表目錄由建表使用者自己指定

create external table t_access(ip string,url string,access_time string)

row format delimited

fields terminated by ','

location '/access/log';

外部表和內部表的特性差別：

內部表的目錄在hive的倉庫目錄中 VS 外部表的目錄由使用者指定
drop一個內部表時：hive會清除相關元資料，並刪除表資料目錄
drop一個外部表時：hive只會清除相關元資料；

一個hive的資料倉庫，最底層的表，一定是來自於外部系統，為了不影響外部系統的工作邏輯，在hive中可建external表來對映這些外部系統產生的資料目錄；

然後，後續的etl操作，產生的各種中間表建議用managed_table（內部表）

5.2.4 分割槽表

分割槽表的實質是：在表目錄中為資料檔案建立分割槽子目錄，以便於在查詢時，MR程式可以針對指定的分割槽子目錄中的資料進行處理，縮減讀取資料的範圍，提高效率！

比如，網站每天產生的瀏覽記錄，瀏覽記錄應該建一個表來存放，但是，有時候，我們可能只需要對某一天的瀏覽記錄進行分析

這時，就可以將這個表建為分割槽表，每天的資料匯入其中的一個分割槽；

當然，每日的分割槽目錄，應該有一個目錄名（分割槽欄位）

5.2.4.1 一個分割槽欄位的例項

示例如下：

建立帶分割槽的表

create table t_access(ip string,url string,access_time string)

partitioned by(dt string)

row format delimited

fields terminated by ',';

注意：分割槽欄位不能是表定義中的已存在欄位

2.向分割槽中匯入資料

load data local inpath '/root/access.log.2017-08-04.log' into table t_access partition(dt='20170804');

load data local inpath '/root/access.log.2017-08-05.log' into table t_access partition(dt='20170805');

3.針對分割槽資料進行查詢

a、統計8月4號的總PV：

select count(*) from t_access where dt='20170804';

實質：就是將分割槽欄位當成表字段來用，就可以使用where子句指定分割槽了

b、統計表中所有資料總的PV：

select count(*) from t_access;

實質：不指定分割槽條件即可

5.2.4.2 多個分割槽欄位示例

建表：

create table t_partition(id int,name string,age int)

partitioned by(department string,sex string,howold int)

row format delimited fields terminated by ',';

導資料：

load data local inpath '/root/p1.dat' into table t_partition partition(department='xiangsheng',sex='male',howold=20);

5.2.5 CTAS建表語法

可以通過已存在表來建表：

1、create table t_user_2 like t_user;

新建的t_user_2表結構定義與源表t_user一致，但是沒有資料

2.在建表的同時插入資料

create table t_access_user

as

select ip,url from t_access;

t_access_user會根據select查詢的欄位來建表，同時將查詢的結果插入新表中！

補充：將查詢出來的資料儲存到一張表中：

方式1：create t_x as select .......

方式2：

如果事先存在一張表t_x

可以將select查詢出來的結果資料insert到這張已存在的表中;

insert into t_x select .......

5.3 資料匯入匯出

5.3.1 將資料檔案匯入hive的表

方式1：匯入資料的一種方式：

手動用hdfs命令，將檔案放入表目錄；

方式2：在hive的互動式shell中用hive命令來匯入本地資料到表目錄

hive>load data local inpath '/root/order.data.2' into table t_order;

方式3：用hive命令匯入hdfs中的資料檔案到表目錄

hive>load data inpath '/access.log.2017-08-06.log' into table t_access;

注意：導本地檔案和導HDFS檔案的區別：

本地檔案匯入表：複製

hdfs檔案匯入表：移動

方式4：如果目標表是一個分割槽表

hive> load data [local] inpath ‘......’ into table t_dest partition(p=’value’);

5.3.2 將hive表中的資料匯出到指定路徑的檔案

將hive表中的資料匯入HDFS的檔案

insert overwrite directory '/root/access-data'

row format delimited fields terminated by ','

select * from t_access;

2.將hive表中的資料匯入本地磁碟檔案

insert overwrite local directory '/root/access-data'

row format delimited fields terminated by ','

select * from t_access limit 100000;

5.3.3 hive檔案格式

HIVE支援很多種檔案格式： SEQUENCE FILE | TEXT FILE | PARQUET FILE | RC FILE

create table t_text(movie string,rate int) stored as textfile;

create table t_seq(movie string,rate int) stored as sequencefile;

create table t_pq(movie string,rate int) stored as parquetfile;

演示：

1、先建一個儲存文字檔案的表
create table t_access_text(ip string,url string,access_time string)

row format delimited fields terminated by ','

stored as textfile;

匯入文字資料到表中：

load data local inpath '/root/access-data/000000_0' into table t_access_text;

2.建一個儲存sequence file檔案的表：

create table t_access_seq(ip string,url string,access_time string)

stored as sequencefile;

從文字表中查詢資料插入sequencefile表中，生成資料檔案就是sequencefile格式的了：

insert into t_access_seq

select * from t_access_text;

3.建一個儲存parquet file檔案的表：

create table t_access_parq(ip string,url string,access_time string)

stored as parquetfile;

5.4 資料型別

5.4.1 數字型別

TINYINT (1位元組整數)

SMALLINT (2位元組整數)

INT/INTEGER (4位元組整數)

BIGINT (8位元組整數)

FLOAT (4位元組浮點數)

DOUBLE (8位元組雙精度浮點數)

示例：

create table t_test(a string ,b int,c bigint,d float,e double,f tinyint,g smallint)

5.4.2 時間型別

TIMESTAMP (時間戳) (包含年月日時分秒毫秒的一種封裝)

DATE (日期)（只包含年月日）

示例，假如有以下資料檔案：

1,zhangsan,1985-06-31

2,lisi,1986-07-10

3,wangwu,1985-08-09

那麼，就可以建一個表來對資料進行對映

create table t_customer(id int,name string,birthday date)

row format delimited fields terminated by ',';

然後匯入資料

load data local inpath '/root/customer.dat' into table t_customer;

然後，就可以正確查詢

5.4.3 字串型別

STRING

VARCHAR(20) (字串1-65535長度，超長截斷)

CHAR (字串，最大長度255)

5.4.4 其他型別

BOOLEAN（布林型別）：true false

~~BINARY (二進位制)：~~

舉例：

1,zs,28,true

2,ls,30,false

3,ww,32,false

4,lulu,18,true

create table t_p(id int,name string,age int,is_married boolean)

select

from t_p where is_married;

5.4.5 複合(集合)型別

5.4.5.1 array陣列型別

arrays: ARRAY<data_type> )

示例：array型別的應用

假如有如下資料需要用hive的表去對映：

戰狼2,吳京:吳剛:余男,2017-08-16

三生三世十里桃花,劉亦菲:癢癢,2017-08-20

羞羞的鐵拳,沈騰:瑪麗:艾倫,2017-12-20

設想：如果主演資訊用一個數組來對映比較方便

建表：

create table t_movie(moive_name string,actors array<string>,first_show date)

row format delimited fields terminated by ','

collection items terminated by ':';

匯入資料：

load data local inpath '/root/movie.dat' into table t_movie;

查詢：

select * from t_movie;

select moive_name,actors[0] from t_movie;

select moive_name,actors from t_movie where array_contains(actors,'吳剛');

select moive_name,size(actors) from t_movie;

5.4.5.2 map型別

maps: MAP<primitive_type, data_type>

假如有以下資料：

1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28

2,lisi,father:mayun#mother:huangyi#brother:guanyu,22

3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29

4,mayun,father:mayongzhen#mother:angelababy,26

可以用一個map型別來對上述資料中的家庭成員進行描述

建表語句：

create table t_person(id int,name string,family_members map<string,string>,age int)

row format delimited fields terminated by ','

collection items terminated by '#'

map keys terminated by ':';

查詢

select * from t_person;

## 取map欄位的指定key的值

select id,name,family_members['father'] as father from t_person;

## 取map欄位的所有key

select id,name,map_keys(family_members) as relation from t_person;

## 取map欄位的所有value

select id,name,map_values(family_members) from t_person;

select id,name,map_values(family_members)[0] from t_person;

## 綜合：查詢有brother的使用者資訊

方式1：

select id,name,father

from

(select id,name,family_members['brother'] as brother from t_person) tmp

where brother is not null;

方式2：

select * from t_map where array_contains(map_keys(family),'sister');

5.4.5.3 struct型別

struct: STRUCT<col_name : data_type, ...>

假如有如下資料：

1,zhangsan,18:male:beijing

2,lisi,28:female:shanghai

其中的使用者資訊包含：年齡：整數，性別：字串，地址：字串

設想用一個欄位來描述整個使用者資訊，可以採用struct

2.建表：

create table t_person_struct(id int,name string,info struct<age:int,sex:string,addr:string>)

row format delimited fields terminated by ','

collection items terminated by ':';

3.查詢

select * from t_person_struct;

select id,name,info.age from t_person_struct;

5.5 修改表定義

僅修改Hive元資料，不會觸動表中的資料，使用者需要確定實際的資料佈局符合元資料的定義。

修改表名：

ALTER TABLE table_name RENAME TO new_table_name

示例：alter table t_1 rename to t_x;

修改分割槽名：

alter table t_partition partition(department='xiangsheng',sex='male',howold=20) rename to partition(department='1',sex='1',howold=20);

新增分割槽：

alter table t_partition add partition (department='2',sex='0',howold=40);

刪除分割槽：

alter table t_partition drop partition (department='2',sex='2',howold=24);

修改表的檔案格式定義：

ALTER TABLE table_name [PARTITION partitionSpec] SET FILEFORMAT file_format

修改表的某個檔案格式定義：

alter table t_partition partition(department='2',sex='0',howold=40 ) set fileformat sequencefile;

修改列名定義：

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENTcol_comment] [FIRST|(AFTER column_name)]

alter table t_user change price jiage float first;

增加/替換列：

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type[COMMENT col_comment], ...)

alter table t_user add columns (sex string,addr string);

alter table t_user replace columns (id string,age int,price float);

6.hive查詢語法

sql是一門面向集合的程式語言；

select 1;

提示：在做小資料量查詢測試時，可以讓hive將mrjob提交給本地執行器執行，可以在hive會話中設定如下引數：

hive> set hive.exec.mode.local.auto=true;

6.1 基本查詢示例

select * from t_access;

select count(*) from t_access;

select max(ip) from t_access;

6.2 條件查詢

select * from t_access where access_time<'2017-08-06 15:30:20'

select * from t_access where access_time<'2017-08-06 16:30:20' and ip>'192.168.33.3';

6.3 join關聯查詢示例

假如有a.txt檔案

a,1

b,2

c,3

d,4

假如有b.txt檔案

a,xx

b,yy

d,zz

e,pp

進行各種join查詢：

inner join（join）

select

a.name as aname,

a.numb as anumb,

b.name as bname,

b.nick as bnick

from t_a a

join t_b b

on a.name=b.name

結果：

+--------+--------+--------+--------+--+

+--------+--------+--------+--------+--+

| a | 1 | a | xx |

| b | 2 | b | yy |

| d | 4 | d | zz |

+--------+--------+--------+--------+--+

2.left outer join（left join）

select

a.name as aname,

a.numb as anumb,

b.name as bname,

b.nick as bnick

from t_a a

left outer join t_b b

on a.name=b.name

結果：

3.right outer join（right join）

select

a.name as aname,

a.numb as anumb,

b.name as bname,

b.nick as bnick

from t_a a

right outer join t_b b

on a.name=b.name

結果：

4.full outer join（full join）

select

a.name as aname,

a.numb as anumb,

b.name as bname,

b.nick as bnick

from t_a a

full join t_b b

on a.name=b.name;

結果：

6.4 left semi join

Left semi join ：相當於join連線兩個表後產生的資料中的左半部分

hive中不支援exist/IN子查詢，可以用left semi join來實現同樣的效果：

select

a.name as aname,

a.numb as anumb

from t_a a

left semi join t_b b

on a.name=b.name;

結果：

注意： left semi join的 select子句中，不能有右表的欄位

6.5 group by 分組聚合

20170804,192.168.33.66,http://www.ed.cn/job

20180804,192.168.33.40,http://www.ed.cn/study

20180805,192.168.20.18,http://www.ed2.cn/job

20180805,192.168.20.28,http://www.ed2.cn/login

20180806,192.168.20.38,http://www.ed2.cn/job

20180806,192.168.20.38,http://www.ed2.cn/study

20180807,192.168.33.40,http://www.ed2.cn/login

20180807,192.168.20.88,http://www.ed2.cn/job

select dt,count(*),max(ip) as cnt from t_access group by dt;

select dt,count(*),max(ip) as cnt from t_access group by dt having dt>'20170804';

select

dt,count(*),max(ip) as cnt

from t_access

where url='http://www.ed.cn/job'

group by dt having dt>'20170804';

注意：一旦有group by子句，那麼，在select子句中就不能有（分組欄位，聚合函式）以外的欄位

## 為什麼where必須寫在group by的前面，為什麼group by後面的條件只能用having

因為，where是用於在真正執行查詢邏輯之前過濾資料用的

having是對group by聚合之後的結果進行再過濾；

上述語句的執行邏輯：

where過濾不滿足條件的資料
用聚合函式和group by進行資料運算聚合，得到聚合結果
- 用having條件過濾掉聚合結果中不滿足條件的資料

6.6 子查詢

1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28

2,lisi,father:mayun#mother:huangyi#brother:guanyu,22

3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29

4,mayun,father:mayongzhen#mother:angelababy,26

-- 查詢有兄弟的人

select id,name,brother

from

(select id,name,family_members['brother'] as brother from t_person) tmp

where brother is not null;

另一種寫法：

select id,name,family_members[‘brother’]

from t_person where array_contains(map_keys(family_members),”brother”);

7.hive函式使用

測試函式小技巧：

直接用常量來測試函式即可

select substr("abcdefg",1,3);

而且，可以將hive的本地執行自動模式開啟：

hive>set hive.exec.mode.local.auto=true;

HIVE 的所有函式手冊：

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inTable-GeneratingFunctions(UDTF)

7.1 常用內建函式

select cast("5" as int)

select cast("2017-08-03" as date) ;

select cast(current_timestamp as date);

7.1.1型別轉換函式

1	1995-05-05 13:30:59	1200.3
2	1994-04-05 13:30:59	2200
3	1996-06-01 12:20:30	80000.5

create table t_fun(id string,birthday string,salary string)

row format delimited fields terminated by ',';

select id,cast(birthday as date) as bir,cast(salary as float) from t_fun;

7.1.2 數學運算函式

select round(5.4); ## 5 四捨五入

select round(5.1345,3) ; ##5.135

select ceil(5.4) ; // select ceiling(5.4) ; ## 6 向上取整

select floor(5.4); ## 5 向下取整

select abs(-5.4) ; ## 5.4 絕對值

select greatest(id1,id2,id3) ; ## 6 單行函式

select least(3,5,6) ; ##求多個輸入引數中的最小值

示例：

有表如下：

select greatest(cast(s1 as double),cast(s2 as double),cast(s3 as double)) from t_fun2;

結果：

+---------+--+

| _c0 |

+---------+--+

| 2000.0 |

| 9800.0 |

+---------+--+

select max(age) from t_person group by ..; 分組聚合函式

select min(age) from t_person group by...; 分組聚合函式

7.1.3 字串函式

substr(string str, int start) ## 擷取子串

substring(string str, int start)

示例：select substr("abcdefg",2) ;

substr(string, int start, int len)

substring(string, int start, int len)

示例：select substr("abcdefg",2,3) ; ## bcd

concat(string A, string B...) ## 拼接字串

concat_ws(string SEP, string A, string B...)

示例：select concat("ab","xy") ; ## abxy

select concat_ws(".","192","168","33","44") ; ## 192.168.33.44

length(string A)

示例：select length("192.168.33.44"); ## 13

split(string str, string pat) ## 切分字串，返回陣列

示例：~~select split("192.168.33.44",".") ;~~ 錯誤的，因為.號是正則語法中的特定字元

select split("192.168.33.44","\\.") ;

upper(string str) ##轉大寫

lower(string str) ##轉小寫

7.1.4 時間函式

select current_timestamp(); ## 返回值型別：timestamp，獲取當前的時間戳(詳細時間資訊)

select current_date; ## 返回值型別：date，獲取當前的日期

## unix時間戳轉字串格式——from_unixtime

from_unixtime(bigint unixtime[, string format])

示例：select from_unixtime(unix_timestamp());

select from_unixtime(unix_timestamp(),"yyyy/MM/dd HH:mm:ss");

## 字串格式轉unix時間戳——unix_timestamp：返回值是一個長整數型別

## 如果不帶引數，取當前時間的秒數時間戳long--(距離格林威治時間1970-1-1 0:0:0秒的差距)

select unix_timestamp();

unix_timestamp(string date, string pattern)

示例： select unix_timestamp("2017-08-10 17:50:30");

select unix_timestamp("2017-08-10 17:50:30","yyyy-MM-dd HH:mm:ss");

## 將字串轉成日期date

select to_date("2017-09-17 16:58:32");

7.1.5 條件控制函式

7.1.5.1 IF

select id,if(age>25,'working','worked') from t_user;

select moive_name,if(array_contains(actors,'吳剛'),'好電影',’爛片兒’)

相關推薦

Hive基礎學習

目錄 1.Hive基本概念 2.Hive架構 3.Hive安裝部署 4.Hive使用方式 5.Hive建庫建表與資料匯入 6.hive查詢語法 7.hive函式使用 8 綜合查詢案例 1.Hive基本概念 1.1 Hive簡介 1.1.

Hive基礎學習——安裝Hive及簡單使用

轉載請註明出處：http://blog.csdn.net/dongdong9223/article/details/86030401 本文出自【我是幹勾魚的部落格】 Ingredients： Java：Java SE Development Kit 8u1

大數據學習（8）Hive基礎

fall nat value onf change expected role blog tab 什麽是Hive Hive是一個基於HDFS的查詢引擎。我們日常中的需求如果都自己去寫MapReduce來實現的話會很費勁的，Hive把日常用到的MapReduce功能，比如排序

hive資料庫基礎學習

which hive cd /usr/local/hive/ ls cd conf ls vi hive-site.xml 查詢mysql中的內容 ssh bigdata003 mysql -u hive -phive2017 -D hive show tables; select * from DBS \

Python基礎學習

python 基礎第一個程序：Hello Worldprint("Hello World!")本文出自 “學海無涯” 博客，請務必保留此出處http://tofgetu.blog.51cto.com/12856240/1922108Python基礎學習

Android應用基礎學習記錄

應用 ctp 例如 case 推薦都沒有變量命名規則 bytearray href 01_前言前言，了解了Android的情況。這裏也介紹一下本文。本文是記錄學習Android應用程序開發過程，視頻中使用的Android2.2版本號，我以4.2版本號為基礎，找

Python基礎學習之標識符

數字其他符號 lin port 分用 xxx rom code 1、合法的Python標識符　　Python標識符字符串規則和其他大部分用C編寫的高級語言相似：第一個字符必須是字母或下劃線（_）剩下的字符可以是字母和數字或下滑線大小寫敏感　　標識符不能以數

Java零基礎學習Java編程語言從哪兒入手？

軟件工程是計算機領域發展最快的學科分支之一，國家非常重視軟件行業的發展。對軟件工程師人才的培養給予了非常優惠的政策。在所有軟件開發類人才的需求中對Java工程師的需求達到全部需求量的60~70%。應該說Java軟件工程師就業前景是非常好的,再加上Java軟件工程師不僅IT專業企業需要，廣大的非IT企業也

Python基礎學習（四）

python 函數集合 Python 集合： set 顧明思義，就是個集合，集合的元素是唯一的，無序的。一個{ }裏面放一些元素就構成了一個集合，set裏面可以是多種數據類型（但不能是列表，集合，字典，可以是元組）它可以對列表裏面的重復元素進行去重list1 = [1,2,3,23

python基礎學習日誌day5---random模塊

+= python pre 隨機生成 int 1.0 clas Coding for python使用random生成隨機數下面是主要函數random.random()用於生成一個0到1的隨機符點數: 0 <= n < 1.0random.randint(a,

python基礎學習日誌day5---os模塊

隱藏 dirname 運維 isa 工作打印 rmdir 空值如何 python os模塊提供對操作系統進行調用的接口。 # -*- coding:utf-8 -*-__author__ = ‘shisanjun‘import osprint(os.getcwd())#

Swift基礎學習(一)基本語法

postfix 合數語言特性初學者表達式 public 能夠 dict 命名　　Swift 簡介(贊美一番，收集了多篇關於Swift 學習教程的語言簡介一直以為英語和漢語混寫的時候只是英語字母結尾的地方打空格，原來是英語字母的開頭和結尾都要加一個空格，英語字母

python基礎學習日誌day5---logging模塊

取值 ive expect wid order out 程序正常的 pen 很多程序都有記錄日誌的需求，並且日誌中包含的信息即有正常的程序訪問日誌，還可能有錯誤、警告等信息輸出，python的logging模塊提供了標準的日誌接口，你可以通過它存儲各種格式的日誌，logg

Struts2基礎學習總結

sub 代碼 resource b- content control button pragma dynamic Struts2基礎學習總結（一）---入門 Struts 2： Struts2是一個基於MVC設計模式的Web應用框架，它本質上相當於一個servlet，在MV

vue基礎學習（一）

time tle eight pla use logs new dial for 01-01 vue使用雛形　　　　 <div id="box"> {{msg}} </div> <sc

JS基礎學習2

算數運算 clas asc alt javascrip 程序表達相等 fine 1.CMAScript 運算符算數運算符遞增（++）、遞減（--） var i=15; console.log(i++);

JS基礎學習3

++ pytho cat 代碼塊 return語句控制語句其他實現選擇 1.控制語句（1）if控制語句 if-else基本格式 if (表達式){ 語句１; ...... }else{ 語句２; ..... } 功能說明如果表達式的值為true則執行語句1,

Linux基礎學習（四）

ubuntu pad 檢查 rom run 文件和目錄 mis fdisk 內存十一、系統監控 11.1 系統監視和進程控制工具 11.1.1 top 1) top命令的功能：top命令是Linux下常用的性能分析工具,能夠實時顯示系統中各個進程的資源占用狀況,類似於

seven day linux基礎學習

信號 drop 自身 .cn 磁盤分區子進程提高擴展用法監控系統狀態命令top和free top命令是可以查看系統的狀態 load average:平均負載分1分鐘，5分鐘，15分鐘例如：公司在一分鐘內為某個碼農安排了3

es6 基礎學習一 let

blog 沒有 log true 作用基本 efi 環境 fun 1.基本用法 let聲明的變量，只在let命令所在的代碼快有效： { let a = 1; var b = 1; ｝ b //1 a //referenceError: a is not