HIVE中IN的坑

阿新 • • 發佈：2018-11-07

問題：為什麼HIVE中用了 NOT IN，結果集沒了？

注：這個是原創，轉載請註明，謝謝！
直接進實驗室>>

> select * from a;
OK
1 a1
2 a2
3 a3
Time taken: 0.063 seconds, Fetched: 3 row(s)

hive> select * from b;
OK
1 b1
2 b2
NULL b3
Time taken: 0.063 seconds, Fetched: 3 row(s)

# 兩表通過id匹配，求 A-B ,用 left join實現
hive> select t1.id,t1.name,t2.name from a t1
> left join b t2 on t1.id = t2.id
> where t2.name is null
OK
3 a3 NULL
Time taken: 34.123 seconds, Fetched: 1 row(s)

# 兩表通過id匹配，求 A-B ，用 NOT IN 實現
select * from a where id not in ( select id from b );
OK
Time taken: 34.123 seconds, Fetched: 0 row(s)

這裡有詭異了，為什麼結果集沒了呢？不能啊？？

原因：

在RMDB中， t1.id IN （select t2.id from b t2 ）等價於： t1 join b t2 on t1.id = t2.id and t1.id is not null
在hive中，雖然我們的版本已經高達2.0.0，但是對於IN的處理還是就比較簡陋，沒有對null值進行遮蔽，導致凡是子查詢中有null值，條件就會變成： id in ( null) , 當然， id in ( null) 這個條件是永遠不會有結果的。

正確的用法：

# 兩表通過id匹配，求 A-B ，用 NOT IN 實現
select * from a where id not in ( select id from b where id is not null );
OK
3 a3 NULL
Time taken: 34.123 seconds, Fetched: 1 row(s)

各位不妨可以做個試驗：
--沒結果
hive> select * from a where id not in (null);
OK
Time taken: 3.603 seconds

HIVE中IN的坑

HIVE中IN的坑

hive中使用case、if：一個region統計業務（hive條件函式case、if、COALESCE語法介紹:CONDITIONAL FUNCTIONS IN HIVE）

[轉] String to Date conversion in hive - 在 Hive 中各種字符串轉換成日期格式

Sql語句中IN和exists的區別及應用

better-scroll在vue中的坑

利用Sqoop將MySQL數據導入Hive中

（四）Asp.net web api中的坑-【api的返回值】

菜鳥幫你跳過openstack配置過程中的坑

SQL中in參數在存儲過程中傳遞及使用的方法

JavaScript中in操作符(for..in)、Object.keys()和Object.getOwnPropertyNames()的區別

phpfpm配置 php中的坑

利用sqoop從 hive中往mysql中導入表出現的問題

sql中in和exist語句的區別？(補充了left join和right join)

url_for()中的坑,url_for操作對象是函數，而不是route裏的路徑

封裝sql語句中in限制查詢個數的方法

數據庫中in和exists關鍵字的區別

Oracle中 in、exists、not in，not exists的比較

Hive中如何添加自定義UDF函數以及oozie中使用hive的自定義函數

sql中in和exists的使用情況

從構建分布式秒殺系統聊聊Lock鎖使用中的坑

HIVE中IN的坑

相關推薦