Impala--impala-shell、java通過jdbc連線impala
技術標籤:大資料(其他)
查詢處理介面
要處理查詢,Impala提供了三個介面,如下所示。
1、Impala-shell-使用Cloudera VM設定Impala後,可以通過在編輯器中鍵入impala-shell命令來啟動Impala shell。 我們將在後續章節中更多地討論Impala shell。
2、Hue介面-您可以使用Hue瀏覽器處理Impala查詢。 在Hue瀏覽器中,您有Impala查詢編輯器,您可以在其中鍵入和執行impala查詢。 要訪問此編輯器,首先,您需要登入到Hue瀏覽器
3、ODBC / JDBC驅動程式-與其他資料庫一樣,Impala提供ODBC / JDBC驅動程式。 使用這些驅動程式,您可以通過支援這些驅動程式的程式語言連線到impala,並構建使用這些程式語言在impala中處理查詢的應用程式。
查詢執行過程
1、每當使用者使用提供的任何介面傳遞查詢時,叢集中的Impalads之一就會接受該查詢。 此Impalad被視為該特定查詢的協調程式。
Impalad將query解析為具體的執行計劃Planner, 交給當前機器Coordinator即為中心協調節點
Coordinator(中心協調節點)根據執行計劃Planner,通過本機Executor執行,並轉發給其它有資料的impalad用Executor進行執行
2、在接收到查詢後,查詢協調器使用Hive元儲存中的表模式驗證查詢是否合適。 稍後,它從HDFS名稱節點收集關於執行查詢所需的資料的位置的資訊,並將該資訊傳送到其他impalad以便執行查詢。
3、所有其他Impala守護程式讀取指定的資料塊並處理查詢。 一旦所有守護程式完成其任務,查詢協調器將收集結果並將其傳遞給使用者。
各個impalad的Executor執行完成後,將結果返回給中心協調節點,中心節點Coordinator將匯聚的查詢結果返回給客戶端
三、impala-shell用法、操作hive資料例項
[[email protected] ~]# impala-shell -h
下面是Impala的外部Shell的一些引數:
-h : (--help) 幫助
-v : (--version) 查詢版本資訊
-p : 顯示執行計劃
-k : (--kerberos) 使用kerberos安全加密方式執行impala-shell
-u : 啟用LDAP時,指定使用者名稱
-i : hostname (--impalad=hostname) 指定連線主機格式hostname:port 預設埠21000,
impalad shell 預設連線本機impalad
-q : query 指定查詢的sql語句 從命令列執行查詢,不進入impala-shell
-d : default_db (--database=default_db) 指定資料庫
-B :(--delimited)去格式化輸出,格式化輸出* 大量資料加入格式化,效能受到影響
--output_delimiter=character (指定分隔符與其他命令整合,預設是\t分割)
--print_header 列印列名(去格式化,但是顯示列名字,預設不列印)
-f : query_file後跟查詢檔案(--query_file=query_file)執行查詢檔案,以分號分隔
建議sql 語句寫到一行,因為shell 會讀取檔案一行一行的命令
-o : filename (--output_file filename) 結果輸出到指定檔案
-c : 查詢執行失敗時繼續執行
-r : 重新整理所有元資料(當hive建立表的時候,你需要重新整理到,才能看到hive元資料的改變)
整體重新整理,全量重新整理,萬不得已才能用;
不建議定時去重新整理hive源資料,資料量太大時候,一個重新整理,很有可能會掛掉;
hive> select * from weather.weather_everydate_detail limit 10;
OK
WOCE_P10 1993 279.479 -16.442 172.219 24.9544 34.8887 1.0035 363.551 2
WOCE_P10 1993 279.48 -16.44 172.214 24.9554 34.8873 1.0035 363.736 2
WOCE_P10 1993 279.48 -16.439 172.213 24.9564 34.8868 1.0033 363.585 2
WOCE_P10 1993 279.481 -16.438 172.209 24.9583 34.8859 1.0035 363.459 2
WOCE_P10 1993 279.481 -16.437 172.207 24.9594 34.8859 1.0033 363.543 2
WOCE_P10 1993 279.481 -16.436 172.205 24.9604 34.8858 1.0035 363.432 2
WOCE_P10 1993 279.489 -16.417 172.164 24.9743 34.8867 1.0036 362.967 2
WOCE_P10 1993 279.49 -16.414 172.158 24.9742 34.8859 1.0035 362.96 2
WOCE_P10 1993 279.491 -16.412 172.153 24.9747 34.8864 1.0033 362.998 2
WOCE_P10 1993 279.492 -16.411 172.148 24.9734 34.8868 1.0031 363.022 2
Time taken: 0.815 seconds, Fetched: 10 row(s)
hive> select count(*) from weather.weather_everydate_detail;
Query ID = root_20171214185454_c783708d-ad4b-46cc-9341-885c16a286fe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1512525269046_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1512525269046_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1512525269046_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-12-14 18:55:27,386 Stage-1 map = 0%, reduce = 0%
2017-12-14 18:56:11,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 39.36 sec
2017-12-14 18:56:18,711 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 41.88 sec
MapReduce Total cumulative CPU time: 41 seconds 880 msec
Ended Job = job_1512525269046_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 41.88 sec HDFS Read: 288541 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 41 seconds 880 msec
OK
4018
Time taken: 101.82 seconds, Fetched: 1 row(s)
1、啟動Impala CLI
[[email protected] cloudera] # impala-shell
Starting Impala Shell……
2、在Impala中同步元資料
[quickstart.cloudera:21000] > INVALIDATE METADATA;
Query: invalidate METADATA
Query submitted at: 2017-12-14 19:01:12 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=43460ace5d3a9971:9a50f46600000000
Fetched 0 row(s) in 3.25s
3、在Impala中檢視Hive中表的結構
[quickstart.cloudera:21000] > use weather;
Query: use weather
[quickstart.cloudera:21000] > desc weather.weather_everydate_detail;
Query: describe weather.weather_everydate_detail
+---------+--------+---------+
| name | type | comment |
+---------+--------+---------+
| section | string | |
| year | bigint | |
| date | double | |
| latim | double | |
| longit | double | |
| sur_tmp | double | |
| sur_sal | double | |
| atm_per | double | |
| xco2a | double | |
| qf | bigint | |
+---------+--------+---------+
Fetched 10 row(s) in 3.70s
4、查詢記錄數量
[quickstart.cloudera:21000] > select count(*) from weather.weather_everydate_detail;
Query: select count(*) from weather.weather_everydate_detail
Query submitted at: 2017-12-14 19:03:11 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=5542894eeb80e509:1f9ce37f00000000
+----------+
| count(*) |
+----------+
| 4018 |
+----------+
Fetched 1 row(s) in 2.51s
說明:對比Impala與Hive中的count查詢,2.15 VS 101.82,Impala的優勢還是相當明顯的
5、執行一個普通查詢
[quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_sal=34.8105;
Query: select * from weather_everydate_detail where sur_sal=34.8105
Query submitted at: 2017-12-14 19:20:27 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=c14660ed0bda471f:d92fcf0e00000000
+----------+------+---------+--------+---------+---------+---------+---------+---------+----+
| section | year | date | latim | longit | sur_tmp | sur_sal | atm_per | xco2a | qf |
+----------+------+---------+--------+---------+---------+---------+---------+---------+----+
| WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081 | 361.29 | 2 |
| WOCE_P10 | 1993 | 312.155 | 34.602 | 141.954 | 24.0638 | 34.8105 | 1.0079 | 360.386 | 2 |
+----------+------+---------+--------+---------+---------+---------+---------+---------+----+
Fetched 2 row(s) in 0.25s
[quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_tmp=24.0804;
Query: select * from weather_everydate_detail where sur_tmp=24.0804
Query submitted at: 2017-12-14 23:15:32 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=774e2b3b81f4eed7:8952b5b400000000
+----------+------+---------+--------+---------+---------+---------+---------+--------+----+
| section | year | date | latim | longit | sur_tmp | sur_sal | atm_per | xco2a | qf |
+----------+------+---------+--------+---------+---------+---------+---------+--------+----+
| WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081 | 361.29 | 2 |
+----------+------+---------+--------+---------+---------+---------+---------+--------+----+
Fetched 1 row(s) in 3.86s
6.結論
對於Hive中需要編譯為mapreduce執行的SQL,在Impala中執行是有明顯的速度優勢的,但是Hive也不是所有的查詢都要編譯為mapreduce,此型別的查詢,impala相比於Hive就沒啥優勢了。
二、java通過jdbc連線impala
public class test_jdbc {
public static void test(){
Connection con = null;
ResultSet rs = null;
PreparedStatement ps = null;
String JDBC_DRIVER = "com.cloudera.impala.jdbc41.Driver";
String CONNECTION_URL = "jdbc:impala://192.168.2.20:21050";
try
{
Class.forName(JDBC_DRIVER);
con = (Connection) DriverManager.getConnection(CONNECTION_URL);
ps = con.prepareStatement("select count(*) from billdetail;");
rs = ps.executeQuery();
while (rs.next())
{
System.out.println(rs.getString(1) );
}
} catch (Exception e)
{
e.printStackTrace();
} finally
{
try {
rs.close();
ps.close();
con.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) {
test();
}
}