Drill —— Querying Hive Using Apache Drill

阿新 • • 發佈：2020-08-04

https://acadgild.com/blog/querying-hive-using-apache-drill

Apache Drill is an open source software framework which has been derived from Google’s Dremel System available as an infrastructure service called Google BigQuery.

The specilaity of the Drill is to scale up to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds.

Drill has been popularly used as a SQL query engine for Big Data exploration.

It is designed to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications and the same time it provides the familiarity and ecosystem of ANSI SQL, the industry-standard query language.

Drill provides plug-and-play integration with existing Apache Hive and ApacheHBasedeployments.

In this post, we will be discussing how to install apache Drill in embedded mode and how to configure Hive storage plug-in for Drill to query onHivetables.

Installing Apache Drill

The first step to do is to download Apache Drill. You can do this using the below link:

www.apache.org/dyn/closer.lua?filename=drill/drill-1.6.0/apache-drill-1.6.0.tar.gz&action=download

After downloading, move to the file in the downloaded folder, and type the command

tar -xvzf apache-drill-1.6.0.tar.gz

After the extraction of the tar file, you will be able to seeapache-drill-1.6.0.

That’s it! Drill has been installed in your single node system.

Next, let’s configure Drill to query on Hive tables.

Note: Hive should be pre-installed in your cluster

Now, type the commandhive –service metastorein one terminal and open another terminal and startDrill.To do this, move into the Drill installed folder and go into the bin folder and then type the command./drill-embedded

Setting Hive storage plugin for Drill:

Next, open your browser and typelocalhost:8047to enter into Drill’s Web UI.

Now, click on ‘Storage’ and enable Hive.

Now, click on ‘Update’ and do the changes as specified. By default, the configuration properties will look like as shown below:

 {
       type:"hive",
       enabled: true,
       configProps : {
         "hive.metastore.uris" : "thrift://<metastore_host>:<port>",
         "fs.default.name" : "hdfs://<host>:<port>/",
         "hive.metastore.sasl.enabled" : "false",
         "hive.server2.enable.doAs" : "true",
         "hive.metastore.execute.setugi" : "true"
       }
      }

Next, in theHive.metastore.uris, enter the thrift host name and port number. In our case, host name islocalhostand the port number is default for all it is9083.

Infs.default.name,give the HDFS complete path; in our case, it ishdfs://localhost:9000

Finally, the configuration properties will look as shown below:

{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://localhost:9083",
"javax.jdo.option.ConnectionURL": "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "hdfs://localhost:9000",
"hive.metastore.sasl.enabled": "false"
}
}

For performing simple operations on the tables in Hive, we need to create one table in Hive. Let’s create one table in Hive, now.

create table olympic(athelete STRING,age INT,country STRING,year STRING,closing STRING,sport STRING,gold INT,silver INT,bronze INT,total INT) row format delimited fields terminated by '\t' stored as textfile;

Here, we are creating a table with name “olympic” and the schema of the table is as specified above. The data inside the above input file is delimited by tab space.As explained earlier, the file format is specified as TEXTFILE at the end.The schema of the table created above can be checked usingdescribe olympic;

We can load data into the created table as:

load data local inpath ‘path of your file’ into table olympic;

The same is shown in the figure below:

We have successfully loaded our input file data into our table in TEXTFILE format.

Now, let’s perform one basic SELECT operation on the data as shown below:

Select athlete from olympic;

The data retrieved is as shown in the below image:

Now, go to the terminal where Apache Drill is running; here type the commanduse hiveto make Drill use Hive schema.

Now, you will be able to perform all the Hive queries using Drill. First, let’s check the table which we have created in Hive through Drill.

You can see the schema of the table in Hive using Drill.

Now, let us perform some queries on the data. Let’s find out the average age of the athletes who has participated in Olympics.

Command:select AVG(age) from olympic

You can see that the result has been displayed on the screen. The average age of the athletes is26.405433646812956

Hope this post has been helpful in understanding the concept of querying Hive table using Apache Drill. In case of any questions, feel free to comment below and we will get back to you at the earliest.

Keep visiting our sitewww.acadgild.comfor more updates on Big Data and other technologies.

Drill —— Querying Hive Using Apache Drill

https://acadgild.com/blog/querying-hive-using-apache-drill Apache Drill is an open source software framework which has been derived from Google’s Dremel System available as an infrastructure se

Drill _ Querying HDFS using Apache Drill

In this post, we will be looking at how to query files in HDFS using Apache drill. We recommend you to go through our previous post onInstalling Apache Drillbefore going ahead with this post.

Apache Drill - Querying Data using HBase

https://www.tutorialspoint.com/apache_drill/apache_drill_querying_data_using_hbase.htm HBase is a distributed column-oriented database built on top of the Hadoop file system. It is a part of the Hado

Apache Drill - Querying Parquet Files

https://www.tutorialspoint.com/apache_drill/apache_drill_querying_parquet_files.htm Parquet is a columnar storage format. Apache Drill uses Parquet format for easy, fast and efficient access.

Apache Drill - JDBC Interface

Apache Drill provides JDBC interface to connect and execute queries. We can use JDBC interface in JDBC based SQL Client like “SquirreL SQL Client” and work on all the features of drill. We

drill 學習五配置說明- 記憶體

關於 drill 的配置主要包含了：記憶體配置多租戶配置安全配置效能以及功能配置

drill 學習五配置說明- 安全

安全是一個比較重要的，但是在實際的開發中，大家可能關注的不是很多，drill 安全配置包含：

drill 學習七 drill jdbc 連線說明

drill 官方沒有提供直接的jdbc支援，但是mapr 提供了一個下載jdbc 驅動可以直接在drill 的安裝包提取，也可以在https://apache.osuosl.org/drill/ 地址下載

HIVE報錯：Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

執行insert into table video_orc select * from video_ori;時報錯檢視hive日誌發現具體報錯資訊如下：

Drill —— Querying Hive Using Apache Drill

Installing Apache Drill

Setting Hive storage plugin for Drill:

Drill —— Querying Hive Using Apache Drill

Drill _ Querying HDFS using Apache Drill

Apache Drill - Querying Data using HBase

Apache Drill - Querying Parquet Files

Apache Drill - JDBC Interface

drill 學習五配置說明- 記憶體

drill 學習五配置說明- 安全

drill 學習七 drill jdbc 連線說明

HIVE報錯：Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

超強Mac資料恢復軟體：Disk Drill Enterprise Mac

How HiveServer2 Brings Security and Concurrency to Apache Hive

hive啟動一直失敗 org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver

org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source錯誤解決辦法

OLAP中roll-up和drill-down和slicing

Disk Drill資料恢復工具

專案啟動發現啟動不起來，反而控制檯無限輸出：Logging initialized using 'class org.apache.ibatis.logging.stdout.StdOutImpl' adapter.

Apache Hudi 與 Hive 整合手冊

apache 大資料平臺搭建(hive)

Apache Impala架構解析及與Hive、SparkSQL的效能比較

hive啟動一直失敗 org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver

Drill —— Querying Hive Using Apache Drill

Installing Apache Drill

Setting Hive storage plugin for Drill:

相關推薦