How HiveServer2 Brings Security and Concurrency to Apache Hive

阿新 • • 發佈：2020-10-10

一篇比較老的文章。

repost:https://blog.cloudera.com/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/

Apache Hivewas one of the first projects to bring higher-level languages to Apache Hadoop. Specifically, Hive enables the legions of trained SQL users to use industry-standard SQL to process their Hadoop data.

However, as you probably have gathered from all the recent community activity in the SQL-over-Hadoop area, Hive has a few limitations for users in the enterprise space. Until recently, two in particular – concurrency and security – were largely unaddressed.

To address these gaps, for Hive release 0.11, Cloudera engineers built and contributed new infrastructure for meeting these needs. In this post, you’ll learn why it’s needed, and how it works.

Customer Requirements

As you probably know, relational databases almost universally have a server process to support clients connecting over IPC or network connections. The clients may be native command-line editors or applications/tools using a driver such as ODBC or JDBC.

In Hive, a component called HiveServer serves this purpose. But over the past few years, as adoption of Hive increased, more and more customers reported two major requirements unaddressed by HiveServer:

To run more users concurrently against Hive in traditional client/server architecture
To authenticate users to prevent untrusted user access and to enforce authorization around permissions to their data assets

Because Hive is so important for our customers, these requirements motivated us to implement a new server process for Hive 0.11. The goal was to create a framework that handles multiple concurrent clients, supports popular authentication mechanisms, and is easy to adopt for open client implementations like JDBC and ODBC.

The result of that effort, HiveServer2 (HIVE-2935), finally bringsconcurrency,authentication, and afoundation for authorizationto Hive. Next, we’ll provide some details about these new features.

HiveServer2 Architecture

HiveServer2 is now available in Hive 0.11 and all other releases of Hive in CDH 4.1 and later. It implements a new Thrift-based RPC interface that can handle concurrent clients. The current release supports Kerberos, LDAP, and custom pluggable authentication. The new RPC interface also has better options for JDBC and ODBC clients, especially for metadata access.

Like the original HiveServer, HiveServer2 is a container for the Hive execution engine. For each client connection, it creates a new execution context that serves Hive SQL requests from the client. Thenew RPC interfaceenables the server to associate this Hive execution context with the thread serving the client’s request.

Clients for HiveServer2

JDBC:Hive 0.11 includes a new JDBC driver that works with HiveServer2, enabling users to write JDBC applications against Hive. The application needs to use the JDBC driver class and specify the network address and port in the connection URL in order to connect to Hive. The following code snippet shows how to connect to HiveServer2 from JDBC:

Class.forName("org.apache.hive.jdbc.HiveDriver");
 Connection con = DriverManager.
 getConnection("jdbc:hive2://localhost:10000/default",
 "hive", "passwd");

You can review a detailed example on theHive wiki.

Beeline CLI:Hive 0.11 also includes a new command-line interface (CLI) called Beeline that works with HiveServer2. Beeline is a JDBC application based on theSQLLineCLI that supports embedded and remote-client modes. The embedded mode is where the Hive runtime is part of the client process itself; there’s no server involved. (You can explore the detailed documentation for SQLLine, which is also applicable to Beeline,here.) Note that HiveServer2 doesn’t support the original Hive CLI client, as the Beeline CLI is a functional replacement designed for the HiveServer2 interface.

ODBC:Although Hive 0.11 currently doesn’t include a ODBC driver that works with HiveServer2, Cloudera makes one available.

Metastore Considerations

The Hive metastore service runs in its own JVM process. Clients other than Hive, like Apache Pig, connect to this service via HCatalog for metadata access. HiveServer2 supports local as well as remote metastore modes – which is useful when you have more than one service (Pig, Cloudera Impala, and so on) that needs access to metadata. This is the recommended deployment mode with HiveServer2:

Authentication

Authentication support is another major feature of HiveServer2. In the original HiveServer, if you can access the host/port over the network, you can access the data – so it relies on support for multiple authentication options to restrict access.

In contrast, HiveServer2 support Kerberos, pass-through LDAP, and pass-through plug-able custom authentication. All client types – JDBC, ODBC, as well as Beeline CLI — support these authentication modes. This enables the Hive deployment to easily integrate with existing authentication services.

Gateway to Secure Hadoop

Today, the Hadoop ecosystem only supports Kerberos for authentication. That means for accessing secure Hadoop, one needs to get a Kerberos ticket. However, enabling Kerberos on every client box can be a very challenging task and thus can restrict access to Hive and Hadoop.

To address that issue, HiveServer2 can authenticate clients over non-Kerberos connections (eg. LDAP) and run queries against Kerberos-secured Hadoop data. This approach allows users to securely access Hive without complex security infrastructure or limitations.

Foundation for Fine-grained Authorization

As a stopgap until fine-grained authorization is available, HiveServer2 also supports access to Hadoop as itself or by impersonating the connected user. (This behavior is configurable.) In this so-called impersonation mode, MapReduce jobs are submitted as the user connecting to HiveServer2. If the underlying Hadoop cluster is secure, the service principle used by Hive needs Hadoop proxy privileges to impersonate the connecting users. This interim solution provides coarse-grained authorization based on ownership and permissions on files and directories in HDFS (as opposed to Hive tables and views), which unblocks some usage.

HiveServer2’s strong authentication and revamped server-side architecture also provides the foundation for fine-grained authorization in Hive in the very near future. Stay tuned! (Update: read“With Sentry, Cloudera Closes Hadoop’s Enterprise Security Gap”)

Conclusion

In this post, you have received an overview of how Cloudera’s contribution of HiveServer2 brings concurrency, authentication, and a foundation for fine-grained authorization (more on this in afuture post) to Hive. For further reading, you may want to explore the docs onSetting up HiveServer2andHiveServer2 Clients.

Prasad Mujumdar is a Software Engineer on the Platform team.

How HiveServer2 Brings Security and Concurrency to Apache Hive

一篇比較老的文章。 repost:https://blog.cloudera.com/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/

For M_tree how to through pre_visit and post_visit to find how many kinds mid_visit?

技術標籤：考研機試刷題 Introduction We are all familiar with pre-order, in-order and post-order traversals of binary trees. A common problem in data structure classes is to find the pre-order tra

This PyQt5 tutorial shows how to use Python 3 and Qt to create a GUI on Windows, Mac or Linux. It even covers creating an installer for your app.

What is PyQt5? PyQtis a library that lets you use theQt GUI frameworkfrom Python. Qt itself is written in C++. By using it from Python, you can build applications much more quickly while not sacrific

redirect-windows-cmd-stdout-and-stderr-to-a-single-file

https://stackoverflow.com/questions/1420965/redirect-windows-cmd-stdout-and-stderr-to-a-single-file You want:

electron熟悉主程序和渲染程序通訊 ipcRenderer.send() and ipcMain.to()

在昨天的時候，已經用過ipcRendered.sendSync(), 昨天的程式碼是這樣的 renderer.js檔案

【c】 invalid operands of types ‘int‘ and ‘double‘ to binary ‘operator%‘

技術標籤：cc語言 % 符號報錯問題 *和/的運算元應為算術或列舉型別；%的運算元應為整數或列舉型別。解決：也就是% 只能用於整型或列舉型別，要麼把double改為int型別；要麼使用fmod函式；

sqli-labs less 25-a GET -Blind based -All your OR&AND belong to us -Intiger based

技術標籤：sqli-labs資料庫mysql安全一、輸入id，新增單引號後或者雙引號後，顯示錯誤頁面，說明是整形閉合沒有閉合符號，之後也和25關沒有什麼區別

Effective C++ 筆記 —— Item 2: Prefer consts, enums, and inlines to #defines

When you do something like this: #define ASPECT_RATIO 1.653 Because #define may be treated as if it\'s not part of the language per se. The name you defined may not get entered into the symbol tabl

missing semester - Security and Cryptography

熵熵(Entropy) 度量了不確定性並可以用來決定密碼的強度。熵的單位是 bits(位元)。對於一個均勻分佈的隨機離散變數，熵等於 log_2(# of possibilities)。扔一次硬幣的熵是1 bits，即log_2(2)。擲一次（六面）骰子

How can TCP ACKs be used to measure latency to a server?

https://www.pico.net/kb/how-can-tcp-acks-be-used-to-measure-latency-to-a-server/ TCP ACKs can be used to measure the round-trip time to a TCP receiver, and they can do so very accurately: since ACKs

[轉]Create a Sub-task and Link to Parent Issue in Jira

https://library.adaptavist.com/ Overview This script creates a sub-task in Jira with the minimum fields required, then automatically links the sub-task to a specified parent issue.

Jenkins - 構建報錯:The [cargo.remote.username] and [cargo.remote.password] properties are mandatory and need to be defined in your configuration.

org.codehaus.cargo.container.ContainerException: The [cargo.remote.username] and [cargo.remote.password] properties are mandatory and need to be defined in your configuration.

Introducing Libadwaita---GTK 3 (and libhandy) to GTK 4 and libadwaita for GNOME 41 GNOME 42

https://aplazas.pages.gitlab.gnome.org/blog/blog/2021/03/31/introducing-libadwaita.html Introducing Libadwaita

Spark2 Can't write dataframe to parquet hive table : HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.

一、概述出現該問題的原因是因為如果用命令列建立的hive表，會根據hive的hive.default.fileformat，這個配置來規定hive檔案的格式，其中fileformat一般有4中，分別是TextFile、SequenceFile、RCFile、ORC。預設情況

Spark2 Can't write dataframe to parquet hive table : HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

7 3 I\'m trying to save dataframe in table hive. In spark 1.6 it\'s work but after migration to 2.2.0 it doesn\'t work anymore.

How HiveServer2 Brings Security and Concurrency to Apache Hive

Customer Requirements

HiveServer2 Architecture

Clients for HiveServer2

Metastore Considerations

Authentication

Foundation for Fine-grained Authorization

Conclusion

How HiveServer2 Brings Security and Concurrency to Apache Hive

For M_tree how to through pre_visit and post_visit to find how many kinds mid_visit?

This PyQt5 tutorial shows how to use Python 3 and Qt to create a GUI on Windows, Mac or Linux. It even covers creating an installer for your app.

redirect-windows-cmd-stdout-and-stderr-to-a-single-file

electron熟悉主程序和渲染程序通訊 ipcRenderer.send() and ipcMain.to()

【c】 invalid operands of types ‘int‘ and ‘double‘ to binary ‘operator%‘

sqli-labs less 25-a GET -Blind based -All your OR&AND belong to us -Intiger based

Effective C++ 筆記 —— Item 2: Prefer consts, enums, and inlines to #defines

missing semester - Security and Cryptography

How can TCP ACKs be used to measure latency to a server?

[轉]Create a Sub-task and Link to Parent Issue in Jira

Jenkins - 構建報錯:The [cargo.remote.username] and [cargo.remote.password] properties are mandatory and need to be defined in your configuration.

Introducing Libadwaita---GTK 3 (and libhandy) to GTK 4 and libadwaita for GNOME 41 GNOME 42

Spark2 Can't write dataframe to parquet hive table : HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.

Spark2 Can't write dataframe to parquet hive table : HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

Apache Hive-2.3.0 快速搭建與使用

Spring boot with Apache Hive

Apache Hive 快速入門

How to Install Node.js and NPM on Mac OS

How To List Users and Groups on Linux

How HiveServer2 Brings Security and Concurrency to Apache Hive

Customer Requirements

HiveServer2 Architecture

Clients for HiveServer2

Metastore Considerations

Authentication

Foundation for Fine-grained Authorization

Conclusion

相關推薦