hive使用者許可權以及表許可權實現思路

阿新 • • 發佈：2018-12-12

hive許可權系統

hive本身提供的許可權的系統是基於linux使用者構建的，帶來的問題就是，使用者可以偽造賬號訪問資料，這樣的話許可權系統形同虛設；所以通常情況下，公司一般都會使用kerberos+sentry這種架構構建資料倉庫；這就需要資料團隊有比較強的技術實力[kerberos這玩意玩起來挺費勁的]，但是大多數公司可能用上了大資料，但技術儲備不夠完善；所以我在想如何在不適用這些外掛，也能實現這些功能

使用者訪問控制

大多數情況下都使用hiveserver2這種方式訪問hive，所以在這裡我們嘗試修改hiveserver2這個請求入口，控制使用者的請求；我們事先約定使用者訪問hive都需要提供username和password，不提供username和password，我們是為非法使用者，拒絕訪問hive，為了實現這個功能，我們需要開啟hiveserver2驗證功能，修改hive-site.xml

<property>
  <name>hive.server2.custom.authentication.class</name>
  <value>org.apache.hive.service.auth.CustomPasswdAuthenticator</value>
</property>

<property>
    <name>hive.server2.authentication</name>
    <value>CUSTOM</value>
</property>

構建一個java maven專案，依賴如下

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>3.1.0</version>
        </dependency>

使用者登陸許可權控制實現類org.apache.hive.service.auth.CustomPasswdAuthenticator

package org.apache.hive.service.auth;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.security.sasl.AuthenticationException;
import java.util.HashMap;
import java.util.Map;

public class CustomPasswdAuthenticator implements org.apache.hive.service.auth.PasswdAuthenticationProvider {

    private static final Logger LOG = LoggerFactory.getLogger(CustomPasswdAuthenticator.class);

    private static final Map<String,String> users = new HashMap<String,String>();
    static{
        users.put("xb","123456");
        users.put("hadoop","123456");
        users.put("hive","123456");
    }

    public void Authenticate(String userName, String passwd)
            throws AuthenticationException {
        if(userName==null||"".equals(userName)){
            throw new AuthenticationException("user can not be null");
        }else{
            if(users.get(userName)==null){
                throw new AuthenticationException("user "+userName +" is not exists");
            }
        }

        if(!passwd.equals(users.get(userName))){
            throw new AuthenticationException("user:"+userName +",passwd:"+passwd+". is error");
        }
        LOG.info("====================================user: "+userName+" try login. passwd is "+passwd);
    }
}

所有使用者登陸都需要Authenticate這個方法，在這個方法裡，我們可以對使用者進行校驗；這裡為了方便，我把所有的使用者存在了Map裡，如果運用線上的話可以將此部分移到Mysql中去

編譯打包，將jar放置到$HIVE_HOME/lib/下，重啟hiveserver2即可

mvn clean package
hiveserver2 --hiveconf hive.root.logger=INFO,console

jdbc測試main方法

    public static void main(String[] args)throws Exception {

        Connection con = DriverManager.getConnection("jdbc:hive2://master:10000/default", 
            "hadoop", "123456");
        Statement stmt = con.createStatement();
        String sql = "select * from wh.test " ;
        ResultSet res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(String.valueOf(res.getString(1)) );
        }

    }

hiveserver日誌輸出如下

2018-09-22T04:03:14,815  INFO [HiveServer2-Handler-Pool: Thread-56] auth.CustomPasswdAuthenticator: ====================================user: xiaobin try login. passwd is 123456

到這裡hiveserver2就具備使用者訪問控制功能，接下來要做的就是使用者分配····

表控制

上面這些只是做了使用者級別的控制，正式環境中，這些是遠遠不夠的；因為驗證通過的使用者擁有所有資料的訪問許可權，這明顯是不合理的；大多情況下我們都需要做到表級別的許可權控制；所以研究了下hive程式碼，追蹤了下sql呼叫鏈，這個功能也是可以做到的，具體參照org.apache.hive.service.cli.session.HiveSessionImpl.java

 private OperationHandle executeStatementInternal(String statement,
      Map<String, String> confOverlay, boolean runAsync, long queryTimeout) throws HiveSQLException {
    acquire(true, true);
//列印使用者名稱，sql語句，以及使用者密碼
    LOG.info(username+"============================="+statement+"============================="+password);
    ExecuteStatementOperation operation = null;
    OperationHandle opHandle = null;
    try {
      operation = getOperationManager().newExecuteStatementOperation(getSession(), statement,
          confOverlay, runAsync, queryTimeout);
      opHandle = operation.getHandle();
      addOpHandle(opHandle);
      operation.run();
      return opHandle;
    } catch (HiveSQLException e) {
      // Refering to SQLOperation.java, there is no chance that a HiveSQLException throws and the
      // async background operation submits to thread pool successfully at the same time. So, Cleanup
      // opHandle directly when got HiveSQLException
      if (opHandle != null) {
        removeOpHandle(opHandle);
        getOperationManager().closeOperation(opHandle);
      }
      throw e;
    } finally {
      if (operation == null || operation.getBackgroundHandle() == null) {
        release(true, true); // Not async, or wasn't submitted for some reason (failure, etc.)
      } else {
        releaseBeforeOpLock(true); // Release, but keep the lock (if present).
      }
    }
  }

編譯打包

mvn clean package -DskipTests -pl service

將hive-service-3.1.0.jar複製到$HIVE_HOME/lib下，重啟hiveserver2，使用beeline方式訪問hive

beeline -u 'jdbc:hive2://localhost:10000' -n xiaobin -p 123456
0: jdbc:hive2://localhost:10000> select * from wh.test;

+----------------------------------------------------+
|                     test.line                      |
+----------------------------------------------------+
| The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. |
|                                                    |
| The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. |
+----------------------------------------------------+
3 rows selected (3.952 seconds)

hiveserver日誌輸出如下

2018-09-22T02:28:11,681  INFO [4bdac79a-9f84-4975-bcde-f7a7d84d7586 HiveServer2-Handler-Pool: Thread-61] 
session.HiveSessionImpl: hadoop=============================select * from wh.test =============================null

我們可以看到使用者名稱和sql已經打印出來了，但是有點奇怪的是passwd為null，追蹤了下程式碼，沒有找到是什麼問題，但不影響我們對使用者做許可權驗證，接下來我就說下思路，大家有需要的話自己去實現。

在方法executeStatementInternal中，首先需要解析sql語句，把使用者提交的sql中所涉及的表都提取出來，sql解析提取表名的方案，太多了，大家網上一搜一大把[之前我用presto-parsse提取過,這個是presto中的一個包，單獨拿出來也可以使用]，接下來的就是和自己設計的表許可權驗證，這裡也可以做庫的許可權，但事先需要約定使用者提交的sql中表方式為db.tablename,就像之前的sql語句中的wh.test，再往下大家都懂了，有許可權啥都不幹，沒許可權就拋異常

hive使用者許可權以及表許可權實現思路

hive許可權系統

使用者訪問控制

表控制

hive使用者許可權以及表許可權實現思路

基於hive的拉鏈表設計實現

【專案實踐】一文帶你搞定頁面許可權、按鈕許可權以及資料許可權

針對不同資料庫，獲取當前使用者所有有許可權檢視的表，以及表的建立時間、更新時間、註釋等資訊，表中欄位的相關資訊(包含分頁實現)

按鈕許可權和查詢條件許可權的實現思路

0015-如何使用Sentry管理Hive外部表許可權

RBAC許可權管理系統實現思路（一）

《GitLab批量修改專案成員許可權以及生成報表邏輯實現》

許可權管理系統實現思路（SpringCloud+Thymeleaf）(二)

Oracle 建立使用者和表空間，以及賦予許可權

Oracle建立表空間、建立使用者、授權、授權物件的訪問以及檢視許可權

hive多linux使用者和許可權管理

mysql 開發進階篇系列 51 許可權與安全(許可權表user,db詳細介紹 )

Linux許可權管理（使用者和組以及檔案許可權）

Linux建立ftp並設定許可權以及忘記ftp帳號（密碼）修改

許可權管理登入程式碼實現

06-訪問許可權以及封裝

Android 需要動態申請的許可權以及EasyPermission 的使用

SqlServer-RBAC五表許可權

Linux建立ftp並設定許可權以及忘記ftp帳號（密碼）修改（轉）

hive使用者許可權以及表許可權實現思路

hive許可權系統

使用者訪問控制

表控制

相關推薦