Hive- UDF&GenericUDF

阿新 • • 發佈：2018-11-22

原文連結：https://www.jianshu.com/p/ca9dce6b5c37

Hive- UDF&GenericUDF

hive udf簡介

在Hive中，使用者可以自定義一些函式，用於擴充套件HiveQL的功能，而這類函式叫做UDF（使用者自定義函式）。UDF分為兩大類：UDAF（使用者自定義聚合函式）和UDTF（使用者自定義表生成函式）。在介紹UDAF和UDTF實現之前，我們先在本章介紹簡單點的UDF實現——UDF和GenericUDF，然後以此為基礎在下一章介紹UDAF和UDTF的實現。

Hive有兩個不同的介面編寫UDF程式。一個是基礎的UDF介面，一個是複雜的GenericUDF介面。

org.apache.hadoop.hive.ql. exec.UDF 基礎UDF的函式讀取和返回基本型別，即Hadoop和Hive的基本型別。如，Text、IntWritable、LongWritable、DoubleWritable等。

org.apache.hadoop.hive.ql.udf.generic.GenericUDF 複雜的GenericUDF可以處理Map、List、Set型別。

註解使用：

@Describtion註解是可選的，用於對函式進行說明，其中的FUNC字串表示函式名，當使用DESCRIBE FUNCTION命令時，替換成函式名。@Describtion包含三個屬性：

name：用於指定Hive中的函式名。
value：用於描述函式的引數。
extended：額外的說明，如，給出示例。當使用DESCRIBE FUNCTION EXTENDED name的時候列印。

而且，Hive要使用UDF，需要把Java檔案編譯、打包成jar檔案，然後將jar檔案加入到CLASSPATH中，最後使用CREATE FUNCTION語句定義這個Java類的函式：

hive> ADD jar /root/experiment/hive/hive-0.0.1-SNAPSHOT.jar;
hive> CREATE TEMPORARY FUNCTION hello AS "edu.wzm.hive. HelloUDF";

hive> DROP TEMPORARY FUNCTION IF EXIST hello;

udf

簡單的udf實現很簡單，只需要繼承udf，然後實現evaluate()方法就行了。evaluate()允許過載。

一個例子：

@Description(  
    name = "hello",  
    value = "_FUNC_(str) - from the input string"  
        + "returns the value that is \"Hello $str\" ",  
    extended = "Example:\n"  
        + " > SELECT _FUNC_(str) FROM src;"  
)  
public class HelloUDF extends UDF{  
      
    public String evaluate(String str){  
        try {  
            return "Hello " + str;  
        } catch (Exception e) {  
            // TODO: handle exception  
            e.printStackTrace();  
            return "ERROR";  
        }  
    }  
}

genericUDF

GenericUDF實現比較複雜，需要先繼承GenericUDF。這個API需要操作Object Inspectors，並且要對接收的引數型別和數量進行檢查。GenericUDF需要實現以下三個方法：

//這個方法只調用一次，並且在evaluate()方法之前呼叫。該方法接受的引數是一個ObjectInspectors陣列。該方法檢查接受正確的引數型別和引數個數。  
abstract ObjectInspector initialize(ObjectInspector[] arguments);  
  
//這個方法類似UDF的evaluate()方法。它處理真實的引數，並返回最終結果。  
abstract Object evaluate(GenericUDF.DeferredObject[] arguments);  
  
//這個方法用於當實現的GenericUDF出錯的時候，打印出提示資訊。而提示資訊就是你實現該方法最後返回的字串。  
abstract String getDisplayString(String[] children);

一個例子：判斷array是否包含某個值。

/*** Eclipse Class Decompiler plugin, copyright (c) 2016 Chen Chao ([email protected]) ***/
package org.apache.hadoop.hive.ql.udf.generic;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.BooleanWritable;

@Description(name = "array_contains", value = "_FUNC_(array, value) - Returns TRUE if the array contains value.", extended = "Example:\n  > SELECT _FUNC_(array(1, 2, 3), 2) FROM src LIMIT 1;\n  true")
public class GenericUDFArrayContains extends GenericUDF {
    private static final int ARRAY_IDX = 0;
    private static final int VALUE_IDX = 1;
    private static final int ARG_COUNT = 2;
    private static final String FUNC_NAME = "ARRAY_CONTAINS";
    private transient ObjectInspector valueOI;
    private transient ListObjectInspector arrayOI;
    private transient ObjectInspector arrayElementOI;
    private BooleanWritable result;

    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        if (arguments.length != 2) {
            throw new UDFArgumentException("The function ARRAY_CONTAINS accepts 2 arguments.");
        }

        if (!(arguments[0].getCategory().equals(ObjectInspector.Category.LIST))) {
            throw new UDFArgumentTypeException(0, "\"array\" expected at function ARRAY_CONTAINS, but \""
                    + arguments[0].getTypeName() + "\" " + "is found");
        }

        this.arrayOI = ((ListObjectInspector) arguments[0]);
        this.arrayElementOI = this.arrayOI.getListElementObjectInspector();

        this.valueOI = arguments[1];

        if (!(ObjectInspectorUtils.compareTypes(this.arrayElementOI, this.valueOI))) {
            throw new UDFArgumentTypeException(1,
                    "\"" + this.arrayElementOI.getTypeName() + "\"" + " expected at function ARRAY_CONTAINS, but "
                            + "\"" + this.valueOI.getTypeName() + "\"" + " is found");
        }

        if (!(ObjectInspectorUtils.compareSupported(this.valueOI))) {
            throw new UDFArgumentException("The function ARRAY_CONTAINS does not support comparison for \""
                    + this.valueOI.getTypeName() + "\"" + " types");
        }

        this.result = new BooleanWritable(false);

        return PrimitiveObjectInspectorFactory.writableBooleanObjectInspector;
    }

    public Object evaluate(GenericUDF.DeferredObject[] arguments) throws HiveException {
        this.result.set(false);

        Object array = arguments[0].get();
        Object value = arguments[1].get();

        int arrayLength = this.arrayOI.getListLength(array);

        if ((value == null) || (arrayLength <= 0)) {
            return this.result;
        }

        for (int i = 0; i < arrayLength; ++i) {
            Object listElement = this.arrayOI.getListElement(array, i);
            if ((listElement == null)
                    || (ObjectInspectorUtils.compare(value, this.valueOI, listElement, this.arrayElementOI) != 0))
                continue;
            this.result.set(true);
            break;
        }

        return this.result;
    }

    public String getDisplayString(String[] children) {
        assert (children.length == 2);
        return "array_contains(" + children[0] + ", " + children[1] + ")";
    }
}

總結

當寫Hive UDF時，有兩個選擇：一是繼承 UDF類，二是繼承抽象類GenericUDF。這兩種實現不同之處是：GenericUDF 可以處理複雜型別引數，並且繼承GenericUDF更加有效率，因為UDF class 需要HIve使用反射的方式去實現。
UDF是作用於一行的。

Hive- UDF&GenericUDF

原文連結：https://www.jianshu.com/p/ca9dce6b5c37

Hive- UDF&GenericUDF

hive udf簡介

Hive- UDF&GenericUDF

Hive-UDF&GenericUDF&Hive-UDTF&Hive-UDAF

Hive UDF函式編寫流程詳解

hive udf開發超詳細手把手教程(有些過時了)

Hive UDF開發例項

HIVE---UDF

hive UDF 開發示例

hive/udf/udaf/udtf 的異同點

Hive UDF臨時與永久函式註冊函式

spark學習記錄（十二、Spark UDF&UDAF&開窗函式）

自定義HIVE-UDF函式

base64加密解密的hive udf函式

Hive udf函式的使用

Hive UDF開發指南(轉）

Hive UDF 環境搭建(Eclipse+Maven)

Hive UDF開發指南

hive udf函式替換特殊字元

Hive UDF進階

Hive UDF 用戶自定義函數編程及使用

0011-如何在Hive & Impala中使用UDF

Hive- UDF&GenericUDF

原文連結：https://www.jianshu.com/p/ca9dce6b5c37

Hive- UDF&GenericUDF

hive udf簡介

相關推薦