1. 程式人生 > 實用技巧 >hive UDF 程式設計

hive UDF 程式設計

UDF的定義

  • UDF(User-Defined Functions)即是使用者定義的hive函式。hive自帶的函式並不能完全滿足業務需求,這時就需要我們自定義函數了

UDF的分類

  1. UDF:one to one,進來一個出去一個,row mapping。是row級別操作,如:upper、substr函式
  2. UDAF:many to one,進來多個出去一個,row mapping。是row級別操作,如sum/min。
  3. UDTF:one to many ,進來一個出去多個。如alteral view與explode

自定義UDF

引入maven依賴

<dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.3
.0</version> </dependency>

實現抽象類GenericUDF

該類的全路徑為:org.apache.hadoop.hive.ql.udf.generic.GenericUDF

1)抽象類GenericUDF解釋
GenericUDF類如下:

public abstract class GenericUDF implements Closeable {
     ...
     /* 例項化後initialize方法只會呼叫一次
        - 引數arguments即udf接收的引數列表對應的objectinspector
        - 返回的ObjectInspector物件就是udf返回值的對應的objectinspector
      initialize方法中往往做的工作是檢查一下arguments是否和你udf需要的引數個數以及型別是否匹配。
     
*/ public abstract ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException; ... // 真正的udf邏輯在這裡實現 // - 引數arguments即udf函式輸入資料,這個陣列的長度和initialize的引數長度一樣 // public abstract Object evaluate(DeferredObject[] arguments) throws HiveException; }

關於ObjectInspector,HIVE在傳遞資料時會包含資料本身以及對應的ObjectInspector,ObjectInspector中包含資料型別資訊,通過oi去解析獲得資料。

2) 例項

public class DateFeaker extends GenericUDF{
    private static final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
    
      private transient ObjectInspectorConverters.Converter[] converters;

      @Override
      public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        if (arguments.length != 2) {
          throw new UDFArgumentLengthException(
              "The function date_util(startdate,enddate) takes exactly 2 arguments.");
        }

        converters = new ObjectInspectorConverters.Converter[arguments.length];
        for (int i = 0; i < arguments.length; i++) {
          converters[i] = ObjectInspectorConverters.getConverter(arguments[i],
              PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        }

        return ObjectInspectorFactory
            .getStandardListObjectInspector(PrimitiveObjectInspectorFactory
                .writableStringObjectInspector);
      }
      


    @Override
    public Object evaluate(DeferredObject[] arguments) throws HiveException {
        if (arguments.length != 2) {
              throw new UDFArgumentLengthException(
                  "The function date_util(startdate,enddate) takes exactly 2 arguments.");
            }
        
        ArrayList<Text> temp = new ArrayList<Text>();
        
        if (arguments[0].get() == null || arguments[1].get() == null) {
            return null;
         }
        System.out.println(converters[0].getClass().getName());
        System.out.println(arguments[0].getClass().getName());
        Text startDate = (Text) converters[0].convert(arguments[0].get());
        Text endDate = (Text) converters[1].convert(arguments[1].get());
        Date start;
        try {
            start = sdf.parse(startDate.toString());
        } catch (ParseException e) {
            e.printStackTrace();
            throw new UDFArgumentException(
                    "The First Argument does not match the parttern yyyy-MM-dd "+arguments[0].get());
        }
        Date end;
        try {
            end = sdf.parse(endDate.toString());
        } catch (ParseException e) {
            e.printStackTrace();
            throw new UDFArgumentException(
                    "The Second Argument does not match the parttern yyyy-MM-dd "+arguments[1].get());
        }
        Calendar  c = Calendar.getInstance();
        while(start.getTime()<=end.getTime()){
            temp.add(new Text(sdf.format(start)));
            c.setTime(start);
            c.add(Calendar.DATE, 1);
            start = c.getTime();
        }
        return temp;
    }

    @Override
    public String getDisplayString(String[] children) {
        assert (children.length == 2);
        return getStandardDisplayString("date_util", children);
    }

3)推薦比較全的例項

git地址:https://github.com/tchqiq/HiveUDF/tree/master/src/main/java/cn/com/diditaxi/hive/cf