hive UDF 程式設計
阿新 • • 發佈:2020-09-02
UDF的定義
- UDF(User-Defined Functions)即是使用者定義的hive函式。hive自帶的函式並不能完全滿足業務需求,這時就需要我們自定義函數了
UDF的分類
- UDF:one to one,進來一個出去一個,row mapping。是row級別操作,如:upper、substr函式
- UDAF:many to one,進來多個出去一個,row mapping。是row級別操作,如sum/min。
- UDTF:one to many ,進來一個出去多個。如alteral view與explode
自定義UDF
引入maven依賴
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.3.0</version> </dependency>
實現抽象類GenericUDF
該類的全路徑為:org.apache.hadoop.hive.ql.udf.generic.GenericUDF
1)抽象類GenericUDF解釋
GenericUDF類如下:
public abstract class GenericUDF implements Closeable { ... /* 例項化後initialize方法只會呼叫一次 - 引數arguments即udf接收的引數列表對應的objectinspector - 返回的ObjectInspector物件就是udf返回值的對應的objectinspector initialize方法中往往做的工作是檢查一下arguments是否和你udf需要的引數個數以及型別是否匹配。*/ public abstract ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException; ... // 真正的udf邏輯在這裡實現 // - 引數arguments即udf函式輸入資料,這個陣列的長度和initialize的引數長度一樣 // public abstract Object evaluate(DeferredObject[] arguments) throws HiveException; }
關於ObjectInspector,HIVE在傳遞資料時會包含資料本身以及對應的ObjectInspector,ObjectInspector中包含資料型別資訊,通過oi去解析獲得資料。
2) 例項
public class DateFeaker extends GenericUDF{ private static final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd"); private transient ObjectInspectorConverters.Converter[] converters; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { if (arguments.length != 2) { throw new UDFArgumentLengthException( "The function date_util(startdate,enddate) takes exactly 2 arguments."); } converters = new ObjectInspectorConverters.Converter[arguments.length]; for (int i = 0; i < arguments.length; i++) { converters[i] = ObjectInspectorConverters.getConverter(arguments[i], PrimitiveObjectInspectorFactory.writableStringObjectInspector); } return ObjectInspectorFactory .getStandardListObjectInspector(PrimitiveObjectInspectorFactory .writableStringObjectInspector); } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { if (arguments.length != 2) { throw new UDFArgumentLengthException( "The function date_util(startdate,enddate) takes exactly 2 arguments."); } ArrayList<Text> temp = new ArrayList<Text>(); if (arguments[0].get() == null || arguments[1].get() == null) { return null; } System.out.println(converters[0].getClass().getName()); System.out.println(arguments[0].getClass().getName()); Text startDate = (Text) converters[0].convert(arguments[0].get()); Text endDate = (Text) converters[1].convert(arguments[1].get()); Date start; try { start = sdf.parse(startDate.toString()); } catch (ParseException e) { e.printStackTrace(); throw new UDFArgumentException( "The First Argument does not match the parttern yyyy-MM-dd "+arguments[0].get()); } Date end; try { end = sdf.parse(endDate.toString()); } catch (ParseException e) { e.printStackTrace(); throw new UDFArgumentException( "The Second Argument does not match the parttern yyyy-MM-dd "+arguments[1].get()); } Calendar c = Calendar.getInstance(); while(start.getTime()<=end.getTime()){ temp.add(new Text(sdf.format(start))); c.setTime(start); c.add(Calendar.DATE, 1); start = c.getTime(); } return temp; } @Override public String getDisplayString(String[] children) { assert (children.length == 2); return getStandardDisplayString("date_util", children); }
3)推薦比較全的例項
git地址:https://github.com/tchqiq/HiveUDF/tree/master/src/main/java/cn/com/diditaxi/hive/cf