hive筆記-自定義UDF
阿新 • • 發佈:2019-02-20
1、定義自己的UDF函式
pom.xmlpackage com.hihi.hive; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class HelloWord extends UDF { public Text evaluate(final Text s) { if (s == null) { return null; } return new Text("HelloWord:" + s.toString().toLowerCase()); } }
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>study-hadoop</groupId> <artifactId>hive</artifactId> <version>1.0</version> <properties> <projcet.build.sourceEncoding>UTF-8</projcet.build.sourceEncoding> <hadoop.version>2.6.0-cdh5.7.0</hadoop.version> <hive.version>1.1.0-cdh5.7.0</hive.version> </properties> <repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>${hive.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.10</version> </dependency> </dependencies> </project>
2、將程式碼打包,放上後臺
[root@hadoop001 jar]# rz
rz waiting to receive.
Starting zmodem transfer. Press Ctrl+C to cancel.
Transferring hive-1.0.jar...
100% 2 KB 2 KB/sec 00:00:01 0 Errors
[root@hadoop001 jar]# pwd
/home/hadoop/jar
3、進入hive執行命令,這命令能建立一個臨時函式,該函式只適用於當前會話
hive> add jar /home/hadoop/jar/hive-1.0.jar; Added [/home/hadoop/jar/hive-1.0.jar] to class path Added resources: [/home/hadoop/jar/hive-1.0.jar] hive> create temporary function my_hello as 'com.hihi.hive.HelloWord'; OK Time taken: 0.016 seconds hive> select ename, my_hello(ename) from emp_dept_partition limit 3; OK SMITH HelloWord:SMITH JONES HelloWord:JONES SCOTT HelloWord:SCOTT Time taken: 0.124 seconds, Fetched: 3 row(s) hive> list jars; /home/hadoop/jar/hive-1.0.jar
4、查詢元資料發現不存在關於函式的資料
mysql> select * from funcs;
Empty set (0.00 sec)
5、重新建立會話,由於剛剛建立的是臨時函式,所以發現報錯
hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
FAILED: SemanticException [Error 10011]: Line 1:14 Invalid function 'my_hello'
6、嘗試建立一個永久的函式
hive> add jar /home/hadoop/jar/hive-1.0.jar;
Added [/home/hadoop/jar/hive-1.0.jar] to class path
Added resources: [/home/hadoop/jar/hive-1.0.jar]
hive> create function my_hello as 'com.hihi.hive.HelloWord';
OK
Time taken: 0.016 seconds
7、查詢元資料,發現有改函式的資訊,但func_ru表中卻沒有資料。
mysql> select * from funcs;
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
| 6 | com.hihi.hive.HelloWord | 1515675864 | 1 | my_hello | 1 | NULL | USER |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
1 row in set (0.00 sec)
mysql> select * from func_ru;
Empty set (0.00 sec)
8、重新進入會話,發現呼叫函式還是失敗
hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
FAILED: SemanticException [Error 10011]: Line 1:14 Invalid function 'my_hello'
9、嘗試從HDFS匯入jar包
CREATE FUNCTION my_hello AS 'com.hihi.hive.HelloWord' USING JAR 'hdfs://hadoop001:9000/jar/hive-1.0.jar';
10、檢視元資料,發現func_ru現在有關於函式my_hello的資料,那是不是每次呼叫函式,就讀取元資料重新載入jar包並建立函式呢?
mysql> select * from func_ru;
+---------+---------------+----------------------------------------+-------------+
| FUNC_ID | RESOURCE_TYPE | RESOURCE_URI | INTEGER_IDX |
+---------+---------------+----------------------------------------+-------------+
| 11 | 1 | hdfs://hadoop001:9000/jar/hive-1.0.jar | 0 |
+---------+---------------+----------------------------------------+-------------+
1 row in set (0.00 sec)
mysql> select * from funcs;
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
| 11 | com.hihi.hive.HelloWord | 1515676179 | 1 | my_hello | 1 | NULL | USER |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
1 row in set (0.00 sec)
11、重新登陸會話,先檢查jar包是否被載入再呼叫函式,會發現呼叫函式的時候會重新載入jar包,載入jar包的規則記錄在元資料庫的func_ru表格中
hive> list jar;
hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
converting to local hdfs://hadoop001:9000/jar/hive-1.0.jar
Added [/tmp/9da42cea-1284-46f1-9969-74dc80ed05fe_resources/hive-1.0.jar] to class path
Added resources: [hdfs://hadoop001:9000/jar/hive-1.0.jar]
OK
SMITH HelloWord:SMITH
JONES HelloWord:JONES
SCOTT HelloWord:SCOTT
Time taken: 1.252 seconds, Fetched: 3 row(s)
hive> list jar;
/tmp/9da42cea-1284-46f1-9969-74dc80ed05fe_resources/hive-1.0.jar
不足點:show functions的命令並不會顯示該命令,而且每次使用都要重新載入jar包的話還是挺麻煩的。所以後續會繼續尋找其解決方法。
【來自@若澤大資料】