1. 程式人生 > >Hive UDF 環境搭建(Eclipse+Maven)

Hive UDF 環境搭建(Eclipse+Maven)

  1. 安裝Maven (https://blog.csdn.net/rav009/article/details/79469303)
  2. 安裝Eclipse
  3. 安裝Eclipse的Maven外掛 m2e

使用Eclipse建立Maven專案

Group ID一般是org.yourname.projectname, Group ID會變成你程式碼中類的字首

Artifact ID是Projectname, 就是專案名稱

建立專案後找到pom.xml, 在dependencies節點裡新增:

    <dependency>
    	<groupId>org.apache.hive</groupId>
    	<artifactId>hive-exec</artifactId>
    	<version>2.3.2</version>
    </dependency>

版本號根據hive的情況修改, 我寫這篇文章的時候 hive已經有2.3.3了

來到專案目錄下, 這個目錄裡應該有pom.xml, 執行命令列

mvn install

如果命令列報錯 ,說某個jar包 invalid LOC header (bad signature), 就去repository裡刪掉這個jar包的資料夾,重新執行命令列, 會自動重新下載. 對於ubuntu來說repository在~/.m2

在src/main/java下新增新檔案HelloWorld.java,程式碼如下:

package cn.pywei.HiveUDF;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;

@Description(name="HelloWorld",value="_FUNC_(input), return the string \"HelloWorld\".",extended ="E.g. \n select hello(1);")


public class HelloWorld extends UDF {


	public String evaluate(String s) {
		return "HelloWorld";
	}
	
	public String evaluate(int s) {
		return "HelloWorld";
	}
	
	public String evaluate(boolean s) {
		return "HelloWorld";
	}
}

export成jar檔案

在Hive中匯入jar檔案:

add jar /path/name.jar;

在Hive中建立臨時函式:

create temporary function hello as 'cn.pywei.HiveUDF.HelloWorld';

執行:

select hello(1);
select hello('abc');
select hello(True);
describe function hello;
describe function extended hello;

此外還可以用以下命令操作jar包:

list jar;
delete jar /path/name.jar;
delete jar; --delete all jar;

maven的一些小問題:

Go to Project => check Build automatically and Clean.

If this doesn't solve the problem..

Right click the "Maven Dependencies" => "Build Path" => "Remove from the build path";
Right click the project, go to "Maven" => "Update project";

Pom.xml中scope項 compile 和 provided 的區別?

Dependency scope is used to limit the transitivity of a dependency, and also to affect the classpath used for various build tasks.

There are 6 scopes available:

  • compile
    This is the default scope, used if none is specified. Compile dependencies are available in all classpaths of a project. Furthermore, those dependencies are propagated to dependent projects.
  • provided
    This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.
  • runtime
    This scope indicates that the dependency is not required for compilation, but is for execution. It is in the runtime and test classpaths, but not the compile classpath.
  • test
    This scope indicates that the dependency is not required for normal use of the application, and is only available for the test compilation and execution phases. This scope is not transitive.
  • system
    This scope is similar to provided except that you have to provide the JAR which contains it explicitly. The artifact is always available and is not looked up in a repository.
  • import (only available in Maven 2.0.9 or later)
    This scope is only supported on a dependency of type pom in the <dependencyManagement> section. It indicates the dependency to be replaced with the effective list of dependencies in the specified POM's <dependencyManagement> section. Since they are replaced, dependencies with a scope of import do not actually participate in limiting the transitivity of a dependency.

參考連結:

https://blog.csdn.net/u010376788/article/details/50532166

https://www.jianshu.com/p/7ebc8f9c9b78

http://www.crazyant.net/2160.html