1. 程式人生 > >Hive安裝與簡單使用並整合SparkSQL

Hive安裝與簡單使用並整合SparkSQL

## Hive環境搭建
1. hive下載:http://archive-primary.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz
wget http://archive-primary.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz

2. 解壓
tar -zxvf hive-1.1.0-cdh5.7.0.tar.gz -C ../apps/


3. 系統環境變數(vim ~/.bash_profile)
```
export HIVE_HOME=/root/apps/hive-1.1.0-cdh5.7.0
export PATH=$HIVE_HOME/bin:$PATH
source ~/.bash_profile
```

4. 配置

```
4.1 $HIVE_HOME/conf/hive-env.sh 中匯出Hadoop_Home
4.2 拷貝mysql 驅動架包到$HIVE_HOME/lib
```

4.3 vim hive-site.xml

```
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://spark003:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
</configuration>
```

5. 啟動Hive: $HIVE_HOME/bin/hive


## Hive的基本使用
建立表

> create table test_table(name string);

載入本地資料到hive表【local方式】

> load data local inpath '/home/hadoop/data/hello.txt' into table test_table;

查詢,統計,詞頻的個數:
select * from test_table;

> select word, count(1) from test_table lateral view explode(split(name),'\t') wc as word group by word;


### 小案例
create table emp(
empno int,
ename string,
job string,
mgr int,
sal double,
comm double,
deptno int
)row format delimited fields terminated by '\t';

create table dept(
deptno int,
dname string,
location string
)row format delimited fields terminated by '\t';


load data local inpath '/home/hadoop/data/emp.txt' into table emp;
load data local inpath '/home/hadoop/data/dept.txt' into table dept;

統計分析:
求每個部門的人數:
select deptno,count(1) from emp group by deptno;

## Spark SQL 與Hive整合(spark-shell)

1. 將hive的配置檔案hive-site.xml拷貝到spark conf目錄,同時新增metastore的url配置。

```
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://spark001:9083</value>
</property>
</configuration>
```
2. mysql jar包到 spark 的 lib 目錄下

```
[[email protected] lib]# pwd
/root/apps/spark-2.2.0-bin-2.6.0-cdh5.7.0/lib
[[email protected] lib]# ll
total 972
-rw-r--r--. 1 root root 992805 Oct 23 23:59 mysql-connector-java-5.1.41.jar

```

3. 修改spark-env.sh 檔案中的配置

操作: vim spark-env.sh,新增如下內容:

```
export JAVA_HOME=/root/apps/jdk1.8.0_144
export SPARK_HOME=/root/apps/spark-2.2.0-bin-2.6.0-cdh5.7.0
export SCALA_HOME=/root/apps/scala-2.11.8
#新新增下面的這一條
export HADOOP_CONF_DIR=/root/apps/spark-2.2.0-bin-2.6.0-cdh5.7.0/etc/hadoop
```
4. 啟動服務
啟動hadoop start-all.sh
啟動saprk start-all.sh
啟動mysql元資料庫 service mysqld restart
啟動hive metastore服務 hive --service metastore
啟動hive命令列 hive
啟動spark-shell命令列 spark-shell

5. 簡單測試
建立本地檔案 test.csv,內容如下:
0001,spark
0002,hive
0003,hbase
0004,hadoop
> 執行hive命令:

hive> show databases;
hive> create database databases1;
hive> create table if not exists test(userid string,username string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS textfile;
hive> load data local inpath "/root/test.csv" into table test;
hive>select * from test;

> 執行Spark-shell命令:

spark.sql("select * from databases1.test").show