自己的HADOOP平臺(三):Mysql+hive遠端模式+Spark on Yarn
Spark和hive配置較為簡單,為了方便Spark對資料的使用與測試,因此在搭建Spark on Yarn模式的同時,也把Mysql + Hive一起搭建完成,並且配置Hive對Spark的支援,讓Spark也能像Hive一樣操作資料。
前期準備
scala-2.11.11.tgz
spark-2.1.1-bin-hadoop2.7.tar.gz
hive-1.2.1.tar.gz
mysql-connector-java-5.1.43-bin.jar
安裝MySQL
通過yum 安裝MySQL
MySQL因為只用來儲存hive的元資料,因此只用在一個節點上安裝就好
1、下載MySQL的repo源
wget http://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
2、安裝mysql源
yum localinstall mysql57-community-release-el7-11.noarch.rpm
3、檢查源是否安裝成功
yum repolist enabled | grep "mysql.*-community.*"
4、安裝mysql
yum install mysql-community-server
5、啟動mysql
systemctl start mysqld
6、檢視mysql狀態
systemctl status mysqld
出現active (running)表示成功
7、設定開機啟動mysql
systemctl enable mysqld
systemctl daemon-reload
8、修改root本地登入密碼
//生成預設密碼,然後登入後修改
grep 'temporary password' /var/log/mysqld.log
mysql -uroot -p
//修改全域性引數以便修改密碼
//檢查是否安裝validate_password外掛
SHOW VARIABLES LIKE 'validate_password%' ;
//修改validate_passwhiord_policy引數的值
set global validate_password_policy=0;
//設定root賬戶密碼
set password for 'root'@'localhost'=password('rootroot');
9、新增遠端登入使用者
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'rootroot' WITH GRANT OPTION;
10、配置預設編碼為utf-8
//修改/etc/my.cnf 在[mysqld]下新增編碼
character_set_server=utf8
init_connect='SET NAMES utf8'
HIVE安裝
在master1節點上
1、建立hdfs目錄並賦予許可權
這幾步必須做,否則後面指定hive元資料庫的時候回出錯
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/tmp
hdfs dfs -mkdir -p /user/hive/log
hdfs dfs -chmod 777 /user/hive/warehouse
hdfs dfs -chmod 777 /user/hive/tmp
hdfs dfs -chmod 777 /user/hive/log
增加環境變數
export HIVE_HOME=/usr/local/hive-1.2.1
export HIVE_CONF_DIR=/usr/local/hive/conf
2、建立mysql資料庫資訊並指定元資料庫
//登入mysql,建立一個數據庫命令為hive
create database hive;
//建立hive使用者,並賦予所有的許可權
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'rootroot';
GRANT ALL PRIVILEGES ON *.* TO hive IDENTIFIED BY 'ROOTROOT' WITH GRANT OPTION;
//將mysql的JDBC驅動包拷貝到hive的安裝目錄的lib目錄中
3、遠端模式的服務端配置(master節點)
修改hive-site.xml配置
vim /usr/local/hive-1.2.1/conf/hive-site.xml
//具體配置如下
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>rootroot</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/hive-1.2.1/iotmp/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/hive-1.2.1/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/hive-1.2.1/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/local/hive-1.2.1/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property>
</configuration>
4、其他節點作為客戶端(master1/slave1/slave2/slave3)
修改hive-site.xml配置
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
</property>
</configuration>
到這裡 hive的遠端模式就配置完成了。
測試一下hive是否正常啟動
//在master節點上啟動hive元資料服務
hive --service metastore &
//在master1節點上啟動hive
hive
hive 可以顯示資料
mysql儲存hive元資料資訊
HDFS儲存資料
對應的HDFS上的資料
hive功能執行正常
Spark on Yarn 配置
1、解壓spark包
//解壓到/usr/local/spark
tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz
mv spark-2.1.1-bin-hadoop2.7 /usr/local/spark
2、增加環境變數
vim ~/.bashrc
//增加
export SPARK_HOME=/usr/local/spark
//在PATH後面追加
%SPARK_HOME/bin:%SPARK_HOME/sbin
3、修改spark-env.sh配置檔案
//增加配置
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export JAVA_HOME=/usr/local/jdk1.8.0_144
export SPARK_HOME=/usr/local/spark
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_EXECUTOR_cores=1
export SPARK_WORKER_CORES=1
export SCALA_HOME=/usr/local/scala
測試一下通過spark on yarn
使用spark知道的SparkPi來測試,指定master為yarn
/usr/local/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 2 /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 5
也可以在yarn UI介面上看到Yarn為spark分配的application
Spark sql訪問hive資料
1、將master節點的hive的配置檔案hive-site.xml拷貝進入spark/conf目錄中
hive-site.xml內容
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNoExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>rootroot</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse<value>
</property>
2、修改spark-default.conf檔案
//在配置檔案裡面增加如下配置
spark.sql.warehouse.dir /user/spark/warehouse
3、將hive-site.xml 和 spark-default.conf兩個配置檔案傳送給其他的幾個節點
scp hive-site.xml hadoop@master1:/usr/local/spark/conf
scp hive-site.xml hadoop@slave1:/usr/local/spark/conf
scp hive-site.xml hadoop@slave2:/usr/local/spark/conf
scp hive-site.xml hadoop@slave3:/usr/local/spark/conf
scp spark-default.conf hadoop@master1:/usr/local/spark/conf
scp spark-default.conf hadoop@slave1:/usr/local/spark/conf
scp spark-default.conf hadoop@slave2:/usr/local/spark/conf
scp spark-default.conf hadoop@slave3:/usr/local/spark/conf
4、把mysql的驅動包放入spark/jars裡面
增加配置過後,就可以通過spark sql來操作hive資料庫了
測試一下spark sql 對hive的操作
spark能通過sql語句訪問,功能正常!
如果有什麼意見或者建議,請聯絡我,謝謝。