1. 程式人生 > >SparkSQL配置(HIVE作為資料來源)

SparkSQL配置(HIVE作為資料來源)

HIVE的配置(以mysql做為元資料的儲存,hdfs作為資料的儲存):

1.修改 hive-env.sh  (可以從hive-default.xml.template拷貝修改)

#hadoop的主目錄
export HADOOP_HOME=/usr/local/hadoop
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/hive/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
2.修改 hive-site.xml(可以參考hive-default.xml.template修改)
#此處主要配置與mysql相關資訊
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>youpassword</value>
    <description>password to use against metastore database</description>
  </property>
 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
<span style="font-family: Arial, Helvetica, sans-serif;">至此hive基本配置完畢</span>
<span style="font-family: Arial, Helvetica, sans-serif;">然後啟動./HIVE_HOME/bin/hive 看是否能啟動成功!</span>
<span style="font-family: Arial, Helvetica, sans-serif;">-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</span>
配置spark
1.修改spark-env.sh
#記憶體根據自己的機器配置,注意:太配置小了,執行會出現no resource。。。。。。,
export SCALA_HOME=/usr/local/spark
export JAVA_HOME=/usr/local/jdk1.8.0
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=800m
export SPARK_EXECUTOR_MEMORY=800m
export SPARK_DRIVER_MEMORY=800m
export SPARK_WORKER_CORES=4
export MASTER=spark://master:7077
2.配置spark-defaults.conf
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two thr"
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://master:9000/historyserverforSpark
#可以用來檢視spark的歷史執行任務 web UI
spark.yarn.historyServer.address        master:18080
spark.history.fs.logDirectory   hdfs://master:9000/historyserverforSpark 
3.配置slaves(配置了兩個work節點)
slave1
slave2
-------------------------------------------------------
在spark/conf中配置新增hive-site.xml,內容如下
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>hive.metastore.uris</name>  
    <value>thrift://master:9083</value>  
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> 
</property>


</configuration>
4.啟動 hive的元資料
 hive --servie meatastore
5. 啟動sparkSQL
./bin/spark-bin