SparkSQL配置(HIVE作為資料來源)
阿新 • • 發佈:2019-02-12
HIVE的配置(以mysql做為元資料的儲存,hdfs作為資料的儲存):
1.修改 hive-env.sh (可以從hive-default.xml.template拷貝修改)
#hadoop的主目錄
export HADOOP_HOME=/usr/local/hadoop # Hive Configuration Directory can be controlled by: export HIVE_CONF_DIR=/usr/local/hive/conf # Folder containing extra ibraries required for hive compilation/execution can be controlled by: export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
2.修改 hive-site.xml(可以參考hive-default.xml.template修改)
#此處主要配置與mysql相關資訊
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>youpassword</value>
<description>password to use against metastore database</description>
</property>
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>Username to use against metastore database</description> </property>
<span style="font-family: Arial, Helvetica, sans-serif;">至此hive基本配置完畢</span>
<span style="font-family: Arial, Helvetica, sans-serif;">然後啟動./HIVE_HOME/bin/hive 看是否能啟動成功!</span>
<span style="font-family: Arial, Helvetica, sans-serif;">-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</span>
配置spark
1.修改spark-env.sh
#記憶體根據自己的機器配置,注意:太配置小了,執行會出現no resource。。。。。。,
export SCALA_HOME=/usr/local/spark
export JAVA_HOME=/usr/local/jdk1.8.0
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=800m
export SPARK_EXECUTOR_MEMORY=800m
export SPARK_DRIVER_MEMORY=800m
export SPARK_WORKER_CORES=4
export MASTER=spark://master:7077
2.配置spark-defaults.conf
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two thr"
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/historyserverforSpark
#可以用來檢視spark的歷史執行任務 web UI
spark.yarn.historyServer.address master:18080
spark.history.fs.logDirectory hdfs://master:9000/historyserverforSpark
3.配置slaves(配置了兩個work節點)
slave1
slave2
-------------------------------------------------------
在spark/conf中配置新增hive-site.xml,內容如下
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
</configuration>
4.啟動 hive的元資料
hive --servie meatastore
5. 啟動sparkSQL
./bin/spark-bin