Spark Job Server 0.7.0部署和使用
##安裝Scala
在Scala官網下載合適的版本 解壓到/usr/local/scala目錄下(目錄可隨意修改) 在linux下加入環境變數
export PATH="$PATH:/usr/scala/bin"
輸入scala檢查是否安裝成功
##手動安裝sbt
在官網下載sbt,可以用zip或tgz 解壓到/usr/local/sbt目錄下 在/usr/local/sbt目錄下新建sbt檔案
cd /usr/local/sbt
vi sbt
輸入以下內容(-XX:MaxPermSize=256M 在JAVA 1.8可以取消):
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar " [email protected]"
配置倉庫
vi ~/.sbt/repositories
輸入以下內容
[repositories] local aliyun-nexus: http://maven.aliyun.com/nexus/content/groups/public/ #或者oschina: http://maven.oschina.net/content/groups/public/ jcenter: http://jcenter.bintray.com/ typesafe-ivy-releases: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly maven-central: http://repo1.maven.org/maven2/
配置環境變數
export SBT_HOME=/usr/local/sbt
export PATH=$PATH:$SBT_HOME
輸入sbt命令檢查是否安裝成功 sbt第一次執行時會自動下載包,等出現sbt控制檯即配置完成
sbt:sbt>
##部署spark-jobserver ###配置
cd /home/hadoop/application/spark-jobserver/conf
cp local.conf.template local.conf
cp local.sh.template local.sh
#!/usr/bin/env bash # Environment and deploy file # For use with bin/server_deploy, bin/server_package etc. #ssh遠端部署host,可以使用ip DEPLOY_HOSTS="dashuju213 dashuju214" #ssh安裝時使用者名稱和使用者組 APP_USER=hadoop APP_GROUP=hadoop JMX_PORT=9999 # optional SSH Key to login to deploy server #SSH_KEY=/path/to/keyfile.pem # deploy安裝目錄 INSTALL_DIR=/home/hadoop/application/jobserver # 日誌目錄 LOG_DIR=/home/hadoop/application/jobserver/logs PIDFILE=spark-jobserver.pid JOBSERVER_MEMORY=1G #SPARK版本 SPARK_VERSION=2.3.2 MAX_DIRECT_MEMORY=512M #SPARK目錄 SPARK_HOME=/home/hadoop/application/spark SPARK_CONF_DIR=$SPARK_HOME/conf #SCALA版本 SCALA_VERSION=2.11.12
配置資料庫 vi local.conf,只列出需要修改的配置
# also add the following line at the root level.
flyway.locations="db/mysql/migration"
spark {
# local[...], yarn, mesos://... or spark://...
master = "spark://dashuju213:6066,dashuju214:6066"
# client or cluster deployment
submit.deployMode = "cluster"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 2
jobserver {
...
sqldao {
# Slick database driver, full classpath
slick-driver = slick.driver.MySQLDriver
# JDBC driver, full classpath
jdbc-driver = com.mysql.jdbc.Driver
jdbc {
url = "jdbc:mysql://db_host/spark_jobserver"
user = "jobserver"
password = "secret"
}
dbcp {
maxactive = 20
maxidle = 10
initialsize = 10
}
}
}
}
配置ssh免密登入 配置ssh埠,預設使用了22埠,可以根據需要修改 vi server_deploy.sh
for host in $DEPLOY_HOSTS; do
# We assume that the deploy user is APP_USER and has permissions
ssh -p 2222 -o StrictHostKeyChecking=no $ssh_key_to_use ${APP_USER}@$host mkdir -p $INSTALL_DIR
scp -P 2222 -o StrictHostKeyChecking=no $ssh_key_to_use $FILES ${APP_USER}@$host:$INSTALL_DIR/
scp -P 2222 -o StrictHostKeyChecking=no $ssh_key_to_use "$CONFIG_DIR/$ENV.conf" ${APP_USER}@$host:$INSTALL_DIR/
scp -P 2222 -o StrictHostKeyChecking=no $ssh_key_to_use "$configFile" ${APP_USER}@$host:$INSTALL_DIR/settings.sh
done
###部署
進入bin目錄下,執行部署命令
./server_deploy.sh local
執行完成後,進入INSTALL_DIR中的目錄,使用server_start.sh和server_stop.sh進行啟停
###遇到的問題 ####啟動問題
因為我在spark-defult.xml中配置了master和deployMode,因此需要修改server_start.sh,改為需要的方式
cmd='$SPARK_HOME/bin/spark-submit --master local[1] --deploy-mode
####資料庫初始化失敗
修改spark-jobserver\spark-jobserver-master\job-server\src\main\resources\db\mysql\migration\V0_7_2\V0_7_2__convert_binaries_table_to_use_milliseconds.sql 可以重新執行部署命令或直接修改jar包中檔案
ALTER TABLE `BINARIES` MODIFY COLUMN `UPLOAD_TIME` TIMESTAMP;
Validate failed. Migration Checksum mismatch for migration 0.7.2 由於初始化失敗造成,刪除資料庫下所有表,重新初始化
####java.lang.ClassNotFoundException: akka.event.slf4j.Slf4jLogger
修改project/Dependencies.scala
"com.typesafe.akka" %% "akka-slf4j" % akka % "provided",
...
"io.spray" %% "spray-routing" % spray,
改為
"com.typesafe.akka" %% "akka-slf4j" % akka,
...
"io.spray" %% "spray-routing-shapeless23" % "1.3.4",
project/Versions.scala 新增
lazy val mysql = "5.1.42"
###使用 ####執行spark-sql
修改local.conf
spark {
jobserver {
# Automatically load a set of jars at startup time. Key is the appName, value is the path/URL.
job-binary-paths { # NOTE: you may need an absolute path below
sql = job-server-extras/target/scala-2.10/job-server-extras_2.10-0.6.2-SNAPSHOT-tests.jar
}
}
contexts {
sql-context {
num-cpu-cores = 1 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
context-factory = spark.jobserver.context.HiveContextFactory
}
}
}