1. 程式人生 > >spark scala maven簡單建立工程與提交任務到yarn

spark scala maven簡單建立工程與提交任務到yarn

第一步 :使用idea和maven開發和打包scala和spark程式
參考:https://blog.csdn.net/xingyx1990/article/details/80752041
(注意:我自身採用mvn命令打包的方式打包:mvn clean compile package

第二步:其中的maven工程中需要配置java+scala的jar包的打包方式

如下,是我的spark的maven工程的pom.xml檔案
(如下的spark只是用來做spark歷史批處理測試使用,具體情況具體分析)

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>cetcocean</groupId> <artifactId>SparkTest_Tank</artifactId> <packaging>jar<
/packaging> <version>1.0-SNAPSHOT</version> <inceptionYear>2008</inceptionYear> <properties> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <encoding>UTF-8</encoding>
<scala.version>2.11.12</scala.version> <spark.version>2.1.2</spark.version> <hadoop.version>2.7.2</hadoop.version> </properties> <repositories> <repository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </pluginRepository> </pluginRepositories> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.4</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs</groupId> <artifactId>specs</artifactId> <version>1.2.5</version> <scope>test</scope> </dependency> </dependencies> <!--使用構建工具:打包jar--> <build> <!-- <sourceDirectory>src/main/java</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory>--> <plugins> <!--(start) for package jar with dependencies --> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <configuration> <archive> <manifest> <mainClass>cetcocean.TestLoadDbData</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <!-- this is used for inheritance merges --> <phase>package</phase> <!-- bind to the packaging phase --> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <!--(end) for package jar with dependencies --> <!--scala的maven打包工具:--> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> <configuration> <scalaVersion>${scala.version}</scalaVersion> <args> <arg>-target:jvm-1.8</arg> </args> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-eclipse-plugin</artifactId> <configuration> <downloadSources>true</downloadSources> <buildcommands> <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand> </buildcommands> <additionalProjectnatures> <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature> </additionalProjectnatures> <classpathContainers> <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer> <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer> </classpathContainers> </configuration> </plugin> </plugins> </build> <reporting> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <configuration> <scalaVersion>${scala.version}</scalaVersion> </configuration> </plugin> </plugins> </reporting> </project>

參考:https://blog.csdn.net/lukabruce/article/details/81335205

第三步:提交spark應用到yarn叢集
參考如下:https://www.cnblogs.com/LHWorldBlog/p/8414342.html
我的命令:

./spark-submit 
--master yarn 
--deploy-mode client 
--queue "spark" 
--class cetcocean.TestLoadDbData  
SparkTest_Tank-1.0-SNAPSHOT-jar-with-dependencies.jar

不足之處:需要新增spark-submit的優化引數。
例如:

./spark-submit 
--master yarn 
--deploy-mode client     #yarn上叢集client模式部署
--queue "spark"    # 佇列名稱。在yarn下使用
--num-executors 48    #啟動的executor數量。預設為2。在yarn下使用
 --driver-memory 2g   # 配置成1-2g即可,一半隻是負責協調分派任務
 --executor-memory 7g    # 沒過executor記憶體數量
 --executor-cores 3   #每個executor的核數。在yarn或者standalone下使用
SparkTest_Tank-1.0-SNAPSHOT-jar-with-dependencies.jar