1. 程式人生 > >編譯spark2.X原始碼,引數說明

編譯spark2.X原始碼,引數說明

編譯spark2.X原始碼

這裡我們使用原始碼包中自帶的make-distribution.sh檔案進行編譯。當然在編譯之前你可以試著修改一些原始碼。 
在spark原始碼目錄下執行

./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,-Dscala-2.11" -rf :spark-repl_2.11
./dev/make-distribution.sh --name "hadoop2-hive" --tgz "-Pyarn,-Phive,hadoop-provided,hadoop-2.7,parquet-provided,-Dscala-2.11" -rf :spark-repl_2.11

引數解釋: 
-DskipTests,不執行測試用例,但編譯測試用例類生成相應的class檔案至target/test-classes下。 
-Dhadoop.version 和-Phadoop: Hadoop 版本號,不加此引數時hadoop 版本為1.0.4 。 
-Pyarn :是否支援Hadoop YARN ,不加引數時為不支援yarn 。 
-Phive和-Phive-thriftserver:是否在Spark SQL 中支援hive ,不加此引數時為不支援hive 。 
–with-tachyon :是否支援記憶體檔案系統Tachyon ,不加此引數時不支援tachyon 。 
–tgz :在根目錄下生成 spark-$VERSION-bin.tgz ,不加此引數時不生成tgz 檔案,只生成/dist 目錄。

–name :和–tgz結合可以生成spark-$VERSION-bin-$NAME.tgz的部署包,不加此引數時NAME為hadoop的版本號。

這樣大概要等二十分鐘到一個多小時不等,主要取決於網路環境,因為要下載一些依賴包之類的。之後你就可以獲得一個spark編譯好的包了,解壓之後就可以部署到機器上了。

執行以下命令,會在spark-2.0.2下生成檔案 spark-2.0.2-bin-hadoop2-with-hive.tgz

[[email protected] spark-2.0.2]# ./dev/change-scala-version.sh 2.11
[[email protected]
spark-2.0.2]# ./dev/make-distribution.sh --name "hadoop2-with-hive" --tgz "-Pyarn,-Phive,hadoop-provided,hadoop-2.7,parquet-provided"
   
main:
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [ 19.119 s]
[INFO] Spark Project Tags ................................. SUCCESS [  7.630 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  6.463 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 19.845 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 13.890 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 13.337 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 23.115 s]
[INFO] Spark Project Core ................................. SUCCESS [03:42 min]
[INFO] Spark Project GraphX ............................... SUCCESS [ 26.100 s]
[INFO] Spark Project Streaming ............................ SUCCESS [01:07 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:36 min]
[INFO] Spark Project SQL .................................. SUCCESS [03:26 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 14.402 s]
[INFO] Spark Project ML Library ........................... SUCCESS [02:54 min]
[INFO] Spark Project Tools ................................ SUCCESS [  3.691 s]
[INFO] Spark Project Hive ................................. SUCCESS [01:32 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 11.372 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 16.772 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 27.160 s]
[INFO] Spark Project Assembly ............................. SUCCESS [  5.484 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 22.666 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 22.288 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [  5.101 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 21.637 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 42.329 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [  8.713 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 22.547 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  7.028 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 18.807 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 21:43 min
[INFO] Finished at: 2018-01-25T11:13:06+08:00
[INFO] Final Memory: 76M/327M
[INFO] ------------------------------------------------------------------------
+ rm -rf /opt/spark-2.0.2/dist
+ mkdir -p /opt/spark-2.0.2/dist/jars
+ echo 'Spark 2.0.2 built for Hadoop 2.7.3'