1. 程式人生 > >spark.yarn.jar和spark.yarn.archive的使用

spark.yarn.jar和spark.yarn.archive的使用

啟動Spark任務時,在沒有配置spark.yarn.archive或者spark.yarn.jars時, 會看到不停地上傳jar非常耗時;使用spark.yarn.archive可以大大地減少任務的啟動時間,整個處理過程如下

1.在本地建立zip檔案

[email protected]:~/env/spark$ cd jars/
[email protected]:~/env/spark/jars$ zip spark2.1.1-hadoop2.7.3.zip ./*

2.上傳至HDFS並更改許可權

[email protected]:~/env/spark$ hdfs dfs -mkdir /tmp/spark-archive
[email protected]
:
~/env/spark$ hdfs dfs -put ./spark2.1.1-hadoop2.7.3.zip /tmp/spark-archive [email protected]:~/env/spark$ hdfs dfs -chmod 775 /tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip

3.配置spark-defaut.conf

  hdfs:///tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip

可以參考日誌如下:

17/08/10 14:59:27 INFO Client: To enable the AM to login from
keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache. 17/08/10 14:59:27 INFO Client: Uploading resource file:/etc/security/keytabs/hive.service.keytab -> hdfs://hz-test-01/user/hive/.sparkStaging/application_1500533600435_2825/hive.service.keytab 17/08/10 14:59:27 INFO Client: Source and destination file systems are the same. Not copying hdfs:/tmp/spark-archive/spark2.1.1
-hadoop2.7.3.zip 17/08/10 14:59:27 INFO Client: Uploading resource file:/home/hzlishuming/env/spark-2.1.1/local/spark-6606333c-1e5b-462c-ad39-aaf75251c246/__spark_conf__2962372142699552959.zip -> hdfs://hz-test-01/user/hive/.sparkStaging/application_1500533600435_2825/__spark_conf__.zip