Jupyter notebook執行Spark+Scala教程
今天在intellij除錯spark的時候感覺每次有新的一段程式碼,都要重新跑一遍,如果用spark-shell,感覺也不是特別方便,如果能像python那樣,使用jupyter notebook進行程式設計就很方便了,同時也適合程式碼展示,網上查了一下,試了一下,碰到了很多坑,有些是舊的版本,還有些是版本不同導致錯誤,這裡就記錄下來安裝的過程。
1.執行環境
硬體:Mac
事先裝好:Jupyter notebook,spark2.1.0,scala 2.11.8 (這個版本很重要,關係到後面的安裝)
2.安裝
2.1.scala kernel
從github下載
git clone https://github.com/jupyter-scala/jupyter-scala.git
進入下載的jupyter-scala目錄下,執行
bash jupyter-scala
然後檢視
jupyter kernelspec list
表示scala已經嵌入到jupyter notebook
2.2.spark kernel
這個也比較好裝,但是要注意版本問題,我們用的是toree來裝的,首先要安裝toree
網上的教程通常直接
pip install toree
但是這個下載的是0.1.0版本,該版本的話問題是,後面裝spark kernel後,在jupyter執行spark的時候,預設選的是scala2.10.4版本,會有以下的錯誤
[I 03:15:16.677 NotebookApp] Kernel started: 94a63354-d294-4de7-a12c-2e05905e0c45 Starting Spark Kernel with SPARK_HOME=/usr/local/spark 16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Kernel version: 0.1.0.dev8-incubating-SNAPSHOT 16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Scala version: Some(2.10.4) 16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - ZeroMQ (JeroMQ) version: 3.2.2 16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Initializing internal actor system Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet; at akka.actor.ActorCell$.<init>(ActorCell.scala:336) at akka.actor.ActorCell$.<clinit>(ActorCell.scala) at akka.actor.RootActorPath.$div(ActorPath.scala:185) at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465) at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:453) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78) at scala.util.Try$.apply(Try.scala:192) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84) at scala.util.Success.flatMap(Try.scala:231) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84) at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:585) at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:578) at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) at akka.actor.ActorSystem$.apply(ActorSystem.scala:109) at org.apache.toree.boot.layer.StandardBareInitialization$class.createActorSystem(BareInitialization.scala:71) at org.apache.toree.Main$$anon$1.createActorSystem(Main.scala:35) at org.apache.toree.boot.layer.StandardBareInitialization$class.initializeBare(BareInitialization.scala:60) at org.apache.toree.Main$$anon$1.initializeBare(Main.scala:35) at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:72) at org.apache.toree.Main$delayedInit$body.apply(Main.scala:40) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at org.apache.toree.Main$.main(Main.scala:24) at org.apache.toree.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [W 03:15:26.738 NotebookApp] Timeout waiting for kernel_info reply from 94a63354-d294-4de7-a12c-2e05905e0c45
這個錯誤太可怕了,就是版本不對,因為spark2.1.0對應的是scala2.11版本的
所以要用下面的方式下載0.2.0版本
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
後面就可以安裝spark kernel了
jupyter toree install --interpreters=Scala --spark_home=/usr/local/Cellar/apache-spark/2.1.0/libexec --user --kernel_name=apache_toree --interpreters=PySpark,SparkR,Scala,SQL
其中spark_home指的是你的spark的安裝目錄,記住這個安裝目錄必須到spark中有python之前,比如我的spark中的python(spark中的python資料夾,不是我們自己裝的那個)在 /usr/local/Cellar/apache-spark/2.1.0/libexec
檢視結果
jupyter kernelspec list
安裝成功
3.開啟jupyter notebook檢視效果
有這麼多選項,可以快樂的用jupyter notebook進行spark了
以上這篇Jupyter notebook執行Spark+Scala教程就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支援我們。