如何將Column.isin與List使用(判斷column中的值是否在List中)--filter(Column.isin(List))
阿新 • • 發佈:2018-11-19
spark datafream 中某列的值進行過濾
val items = List("a", "b", "c")
sqlContext.sql("select c1 from table")
.filter($"c1".isin(items))
.collect
.foreach(println)
直接傳入list時報錯:
Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(a, b, c) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49) at org.apache.spark.sql.functions$.lit(functions.scala:89) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.Column.isin(Column.scala:642)
根據文件,isin
採取vararg,而不是列表。List在這裡實際上是一個別名。你可以嘗試將List轉換為vararg,如下所示:
val items = List("a", "b", "c")
sqlContext.sql("select c1 from table")
.filter($"c1".isin(items:_*))
.collect
.foreach(println)