Null value appeared in non-nullable field java.lang.NullPointerException
阿新 • • 發佈:2019-08-27
報錯
Null value appeared in non-nullable field java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
dataset schema
root |-- window: long (nullable = false) |-- linkId: long (nullable = false) |-- mapVersion: integer (nullable = false) |-- passthrough: long (nullable = false) |-- resident: long (nullable = false) |-- driverId: string (nullable = true) |-- inLink: map (nullable = true) | |-- key: long | |-- value: integer (valueContainsNull = false) |-- outLink: map (nullable = true) | |-- key: long | |-- value: integer (valueContainsNull = false)
報錯原因
有些不可以為null的欄位被賦值為null了
解決辦法
1、過濾為這些欄位為null的資料
2、將欄位宣告為可以為null的型別
例子
val path: String = ???
val peopleDF = spark.read
.option("inferSchema","true")
.option("header", "true")
.option("delimiter", ",")
.csv(path)
peopleDF.printSchema
輸出為:
root |-- name: string (nullable = true) |-- age: long (nullable = false) |-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show
輸出為:
+----+----+----+
|name| age|stat|
+----+----+----+
| xyz|null| s|
+----+----+----+
接下來將Dataset[Row]
轉換為 Dataset[Person]
val peopleDS = peopleDF.as[Person]
peopleDS.printSchema
執行如下程式碼
peopleDS.where($"age" > 30).show
結果
+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+
sql認為null是有效值
執行如下程式碼
peopleDS.filter(_.age > 30)
報上面的錯誤
原因是因為scala中Long型別不能為null
解決辦法,用Option類
case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))
結果
+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+