1. 程式人生 > >Null value appeared in non-nullable field java.lang.NullPointerException

Null value appeared in non-nullable field java.lang.NullPointerException

報錯

Null value appeared in non-nullable field
java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).

dataset schema

root
 |-- window: long (nullable = false)
 |-- linkId: long (nullable = false)
 |-- mapVersion: integer (nullable = false)
 |-- passthrough: long (nullable = false)
 |-- resident: long (nullable = false)
 |-- driverId: string (nullable = true)
 |-- inLink: map (nullable = true)
 |    |-- key: long
 |    |-- value: integer (valueContainsNull = false)
 |-- outLink: map (nullable = true)
 |    |-- key: long
 |    |-- value: integer (valueContainsNull = false)

報錯原因

有些不可以為null的欄位被賦值為null了

解決辦法

1、過濾為這些欄位為null的資料

2、將欄位宣告為可以為null的型別

例子

val path: String = ???

val peopleDF = spark.read
  .option("inferSchema","true")
  .option("header", "true")
  .option("delimiter", ",")
  .csv(path)

peopleDF.printSchema

輸出為: 

root
|-- name: string (nullable = true)
|-- age: long (nullable = false)
|-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show

輸出為:

+----+----+----+
|name| age|stat|
+----+----+----+
| xyz|null|   s|
+----+----+----+

接下來將Dataset[Row] 轉換為 Dataset[Person]

val peopleDS = peopleDF.as[Person]

peopleDS.printSchema

執行如下程式碼

peopleDS.where($"age" > 30).show

結果

+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+

sql認為null是有效值

執行如下程式碼

peopleDS.filter(_.age > 30)

報上面的錯誤

原因是因為scala中Long型別不能為null

解決辦法,用Option類

case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))

結果

+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+