spark 從Rdd 構造df 報錯
阿新 • • 發佈:2019-02-11
問題一
如下報錯,
第4行定義的action_time 的形式如下:
StructField("action_time", StringType, nullable = false)
即不允許為空,但是轉換的資料中有對應不上的。所以會報錯。
Caused by: java.lang.RuntimeException: The 4th field 'action_time' of input row cannot be null.
這裡的4th 其實指的是第五列的內容。
問題二
如下報錯:
Caused by: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of string
spark 的schema 定義和對應的資料對應不上,所以報錯。
val schema = Array(
StructField("session_id",StringType, nullable = false),
StructField("user_id", LongType, nullable = false),
StructField("page_id", LongType, nullable = false),
StructField("action_time" , StringType, nullable = false),
StructField("search_keyword", StringType, nullable = true),
StructField("click_category_id", LongType, nullable = true),
StructField("click_procduct_id", LongType, nullable = true),
StructField("order_category_ids", StringType, nullable = true ),
StructField("order_product_ids", StringType, nullable = true),
StructField("pay_category_ids", StringType, nullable = true),
StructField("pay_product_ids", StringType, nullable = true),
StructField("date", StringType, nullable = false)
)