1. 程式人生 > >spark 從Rdd 構造df 報錯

spark 從Rdd 構造df 報錯

問題一

如下報錯,
第4行定義的action_time 的形式如下:
 StructField("action_time", StringType, nullable = false)
 即不允許為空,但是轉換的資料中有對應不上的。所以會報錯。
Caused by: java.lang.RuntimeException: The 4th field 'action_time' of input row cannot be null.
這裡的4th 其實指的是第五列的內容。

問題二

如下報錯:
Caused by: java.lang.RuntimeException: java.lang.Long
is not a valid external type for schema of string spark 的schema 定義和對應的資料對應不上,所以報錯。 val schema = Array( StructField("session_id",StringType, nullable = false), StructField("user_id", LongType, nullable = false), StructField("page_id", LongType, nullable = false), StructField("action_time"
, StringType, nullable = false), StructField("search_keyword", StringType, nullable = true), StructField("click_category_id", LongType, nullable = true), StructField("click_procduct_id", LongType, nullable = true), StructField("order_category_ids", StringType, nullable = true
), StructField("order_product_ids", StringType, nullable = true), StructField("pay_category_ids", StringType, nullable = true), StructField("pay_product_ids", StringType, nullable = true), StructField("date", StringType, nullable = false) )