1. 程式人生 > 其它 >spark將hive表結果儲存至mysql表中BigDecimal精度問題解決。

spark將hive表結果儲存至mysql表中BigDecimal精度問題解決。

技術標籤:Spark

問題描述:

hive表結果dataFrame 將row轉case時精度轉換時報錯:
Cannot up cast xxx from decimal(29,2) to decimal(38,18) as it may truncate

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast `zskpje` from decimal(29,2) to decimal(38,18) as it may truncate
The type path of the target object is:
- field (
class: "scala.math.BigDecimal", name: "zskpje") - root class: "com.xxx.bean.Inovice_Monthly" You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object; at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:2292) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$37$$anonfun$applyOrElse$15.applyOrElse(Analyzer.scala:2308) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun
$apply$37$$anonfun$applyOrElse$15.applyOrElse(Analyzer.scala:2303) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

錯誤原因:

 val result = DwdDataDao.getmonthlyStatisticsData(sparkSession: SparkSession).as[Inovice_Monthly]
  1. 查詢hive結果 想將row轉成case儲存 生成DataSet

  2. 輸出結果dataframe schema

root
 |-- NSR_SBH: string (nullable = true)
 |-- INVOICE_TYPE: decimal(10,0) (nullable = true)
 |-- TAX_RATE: decimal(18,2) (nullable = true)
 |-- zskpje: decimal(29,2) (nullable = true)
	......

而我們建立的case類(Inovice_Monthly)為BigDecimal 預設為(38,18)
想讓 DecimalType(10,0)->DecimalType(38,18) 或DecimalType(29,2)->DecimalType(38,18)顯然都是不可行的

case class Inovice_Monthly(
                              NSR_SBH: String,
                              INVOICE_TYPE: BigDecimal,
                              TAX_RATE: BigDecimal,
                              zskpje: BigDecimal,
								......
                            )

Spark case class開發人員認為從scala推斷模式很方便,他們選擇不支援允許程式設計師為Decimal或中的BigDecimal型別指定精度和小數位數case class 請參閱https://issues.apache.org/jira/browse/SPARK-18484

解決方法:

本人是將case類中BigDecimal型別改為Double 然後將結果集每列對應修改型別。

 result.withColumn("INVOICE_TYPE", result("INVOICE_TYPE").cast(DoubleType))
      .withColumn("TAX_RATE", result("TAX_RATE").cast(DoubleType))
      .withColumn("zskpje", result("zskpje").cast(DoubleType))
		......
      .as[Inovice_Monthly]

row轉成case無報錯
在這裡插入圖片描述