1. 程式人生 > 其它 >scala處理json(針對json中陣列巢狀陣列,針對json中value資料型別不同,針對json中map的key不一定存在)

scala處理json(針對json中陣列巢狀陣列,針對json中value資料型別不同,針對json中map的key不一定存在)

技術標籤:scala-spark

目的:

解決json很不規範 key不一定存在 value資料型別不一定相同 等多種問題

處理巢狀json 處理不規範json 模式匹配

參考文件(play-json)

pom

        <!-- https://mvnrepository.com/artifact/com.typesafe.play/play-json -->
        <dependency>
            <groupId>com.typesafe.play</groupId>
            <artifactId>play-json_${scala.binary.version}</artifactId>
            <version>2.7.4</version>
        </dependency>

案例

package JsonParse

import play.api.libs.json._

import scala.collection.mutable.{ArrayBuffer, ListBuffer} // 使用play-json模組 stackoverflow推薦的好用的文件: https : //www.playframework.com/documentation/2.8.x/ScalaJson

object JsonTest {
  /*
  目的:獲取裡面 name is_active completeness的值組成的三元組
  如果欄位不全使用空字串填充
  注意: 原始資料不規範,存在 陣列巢狀(root下陣列中的language中又是陣列) value型別不同等多種情況(lanuage下有的是map有的是字串)

  核心思想: 使用match配合 validate 驗證資料型別 驗證該key是否存在來處理資料 不匹配統統給空陣列 最終結果中filter刪除空陣列導致的結果

   */


  def main(args: Array[String]): Unit = {

    val jsonString =
      """
         {
         "root":[
      {
        "languages": [
        {
            "name": "English",
            "is_active": "true",
            "completeness": "asdf"
        },
        {"aa":"我是來干擾的map"}
        ,
        {
            "name": "Latin",
            "is_active": "asdf",
            "completeness": "232"
        }
            ,{
                "name": "Latin",
                "is_active": "0009"
            }
                  ]
      },
       {
        "languages": [
        {
            "name": "English1",
            "is_active": "true1",
            "completeness": "asdf1"
        },
        {
            "name": "Latin1",
            "is_active": "asdf1",
            "completeness": "2321"
        },
        "我是來干擾的字串"
        ,
        {
                "name": "Latin1",
                "is_active": "00091"
        }
         ]
      },
      {
      "notLanguage":"部分map不存在language的情況"
      }
]
}
    """.stripMargin

    val listb: ListBuffer[Tuple3[String, String, String]] = ListBuffer.empty

    val json = Json.parse(jsonString)
    val list1 = (json \ "root").as[Seq[JsValue]]
    list1.foreach(
      root2list => {

        val list0 = (root2list \ "languages").validate[JsArray] match {
          // 篩選獲取不到 language或者返回型別不是陣列的為 空JSArray
          case JsSuccess(v, p) => v
          case _ => JsArray.empty
        }
        // 陣列轉換為Seq
        val list = list0.as[Seq[JsValue]]


        val names = list.map(x => ((x \ "name").validate[String] match {
          case JsSuccess(v, p) => v
          case _ => ""
        }
          ))

        val isActives = list.map(x => ((x \ "is_active").validate[String] match {
          case JsSuccess(v, p) => v
          case _ => ""
        }
          ))

        val completeness = list.map(x => ((x \ "completeness").validate[String] match {
          case JsSuccess(v, p) => v
          case _ => ""
        }
          ))

        val res = for (idx <- 0 until list.length) yield (names(idx), isActives(idx), completeness(idx))
        val res1 = res.toList
        listb ++= res.toList

      }
    )


    println(listb.filter(!_._1.equals("")))


  }

}

輸出結果

ListBuffer((English,true,asdf), (Latin,asdf,232), (Latin,0009,), (English1,true1,asdf1), (Latin1,asdf1,2321), (Latin1,00091,))