用spark分析北京積分落戶資料,按使用者年齡分析
阿新 • • 發佈:2018-12-15
載入剛才解析json格式儲存而成的csv檔案。
按使用者年齡分析
df = spark.read.format("csv").option("header", "true").load("jifenluohu.csv") #df.show() df.createOrReplaceTempView("jflh") #按年齡分組 #按照數量倒序 spark.sql("select 2018-substring(idCard,7,4) as age,count(*) as num from jflh group by age order by num desc").show(30) #按照年齡正序 spark.sql("select 2018-substring(idCard,7,4) as age,count(*) as num from jflh group by age order by age asc").show(30) +----+ | num| +----+ |6019| +----+ +----+---+ | age|num| +----+---+ |42.0|813| |41.0|799| |40.0|773| |43.0|757| |44.0|586| |39.0|507| |45.0|507| |46.0|378| |38.0|302| |47.0|238| |37.0|162| |36.0|109| |35.0| 39| |34.0| 13| |49.0| 9| |54.0| 5| |48.0| 4| |51.0| 4| |52.0| 3| |33.0| 3| |53.0| 2| |50.0| 1| |60.0| 1| |58.0| 1| |59.0| 1| |57.0| 1| |55.0| 1| +----+---+ +----+---+ | age|num| +----+---+ |33.0| 3| |34.0| 13| |35.0| 39| |36.0|109| |37.0|162| |38.0|302| |39.0|507| |40.0|773| |41.0|799| |42.0|813| |43.0|757| |44.0|586| |45.0|507| |46.0|378| |47.0|238| |48.0| 4| |49.0| 9| |50.0| 1| |51.0| 4| |52.0| 3| |53.0| 2| |54.0| 5| |55.0| 1| |57.0| 1| |58.0| 1| |59.0| 1| |60.0| 1| +----+---+