Spark實戰(3) DataFrame基礎之行列操作和SQL
阿新 • • 發佈:2018-12-18
文章目錄
行列操作
df['age'] # I only get a column object
df.select('age').show() # I get a datafram with a column that we could use with show() method
# see the first two row elements
df.head(2) # return a list
df.select(['age','name']).show() # get two columns
# create a new column
df.withColumn( 'double_age',df['age'] * 2).show() # this is not inplace
# rename a column
df.withColumnRenamed('age','my_new_age').show()
SQL操作
# very useful when you are familar with SQL
# create a temp view at first
df.createOrReplaceTempView('people') # the table name is people
# create one sql query and get the result
results = spark.sql("SELECT * FROM people")
results.show()
# create another sql query and get the result
new_results = spark.sql("SELECT * FROM people WHERE age=30")
new_results.show()