Spark實戰(4) DataFrame基礎之資料篩選
阿新 • • 發佈:2018-11-03
文章目錄
filter寫法一
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('appe_stock.csv',inferSchema = True, header = True)
df.printSchema()
df.show()
# The first way
df. filter("Close < 500").show() # 傳入一個條件
df.filter("Close < 500").select('Open').show()
df.filter("Close < 500").select(['Open','Close']).show()
filter寫法二
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('appe_stock.csv',inferSchema = True, header = True)
df.printSchema()
df.show()
# The second way
df.filter(df['Close'] < 500).select('Volume').show()
df.filter(df['Close'] < 200 and df['Open'] > 200).show() # wrong
df.filter((df['Close'] < 200) & (df['Open'] > 200)).show() # right
條件符號
# not operation
df.filter( (df['Close'] < 200) & ~(df['Open'] > 200)).show() # right
# equal operation
df.filter(df['Low'] == 197.16).show()
獲取結果
# if we want to save it, we could use collect()
result = df.filter(df['Low'] == 197.16).collect()
# one row as many format
result[0].asDict()
# and then you could get specific attribute
result[0].asDict()['Volume']