1. 程式人生 > >Spark SQL 筆記(19)——spark SQL 總結(2) DataFrame VS SQL

Spark SQL 筆記(19)——spark SQL 總結(2) DataFrame VS SQL

1 DataFrame

  • DataFrame = RDD + Schema
  • DataFrame is just a type alias for Dataset of Row
  • DataFrame over RDD : Catalyst optimization&schemas
  • DataFrame can handle : Text,JSON,Parquet,…
  • Both SQL and API Functions in DF still Catalyst optimized

2 Schema

https://spark.apache.org/docs/2.1.3/sql-programming-guide.html#interoperating-with-rdds

  • inferred
  • explicit

3 Loading & Saving Results

https://spark.apache.org/docs/2.1.3/sql-programming-guide.html#save-modes

4 SQL Function Coverage

SQL 覆蓋面

  • SQL 2003 support
  • Runs all 99 of TPC-DS benchmark queries
  • Subquery supports
  • vectorization

5 外部資料來源

https://spark-packages.org/

  • rdbms,need JDBC jars,
  • Parquet,Phoenix,csv,avro,…