1. 程式人生 > >[Spark基礎]-- Spark 內建資料來源 options 名稱

[Spark基礎]-- Spark 內建資料來源 options 名稱

在 Spark-2.1.0 以後支援的 Options 如下:

--------- JDBC’s options  ---------
 user
 password
 url
 dbtable
 driver
 partitionColumn
 lowerBound
 upperBound
 numPartitions
 fetchsize
 truncate
 createTableOptions
 batchsize
 isolationLevel

--------- CSV’s options  ---------
 path
 sep
 delimiter
 mode
 encoding
 charset
 quote
 escape
 comment
 header
 inferSchema
 ignoreLeadingWhiteSpace
 ignoreTrailingWhiteSpace
 nullValue
 nanValue
 positiveInf
 negativeInf
 compression
 codec
 dateFormat
 timestampFormat
 maxColumns
 maxCharsPerColumn
 escapeQuotes
 quoteAll

---------  JSON’s options  ---------
 path
 samplingRatio
 primitivesAsString
 prefersDecimal
 allowComments
 allowUnquotedFieldNames
 allowSingleQuotes
 allowNumericLeadingZeros
 allowNonNumericNumbers
 allowBackslashEscapingAnyCharacter
 compression
 mode
 columnNameOfCorruptRecord
 dateFormat
 timestampFormat

--------- Parquet’s options  ---------
 path
 compression
 mergeSchema.

---------  ORC’s options  --------- 
 path
 compression
 orc.compress.

---------  FileStream’s options --------- 
 path
 maxFilesPerTrigger
 maxFileAge
 latestFirst.

--------- Text’s options ---------
 path 
 compression

--------- LibSVM’s options -------
 path
 vectorType 
 numFeatures

注意:在 Spark-2.1.0 以前,他們都是區分大小寫的。

參考:https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/DataFrameReader.html