Spark運算元:Action之saveAsTextFile、saveAsSequenceFile、saveAsObjectFile
阿新 • • 發佈:2018-12-11
1、saveAsTextFile
1)def saveAsTextFile(path: String): Unit 2)def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit saveAsTextFile用於將RDD以文字檔案的格式儲存到檔案系統中。
var rdd1 = sc.makeRDD(1 to 10,2) scala> rdd1.saveAsTextFile("hdfs://cdh5/tmp/lxw1234.com/") //儲存到HDFS hadoop fs -ls /tmp/lxw1234.com Found 2 items -rw-r--r-- 2 lxw1234 supergroup 0 2015-07-10 09:15 /tmp/lxw1234.com/_SUCCESS -rw-r--r-- 2 lxw1234 supergroup 21 2015-07-10 09:15 /tmp/lxw1234.com/part-00000 hadoop fs -cat /tmp/lxw1234.com/part-00000 1 2 3 4 5
//指定壓縮格式儲存
rdd1.saveAsTextFile("hdfs://cdh5/tmp/lxw1234.com/",classOf[com.hadoop.compression.lzo.LzopCodec]) hadoop fs -ls /tmp/lxw1234.com -rw-r--r-- 2 lxw1234 supergroup 0 2015-07-10 09:20 /tmp/lxw1234.com/_SUCCESS -rw-r--r-- 2 lxw1234 supergroup 71 2015-07-10 09:20 /tmp/lxw1234.com/part-00000.lzo hadoop fs -text /tmp/lxw1234.com/part-00000.lzo 1 2 3 4 5
2、saveAsSequenceFile
saveAsSequenceFile用於將RDD以SequenceFile的檔案格式儲存到HDFS上,用法同saveAsTextFile
3、saveAsObjectFile : def saveAsObjectFile(path: String): Unit
saveAsObjectFile用於將RDD中的元素序列化成物件,儲存到檔案中。對於HDFS,預設採用SequenceFile儲存。
var rdd1 = sc.makeRDD(1 to 10,2) rdd1.saveAsObjectFile("hdfs://cdh5/tmp/lxw1234.com/") hadoop fs -cat /tmp/lxw1234.com/part-00000 SEQ !org.apache.hadoop.io.NullWritable"org.apache.hadoop.io.BytesWritableT