Structure Streaming和spark streaming原生API訪問HDFS檔案資料對比
阿新 • • 發佈:2018-10-31
此文已由作者嶽猛授權網易雲社群釋出。
歡迎訪問網易雲社群,瞭解更多網易技術產品運營經驗。
Structure Stream訪問方式
code examples
import org.apache.spark.sql.streaming._ val df = spark.readStream.text("/home/testhdfs") val ps = df.writeStream.format("console").outputMode(OutputMode.Append).start
結論
basedir = /home/testhdfs
支援:mv file to basedir(/home/testhdfs)
不支援:mv directory to basedir
如果往basedir裡面新增資料夾會出現ERROR:
java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths: hdfs://172.17.1.180:9000/home/testhdfs/data1 hdfs://172.17.1.180:9000/home/testhdfsIf provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.
spark streaming 訪問方式
測試textFile介面使用
import org.apache.spark.streaming._ val ssc = StreamingContext.getActiveOrCreate(() => new StreamingContext(sc, Seconds(120))) val ds1 = ssc.textFileStream("/home/testhdfs2") ds1.print ssc.start
結論
支援:mv file to basedir(/home/testhdfs2)
支援:mv directory to basedir
連結:https://www.jianshu.com/p/9eb8ff8f0660
更多網易技術、產品、運營經驗分享請點選。
相關文章:
【推薦】 網易雲容器服務微服務化實踐—微服務測試及映象化提測全流程實踐