pyspark報錯: invalid stream header
阿新 • • 發佈:2018-11-01
當我對rdd進行map操作的時候,就是新增一個欄位,表示其中的兩個欄位是否相等,然後報這個錯誤。
Google後也沒找到確切的解決方案,因為是用python程式設計,對java不熟悉,估計是java物件讀寫資料的時候發生的問題. google的答案: I can tell you that this usually means somewhere something wrote
objects to the same OutputStream with multiple ObjectOutputStreams. AC
is a header value.
I don't obviously see where/how that could happen, but maybe it rings
a bell for someone. This could happen if an OutputStream is reused
across object serializations but new ObjectOutputStreams are opened,
for example.
既然是這樣,就重啟pyspark看看能否解決,果然,重啟後就解決了。沒再報那個錯誤了。
然後我就想,可能是比較rdd中兩個元素的時候,多次使用了FileOutputStream構建的ObjectOutputStream物件
如果哪位大神看到知道具體是什麼原因引起的,可以留下你的答案!!!