sqlContext.filter()返回的RDD為空

阿新 • • 發佈：2019-01-22

Hive中已有表records:

hive> desc records;
OK
year string
temperature int
quality int

hive> select * from records;
OK
201315 18
201423 32
201519 91

把records表中temperature中！=15的篩選出來，另建立一張新表存入篩選後的資料。程式碼如下：

from pyspark import SparkContext
from pyspark.sql import HiveContext

def inside(row):
<span style="color:#ff0000;">        if int(row[1]) == 15:
                print "*******************************" +str( row[1])
                return False</span>

if __name__ == "__main__":
        sc = SparkContext(appName = "records")
        sqlContext = HiveContext(sc)

        table_df = sqlContext.sql("select * from records").rdd
        rltrdd = table_df.filter(lambda row : inside(row))

        count = rltrdd.count()
        if count == 0:
                print "**************************nothing****************"
        else:
                print "********************************************" + str(count)

 <span style="white-space:pre">	</span>tablename = "temp"
        newdf = sqlContext.createDataFrame(rltrdd)
        newdf.registerAsTable(tablename)

        sql_create = "create table  temptable like records"
        sql_insert = "insert into table temptable select * from temp"
        sqlContext.sql(sql_create)
        sqlContext.sql(sql_insert)
        sc.stop()

提示RDD為空，報錯如下：
Traceback (most recent call last):
File "/home/sky/spark/bin/workspace/query.py", line 24, in <module>
newdf = sqlContext.createDataFrame(rltrdd)
File "/home/sky/spark/python/pyspark/sql/context.py", line 284, in createDataFrame
schema = self._inferSchema(rdd, samplingRatio)
File "/home/sky/spark/python/pyspark/sql/context.py", line 164, in _inferSchema
first = rdd.first()
File "/home/sky/spark/python/pyspark/rdd.py", line 1245, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty

修改程式碼如下：

from pyspark import SparkContext
from pyspark.sql import HiveContext

def inside(row):
<span style="color:#ff0000;">        if int(row[1]) != 15:
                print "*******************************" +str( row[1])
                return True
        else:   
                return False</span>
if __name__ == "__main__":
        sc = SparkContext(appName = "records")
        sqlContext = HiveContext(sc)
      
        table_df = sqlContext.sql("select * from records").rdd
        print "*************************************" 
        print  table_df
        rltrdd = table_df.filter(lambda row : inside(row))
        print "*************************************"+str(rltrdd)
        count = rltrdd.count()
        if count == 0:
                print "**************************nothing****************"
        else:
<span style="white-space:pre">	</span>print "********************************************" + str(count)
        tablename = "temp"
        newdf = sqlContext.createDataFrame(rltrdd)
        newdf.registerAsTable(tablename)

        sql_create = "create table  temptable like records"
        sql_insert = "insert into table temptable select * from temp"
        sqlContext.sql(sql_create)
        sqlContext.sql(sql_insert)
        sc.stop()

錯誤原因：

filter必須返回有True的值，否則為空

sqlContext.filter()返回的RDD為空

sqlContext.filter()返回的RDD為空

ajax異步傳輸數據，return返回值為空

mybatis查詢資料庫返回結果為空

使用Collections.emptyList()方法返回可能為空的集合

在https協議下 curl的返回結果為空問題

jqGrid表格載入返回資料為空時，alert一句提示

小程式部分機型請求200、返回資料為空

servlet如何接收Ajax傳來的值,ajax傳值給servlet並且解決返回值為空的現象

hive--解決使用not in之後返回資料為空的問題

getResourceAsStream返回值為空的問題

mybatis學習----------查詢數據庫返回結果為空

Android呼叫系統照相機返回intent為空原因分析

伺服器返回訊息為空iOS

com.xyzlf.share:sharesdk---getSampleBitmap--- BitmapFactory.decodeStream返回值為空

mysql not in子查詢返回結果為空

springmvc中的檢視模型的返回方式，尤其注意當返回值為空時的預設檢視返回路徑

springmvc返回數據庫不為空的數據的方法(或JSONObject過濾null字段的方法)

使用MyBatis查詢返回類型為int，但是當查詢結果為空NULL，報異常的解決方法

spring boot加mybatis使用Map返回時，當值為空時屬性也會沒有（轉）

判斷返回的對象是否為空

sqlContext.filter()返回的RDD為空

相關推薦