1. 程式人生 > >Spark TempView和GlobalTempView的區別

Spark TempView和GlobalTempView的區別

# Spark TempView和GlobalTempView的區別 TempView和GlobalTempView在spark的Dataframe中經常使用,兩者的區別和應用場景有什麼不同。 我們以下面的例子比較下兩者的不同。 ```python from pyspark.sql import SparkSession import numpy as np import pandas as pd spark = SparkSession.builder.getOrCreate() ``` ```python d = np.random.randint(1,100, 5*5).reshape(5,-1) data = pd.DataFrame(d, columns=list('abcde')) df = spark.createDataFrame(data) df.show() ``` +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 17| 30| 61| 61| 33| | 32| 23| 24| 7| 7| | 47| 6| 4| 95| 34| | 50| 69| 83| 21| 46| | 52| 12| 83| 49| 85| +---+---+---+---+---+ ## 從tempview中取資料 ```python temp = df.createTempView('temp') temp_sql = "select * from temp where a=50" res = spark.sql(temp_sql) res.show() ``` +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 50| 69| 83| 21| 46| +---+---+---+---+---+ ## 從globaltempview中取資料 ```python glob = df.createGlobalTempView('glob') glob_sql = "select * from global_temp.glob where a = 17" res2 = spark.sql(glob_sql) res2.show() ``` +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 17| 30| 61| 61| 33| +---+---+---+---+---+ ## Globaltempview 資料可以在多個sparkSession中共享 ```python # 建立新的sparkSession spark2 = spark.newSession() spark2 == spark ``` False ```python # 新的sparkSession可以獲取globaltempview中的資料 new_sql = "select * from global_temp.glob where a = 47" temp = spark2.sql(new_sql) temp.show() ``` +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 47| 6| 4| 95| 34| +---+---+---+---+---+ ```python # 新的sparkSession無法獲取tempview中的資料 # 會提示找不到temp表 new_sql2 = "select * from temp where a = 47" temp = spark2.sql(new_sql2) temp.show() ``` ```python # 使用global_temp字首也不行 new_sql2 = "select * from global_temp.temp where a = 47" temp = spark2.sql(new_sql2) temp.show() ``` --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) # 此處多行刪除異常資訊 AnalysisException: "Table or view not found: `global_temp`.`temp`; line 1 pos 14;\n'Project [*]\n+- 'Filter ('a = 47)\n +- 'UnresolvedRelation `global_temp`.`temp`\n" ## tempview刪除後無法使用 ```python spark.catalog.dropTempView('temp') spark.catalog.dropGlobalTempView('glob') # 報錯,找不到table temp temp_sql2 = "select * from temp where a = 47" temp = spark.sql(temp_sql2) # 報錯,找不到global_temp.glob,spark和spark2中均報錯 glob_sql2 = "select * from global_temp.glob where a = 47" temp = spark.sql(glob_sql2) temp = spark2.sql(glob_sql2) ``` ## 總結 **spark中有四個tempview方法** - df.createGlobalTempView - df.createOrReplaceGlobalTempView - df.createOrReplaceTempView - df.createTempView **replace方法:不存在則直接建立,存在則替換** ------ **tempview刪除後無法使用** 兩個刪除方法 spark.catalog.dropTempView('temp') spark.catalog.dropGlobalTempView('glob') ---- TempView和GlobalTempView的異同 1. tempview只能在一個sparkSession中使用 2. GlobaltempView可以在多個sparkSession中共享使用 3. 但是他們都不能跨Applicatio