Python操作Hive的兩種方法總結
阿新 • • 發佈:2018-11-01
方法一:使用PyHive庫
安裝依賴包:其中sasl安裝可能會報錯,可以去https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl下載對應版本安裝。
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
Python指令碼程式碼操作:
from pyhive import hive # or import hive conn = hive.Connection(host='****', port=****, username='****', database='****') cursor.execute(''SELECT * FROM my_awesome_data LIMIT 10'') for i in range(****): sql = "INSERT INTO **** VALUES ({},'username{}')".format(value, str(username)) cursor.execute(sql) # 下面是官網程式碼: from pyhive import presto # or import hive cursor = presto.connect('localhost').cursor() cursor.execute('SELECT * FROM my_awesome_data LIMIT 10') print(cursor.fetchone()) print(cursor.fetchall())
方法二:使用 impyla庫
impyla依賴包:
pip install six
pip install bit_array
pip install thriftpy
為了支援Hive還需要以下兩個包:
pip install sasl
pip install thrift_sasl
可在Python PyPI中下載impyla及其依賴包的原始碼
from impala.dbapi import connect conn = connect(host ='****',port = ****) cursor = conn.cursor() cursor.execute('SELECT * FROM mytable LIMIT 100') print cursor.description # 列印結果集的schema results = cursor.fetchall()