python無法讀取hdfs檔案的問題:requests.exceptions.ConnectionError: HTTPConnectionPool
阿新 • • 發佈:2018-12-11
1.問題一描述:在用python的hdfs庫操作HDFS時,可以正常的獲取到hdfs的檔案目錄
from hdfs import *
client = Client("http://10.0.30.9:50070")
print(client.list('/'))
['test.txt']
但是在讀取檔案時,出現了hdfs.util.HdfsError: File /user/dr.who/test.txt not found.的錯誤,嘗試使用pyhdfs也是同樣的問題,包括下面說的第二個問題
from hdfs import * client = Client("http://10.0.30.9:50070") print(client.list('/')) with client.read('test.txt') as reader: content = reader.read() print(content)
Traceback (most recent call last): File "E:/pycharm/workspace/hadoopforwin/myhdfs.py", line 5, in <module> with client.read('test.txt') as reader: File "D:\python3.6\lib\contextlib.py", line 81, in __enter__ return next(self.gen) File "D:\python3.6\lib\site-packages\hdfs\client.py", line 678, in read buffersize=buffer_size, File "D:\python3.6\lib\site-packages\hdfs\client.py", line 112, in api_handler raise err File "D:\python3.6\lib\site-packages\hdfs\client.py", line 107, in api_handler **self.kwargs File "D:\python3.6\lib\site-packages\hdfs\client.py", line 210, in _request _on_error(response) File "D:\python3.6\lib\site-packages\hdfs\client.py", line 50, in _on_error raise HdfsError(message, exception=exception) hdfs.util.HdfsError: File /user/dr.who/test.txt not found.
2.問題一解決方法:出現這個問題是因為沒有指定根路徑(root path),需要在呼叫Client方法連線hdfs時指定root path
from hdfs import *
client = Client("http://10.0.30.9:50070", root='/')
print(client.list('/'))
with client.read('test.txt') as reader:
content = reader.read()
print(content)
執行程式碼,又出現了新的問題。。。。。
3.問題二描述:報錯內容的最後一行如下,這裡的hmaster是hadoop主機的主機名,說明程式沒有將主機名對映到正確的ip
requests.exceptions.ConnectionError: HTTPConnectionPool(host='hmaster', port=50075): Max retries exceeded with url: /webhdfs/v1/test.txt?op=OPEN&namenoderpcaddress=hMaster:9000&offset=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000000035BAB38>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
4.問題二解決方法:在執行python程式的主機的hosts檔案中加上主機名和ip的對映,對於我所使用的windows系統,hosts檔案的路徑是C://Windows/System32/drivers/etc/hosts,在檔案末尾加上
ip 主機名
以本文的情況為例,則是
10.0.30.9 hmaster
修改完記得儲存,執行程式成功讀取檔案。
5.在使用hdfs和pyhdfs庫時,除了讀取檔案,還有一些方法也會出現這種情況,解決方法相同