Pheonix學習筆記 --- Blk Data Loading,Pheonix導如CSV檔案

Permissions issues when uploading HFiles

There can be issues due to file permissions on the created HFiles in the final stage of a bulk load, when the created HFiles are handed over to HBase. HBase needs to be able to move the created HFiles, which means that it needs to have write access to the directories where the files have been written. If this is not the case, the uploading of HFiles will hang for a very long time before finally failing.

There are two main workarounds for this issue: running the bulk load process as the hbase user, or creating the output files with as readable for all users.

The first option can be done by simply starting the hadoop command with sudo -u hbase, i.e.

sudo -u hbase hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /data/example.csv

Creating the output files as readable by all can be done by setting the fs.permissions.umask-mode configuration setting to “000”. This can be set in the hadoop configuration on the machine being used to submit the job, or can be set for the job only during submission on the command line as follows:

hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=000 --table EXAMPLE --input /data/example.csv


