hive0.14-insert、update、delete操作測試
阿新 • • 發佈:2019-02-19
問題導讀
1.測試insert報錯,該如何解決?
2.hive delete和update報錯,該如何解決?
3.什麼情況下才允許delete和update?
首先用最普通的建表語句建一個表:
結果報錯:
貌似往hdfs上找jar包了,小問題,直接把lib下的jar包上傳到hdfs
接著執行insert,沒有問題,接下來測試delete
報錯!:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
說是在使用的轉換管理器不支援update跟delete操作。
原來要支援update操作跟delete操作,必須額外再配置一些東西,見:
https://cwiki.apache.org/conflue ... tersforTransactions
根據提示配置hive-site.xml:
配置完以為能夠順利運行了,誰知開始報下面這個錯誤:
與元資料庫出現了問題,修改log為DEBUG檢視具體錯誤:
在元資料庫中找不到COMPACTION_QUEUE這個表,趕緊去mysql中檢視,確實沒有這個表。怎麼會沒有這個表呢?找了很久都沒找到什麼原因,查原始碼吧。
在org.apache.hadoop.hive.metastore.txn下的TxnDbUtil類中找到了建表語句,順藤摸瓜,找到了下面這個方法會呼叫建表語句:
什麼意思呢,就是說要執行建表語句還有一個條件:HIVE_IN_TEST或者HIVE_IN_TEZ_TEST.只有在測試環境中才能用delete,update操作,也可以理解,畢竟還沒有開發完全。
終於找到原因,解決方法也很簡單:在hive-site.xml中新增下面的配置:
OK,再重新啟動服務,再執行delete:
網上查到確實如此,而且目前只有ORCFileformat支援AcidOutputFormat,不僅如此建表時必須指定引數('transactional' = true)。感覺太麻煩了。。。。
於是按照網上示例建表:
delete
update
OK!全部順利執行,不過貌似效率太低了,基本都要30s左右,估計應該可以優化,再研究研究
1.測試insert報錯,該如何解決?
2.hive delete和update報錯,該如何解決?
3.什麼情況下才允許delete和update?
首先用最普通的建表語句建一個表:
- hive>create table test(id int,name string)row format delimited fields terminated by ',';
- insert into table test values (1,'row1'),(2,'row2');
結果報錯:
- java.io.FileNotFoundException: File does not exist: hdfs://127.0.0.1:9000/home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/
- apache-hive-0.14.0-SNAPSHOT-bin/lib/curator-client-2.6.0.jar
- at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
- at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
- at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
- at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
- at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
- at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
- at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
- at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
- at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
- at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
- at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
- at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
- at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
- at java.security.AccessController.doPrivileged(Native Method)
- ......
貌似往hdfs上找jar包了,小問題,直接把lib下的jar包上傳到hdfs
- hadoop fs -mkdir -p /home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/lib/
- hadoop fs -put $HIVE_HOME/lib/* /home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/lib/
接著執行insert,沒有問題,接下來測試delete
- hive>delete from test where id = 1;
報錯!:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
說是在使用的轉換管理器不支援update跟delete操作。
原來要支援update操作跟delete操作,必須額外再配置一些東西,見:
https://cwiki.apache.org/conflue ... tersforTransactions
根據提示配置hive-site.xml:
- hive.support.concurrency – true
- hive.enforce.bucketing – true
- hive.exec.dynamic.partition.mode – nonstrict
- hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
- hive.compactor.initiator.on – true
- hive.compactor.worker.threads – 1
配置完以為能夠順利運行了,誰知開始報下面這個錯誤:
- FAILED: LockException [Error 10280]: Error communicating with the metastore
與元資料庫出現了問題,修改log為DEBUG檢視具體錯誤:
- 4-11-04 14:20:14,367 DEBUG [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(265)) - Going to execute query <select cq_id,
- cq_database, cq_table, cq_partition, cq_type, cq_run_as from COMPACTION_QUEUE where cq_state = 'r'>
- 2014-11-04 14:20:14,367 ERROR [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(285)) - Unable to select next element for cleaning,
- Table 'hive.COMPACTION_QUEUE' doesn't exist
- 2014-11-04 14:20:14,367 DEBUG [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(287)) - Going to rollback
- 2014-11-04 14:20:14,368 ERROR [Thread-8]: compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message
- :Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist
- at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown Source)
- at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
- at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
在元資料庫中找不到COMPACTION_QUEUE這個表,趕緊去mysql中檢視,確實沒有這個表。怎麼會沒有這個表呢?找了很久都沒找到什麼原因,查原始碼吧。
在org.apache.hadoop.hive.metastore.txn下的TxnDbUtil類中找到了建表語句,順藤摸瓜,找到了下面這個方法會呼叫建表語句:
- private void checkQFileTestHack() {
- boolean hackOn = HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) ||
- HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEZ_TEST);
- if (hackOn) {
- LOG.info("Hacking in canned values for transaction manager");
- // Set up the transaction/locking db in the derby metastore
- TxnDbUtil.setConfValues(conf);
- try {
- TxnDbUtil.prepDb();
- } catch (Exception e) {
- // We may have already created the tables and thus don't need to redo it.
- if (!e.getMessage().contains("already exists")) {
- throw new RuntimeException("Unable to set up transaction database for" +
- " testing: " + e.getMessage());
- }
- }
- }
- }
什麼意思呢,就是說要執行建表語句還有一個條件:HIVE_IN_TEST或者HIVE_IN_TEZ_TEST.只有在測試環境中才能用delete,update操作,也可以理解,畢竟還沒有開發完全。
終於找到原因,解決方法也很簡單:在hive-site.xml中新增下面的配置:
- <property>
- <name>hive.in.test</name>
- <value>true</value>
- </property>
OK,再重新啟動服務,再執行delete:
- hive>delete from test where id = 1;
- FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table default.test that does not use an AcidOutputFormat or is not bucketed
網上查到確實如此,而且目前只有ORCFileformat支援AcidOutputFormat,不僅如此建表時必須指定引數('transactional' = true)。感覺太麻煩了。。。。
於是按照網上示例建表:
- hive>create table test(id int ,name string )clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
- hive>insert into table test values (1,'row1'),(2,'row2'),(3,'row3');
delete
- hive>delete from test where id = 1;
update
- hive>update test set name = 'Raj' where id = 2;
OK!全部順利執行,不過貌似效率太低了,基本都要30s左右,估計應該可以優化,再研究研究