將hive的元資料存到Oracle裡
感覺那個derby用不來,,故想將元資料放置到oracle裡面【也可存到mysql】。至少知道怎麼備份它。。。
下面是研究了一晚上的東東,人太笨了。多謝 國寶 斑豬、
=========================================
How to store Hive's metadata in Oracle
hive by default uses an embedded derby database. In real-world scenarios which multiple hive queries are executed using multiple session, a database server like MySQL
1)create oracle user
sql> create user hive identified by hive identified by hive
default tablespace hive;
sql> grant dba to hive; [notice security]
2)edit hive/conf/hive-default.xml
1 <?xml version="1.0"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3
4 <configuration>
5
6 <!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
7 <!-- that are implied by Hadoop
8 <!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
9 <!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
10 <!-- resource). -->
11
12 <!-- Hive Execution Parameters -->
13 <property>
14 <name>mapred.reduce
15 <value>-1</value>
16 <description>The default number of reduce tasks per job. Typically set
17 to a prime close to the number of available hosts. Ignored when
18 mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value.
19 By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
20 </description>
21 </property>
22
23 <property>
24 <name>hive.exec.reducers.bytes.per.reducer</name>
25 <value>1000000000</value>
26 <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description>
27 </property>
28
29 <property>
30 <name>hive.exec.reducers.max</name>
31 <value>999</value>
32 <description>max number of reducers will be used. If the one
33 specified in the configuration parameter mapred.reduce.tasks is
34 negative, hive will use this one as the max number of reducers when
35 automatically determine number of reducers.</description>
36 </property>
37
38 <property>
39 <name>hive.exec.scratchdir</name>
40 <value>/tmp/hive-${user.name}</value>
41 <description>Scratch space for Hive jobs</description>
42 </property>
43
44 <property>
45 <name>hive.test.mode</name>
46 <value>false</value>
47 <description>whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename</description>
48 </property>
49
50 <property>
51 <name>hive.test.mode.prefix</name>
52 <value>test_</value>
53 <description>if hive is running in test mode, prefixes the output table by this string</description>
54 </property>
55
56 <!-- If the input table is not bucketed, the denominator of the tablesample is determinied by the parameter below -->
57 <!-- For example, the following query: -->
58 <!-- INSERT OVERWRITE TABLE dest -->
59 <!-- SELECT col1 from src -->
60 <!-- would be converted to -->
61 <!-- INSERT OVERWRITE TABLE test_dest -->
62 <!-- SELECT col1 from src TABLESAMPLE (BUCKET 1 out of 32 on rand(1)) -->
63 <property>
64 <name>hive.test.mode.samplefreq</name>
65 <value>32</value>
66 <description>if hive is running in test mode and table is not bucketed, sampling frequency</description>
67 </property>
68
69 <property>
70 <name>hive.test.mode.nosamplelist</name>
71 <value></value>
72 <description>if hive is running in test mode, dont sample the above comma seperated list of tables</description>
73 </property>
74
75 <property>
76 <name>hive.metastore.local</name>
77 <value>true</value>
78 <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
79 </property>
80
81 <property>
82 <name>javax.jdo.option.ConnectionURL</name>
83 <value>jdbc:oracle:thin:@192.168.1.101:1521/ORCL</value>
84 <description>JDBC connect string for a JDBC metastore</description>
85 </property>
86
87 <property>
88 <name>javax.jdo.option.ConnectionDriverName</name>
89 <value>oracle.jdbc.driver.OracleDriver</value> 【就是第三步下載的ojdbc.jar】
90 <description>Driver class name for a JDBC metastore</description>
91 </property>
92
93 <property>
94 <name>javax.jdo.PersistenceManagerFactoryClass</name>
95 <value>org.datanucleus.jdo.JDOPersistenceManagerFactory</value>
96 <description>class implementing the jdo persistence</description>
97 </property>
98
99 <property>
100 <name>javax.jdo.option.DetachAllOnCommit</name>
101 <value>true</value>
102 <description>detaches all objects from session so that they can be used after transaction is committed</description>
103 </property>
104
105 <property>
106 <name>javax.jdo.option.NonTransactionalRead</name>
107 <value>true</value>
108 <description>reads outside of transactions</description>
109 </property>
110
111 <property>
112 <name>javax.jdo.option.ConnectionUserName</name>
113 <value>hive</value> [使用者]
114 <description>username to use against metastore database</description>
115 </property>
116
117 <property>
118 <name>javax.jdo.option.ConnectionPassword</name>
119 <value>hive</value> [密碼]
120 <description>password to use against metastore database</description>
121 </property>
122
123 <property>
124 <name>datanucleus.validateTables</name>
125 <value>false</value>
126 <description>validates existing schema against code. turn this on if you want to verify existing schema </description>
127 </property>
128
129 <property>
130 <name>datanucleus.validateColumns</name>
131 <value>false</value>
132 <description>validates existing schema against code. turn this on if you want to verify existing schema </description>
133 </property>
134
135 <property>
136 <name>datanucleus.validateConstraints</name>
137 <value>false</value>
138 <description>validates existing schema against code. turn this on if you want to verify existing schema </description>
139 </property>
140
141 <property>
142 <name>datanucleus.storeManagerType</name>
143 <value>rdbms</value>
144 <description>metadata store type</description>
145 </property>
146
147 <property>
148 <name>datanucleus.autoCreateSchema</name>
149 <value>true</value>
150 <description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
151 </property>
152
153 <property>
154 <name>datanucleus.autoStartMechanismMode</name>
155 <value>checked</value>
156 <description>throw exception if metadata tables are incorrect</description>
157 </property>
158
159 <property>
160 <name>datancucleus.transactionIsolation</name>
161 <value>read-committed</value>
162 <description></description>
163 </property>
164
165 <property>
166 <name>datanuclues.cache.level2</name>
167 <value>true</value>
168 <description>use a level 2 cache. turn this off if metadata is changed independently of hive metastore server</description>
169 </property>
170
171 <property>
172 <name>datanuclues.cache.level2.type</name>
173 <value>SOFT</value>
174 <description>SOFT=soft reference based cache, WEAK=weak reference based cache.</description>
175 </property>
176
177 <property>
178 <name>hive.metastore.warehouse.dir</name>
179 <value>/user/hive/warehouse</value>
180 <description>location of default database for the warehouse</description>
181 </property>
182
183 <property>
184 <name>hive.metastore.connect.retries</name>
185 <value>5</value>
186 <description>Number of retries while opening a connection to metastore</description>
187 </property>
188
189 <property>
190 <name>hive.metastore.rawstore.impl</name>
191 <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
192 <description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database</description>
193 </property>
194
195 <property>
196 <name>hive.default.fileformat</name>
197 <value>TextFile</value>
198 <description>Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS <TEXTFILE|SEQUENCEFILE> to override</description>
199 </property>
200
201 <property>
202 <name>hive.map.aggr</name>
203 <value>true</value>
204 <description>Whether to use map-side aggregation in Hive Group By queries</description>
205 </property>
206
207 <property>
208 <name>hive.groupby.skewindata</name>
209 <value>false</value>
210 <description>Whether there is skew in data to optimize group by queries</description>
211 </property>
212
213 <property>
214 <name>hive.groupby.mapaggr.checkinterval</name>
215 <value>100000</value>
216 <description>Number of rows after which size of the grouping keys/aggregation classes is performed</description>
217 </property>
218
219 <property>
220 <name>hive.mapred.local.mem</name>
221 <value>0</value>
222 <description>For local mode, memory of the mappers/reducers</description>
223 </property>
224
225 <property>
226 <name>hive.map.aggr.hash.percentmemory</name>
227 <value>0.5</value>
228 <description>Portion of total memory to be used by map-side grup aggregation hash table</description>
229 </property>
230
231 <property>
232 <name>hive.map.aggr.hash.min.reduction</name>
233 <value>0.5</value>
234 <description>Hash aggregation will be turned off if the ratio between hash
235 table size and input rows is bigger than this number. Set to 1 to make sure
236 hash aggregation is never turned off.</description>
237 </property>
238
239 <property>
240 <name>hive.optimize.cp</name>
241 <value>true</value>
242 <description>Whether to enable column pruner</description>
243 </property>
244
245 <property>
246 <name>hive.optimize.ppd</name>
247 <value>true</value>
248 <description>Whether to enable predicate pushdown</description>
249 </property>
250
251 <property>
252 <name>hive.optimize.pruner</name>
253 <value>true</value>
254 <description>Whether to enable the new partition pruner which depends on predicate pushdown. If this is disabled,
255 the old partition pruner which is based on AST will be enabled.</description>
256 </property>
257
258 <property>
259 <name>hive.join.emit.interval</name>
260 <value>1000</value>
261 <description>How many rows in the right-most join operand Hive should buffer before emitting the join result. </description>
262 </property>
263
264 <property>
265 <name>hive.mapred.mode</name>
266 <value>nonstrict</value>
267 <description>The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run</description>
268 </property>
269
270 <property>
271 <name>hive.exec.script.maxerrsize</name>
272 <value>100000</value>
273 <description>Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity </description>
274 </property>
275
276 <property>
277 <name>hive.exec.compress.output</name>
278 <value>false</value>
279 <description> This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
280 </property>
281
282 <property>
283 <name>hive.exec.compress.intermediate</name>
284 <value>false</value>
285 <description> This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
286 </property>
287
288 <property>
289 <name>hive.hwi.listen.host</name>
290 <value>0.0.0.0</value>
291 <description>This is the host address the Hive Web Interface will listen on</description>
292 </property>
293
294 <property>
295 <name>hive.hwi.listen.port</name>
296 <value>9999</value>
297 <description>This is the port the Hive Web Interface will listen on</description>
298 </property>
299
300 <property>
301 <name>hive.hwi.war.file</name>
302 <value>${HIVE_HOME}/lib/hive-hwi.war</value>
303 <description>This is the WAR file with the jsp content for Hive Web Interface</description>
304 </property>
305
306 <property>
307 <name>hive.exec.pre.hooks</name>
308 <value></value>
309 <description>Pre Execute Hook for Tests</description>
310 </property>
311
312 <property>
313 <name>hive.merge.mapfiles</name>
314 <value>true</value>
315 <description>Merge small files at the end of a map-only job</description>
316 </property>
317
318 <property>
319 <name>hive.merge.mapredfiles</name>
320 <value>false</value>
321 <description>Merge small files at the end of any job(map only or map-reduce)</description>
322 </property>
323
324 <property>
325 <name>hive.heartbeat.interval</name>
326 <value>1000</value>
327 <description>Send a heartbeat after this interval - used by mapjoin and filter operators</description>
328 </property>
329
330 <property>
331 <name>hive.merge.size.per.task</name>
332 <value>256000000</value>
333 <description>Size of merged files at the end of the job</description>
334 </property>
335
336 <property>
337 <name>hive.script.auto.progress</name>
338 <value>false</value>
339 <description>Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker. </description>
340 </property>
341
342 </configuration>
3)download oracle jdbc:------- ojdbc-14.jar
http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/htdocs/jdbc_10201.html
4) test it :hive> [[email protected] ~]# hive
Hive history file=/tmp/root/hive_job_log_root_201001162330_322775424.txt
hive> show tables;
OK
Time taken: 6.008 seconds
hive> create table test(id bigint);
OK
Time taken: 0.495 seconds
hive>
5)login into oracle,check if autocreate some tables;
[[email protected] ~]$ sqlplus hive/[email protected]:1521/spgjmega
SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jan 16 23:57:37 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> show user;
USER is "HIVE"
SQL> select * from tab;
TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
NUCLEUS_TABLES TABLE
A TABLE
DBS TABLE
SEQUENCE_TABLE TABLE
SERDES TABLE
TBLS TABLE
SDS TABLE
PARTITION_KEYS TABLE
COLUMNS TABLE
BUCKETING_COLS TABLE
SD_PARAMS TABLE
TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
SORT_COLS TABLE
SERDE_PARAMS TABLE
TABLE_PARAMS TABLE
14 rows selected.
SQL> set line 1000
SQL> r
1* select * from TBLS
TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME
---------- ----------- ---------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
1 1263655926 1 0 root 0 1 test
SQL>
======參考的連結==============
http://code.google.com/p/hacdb/wiki/HiveTry
http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx
http://www.jpox.org/docs/1_0/rdbms.html
http://bbs.hadoopor.com/thread-219-1-1.html
http://wiki.apache.org/hadoop/HiveDerbyServerMode
相關推薦
將hive的元資料存到Oracle裡
感覺那個derby用不來,,故想將元資料放置到oracle裡面【也可存到mysql】。至少知道怎麼備份它。。。下面是研究了一晚上的東東,人太笨了。多謝 國寶 斑豬、=========================================How to store Hi
hive元資料從mysql遷移到oracle
Stop Hive MetaStore and Hive Server sudo service hive-metastore stop sudo service hive-server2 stop Dump myql資料 執
(三)Hive元資料資訊對應MySQL資料庫表
(三)Hive元資料資訊對應MySQL資料庫表 目錄 概述 一、儲存Hive版本的元資料表(VERSION) 二、Hive資料庫相關的元資料表(DBS、DATABASE_PARAMS) 1、DBS 2、DATABASE_P
大資料(十八):Hive元資料配置、常見屬性配置、資料型別與資料轉換
一、Hive元資料配置到MySQL當中 為什麼要把Hive的元資料配置到MySql中?我們可以使用多個客戶端連線linux系統並且都嘗試啟動Hive,可以發現在啟動第二個Hive客戶端的時候發生報錯了。  
Atlas匯入Hive元資料
1、設定hive-site.xml ; <property> <name>hive.exec.post.hooks</name> <
【VBA】 通過VBA指令碼將EXCEL的資料匯入 ORACLE
作者:lianghc 描述:最近在使用infamatica 將excel 的資料匯入oracle時,遇到意外終止錯誤,無法將資料匯入。於是採用VBA將資料匯入資料庫,是辦公人員一鍵同步excel的資料導資料庫中,這種做法的前提是提供標準的模板。下面是解決問題過程中收集的連
如何監聽對 HIVE 元資料的操作
目錄 簡介 HIVE 基本操作 獲取 HIVE 原始碼 編譯 HIVE 原始碼 啟動 HIVE 停止 HIVE 監聽對 HIVE 元資料的操作 參考文件 簡介 公司有個元資料管理平臺,會定期同步 HIVE 中的元資料。但這樣做有個問題,
利用pyhive將hive查詢資料匯入到mysql
在大資料工作中經常碰到需要將hive查詢資料匯入到mysql的需求,常見的方法主要有兩種,一是sqoop,另一種則是pyhive。本文主要講的就是python的pyhive庫的安裝與使用。 pyhive作用 遠端連線hive資料庫,執行hive sql,而
hive元資料服務啟動
啟動hive報錯 Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoo
利用sqoop將hive資料匯入Oracle中(踩的坑)
教程很多,這裡只說踩過的坑 1.下載sqoop時,還得下一個bin的包,拿到sqoop-1.4.6.jar 的包,放到hadoop的lib目錄下 2.匯入oracle,執行程式碼時,使用者名稱和表名必須大寫!且資料庫建表時表名必須大寫! 示例程式碼: sqoop expo
利用sqoop將hive資料匯入Oracle中
首先: 如oracle則執行sqoop list-databases --connect jdbc:oracle:thin:@//192.168.27.235:1521/ORCL --username DATACENTER -P 來測試是否能正確連線資料庫 如mysql則執行sq
通過python將MySQL,Oracle查詢的資料存到Excel檔案
#!/usr/bin/python # -*- coding: UTF-8 -*- #------------------------------------------------------------------------------- # Name:
Sqoop_詳細總結 使用Sqoop將HDFS/Hive/HBase與MySQL/Oracle中的資料相互匯入、匯出
一、使用Sqoop將MySQL中的資料匯入到HDFS/Hive/HBase 二、使用Sqoop將HDFS/Hive/HBase中的資料匯出到MySQL2.3HBase中的資料匯出到mysql目前沒有直
Sqoop_具體總結 使用Sqoop將HDFS/Hive/HBase與MySQL/Oracle中的數據相互導入、導出
能夠 mes south ase form html 技術 popu 沒有 一、使用Sqoop將MySQL中的數據導入到HDFS/Hive/HBase 二、使用Sqoop將HDFS/Hive/HBase中的數據導出到MySQL 2.3 HBase中的數據
將控制臺輸入的資料存到文檔中,並按照存入資料的年齡進行排序
file 實現 冒泡 控制 eno rabl trace gen 是否 package LX10_11; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File;
使用spark將hive中的資料匯入到mongodb
import com.huinong.truffle.push.process.domain.common.constant.Constants; import com.mongodb.spark.MongoSpark; import com.mongodb.spark.config.WriteConf
從INFORMATION_SCHEMA裡獲取mysql裡的元資料資訊
從INFORMATION_SCHEMA裡獲取mysql裡的元資料資訊 information_schema是MySQL自帶的系統資料庫,儲存了所有的元資料。 information_schema資料庫中的表都是隻讀的,不能進行更新、刪除和插入等操作。 information_sche
SSM框架---上傳多張圖片到專案資料夾內,並將圖片的路徑存到資料庫
資料庫:所存的圖片路徑可以為空,不確定要上傳幾張圖片,圖片id自增 spring配置檔案裡配置檔案上傳解析器 <bean id="multipartResolver" class="org.springframework.web.multipart.commons
如何快速地將Hive中的資料匯入ClickHouse
如何快速地將Hive中的資料匯入ClickHouse ClickHouse是面向OLAP的分散式列式DBMS。我們部門目前已經把所有資料分析相關的日誌資料儲存至ClickHouse這個優秀的資料倉庫之中,當前日資料量達到了300億。 在之前的文章如何快速地把HDFS中的資料
oracle 如何將多行資料進行合併(用LISTAGG!!!)
LISTAGG(列名,' 分割符號') oracle 11g 以上的版本才有的一個將指定列名的多行查詢結果,用 指定的分割符號 合併成一行顯示: 例如: 表原始資料: 需求:將 mb1_Transport_License_list 表中的資料,根據 transpo