1. 程式人生 > >hive表增量抽取到oracle數據庫的通用程序(二)

hive表增量抽取到oracle數據庫的通用程序(二)

img 系統 報表 default esql java this utf-8 Coding

hive表增量抽取到oracle數據庫的通用程序(一)

前一篇介紹了java程序的如何編寫、使用以及引用到的依賴包。這篇接著上一篇來介紹如何在oozie中使用該java程序。

在我的業務中,分為兩段:

1. 查詢hive表中的信息,通過oozie可以設置不同的變量作為增量查詢的條件。

2. 將hive查詢到的信息寫入到oracle中。

對應oozie中的workflow文件如下:

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
--> <workflow-app xmlns="uri:oozie:workflow:0.4" name="wf_${job_name}_day"> <start to="hive-node"/> <!-- 統計day: dm_guba_loginlog --> <action name="hive-node" retry-max="10" retry-interval="3"> <hive xmlns="uri:oozie:hive-action:0.2"> <
job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>${hive_site_path}</job-xml> <configuration> <property> <name>mapred.job.queue.name</name>
<value>${queueName}</value> </property> </configuration> <script>script.q</script> <param>tmp_table=tmp_dm_xxx_day</param> <param>params_dt=${params_dt}</param> </hive> <ok to="java-node"/> <error to="senderror"/> </action> <!-- 註意:hive_hql,rdms_presql語句結尾不能使用分號 --> <action name="java-node"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <main-class>com.exe.Hive2RMDS</main-class> <arg>--hive_url</arg> <arg>jdbc:hive2://xx.xx.xx.xx:10000/default</arg> <arg>--hive_hql</arg> <arg>select field1,field2,field3,dim,period,period_value from dw_dm.dm_xxx where period = ‘day‘ and period_value = ${params_dt}</arg> <arg>--rdms_driver</arg> <arg>oracle.jdbc.driver.OracleDriver</arg> <arg>--rdms_url</arg> <arg>jdbc:oracle:thin:@xx.xx.xx.xx:1521:test001</arg> <arg>--rdms_username</arg> <arg>DW_test</arg> <arg>--rdms_password</arg> <arg>DW_test</arg> <arg>--rdms_tableName</arg> <arg>DW_DM.DM_xxx_TEST</arg> <arg>--rdms_columnNames</arg> <arg>field1,field2,field3,dim,period,period_value</arg> <arg>--rdms_presql</arg> <arg>delete from DW_DM.DM_xxx_TEST where period = ‘day‘ and period_value = ‘${params_dt}‘</arg> </java> <ok to="end"/> <error to="senderror"/> </action>
<!--出錯發送郵件 --> <action name="senderror"> <email xmlns="uri:oozie:email-action:0.1"> <to>[email protected]</to> <subject>${job_name} log error in ${params_dt}</subject> <body>Error: ${params_dt} ,error message[${wf:errorMessage(wf:lastErrorNode())}]</body> </email> <ok to="fail"/> <error to="fail"/> </action> <kill name="fail"> <message>workflow: ${wf:id()}, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>

以下為在oozie中的執行過程提示信息:

技術分享圖片

總結:

該程序相對來說比較通用,可以在導入oracle前,預先執行一段sql語句用於清除oracle中已有的記錄。

通過批處理導入,執行效率比較高。

通常用於將匯總的結果導出到關系型數據庫中,然後通過關系型數據庫直接展示到BI報表系統中。

hive表增量抽取到oracle數據庫的通用程序(二)