OOzie排程sqoop1 Action 從mysql匯入資料到hive
阿新 • • 發佈:2019-01-29
前提條件:
1、安裝好mysql、Hadoop、oozie、hive
2、上面安裝的軟體皆可正確執行
開始:
用oozie排程sqoop1 Action 需要準備三個基本檔案,workflow.xml、job.properties、hive-site.xml(其實也可以不要,後面會說)檔案
1、在HDFS建立一個oozie的工作流應用儲存路徑,我建立的是/user/oozieDemo/workflows/sq2hiveDemo,在此路徑下再建立一個lib目錄,如下圖所示
2、將mysql的驅動程式上傳到lib目錄下 3、編寫job.properties檔案,檔案內容如下: oozie.wf.application.path=hdfs://NODE3:8020/user/oozieDemo/workflows/sq2hiveDemo #Shell Script to run EXEC=sq2hive.sh jobTracker=NODE3:8032 nameNode=hdfs://NODE3:8020 queueName=default oozie.use.system.libpath=true oozie.libpath=/user/oozie/share/lib/lib_20150708191612 user.name=root
4、編寫workflow.xml檔案,也就是sqoopAction的配置檔案,此處可以有兩種配置檔案,一種是命令列模式,一種是引數模式 命令列模式的workflow.xml配置檔案: <workflow-app xmlns='uri:oozie:workflow:0.1' name='sq2hive-wf'> <start to='sq2hive' /> <action name='sq2hive'> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://172.17.20.2:9083</value> </property> </configuration> <command>import --connect jdbc:mysql://172.17.20.4/scm --username root --password root --table ROLES --columns "ROLE_ID,NAME,HOST_ID" --delete-target-dir --hive-import --hive-overwrite --hive-table sun.roles -m 2</command> </sqoop> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' />
引數模式的workflow.xml配置檔案: <workflow-app xmlns='uri:oozie:workflow:0.1' name='sq2hive-wf'> <start to='sq2hive' /> <action name='sq2hive'> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://172.17.20.2:9083</value> </property> </configuration> <arg>import</arg> <arg>--connect</arg> <arg>jdbc:mysql://172.17.20.4/scm</arg> <arg>--username</arg> <arg>root</arg> <arg>--password</arg> <arg>root</arg> <arg>--table</arg> <arg>ROLES</arg> <arg>--columns</arg> <arg>ROLE_ID,NAME,HOST_ID</arg> <arg>--delete-target-dir</arg> <arg>--hive-import</arg> <arg>--hive-overwrite</arg> <arg>--hive-table</arg> <arg>sun.roles</arg> <arg>-m</arg> <arg>2</arg> <file>/user/oozieDemo/workflows/sq2hiveDemo/hive-site.xml#hive-site.xml</file> </sqoop> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow-app>
5,將workflow.xml檔案上傳到HDFS上,即oozie工作流應用路徑/user/oozieDemo/workflows/sq2hiveDemo下,同時將job.properties(也可以不用上傳,執行OOzie的job時指定的是本地的job.properties)和hive-site.xml上傳到此路徑下,如下圖:
6,使用命令列方式提交併啟動剛剛寫好的工作流:命令如下: oozie job --oozie http://node1:11000/oozie --config job.properties -run 此命令為提交併啟動job
7,可以通過oozie web介面或者 hue的workflow檢視介面檢視job執行情況,hue下檢視如下:
2、將mysql的驅動程式上傳到lib目錄下 3、編寫job.properties檔案,檔案內容如下: oozie.wf.application.path=hdfs://NODE3:8020/user/oozieDemo/workflows/sq2hiveDemo #Shell Script to run EXEC=sq2hive.sh jobTracker=NODE3:8032 nameNode=hdfs://NODE3:8020 queueName=default oozie.use.system.libpath=true oozie.libpath=/user/oozie/share/lib/lib_20150708191612 user.name=root
4、編寫workflow.xml檔案,也就是sqoopAction的配置檔案,此處可以有兩種配置檔案,一種是命令列模式,一種是引數模式 命令列模式的workflow.xml配置檔案: <workflow-app xmlns='uri:oozie:workflow:0.1' name='sq2hive-wf'> <start to='sq2hive' /> <action name='sq2hive'> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://172.17.20.2:9083</value> </property> </configuration> <command>import --connect jdbc:mysql://172.17.20.4/scm --username root --password root --table ROLES --columns "ROLE_ID,NAME,HOST_ID" --delete-target-dir --hive-import --hive-overwrite --hive-table sun.roles -m 2</command> </sqoop> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' />
引數模式的workflow.xml配置檔案: <workflow-app xmlns='uri:oozie:workflow:0.1' name='sq2hive-wf'> <start to='sq2hive' /> <action name='sq2hive'> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://172.17.20.2:9083</value> </property> </configuration> <arg>import</arg> <arg>--connect</arg> <arg>jdbc:mysql://172.17.20.4/scm</arg> <arg>--username</arg> <arg>root</arg> <arg>--password</arg> <arg>root</arg> <arg>--table</arg> <arg>ROLES</arg> <arg>--columns</arg> <arg>ROLE_ID,NAME,HOST_ID</arg> <arg>--delete-target-dir</arg> <arg>--hive-import</arg> <arg>--hive-overwrite</arg> <arg>--hive-table</arg> <arg>sun.roles</arg> <arg>-m</arg> <arg>2</arg> <file>/user/oozieDemo/workflows/sq2hiveDemo/hive-site.xml#hive-site.xml</file> </sqoop> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow-app>
5,將workflow.xml檔案上傳到HDFS上,即oozie工作流應用路徑/user/oozieDemo/workflows/sq2hiveDemo下,同時將job.properties(也可以不用上傳,執行OOzie的job時指定的是本地的job.properties)和hive-site.xml上傳到此路徑下,如下圖:
6,使用命令列方式提交併啟動剛剛寫好的工作流:命令如下: oozie job --oozie http://node1:11000/oozie --config job.properties -run 此命令為提交併啟動job
7,可以通過oozie web介面或者 hue的workflow檢視介面檢視job執行情況,hue下檢視如下: