1. 程式人生 > >Oozie安裝應用-工作流引擎 Oozie

Oozie安裝應用-工作流引擎 Oozie

本文基於 Centos6.x + CDH5.x

Oozie是什麼

簡單的說Oozie是一個工作流引擎。只不過它是一個基於Hadoop的工作流引擎,在實際工作中,遇到對資料進行一連串的操作的時候很實用,不需要自己寫一些處理程式碼了,只需要定義好各個action,然後把他們串在一個工作流裡面就可以自動執行了。對於大資料的分析工作非常有用

安裝Oozie

Oozie分為服務端和客戶端,我現在選擇host1作為服務端,host2作為客戶端。所以在host1上執行[plain] view plain copy print?
  1. yum install oozie  
yum install oozie

在host2上執行[plain]
view plain copy print?
  1. yum install oozie-client  
yum install oozie-client

配置Oozie

配置Oozie使用的MapReduce版本,MapReduce版本有兩個一個是 MRv1 和 YARN。因為我們選擇的是YARN,而且我為了方便上手暫時不用SSL,所以切換成不帶SSL並且使用YARN[plain] view plain copy print?
  1. alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http  
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http

設定Oozie使用的資料庫

這裡提到的資料庫是關係型資料庫,用來儲存Oozie的資料。Oozie自帶一個Derby,不過Derby只能拿來實驗的玩玩,不能上戰場的。這裡我選擇mysql作為Oozie的資料庫我假設你已經安裝好了mysql資料庫,接下來就是建立Oozie用的資料庫[sql] view plain copy print?
  1. $ mysql -u root -p  
  2. Enter password: ******  
  3. mysql> createdatabase oozie;  
  4. Query OK, 1 row affected (0.03 sec)  
  5. mysql>  grantallprivileges
    on oozie.* to'oozie'@'localhost' identified by'oozie';  
  6. Query OK, 0 rows affected (0.03 sec)  
  7. mysql>  grantallprivilegeson oozie.* to'oozie'@'%' identified by'oozie';  
  8. Query OK, 0 rows affected (0.03 sec)  
$ mysql -u root -p
Enter password: ******

mysql> create database oozie;
Query OK, 1 row affected (0.03 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
編輯 oozie-site.xml 配置mysql的連線屬性[html] view plain copy print?
  1. <property>
  2.         <name>oozie.service.JPAService.jdbc.driver</name>
  3.         <value>com.mysql.jdbc.Driver</value>
  4.     </property>
  5.     <property>
  6.         <name>oozie.service.JPAService.jdbc.url</name>
  7.         <value>jdbc:mysql://localhost:3306/oozie</value>
  8.     </property>
  9.     <property>
  10.         <name>oozie.service.JPAService.jdbc.username</name>
  11.         <value>oozie</value>
  12.     </property>
  13.     <property>
  14.         <name>oozie.service.JPAService.jdbc.password</name>
  15.         <value>oozie</value>
  16.     </property>
<property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://localhost:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>

把mysql的jdbc驅動做一個軟鏈到 /var/lib/oozie/[plain] view plain copy print?
  1. $ sudo yum install mysql-connector-java    
  2. $ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar  
$ sudo yum install mysql-connector-java  
$ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar
第一行,如果你已經裝過 mysql-connector-java 可以跳過這步


建立oozie需要的表結構[plain] view plain copy print?
  1. $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run  
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run

開啟Web控制檯

Step1

Step2

解壓開 ext-2.2.zip 並拷貝到 /var/lib/oozie.
[plain] view plain copy print?
  1. # unzip ext-2.2.zip   
  2. # mv ext-2.2 /var/lib/oozie/  
# unzip ext-2.2.zip 
# mv ext-2.2 /var/lib/oozie/

在HDFS上安裝Oozie庫

為oozie分配hdfs的許可權,編輯所有機器上的 /etc/hadoop/conf/core-site.xml ,增加如下配置[html] view plain copy print?
  1. <property>
  2.    <name>hadoop.proxyuser.oozie.hosts</name>
  3.    <value>*</value>
  4. </property>
  5. <property>
  6.    <name>hadoop.proxyuser.oozie.groups</name>
  7.    <value>*</value>
  8. </property>
   <property>
      <name>hadoop.proxyuser.oozie.hosts</name>
      <value>*</value>
   </property>
   <property>
      <name>hadoop.proxyuser.oozie.groups</name>
      <value>*</value>
   </property>
並重啟hadoop的service(namenode 和 datanode  就行了)

拷貝Oozie的Jars到HDFS,讓DistCp, Pig, Hive, and Sqoop 可以呼叫[plain] view plain copy print?
  1. $ sudo -u hdfs hadoop fs -mkdir /user/oozie  
  2. $ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie  
  3. $ sudo oozie-setup sharelib create -fs hdfs://mycluster/user/oozie -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz  
$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ sudo oozie-setup sharelib create -fs hdfs://mycluster/user/oozie -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz

這裡的mycluster請自行替換成你的clusterId

啟動Oozie

[plain] view plain copy print?
  1. $ sudo service oozie start  
$ sudo service oozie start

使用Oozie

連線Oozie的方法

連線Oozie有三個方法

用客戶端連線

由於我的client端裝在了host2上,所以在host2上執行[plain] view plain copy print?
  1. $ oozie admin -oozie http://host1:11000/oozie -status  
  2. System mode: NORMAL  
$ oozie admin -oozie http://host1:11000/oozie -status
System mode: NORMAL
為了方便,不用每次都輸入oozie-server所在伺服器,我們可以設定環境變數[plain] view plain copy print?
  1. $ export OOZIE_URL=http://host1:11000/oozie  
  2. $ oozie admin -version  
  3. Oozie server build version: 4.0.0-cdh5.0.0  
$ export OOZIE_URL=http://host1:11000/oozie
$ oozie admin -version
Oozie server build version: 4.0.0-cdh5.0.0

用瀏覽器訪問

開啟瀏覽器訪問 http://host1:11000/oozie

用HUE訪問

上節課我們講了HUE的使用,現在可以在hue裡面配置上Oozie的引數。用HUE來使用Oozie。編輯 /etc/hue/conf/hue.init  找到  oozie_url  這個屬性,修改為真實地址[plain] view plain copy print?
  1. [liboozie]  
  2.   # The URL where the Oozie service runs on. This is required in order for  
  3.   # users to submit jobs. Empty value disables the config check.  
  4.   oozie_url=http://host1:11000/oozie  
[liboozie]
  # The URL where the Oozie service runs on. This is required in order for
  # users to submit jobs. Empty value disables the config check.
  oozie_url=http://host1:11000/oozie

重啟hue服務訪問hue中的oozie模組
點選 Workflow 可以看到工作流介面

Oozie的3個概念

Oozie有3個主要概念
  • workflow  工作流
  • coordinator 多個workflow可以組成一個coordinator,可以把前幾個workflow的輸出作為後一個workflow的輸入,也可以定義workflow的觸發條件,來做定時觸發
  • bundle 是對一堆coordinator的抽象
以下這幅圖解釋了Oozie元件之間的關係

hPDL

oozie採用一種叫 hPDL的xml規範來定義工作流。這是一個wordcount版本的hPDL的xml例子[html] view plain copy print?
  1. <workflow-appname='wordcount-wf'xmlns="uri:oozie:workflow:0.1">
  2.     <startto='wordcount'/>
  3.     <actionname='wordcount'>
  4.         <map-reduce>
  5.             <job-tracker>${jobTracker}</job-tracker>
  6.             <name-node>${nameNode}</name-node>
  7.             <configuration>
  8.                 <property>
  9.                     <name>mapred.mapper.class</name>
  10.                     <value>org.myorg.WordCount.Map</value>
  11.                 </property>
  12.                 <property>
  13.                     <name>mapred.reducer.class</name>
  14.                     <value>org.myorg.WordCount.Reduce</value>
  15.                 </property>
  16.                 <property>
  17.                     <name>mapred.input.dir</name>
  18.                     <value>${inputDir}</value>
  19.                 </property>
  20.                 <property>
  21.                     <name>mapred.output.dir</name>
  22.                     <value>${outputDir}</value>
  23.                 </property>
  24.             </configuration>
  25.         </map-reduce>
  26.         <okto='end'/>
  27.         <errorto='end'/>
  28.     </action>
  29.     <killname='kill'>
  30.         <message>Something went wrong: ${wf:errorCode('wordcount')}</message>
  31.     </kill/>
  32.     <endname='end'/>
  33. </workflow-app>
<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1">
    <start to='wordcount'/>
    <action name='wordcount'>
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.mapper.class</name>
                    <value>org.myorg.WordCount.Map</value>
                </property>
                <property>
                    <name>mapred.reducer.class</name>
                    <value>org.myorg.WordCount.Reduce</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to='end'/>
        <error to='end'/>
    </action>
    <kill name='kill'>
        <message>Something went wrong: ${wf:errorCode('wordcount')}</message>
    </kill/>
    <end name='end'/>
</workflow-app>

這個例子可以用以下這幅圖表示

一個oozie job的組成

一個oozie 的 job 一般由以下檔案組成
  • job.properties 記錄了job的屬性
  • workflow.xml 使用hPDL 定義任務的流程和分支
  • class 檔案,用來執行具體的任務
任務啟動的命令一般長這樣子[plain] view plain copy print?
  1. $ oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run  
$ oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

可以看到 任務開始是通過呼叫 oozie  job 命令並傳入oozie伺服器地址和 job.properties 的路徑開始。job.properties 是一個任務的執行入口

做個MapReduce例子

這裡使用官方提供的例子。

Step1

在 host1 上下載oozie包[plain] view plain copy print?
  1. wget http://apache.fayea.com/oozie/4.1.0/oozie-4.1.0.tar.gz  
wget http://apache.fayea.com/oozie/4.1.0/oozie-4.1.0.tar.gz

解壓開,裡面有一個 examples資料夾,我們將這個資料夾拷貝到別的地方,並改名為 oozie-examples 進入這個資料夾,然後修改pom.xml,在plugins裡面增加一個plugin[html] view plain copy print?
  1. <plugin>
  2.     <groupId>org.apache.maven.plugins</groupId>
  3.     <artifactId>maven-surefire-plugin</artifactId>
  4.     <version>2.5</version>
  5.         <configuration>
  6.             <skipTests>false</skipTests>
  7.             <testFailureIgnore>true</testFailureIgnore>
  8.             <forkMode>once</forkMode>
  9.         </configuration>
  10. </plugin>
<plugin>
	<groupId>org.apache.maven.plugins</groupId>
	<artifactId>maven-surefire-plugin</artifactId>
	<version>2.5</version>
		<configuration>
			<skipTests>false</skipTests>
			<testFailureIgnore>true</testFailureIgnore>
			<forkMode>once</forkMode>
		</configuration>
</plugin>
然後執行 mvn package 可以看到 target 資料夾下有 oozie-examples-4.1.0.jar

Step2

編輯 oozie-examples/src/main/apps/map-reduce/job.properties修改 namenode為hdfs 的namenode地址,因為我們是搭成ha模式,所以寫成 hdfs://mycluster 。修改 jobTracker為 resoucemanager 所在的地址,這邊為 host1:8032 改完後的 job.properties 長這樣[plain] view plain copy print?
  1. nameNode=hdfs://mycluster  
  2. jobTracker=host1:8032  
  3. queueName=default  
  4. examplesRoot=examples  
  5. oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce  
  6. outputDir=map-reduce  
nameNode=hdfs://mycluster
jobTracker=host1:8032
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce

這裡的 user.name 就是你執行oozie的linux 使用者名稱,我用的是root,所以最後的路徑會變成 hdfs://mycluster/user/root/examples/apps/map-reduce

Step3

根據上面配置的路徑,我們在hdfs上先建立出 /user/root/examples/apps/map-reduce/ 目錄[plain] view plain copy print?
  1. hdfs dfs -mkdir -p /user/root/examples/apps/map-reduce  
hdfs dfs -mkdir -p /user/root/examples/apps/map-reduce

然後把 src/main/apps/map-reduce/workflow.xml 傳到 /user/root/examples/apps/map-reduce 下面[plain] view plain copy print?
  1. hdfs dfs -put oozie-examples/src/main/apps/map-reduce/workflow.xml /user/root/examples/apps/map-reduce/  
hdfs dfs -put oozie-examples/src/main/apps/map-reduce/workflow.xml /user/root/examples/apps/map-reduce/
在 /user/root/examples/apps/map-reduce/ 裡面建立 lib 資料夾,並把 打包好的 oozie-examples-4.1.0.jar 上傳到這個目錄下
[plain] view plain copy print?
  1. hdfs dfs -mkdir /user/root/examples/apps/map-reduce/lib  
  2. hdfs dfs -put oozie-examples/target/oozie-examples-4.1.0.jar /user/root/examples/apps/map-reduce/lib  
hdfs dfs -mkdir /user/root/examples/apps/map-reduce/lib
hdfs dfs -put oozie-examples/target/oozie-examples-4.1.0.jar /user/root/examples/apps/map-reduce/lib


在hdfs上建立 /examples 資料夾[plain] view plain copy print?
  1. sudo -u hdfs hdfs dfs -mkdir /examples  
sudo -u hdfs hdfs dfs -mkdir /examples

把examples 資料夾裡面的  src\main\apps  資料夾傳到這個資料夾底下[plain] view plain copy print?
  1. hdfs dfs -put examples/src/main/apps /examples  
hdfs dfs -put examples/src/main/apps /examples
建立輸出跟輸入資料夾並上傳測試資料[plain] view plain copy print?
  1. hdfs dfs -mkdir -p /user/root/examples/input-data/text  
  2. hdfs dfs -mkdir -p /user/root/examples/output-data  
  3. hdfs dfs -put oozie-examples/src/main/data/data.txt /user/root/examples/input-data/text  
hdfs dfs -mkdir -p /user/root/examples/input-data/text
hdfs dfs -mkdir -p /user/root/examples/output-data
hdfs dfs -put oozie-examples/src/main/data/data.txt /user/root/examples/input-data/text

Step4

執行這個任務[plain] view plain copy print?
  1. oozie job -oozie http://host1:11000/oozie -config oozie-examples/src/main/apps/map-reduce/job.properties -run  
oozie job -oozie http://host1:11000/oozie -config oozie-examples/src/main/apps/map-reduce/job.properties -run

任務建立成功後會返回一個job號比如 job: 0000017-150302164219871-oozie-oozi-W然後你可以採用之前提供的 3 個連線oozie 的方法去查詢任務狀態,這裡我採用HUE去查詢的情況,點選最上面的 Workflow -> 儀表盤 -> Workflow
會看到有一個任務正在執行
點選後,可以實時的看任務狀態,完成後會變成SUCCESS
這時候去看下結果 /user/root/examples/output-data/map-reduce/part-00000[plain] view plain copy print?
  1. 0   To be or not to be, that is the question;  
  2. 42  Whether 'tis nobler in the mind to suffer  
  3. 84  The slings and arrows of outrageous fortune,  
  4. 129 Or to take arms against a sea of troubles,  
  5. 172 And by opposing, end them. To die, to sleep;  
  6. 217 No more; and by a sleep to say we end  
  7. 255 The heart-ache and the thousand natural shocks  
  8. 302 That flesh is heir to ? 'tis a consummation  
  9. 346 Devoutly to be wish'd. To die, to sleep;  
  10. 387 To sleep, perchance to dream. Ay, there's the rub,  
  11. 438 For in that sleep of death what dreams may come,  
  12. 487 When we have shuffled off this mortal coil,  
  13. 531 Must give us pause. There's the respect  
  14. 571 That makes calamity of so long life,  
  15. 608 For who would bear the whips and scorns of time,  
  16. 657 Th'oppressor's wrong, the proud man's contumely,  
  17. 706 The pangs of despised love, the law's delay,  
  18. 751 The insolence of office, and the spurns  
  19. 791 That patient merit of th'unworthy takes,  
  20. 832 When he himself might his quietus make  
  21. 871 With a bare bodkin? who would fardels bear,  
  22. 915 To grunt and sweat under a weary life,  
  23. 954 But that the dread of something after death,  
  24. 999 The undiscovered country from whose bourn  
  25. 1041    No traveller returns, puzzles the will,  
  26. 1081    And makes us rather bear those ills we have  
  27. 1125    Than fly to others that we know not of?  
  28. 1165    Thus conscience does make cowards of us all,  
  29. 1210    And thus the native hue of resolution  
  30. 1248    Is sicklied o'er with the pale cast of thought,  
  31. 1296    And enterprises of great pitch and moment  
  32. 1338    With this regard their currents turn awry,  
  33. 1381    And lose the name of action.  
0	To be or not to be, that is the question;
42	Whether 'tis nobler in the mind to suffer
84	The slings and arrows of outrageous fortune,
129	Or to take arms against a sea of troubles,
172	And by opposing, end them. To die, to sleep;
217	No more; and by a sleep to say we end
255	The heart-ache and the thousand natural shocks
302	That flesh is heir to ? 'tis a consummation
346	Devoutly to be wish'd. To die, to sleep;
387	To sleep, perchance to dream. Ay, there's the rub,
438	For in that sleep of death what dreams may come,
487	When we have shuffled off this mortal coil,
531	Must give us pause. There's the respect
571	That makes calamity of so long life,
608	For who would bear the whips and scorns of time,
657	Th'oppressor's wrong, the proud man's contumely,
706	The pangs of despised love, the law's delay,
751	The insolence of office, and the spurns
791	That patient merit of th'unworthy takes,
832	When he himself might his quietus make
871	With a bare bodkin? who would fardels bear,
915	To grunt and sweat under a weary life,
954	But that the dread of something after death,
999	The undiscovered country from whose bourn
1041	No traveller returns, puzzles the will,
1081	And makes us rather bear those ills we have
1125	Than fly to others that we know not of?
1165	Thus conscience does make cowards of us all,
1210	And thus the native hue of resolution
1248	Is sicklied o'er with the pale cast of thought,
1296	And enterprises of great pitch and moment
1338	With this regard their currents turn awry,
1381	And lose the name of action.

workflow.xml解析

我們把剛剛這個例子裡面的workflow.xml開啟看下[html] view plain copy print?
  1. <workflow-appxmlns="uri:oozie:workflow:0.2"name="map-reduce-wf">
  2.     <startto="mr-node"/>
  3.     <actionname="mr-node">
  4.         <map-reduce>
  5.             <job-tracker>${jobTracker}</job-tracker>
  6.             <name-node>${nameNode}</name-node>
  7.             <prepare>
  8.                 <deletepath="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
  9.             </prepare>
  10.             <configuration>
  11.                 <property>
  12.                     <name>mapred.job.queue.name</name>
  13.                     <value>${queueName}</value>
  14.                 </