Oozie安裝應用-工作流引擎 Oozie
阿新 • • 發佈:2019-01-26
本文基於 Centos6.x + CDH5.x
Oozie是什麼
簡單的說Oozie是一個工作流引擎。只不過它是一個基於Hadoop的工作流引擎,在實際工作中,遇到對資料進行一連串的操作的時候很實用,不需要自己寫一些處理程式碼了,只需要定義好各個action,然後把他們串在一個工作流裡面就可以自動執行了。對於大資料的分析工作非常有用安裝Oozie
Oozie分為服務端和客戶端,我現在選擇host1作為服務端,host2作為客戶端。所以在host1上執行[plain] view plain copy print?- yum install oozie
yum install oozie
在host2上執行[plain]
- yum install oozie-client
yum install oozie-client
配置Oozie
配置Oozie使用的MapReduce版本,MapReduce版本有兩個一個是 MRv1 和 YARN。因為我們選擇的是YARN,而且我為了方便上手暫時不用SSL,所以切換成不帶SSL並且使用YARN[plain] view plain copy print?- alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http
設定Oozie使用的資料庫
這裡提到的資料庫是關係型資料庫,用來儲存Oozie的資料。Oozie自帶一個Derby,不過Derby只能拿來實驗的玩玩,不能上戰場的。這裡我選擇mysql作為Oozie的資料庫我假設你已經安裝好了mysql資料庫,接下來就是建立Oozie用的資料庫[sql] view plain copy print?- $ mysql -u root -p
- Enter password: ******
- mysql> createdatabase oozie;
- Query OK, 1 row affected (0.03 sec)
- mysql> grantallprivileges
- Query OK, 0 rows affected (0.03 sec)
- mysql> grantallprivilegeson oozie.* to'oozie'@'%' identified by'oozie';
- Query OK, 0 rows affected (0.03 sec)
$ mysql -u root -p
Enter password: ******
mysql> create database oozie;
Query OK, 1 row affected (0.03 sec)
mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
編輯 oozie-site.xml 配置mysql的連線屬性[html] view plain copy print?- <property>
- <name>oozie.service.JPAService.jdbc.driver</name>
- <value>com.mysql.jdbc.Driver</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.url</name>
- <value>jdbc:mysql://localhost:3306/oozie</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.username</name>
- <value>oozie</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.password</name>
- <value>oozie</value>
- </property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://localhost:3306/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie</value>
</property>
把mysql的jdbc驅動做一個軟鏈到 /var/lib/oozie/[plain] view plain copy print?
- $ sudo yum install mysql-connector-java
- $ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar
$ sudo yum install mysql-connector-java
$ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar
第一行,如果你已經裝過 mysql-connector-java 可以跳過這步建立oozie需要的表結構[plain] view plain copy print?
- $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
開啟Web控制檯
Step1
Step2
解壓開 ext-2.2.zip 並拷貝到 /var/lib/oozie.[plain] view plain copy print?
- # unzip ext-2.2.zip
- # mv ext-2.2 /var/lib/oozie/
# unzip ext-2.2.zip
# mv ext-2.2 /var/lib/oozie/
在HDFS上安裝Oozie庫
為oozie分配hdfs的許可權,編輯所有機器上的 /etc/hadoop/conf/core-site.xml ,增加如下配置[html] view plain copy print?- <property>
- <name>hadoop.proxyuser.oozie.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.oozie.groups</name>
- <value>*</value>
- </property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
並重啟hadoop的service(namenode 和 datanode 就行了)拷貝Oozie的Jars到HDFS,讓DistCp, Pig, Hive, and Sqoop 可以呼叫[plain] view plain copy print?
- $ sudo -u hdfs hadoop fs -mkdir /user/oozie
- $ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
- $ sudo oozie-setup sharelib create -fs hdfs://mycluster/user/oozie -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz
$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ sudo oozie-setup sharelib create -fs hdfs://mycluster/user/oozie -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz
這裡的mycluster請自行替換成你的clusterId
啟動Oozie
[plain] view plain copy print?- $ sudo service oozie start
$ sudo service oozie start
使用Oozie
連線Oozie的方法
連線Oozie有三個方法用客戶端連線
由於我的client端裝在了host2上,所以在host2上執行[plain] view plain copy print?- $ oozie admin -oozie http://host1:11000/oozie -status
- System mode: NORMAL
$ oozie admin -oozie http://host1:11000/oozie -status
System mode: NORMAL
為了方便,不用每次都輸入oozie-server所在伺服器,我們可以設定環境變數[plain] view plain copy print?- $ export OOZIE_URL=http://host1:11000/oozie
- $ oozie admin -version
- Oozie server build version: 4.0.0-cdh5.0.0
$ export OOZIE_URL=http://host1:11000/oozie
$ oozie admin -version
Oozie server build version: 4.0.0-cdh5.0.0
用瀏覽器訪問
開啟瀏覽器訪問 http://host1:11000/oozie用HUE訪問
上節課我們講了HUE的使用,現在可以在hue裡面配置上Oozie的引數。用HUE來使用Oozie。編輯 /etc/hue/conf/hue.init 找到 oozie_url 這個屬性,修改為真實地址[plain] view plain copy print?- [liboozie]
- # The URL where the Oozie service runs on. This is required in order for
- # users to submit jobs. Empty value disables the config check.
- oozie_url=http://host1:11000/oozie
[liboozie]
# The URL where the Oozie service runs on. This is required in order for
# users to submit jobs. Empty value disables the config check.
oozie_url=http://host1:11000/oozie
重啟hue服務訪問hue中的oozie模組
點選 Workflow 可以看到工作流介面
Oozie的3個概念
Oozie有3個主要概念- workflow 工作流
- coordinator 多個workflow可以組成一個coordinator,可以把前幾個workflow的輸出作為後一個workflow的輸入,也可以定義workflow的觸發條件,來做定時觸發
- bundle 是對一堆coordinator的抽象
hPDL
oozie採用一種叫 hPDL的xml規範來定義工作流。這是一個wordcount版本的hPDL的xml例子[html] view plain copy print?- <workflow-appname='wordcount-wf'xmlns="uri:oozie:workflow:0.1">
- <startto='wordcount'/>
- <actionname='wordcount'>
- <map-reduce>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <configuration>
- <property>
- <name>mapred.mapper.class</name>
- <value>org.myorg.WordCount.Map</value>
- </property>
- <property>
- <name>mapred.reducer.class</name>
- <value>org.myorg.WordCount.Reduce</value>
- </property>
- <property>
- <name>mapred.input.dir</name>
- <value>${inputDir}</value>
- </property>
- <property>
- <name>mapred.output.dir</name>
- <value>${outputDir}</value>
- </property>
- </configuration>
- </map-reduce>
- <okto='end'/>
- <errorto='end'/>
- </action>
- <killname='kill'>
- <message>Something went wrong: ${wf:errorCode('wordcount')}</message>
- </kill/>
- <endname='end'/>
- </workflow-app>
<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1">
<start to='wordcount'/>
<action name='wordcount'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.myorg.WordCount.Map</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.myorg.WordCount.Reduce</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to='end'/>
<error to='end'/>
</action>
<kill name='kill'>
<message>Something went wrong: ${wf:errorCode('wordcount')}</message>
</kill/>
<end name='end'/>
</workflow-app>
這個例子可以用以下這幅圖表示
一個oozie job的組成
一個oozie 的 job 一般由以下檔案組成- job.properties 記錄了job的屬性
- workflow.xml 使用hPDL 定義任務的流程和分支
- class 檔案,用來執行具體的任務
- $ oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
$ oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
可以看到 任務開始是通過呼叫 oozie job 命令並傳入oozie伺服器地址和 job.properties 的路徑開始。job.properties 是一個任務的執行入口
做個MapReduce例子
這裡使用官方提供的例子。Step1
在 host1 上下載oozie包[plain] view plain copy print?- wget http://apache.fayea.com/oozie/4.1.0/oozie-4.1.0.tar.gz
wget http://apache.fayea.com/oozie/4.1.0/oozie-4.1.0.tar.gz
解壓開,裡面有一個 examples資料夾,我們將這個資料夾拷貝到別的地方,並改名為 oozie-examples 進入這個資料夾,然後修改pom.xml,在plugins裡面增加一個plugin[html] view plain copy print?
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-surefire-plugin</artifactId>
- <version>2.5</version>
- <configuration>
- <skipTests>false</skipTests>
- <testFailureIgnore>true</testFailureIgnore>
- <forkMode>once</forkMode>
- </configuration>
- </plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.5</version>
<configuration>
<skipTests>false</skipTests>
<testFailureIgnore>true</testFailureIgnore>
<forkMode>once</forkMode>
</configuration>
</plugin>
然後執行 mvn package 可以看到 target 資料夾下有 oozie-examples-4.1.0.jarStep2
編輯 oozie-examples/src/main/apps/map-reduce/job.properties修改 namenode為hdfs 的namenode地址,因為我們是搭成ha模式,所以寫成 hdfs://mycluster 。修改 jobTracker為 resoucemanager 所在的地址,這邊為 host1:8032 改完後的 job.properties 長這樣[plain] view plain copy print?- nameNode=hdfs://mycluster
- jobTracker=host1:8032
- queueName=default
- examplesRoot=examples
- oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
- outputDir=map-reduce
nameNode=hdfs://mycluster
jobTracker=host1:8032
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
這裡的 user.name 就是你執行oozie的linux 使用者名稱,我用的是root,所以最後的路徑會變成 hdfs://mycluster/user/root/examples/apps/map-reduce
Step3
根據上面配置的路徑,我們在hdfs上先建立出 /user/root/examples/apps/map-reduce/ 目錄[plain] view plain copy print?- hdfs dfs -mkdir -p /user/root/examples/apps/map-reduce
hdfs dfs -mkdir -p /user/root/examples/apps/map-reduce
然後把 src/main/apps/map-reduce/workflow.xml 傳到 /user/root/examples/apps/map-reduce 下面[plain] view plain copy print?
- hdfs dfs -put oozie-examples/src/main/apps/map-reduce/workflow.xml /user/root/examples/apps/map-reduce/
hdfs dfs -put oozie-examples/src/main/apps/map-reduce/workflow.xml /user/root/examples/apps/map-reduce/
在 /user/root/examples/apps/map-reduce/ 裡面建立 lib 資料夾,並把 打包好的 oozie-examples-4.1.0.jar 上傳到這個目錄下[plain] view plain copy print?
- hdfs dfs -mkdir /user/root/examples/apps/map-reduce/lib
- hdfs dfs -put oozie-examples/target/oozie-examples-4.1.0.jar /user/root/examples/apps/map-reduce/lib
hdfs dfs -mkdir /user/root/examples/apps/map-reduce/lib
hdfs dfs -put oozie-examples/target/oozie-examples-4.1.0.jar /user/root/examples/apps/map-reduce/lib
在hdfs上建立 /examples 資料夾[plain] view plain copy print?
- sudo -u hdfs hdfs dfs -mkdir /examples
sudo -u hdfs hdfs dfs -mkdir /examples
把examples 資料夾裡面的 src\main\apps 資料夾傳到這個資料夾底下[plain] view plain copy print?
- hdfs dfs -put examples/src/main/apps /examples
hdfs dfs -put examples/src/main/apps /examples
建立輸出跟輸入資料夾並上傳測試資料[plain] view plain copy print?- hdfs dfs -mkdir -p /user/root/examples/input-data/text
- hdfs dfs -mkdir -p /user/root/examples/output-data
- hdfs dfs -put oozie-examples/src/main/data/data.txt /user/root/examples/input-data/text
hdfs dfs -mkdir -p /user/root/examples/input-data/text
hdfs dfs -mkdir -p /user/root/examples/output-data
hdfs dfs -put oozie-examples/src/main/data/data.txt /user/root/examples/input-data/text
Step4
執行這個任務[plain] view plain copy print?- oozie job -oozie http://host1:11000/oozie -config oozie-examples/src/main/apps/map-reduce/job.properties -run
oozie job -oozie http://host1:11000/oozie -config oozie-examples/src/main/apps/map-reduce/job.properties -run
任務建立成功後會返回一個job號比如 job: 0000017-150302164219871-oozie-oozi-W然後你可以採用之前提供的 3 個連線oozie 的方法去查詢任務狀態,這裡我採用HUE去查詢的情況,點選最上面的 Workflow -> 儀表盤 -> Workflow
會看到有一個任務正在執行
點選後,可以實時的看任務狀態,完成後會變成SUCCESS
這時候去看下結果 /user/root/examples/output-data/map-reduce/part-00000[plain] view plain copy print?
- 0 To be or not to be, that is the question;
- 42 Whether 'tis nobler in the mind to suffer
- 84 The slings and arrows of outrageous fortune,
- 129 Or to take arms against a sea of troubles,
- 172 And by opposing, end them. To die, to sleep;
- 217 No more; and by a sleep to say we end
- 255 The heart-ache and the thousand natural shocks
- 302 That flesh is heir to ? 'tis a consummation
- 346 Devoutly to be wish'd. To die, to sleep;
- 387 To sleep, perchance to dream. Ay, there's the rub,
- 438 For in that sleep of death what dreams may come,
- 487 When we have shuffled off this mortal coil,
- 531 Must give us pause. There's the respect
- 571 That makes calamity of so long life,
- 608 For who would bear the whips and scorns of time,
- 657 Th'oppressor's wrong, the proud man's contumely,
- 706 The pangs of despised love, the law's delay,
- 751 The insolence of office, and the spurns
- 791 That patient merit of th'unworthy takes,
- 832 When he himself might his quietus make
- 871 With a bare bodkin? who would fardels bear,
- 915 To grunt and sweat under a weary life,
- 954 But that the dread of something after death,
- 999 The undiscovered country from whose bourn
- 1041 No traveller returns, puzzles the will,
- 1081 And makes us rather bear those ills we have
- 1125 Than fly to others that we know not of?
- 1165 Thus conscience does make cowards of us all,
- 1210 And thus the native hue of resolution
- 1248 Is sicklied o'er with the pale cast of thought,
- 1296 And enterprises of great pitch and moment
- 1338 With this regard their currents turn awry,
- 1381 And lose the name of action.
0 To be or not to be, that is the question;
42 Whether 'tis nobler in the mind to suffer
84 The slings and arrows of outrageous fortune,
129 Or to take arms against a sea of troubles,
172 And by opposing, end them. To die, to sleep;
217 No more; and by a sleep to say we end
255 The heart-ache and the thousand natural shocks
302 That flesh is heir to ? 'tis a consummation
346 Devoutly to be wish'd. To die, to sleep;
387 To sleep, perchance to dream. Ay, there's the rub,
438 For in that sleep of death what dreams may come,
487 When we have shuffled off this mortal coil,
531 Must give us pause. There's the respect
571 That makes calamity of so long life,
608 For who would bear the whips and scorns of time,
657 Th'oppressor's wrong, the proud man's contumely,
706 The pangs of despised love, the law's delay,
751 The insolence of office, and the spurns
791 That patient merit of th'unworthy takes,
832 When he himself might his quietus make
871 With a bare bodkin? who would fardels bear,
915 To grunt and sweat under a weary life,
954 But that the dread of something after death,
999 The undiscovered country from whose bourn
1041 No traveller returns, puzzles the will,
1081 And makes us rather bear those ills we have
1125 Than fly to others that we know not of?
1165 Thus conscience does make cowards of us all,
1210 And thus the native hue of resolution
1248 Is sicklied o'er with the pale cast of thought,
1296 And enterprises of great pitch and moment
1338 With this regard their currents turn awry,
1381 And lose the name of action.
workflow.xml解析
我們把剛剛這個例子裡面的workflow.xml開啟看下[html] view plain copy print?- <workflow-appxmlns="uri:oozie:workflow:0.2"name="map-reduce-wf">
- <startto="mr-node"/>
- <actionname="mr-node">
- <map-reduce>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <prepare>
- <deletepath="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
- </prepare>
- <configuration>
- <property>
- <name>mapred.job.queue.name</name>
- <value>${queueName}</value>
- </