【轉】azkaban的部署過程中遇到的一些坑(部署篇)
注:azkaban之前有個配置檔案預設要求6G以上可用記憶體的坑,解決完之後今天又遇到了程式碼寫死3G以上記憶體的坑,根據報錯資訊正巧搜到了這篇文章,另外作者的主頁https://my.oschina.net/u/2988360裡也有其他幾篇關於azkaban的文章,推薦
1.azkaban原始碼下載
2.azkaban的安裝部署
下載完成MyAzkaban專案後,裡面有一份部署文件“MyAzkaban-3.0.0使用文件.doc”,參照著該文件進行操作
安裝完成後輸入一下網址進行訪問:https://ip:8443
3.部署過程中可能會遇到的一些坑
在進行專案部署的時候,遇到了一些坑,花了很長時間才解決,這邊分享給大家,希望大家在部署的時候能夠少走一些彎路
3.1官網專案非maven專案
官方提供的原始碼並不是maven專案,不支援maven編譯及打包構建,如果想採用maven進行構建,則通過上面的第一個原始碼連結進行下載
3.2 安裝完進行啟動時候的坑
安裝完成之後,一定要在bin檔案的上一層目錄進行啟動
./bin/start-web.sh
而不能cd到bin目錄裡面進行啟動,因為該啟動指令碼中引用到了當前位置目錄資訊
3.3 啟動指令碼可執行許可權設定
啟動指令碼上傳至伺服器中預設是不具備可執行許可權的,所以需要授予可執行許可權
sudo chmod 755 xxx.sh
3.4 window和linux作業系統空格問題處理
3.5 Multiple Executor Mode模式配置配置對executor主機記憶體限制
azkaban.use.multiple.executors=true
//execute主機過濾器配置
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
其中MinimumFreeMemory過濾器會檢查executor主機空餘記憶體是否會大於6G,如果不足6G,則web-server不會將任務交由該主機執行,具體原始碼如下:
private static final int MINIMUM_FREE_MEMORY = 6 * 1024; /**<pre> * function to register the static Minimum Reserved Memory filter. * NOTE : this is a static filter which means the filter will be filtering based on the system standard which is not * Coming for the passed flow. * This filter will filter out any executors that has the remaining memory below 6G *</pre> * */ private static FactorFilter<Executor, ExecutableFlow> getMinimumReservedMemoryFilter(){ return FactorFilter.create(MINIMUMFREEMEMORY_FILTER_NAME, new FactorFilter.Filter<Executor, ExecutableFlow>() { private static final int MINIMUM_FREE_MEMORY = 6 * 1024; public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) { if (null == filteringTarget){ logger.debug(String.format("%s : filtering out the target as it is null.", MINIMUMFREEMEMORY_FILTER_NAME)); return false; } ExecutorInfo stats = filteringTarget.getExecutorInfo(); if (null == stats) { logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.", MINIMUMFREEMEMORY_FILTER_NAME, filteringTarget.toString())); return false; } return stats.getRemainingMemoryInMB() > MINIMUM_FREE_MEMORY ; } }); }
CpuStatus過濾器會判斷執行主機的cpu佔用率是否達到95%,若達到95%,web-server也不會將任務交給該主機執行
/**
* <pre>
* function to register the static Minimum Reserved Memory filter.
* NOTE : this is a static filter which means the filter will be filtering based on the system standard which
* is not Coming for the passed flow.
* This filter will filter out any executors that the current CPU usage exceed 95%
* </pre>
* */
private static FactorFilter<Executor, ExecutableFlow> getCpuStatusFilter(){
return FactorFilter.create(CPUSTATUS_FILTER_NAME, new FactorFilter.Filter<Executor, ExecutableFlow>() {
private static final int MAX_CPU_CURRENT_USAGE = 95;
public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) {
if (null == filteringTarget){
logger.debug(String.format("%s : filtering out the target as it is null.", CPUSTATUS_FILTER_NAME));
return false;
}
ExecutorInfo stats = filteringTarget.getExecutorInfo();
if (null == stats) {
logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.",
MINIMUMFREEMEMORY_FILTER_NAME,
filteringTarget.toString()));
return false;
}
return stats.getCpuUsage() < MAX_CPU_CURRENT_USAGE ;
}
});
}
3.6 任務執行申請不到記憶體
如果任務執行失敗,報錯資訊如下
14-09-2017 13:50:01 CST A INFO - Starting job A at 1505368201283
14-09-2017 13:50:01 CST A INFO - azkaban.webserver.url property was not set
14-09-2017 13:50:01 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=184 -Dazkaban.jobid=A
14-09-2017 13:50:01 CST A INFO - Building command job executor.
14-09-2017 13:50:01 CST A ERROR - pluginLoadProps is null
14-09-2017 13:50:01 CST A ERROR - Job run failed!
java.lang.Exception: Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:86)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:590)
at azkaban.execapp.JobRunner.run(JobRunner.java:443)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
14-09-2017 13:50:01 CST A ERROR - Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A cause: null
14-09-2017 13:50:01 CST A INFO - Finishing job A attempt: 0 at 1505368201336 with status FAILED
多半是因為所有執行主機記憶體不足引起,azkaban原始碼要求執行主機可用記憶體必須大於3G才能滿足執行任務的條件
azkaban對應的原始碼如下:
private static final long LOW_MEM_THRESHOLD = 3L*1024L*1024L; //3 GB
/**
* @param xms
* @param xmx
* @return System can satisfy the memory request or not
*
* Given Xms/Xmx values (in kb) used by java process, determine if system can
* satisfy the memory request
*/
public synchronized static boolean canSystemGrantMemory(long xms, long xmx, long freeMemDecrAmt) {
if (!memCheckEnabled) {
return true;
}
//too small amount of memory left, reject
if (freeMemAmount < LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb), memory request declined.",
freeMemAmount, LOW_MEM_THRESHOLD));
return false;
}
//let's get newest mem info
if (freeMemAmount >= LOW_MEM_THRESHOLD && freeMemAmount < 2 * LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount (%d kb) is less than 2x low mem threshold (%d kb), re-read /proc/meminfo",
freeMemAmount, LOW_MEM_THRESHOLD));
readMemoryInfoFile();
}
//too small amount of memory left, reject
if (freeMemAmount < LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb), memory request declined.",
freeMemAmount, LOW_MEM_THRESHOLD));
return false;
}
if (freeMemAmount - xmx < LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount minus xmx (%d - %d kb) is less than low mem threshold (%d kb), memory request declined.",
freeMemAmount, xmx, LOW_MEM_THRESHOLD));
return false;
}
if (freeMemDecrAmt > 0) {
freeMemAmount -= freeMemDecrAmt;
logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", freeMemDecrAmt, freeMemAmount));
} else {
freeMemAmount -= xms;
logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", xms, freeMemAmount));
}
return true;
}
3.7 Multiple Executor Mode模式部署目前還不支援主機及埠對應關係配置
Multiple Executor Mode模式部署目前還不支援主機及埠對應關係配置,所以需要手動執行sql往資料庫表中插入資料
insert into executors(host,port) values("EXECUTOR_PORT",EXECUTOR_PORT);
4.原始碼包在windos中直接編譯(本地需要安裝git客戶端)
1.window命令列切換到目標目錄 2.git clone https://github.com/azkaban/azkaban 3.下載完成後 執行gradlew build -x test命令構建(跳過測試) 4.構建成功後找到server以及executor的buit目錄的distributions目錄下
5.azkaban3.35版本資訊中報錯問題解決
5.1 Missing required property 'azkaban.native.lib'報錯解決
報錯資訊如下:
16-09-2017 19:48:28 CST A INFO - Starting job A at 1505562508575
16-09-2017 19:48:28 CST A INFO - azkaban.webserver.url property was not set
16-09-2017 19:48:28 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=1 -Dazkaban.jobid=A
16-09-2017 19:48:28 CST A INFO - Building command job executor.
16-09-2017 19:48:28 CST A INFO - Memory granted for job A
16-09-2017 19:48:28 CST A INFO - 2 commands to execute.
16-09-2017 19:48:28 CST A INFO - cwd=/app/azkaban/source_buit/azkaban-exec-server-3.35.0/executions/1
16-09-2017 19:48:28 CST A INFO - effective user is: azkaban
16-09-2017 19:48:28 CST A ERROR - Job run failed!
azkaban.utils.UndefinedPropertyException: Missing required property 'azkaban.native.lib'
at azkaban.utils.Props.getString(Props.java:420)
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:234)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:748)
at azkaban.execapp.JobRunner.doRun(JobRunner.java:591)
at azkaban.execapp.JobRunner.run(JobRunner.java:552)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16-09-2017 19:48:28 CST A ERROR - Missing required property 'azkaban.native.lib' cause: null
16-09-2017 19:48:28 CST A INFO - Finishing job A at 1505562508845 with status FAILED
解決方案:
配置commonprivate.properties
5.2 介面樣式問題處理
切換完最新原始碼(3.35.0)進行打包後,部署出來的介面存在樣式問題
出現的原因,伺服器中web-server目錄下面的web資料夾我拷貝的是下面的目錄
該目錄下面並沒有azkaban.css樣式檔案
所以出現了樣式問題
解決辦法:
使用編譯後install目錄下的web檔案上傳至伺服器
配置完成後重新啟動,介面展示正常:
說明:
Azkaban中的每個job都是一個程序,在Azkaban中判斷job成功與否是根據這個程序是否成功執行完成,但是在MR 或者Spark Job執行的過程中,如果程式碼出錯,執行在叢集上的任務會停止,並不會有內容寫入目標檔案中,此時返回給Azkaban的程序是執行成功的,也就是job節點執行成功。這與任務執行的結果相悖。
例如:
在執行某個jar包的過程中時,出現了NullPointException,此時MR作業停止,但是最終Process 顯示的為執行成功。並且節點最終執行的結果也為成功:
所以為了防止依賴的節點出現錯誤,其以下節點仍可執行的情況。需要換一個校驗job是否正確執行的維度進行評判,比如檢測MR 或者 Spark 任務的log檔案是否正確執行等,或者檢測叢集中的任務是否執行成功。
總結:在執行結束後可以返回hdfs中查詢是否有對應的檔案生成,如果有則表示成功,沒有則表示失敗