分散式介面呼叫,補償機制如何設計
對於現在的大多數應用來說,基本上都是分散式的,單一應用的系統已經很少見了,那麼對於分散式系統服務,我們總會遇到介面失敗的情況,而對於這種情況,會根據返回結果型別來做不同處理,而我們今天要討論的就是有限時間內重試的機制,該如何設計。
首先我們的思路肯定是迴圈嘗試,是否已經到了指定時長,比方說3分鐘,那麼在第一次呼叫失敗以後,我們應該在3分鐘內多次嘗試介面呼叫,如果成功則返回,如果一直失敗超過3分鐘,則郵件傳送警告,或者採用預設返回值之類的補救措施。那麼對於3分鐘內多次嘗試,階段性的重複呼叫,該怎麼設計呢。下面我描述的方案來自spring Cloud Ribbon中負載均衡在選擇服務提供者的過程中的實現方案。
首先給出的自然是最大嘗試時間maxRetryMillis=500;
其次對呼叫介面方法做修改:
public result getResultFromRemoveServer(String param){
//獲取系統當前時間
long currentTime = System.currentTimeMillis();
//計算重試終止時間
long deadTime = currentTime+maxRetryMillis
//呼叫遠端介面
Result result = getResult(param);
//判斷返回值是否正常
if(result==null&&System.currentTimeMillis()<deadTime){
//表示沒有成功返回而且需要重試來執行
//啟動一個定時任務用於正常中斷重複執行的任務
InterruptTask task = new InterruptTask(deadTime);//類的定義之後給出
//while迴圈來通過判斷當前執行緒是否被中斷來決定是否要結束迴圈並取消任務
while(!Thread.interrupted()){
result =getResult(param);
//判斷返回值是否正常如果正常則跳出迴圈並結束task,否則暫停迴圈不定時繼續執行while迴圈
if(result==null&&System.currentTimeMillis()<deadTime){
Thread.yield();//讓出cpu,重新參與競爭不定期的重複呼叫介面
}else{
break;
}
}
task.cancel();//已經拿到正確值,或者是超時了,取消任務
}
//最後判斷result值做不同操作
if(result!=null){
return result;
}else{
//一直沒有拿到值
傳送告警郵件或者簡訊到負責人;
return 預設的值或者是null
}
}
ok,到這裡基本上這個方案就結束了,那麼重點是兩個,第一個就是task來做執行緒的中斷,來暫停重試,第二個就是Thread.yield()方法來模擬暫停階段性的呼叫介面而不是一直不停呼叫,降低伺服器的消耗和對方介面的併發數。
下面我們給出InterruptTask的定義:
public class InterruptTask extends TimerTask { static Timer timer = new Timer("InterruptTimer", true); protected Thread target = null; public InterruptTask(long millis) { target = Thread.currentThread(); timer.schedule(this, millis); } /* Auto-scheduling constructor */ public InterruptTask(Thread target, long millis) { this.target = target; timer.schedule(this, millis); } public boolean cancel() { try { /* This shouldn't throw exceptions, but... */ return super.cancel(); } catch (Exception e) { return false; } } public void run() { if ((target != null) && (target.isAlive())) { target.interrupt(); } } }
接著給出TimerTask的定義:
public abstract class TimerTask implements Runnable { /** * This object is used to control access to the TimerTask internals. */ final Object lock = new Object(); /** * The state of this task, chosen from the constants below. */ int state = VIRGIN; /** * This task has not yet been scheduled. */ static final int VIRGIN = 0; /** * This task is scheduled for execution. If it is a non-repeating task, * it has not yet been executed. */ static final int SCHEDULED = 1; /** * This non-repeating task has already executed (or is currently * executing) and has not been cancelled. */ static final int EXECUTED = 2; /** * This task has been cancelled (with a call to TimerTask.cancel). */ static final int CANCELLED = 3; /** * Next execution time for this task in the format returned by * System.currentTimeMillis, assuming this task is scheduled for execution. * For repeating tasks, this field is updated prior to each task execution. */ long nextExecutionTime; /** * Period in milliseconds for repeating tasks. A positive value indicates * fixed-rate execution. A negative value indicates fixed-delay execution. * A value of 0 indicates a non-repeating task. */ long period = 0; /** * Creates a new timer task. */ protected TimerTask() { } /** * The action to be performed by this timer task. */ public abstract void run(); /** * Cancels this timer task. If the task has been scheduled for one-time * execution and has not yet run, or has not yet been scheduled, it will * never run. If the task has been scheduled for repeated execution, it * will never run again. (If the task is running when this call occurs, * the task will run to completion, but will never run again.) * * <p>Note that calling this method from within the <tt>run</tt> method of * a repeating timer task absolutely guarantees that the timer task will * not run again. * * <p>This method may be called repeatedly; the second and subsequent * calls have no effect. * * @return true if this task is scheduled for one-time execution and has * not yet run, or this task is scheduled for repeated execution. * Returns false if the task was scheduled for one-time execution * and has already run, or if the task was never scheduled, or if * the task was already cancelled. (Loosely speaking, this method * returns <tt>true</tt> if it prevents one or more scheduled * executions from taking place.) */ public boolean cancel() { synchronized(lock) { boolean result = (state == SCHEDULED); state = CANCELLED; return result; } } /** * Returns the <i>scheduled</i> execution time of the most recent * <i>actual</i> execution of this task. (If this method is invoked * while task execution is in progress, the return value is the scheduled * execution time of the ongoing task execution.) * * <p>This method is typically invoked from within a task's run method, to * determine whether the current execution of the task is sufficiently * timely to warrant performing the scheduled activity: * <pre>{@code * public void run() { * if (System.currentTimeMillis() - scheduledExecutionTime() >= * MAX_TARDINESS) * return; // Too late; skip this execution. * // Perform the task * } * }</pre> * This method is typically <i>not</i> used in conjunction with * <i>fixed-delay execution</i> repeating tasks, as their scheduled * execution times are allowed to drift over time, and so are not terribly * significant. * * @return the time at which the most recent execution of this task was * scheduled to occur, in the format returned by Date.getTime(). * The return value is undefined if the task has yet to commence * its first execution. * @see Date#getTime() */ public long scheduledExecutionTime() { synchronized(lock) { return (period < 0 ? nextExecutionTime + period : nextExecutionTime - period); } } }
好了,到這裡基本上該方案的說明就結束了,上述方案的具體案例請查詢spring cloud Ribbon中的原始碼中檢視RetryRule中的choose方法,而InterruptTask的原始碼請檢視包com.netflix.loadbalancer中,而TimeTask則是jdk中的java.util包裡的類。
再次宣告,上述方案的具體學習來自spring Cloud 微服務實戰,有不懂的地方可以參考本書,寫的很好,也可以留言討論。