優化技巧:提前if判斷幫助CPU分支預測
分支預測
在stackoverflow上有一個非常有名的問題:為什麽處理有序數組要比非有序數組快?,可見分支預測對代碼運行效率有非常大的影響。
現代CPU都支持分支預測(branch prediction)和指令流水線(instruction pipeline),這兩個結合可以極大提高CPU效率。對於像簡單的if跳轉,CPU是可以比較好地做分支預測的。但是對於switch跳轉,CPU則沒有太多的辦法。switch本質上是據索引,從地址數組裏取地址再跳轉。
要提高代碼執行效率,一個重要的原則就是盡量避免CPU把流水線清空,那麽提高分支預測的成功率就非常重要。
那麽對於代碼裏,如果某個switch分支概率很高,是否可以考慮代碼層面幫CPU把判斷提前,來提高代碼執行效率呢?
Dubbo裏ChannelEventRunnable的switch判斷
在ChannelEventRunnable
裏有一個switch來判斷channel state,然後做對應的邏輯:查看
一個channel建立起來之後,超過99.9%情況它的state都是ChannelState.RECEIVED
,那麽可以考慮把這個判斷提前。
benchmark驗證
下面通過jmh來驗證下:
public class TestBenchMarks {
public enum ChannelState { CONNECTED, DISCONNECTED, SENT, RECEIVED, CAUGHT } @State(Scope.Benchmark) public static class ExecutionPlan { @Param({ "1000000" }) public int size; public ChannelState[] states = null; @Setup public void setUp() { ChannelState[] values = ChannelState.values(); states = new ChannelState[size]; Random random = new Random(new Date().getTime()); for (int i = 0; i < size; i++) { int nextInt = random.nextInt(1000000); if (nextInt > 100) { states[i] = ChannelState.RECEIVED; } else { states[i] = values[nextInt % values.length]; } } } } @Fork(value = 5) @Benchmark @BenchmarkMode(Mode.Throughput) public void benchSiwtch(ExecutionPlan plan, Blackhole bh) { int result = 0; for (int i = 0; i < plan.size; ++i) { switch (plan.states[i]) { case CONNECTED: result += ChannelState.CONNECTED.ordinal(); break; case DISCONNECTED: result += ChannelState.DISCONNECTED.ordinal(); break; case SENT: result += ChannelState.SENT.ordinal(); break; case RECEIVED: result += ChannelState.RECEIVED.ordinal(); break; case CAUGHT: result += ChannelState.CAUGHT.ordinal(); break; } } bh.consume(result); } @Fork(value = 5) @Benchmark @BenchmarkMode(Mode.Throughput) public void benchIfAndSwitch(ExecutionPlan plan, Blackhole bh) { int result = 0; for (int i = 0; i < plan.size; ++i) { ChannelState state = plan.states[i]; if (state == ChannelState.RECEIVED) { result += ChannelState.RECEIVED.ordinal(); } else { switch (state) { case CONNECTED: result += ChannelState.CONNECTED.ordinal(); break; case SENT: result += ChannelState.SENT.ordinal(); break; case DISCONNECTED: result += ChannelState.DISCONNECTED.ordinal(); break; case CAUGHT: result += ChannelState.CAUGHT.ordinal(); break; } } } bh.consume(result); }
}
benchSiwtch裏是純switch判斷
benchIfAndSwitch 裏用一個if提前判斷state是否
ChannelState.RECEIVED
benchmark結果是:
Result "io.github.hengyunabc.jmh.TestBenchMarks.benchSiwtch": 576.745 ±(99.9%) 6.806 ops/s [Average] (min, avg, max) = (490.348, 576.745, 618.360), stdev = 20.066 CI (99.9%): 569.939, 583.550
Run complete. Total time: 00:06:48
Benchmark (size) Mode Cnt Score Error Units
TestBenchMarks.benchIfAndSwitch 1000000 thrpt 100 1535.867 ± 61.212 ops/s
TestBenchMarks.benchSiwtch 1000000 thrpt 100 576.745 ± 6.806 ops/s
可以看到提前if判斷的確提高了代碼效率,這種技巧可以放在性能要求嚴格的地方。
Benchmark代碼:https://github.com/hengyunabc/jmh-demo
總結
switch對於CPU來說難以做分支預測
某些switch條件如果概率比較高,可以考慮單獨提前if判斷,充分利用CPU的分支預測機制
原文鏈接
優化技巧:提前if判斷幫助CPU分支預測