1. 程式人生 > >[Branch Prediction]處理器分支預測文獻筆記(2)

[Branch Prediction]處理器分支預測文獻筆記(2)

[文獻名] Casey, Kevin, M. Anton Ertl, and David Gregg. “Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters.” ACM Trans. Program. Lang. Syst. 29, no. 6 (October 2007). doi:10.1145/1286821.1286828.

[相關觀點]

1.BTB方法對於某些應用例如翻譯器,只有2%-50%的命中;

2.一篇針對優化的文獻。

[文獻名] Li, Tao, L.K. John, Anand Sivasubramaniam, N. Vijaykrishnan, and J. Rubio. “OS-Aware Branch Prediction: Improving Microprocessor Control Flow Prediction for Operating Systems.” IEEE Transactions on Computers 56, no. 1 (January 2007): 2–17. doi:10.1109/TC.2007.250619.

[相關觀點]

1.T. Yeh and Y.N. Patt, “Two-Level Adaptive Branch Prediction,”Proc. 24th Int’l Symp. Microarchitecture, pp. 51-61, 1991.

Most current high-performance processors use dynamic branch predictions。

2.指出使用作業系統導致共享分支預測資源。可以見到在作業系統的共同影響下,預測精確度下降(增加一半佔了一定數量)。

3.增加資源量也沒有任何明顯改善

4.不是開發新的預測器而是相容現有的進行改進。

5.

6.闡述however, the fixed sizes of branch predictor tables, constrained by chip die area and access latency,make it impossible to hold all of the dynamic branch information.

7.[6], [21]由於資源限制Branch Aliasing會造成毀滅性破壞

8.關於Gshare:Gshare [12] uses the “exclusive or” (XOR) of the global history with the low-order address bits of a branch to form a more randomized BHT index

9.關於Agree:The Agree predictor [23] converts instances of destructive aliasing into either constructive or neutral aliasing by attaching each branch with a biasing bit that predicts the most likely outcome of that branch. The 2-bit BHT counter is then evaluated as to whether or not the branch will go in the direction indicated by the biasing bit. The concept behind the Agree predictor is that most branches are highly biased.

10.主要採用分割資源的方法。

[文獻名] Sendag, R., J.J. Yi, and Peng-fei Chuang. “Branch Misprediction Prediction: Complementary Branch Predictors.” Computer Architecture Letters 6, no. 2 (February 2007): 49–52. doi:10.1109/L-CA.2007.13.

[相關觀點]

1.利用MPBT記錄錯誤的預測並進行糾錯。

2.在浮點評測中糾錯效能良好。

3.針對變動的迴圈體很有效。

4.依然在cache體系

[文獻名] Biggar, Paul, Nicholas Nash, Kevin Williams, and David Gregg. “An Experimental Study of Sorting and Branch Prediction.” J. Exp. Algorithmics 12 (June 2008): 1.8:1–1.8:39. doi:10.1145/1227161.1370599.

[相關觀點]

1.For example, Intel Pentium 4 processors[Intel 2004, 2001] have pipelines of up to 31 stages.

2.閱讀量化書

3.靜態啟發:前T後不T

4.半靜態:一個Hint位,在編譯時預先寫好跳轉

5.也提到了資源侷限造成的混淆問題。

6.真隨機數網站:random.org

7.資源使用:We used a variety of cache configurations; generally speaking we used an 8-KB level-1 data cache, 8-KB level-1 instruction cache, and a shared 2-MB instruction and data level-2 cache, all with 32-byte cache lines

8.當數值分佈混亂,模式方法就會出現差效能。

9.特定的演算法例如氣泡排序出現mispredict的情況加劇

[文獻名] Kwak, J.W., and C. S. Jhon. “High-Performance Embedded Branch Predictor by Combining Branch Direction History and Global Branch History.” IET Computers Digital Techniques 2, no. 2 (March 2008): 142–54. doi:10.1049/iet-cdt:20060130.

[相關觀點]

1.提到分支預測在移動裝置的重要性。

2.以前使用過了地址和全域性歷史索引,現在加入“分支方向歷史”branch direction history作為輸入量。

3.新預測器:direction-gshare

4.闡述:利用地址索引PHT稱為Bimodal預測器

5.指出PHT的混淆問題。引向了對輸入變數進行xor或其他函式的討論

6.Bi-mode predictor分開了Taken Table和非Taken Table。

7.

[7] SPRANGLE E, CHAPPELL RS, ALSUP M, ET AL.: ‘The agree predictor: a mechanism for reducing negative branch history interference’. IEEE ISCA ‘97, pp. 284–291

[8] LEE C-C, CHEN I-CK, MUDGE TN: ‘The bi-mode branch predictor’. Int. Symp. Microarchitecture IEEE’97, pp. 4–13

8.有很多ARM的技術參考文獻。

9.LOH G, HENRY DS: ‘Predicting conditional branches with fusion-based hybrid predictors’. 11th Conf. Parallel Architectures and Compilation Techniques (PACT),September 2002

10.指出了前代ARM處理器包括ARM7 9 10 11都是使用靜態預測或簡單的Bimodal預測器,一些高階移動處理器也使用上了動態分支預測技術

11.通過一些文獻闡述了神經網路精度高的事實。

12.通過程式碼指出使用跳轉方向資訊的可行性:

the low-level assembly code of the loop-style branch instruction is usually backward-taken, whereas the if-style branch instruction is usually forward-taken. Therefore we propose the additional use of the BDH information as a new component of input vectors for the branch prediction.

13.McFarling文獻

[文獻名] Jiménez, Daniel A. “Generalizing Neural Branch Prediction.” ACM Trans. Archit. Code Optim. 5, no. 4 (March 2009): 17:1–17:27. doi:10.1145/1498690.1498692.

[相關觀點]

1.有提到對深流水線影響很大 Sprangle and Carmean 2002

有很多IEEE文獻

2.資源使用了32KB和256KB作為測試。

3.感知器起源:The perceptron predictor [Jim´enez and Lin 2001]

4.提到了神經網路方法延遲大

5.指出了神經網路方法有著單預測器中的最高精度

6.最開始的神經元設計無法被應用因為高延遲

7.有關於神經元方法的文獻綜述

8.採用了一個三維矩陣記錄資訊,以獲得分段平面的能力。理想情況下足夠大,實驗中縮小。

9.Path-Based Neural Predictor. We simulate the path-based neuralpredictor [Jim´enez 2003].

10.存在表現較差>10%錯誤率的應用~為論文提供依據

自己總結的問題:高資源消耗,延遲問題

[文獻名] Kim, Hyesoon, J. Joao, O. Mutlu, Chang Joo Lee, Y.N. Patt, and R. Cohn. “Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware.” IEEE Transactions on Computers 58, no. 9 (September 2009): 1153–70. doi:10.1109/TC.2008.227.

[相關觀點]

1.指出了使用BTB能夠解決indirect branch的問題。但是效果只有50%左右。

2.A VPC predictor treats an indirect branch as a sequence of multiple conditional branches

3.再利用現有的PHT表進行預測。

4.也屬於前沿探索,挖掘預測效能極限。

[文獻名] Panda, R., P.V. Gratz, and D.A Jimenez. “B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors.” Computer Architecture Letters 11, no. 2 (July 2012): 41–44. doi:10.1109/L-CA.2011.33.

[相關觀點]

1.指出現有處理器頻率不斷增加,但是儲存器速度卻沒有相應跟上。

2.有幾篇處理器技術文獻

3.指出一些處理器依靠聚合順序核獲得低功耗和改善吞吐量。

4.利用專用架構進行cache預讀取。