the hardware designer's favorite solution

阿新 • • 發佈：2022-03-31

抄自 Computer Architecture, Sixth Edition A Quantitative Approach by John L. Hennessy, David A. Patterson (z-lib.org)

2.3 Ten Advanced Optimizations of Cache Performance

... This magical reduction comes from optimized software - the hardware designer's favorite solution! The increasing performance gap between processors and main memory has inspired compiler writers to scrutinize the memory hierarchy to see if compile time optimizations can improve performance. Once again, research is split between improvements in instruction misses and improvements in data misses. The optimizations presented next are found in many modern compilers.

上述i7是i7 6700. The i7 supports the x86-64 instruction set architecture, a 64-bit extension of the 80x86 architecture. The i7 is an out-of-order execution processor that includes four cores. In this chapter, we focus on the memory system design and performance from the viewpoint of a single core. The system performance of multiprocessor designs, including the i7 multicore, is examined in detail in Chapter 5. Each core in an i7 can execute up to four 80x86 instructions per clock cycle, using a multiple issue, dynamically scheduled, 16-stage pipeline, which we describe in detail in Chapter 3. The i7 can also support up to two simultaneous threads per processor, using a technique called simultaneous multithreading, described in Chapter 4. In 2017 the fastest i7 had a clock rate of 4.0 GHz (in Turbo Boost mode), which yielded a peak instruction execution rate of 16 billion instructions per second, or 64 billion instructions per second for the four-core design. Of course, there is a big gap between peak and sustained performance, as we will see over the next few chapters. The i7 can support up to three memory channels, each consisting of a separate set of DIMMs, and each of which can transfer in parallel. Using DDR3-1066 (DIMM PC8500), the i7 has a peak memory bandwidth of just over 25 GB/s.

All the techniques in this chapter exploit parallelism among instructions. The amount of parallelism available within a basic block - a straight-line code sequence with no branches in except to the entry and no branches out except at the exit - is quite small. For typical RISC programs, the average dynamic branch frequency is often between 15% and 25%, meaning that between three and six instructions execute between a pair of branches. Because these instructions are likely to depend upon one another, the amount of overlap we can exploit within a basic block is likely to be less than the average basic block size. To obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks.

the hardware designer's favorite solution

抄自 Computer Architecture, Sixth Edition A Quantitative Approach by John L. Hennessy, David A. Patterson (z-lib.org)

華為交換機繫結IP時報錯The IP address's status is error

1.使用華為交換機繫結IP時報錯The IP address\'s status is error ip pool Wireless-staff14 gateway-list 10.1.14.1

CF742B Arpa's obvious problem and Mehrdad's terrible solution 題解

CF742B Arpa\'s obvious problem and Mehrdad\'s terrible solution 題解 Content 有一個長度為 \\(n\\) 的陣列，請求出使得 \\(a_i \\oplus a_j=x\\) 且 \\(i\\neq j\\) 的數對 \\((i,j)\\) 的個數。其中 \\(\\

ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in codeigniter

修改php.inioutput_buffering引數調大一點說說output_buffering buffer是一個記憶體地址空間，linux系統預設大小一般為4096(4kb)。主要用於儲存速度不同步的裝置或優先順序不同的裝置之間傳遞資料的區域。

[PAT] A1055 The World's Richest

排序（需優化）題目大意給出n個人的姓名、年齡和擁有的錢，然後進行k次查詢，每次查詢輸出在年齡區間內的財富值的從大到小的前m個人的資訊。如果財富值相同就就先輸出年齡小的，如果年齡相同就把名字按照字典序排序

What's the replacement for fuslogvw in .net core 2?

What\'s the replacement for fuslogvw in .net core 2? When encountering problems with resolving DLLs and assemblies in general with .Net fuslogvw gave you the ability to log the binding attempts so yo

【Atcoder MSPC】E M's Solution

題目給出二維平面內的\\(n\\)個帶權節點，初始時\\(x\\)軸和\\(y\\)軸已標記你可以新增\\(k\\)個平行於座標軸的標記軸

Gym100810J Journey to the "The World's Start" （二分答案+動態規劃）

題目連結題目大意：現有n個車站，從1號車站出發，目的地為n號車站，相鄰兩個車站間路程消耗時間為1。但乘車需要乘車卡（一張乘車卡可以無限用但有最大連續車站限制），從中間車站下車再上車需要等待時間。現給出

Solution-「ARC 063D」「AT 2149」Snuke's Coloring 2

\\(\\mathcal{Decription}\\) Link. 平面上有一個左下角座標 \\((0,0)\\) 右上角座標 \\((W,H)\\) 的矩形，起初長方形內部被塗白。

瀏覽器報`The value of the 'Access-Control-Allow-Origin' header in the response must not be the wildcard '*' when the request's credentials mode is 'include'. `問題的解決方案

歡迎訪問moshuying.top檢視更多資訊詳細錯誤資訊 Access to XMLHttpRequest at \'http://localhost:7894/Login\' from origin \'http://localhost:8080\' has been blocked by CORS policy: Response to preflight

2015-2016 ACM-ICPC, NEERC, Northern Subregional Contest Journey to the "The World's Start"

首先我們二分一個x，x是用的票能跳過x站，因為票可以多次使用，所以我們要dp當第i站能用票省下的時間p，只要p小於等於（不用票的總時間-t）就好了

2015-2016 ACM-ICPC, NEERC, Northern Subregional Contest J - Journey to the "The World's Start"

題意:一共n個站，首先處於第一個站的乘車點，在不用卷的情況下，會花一分鐘時間到下一站i+1站下車點，同時要花d[i]分鐘到i+1站的乘車點，在用卷的情況下，假如用的是第r張卷，可以從i搭到i-r到i+r範圍內的所有站臺，

CF1292C Xenon's Attack on the Gangs|DP，貪心

題目連結題目大意：給一棵\\(n\\)個節點的樹，將\\([0,n-2]\\)分配到每一條邊，定義\\(s(s,t)\\)為\\(s\\)到\\(t\\)路徑邊權的\\(mex\\)，求\\(s(s,t)\\)和的最大值

What's the difference between re.DOTALL and re.MULTILINE? [duplicate]

What\'s the difference between re.DOTALL and re.MULTILINE? [duplicate] They are quite different. Yes, both affect how newlines are treated, but they switch behaviour for different concepts.

What's the difference between dependencies, devDependencies and peerDependencies in npm package.json file?

What\'s the difference between dependencies, devDependencies and peerDependencies in npm package.json file?

class "org.bouncycastle.openssl.PEMException"'s signer information does not match signer information of other classes in the same package

最近寫程式碼遇到下面的問題，第一次遇到這種問題，解決的時候花費了一些時間，特此記錄下來

mybatis-plus @Select註解的坑 The method's class, org.apache.ibatis.annotations.Select, is available from the following locations:

執行時報錯資訊如下： *************************** APPLICATION FAILED TO START ***************************

the hardware designer's favorite solution

the hardware designer's favorite solution

華為交換機繫結IP時報錯The IP address's status is error

CF742B Arpa's obvious problem and Mehrdad's terrible solution 題解

ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in codeigniter

[PAT] A1055 The World's Richest

What's the replacement for fuslogvw in .net core 2?

【Atcoder MSPC】E M's Solution

Gym100810J Journey to the "The World's Start" （二分答案+動態規劃）

Solution-「ARC 063D」「AT 2149」Snuke's Coloring 2

瀏覽器報`The value of the 'Access-Control-Allow-Origin' header in the response must not be the wildcard '*' when the request's credentials mode is 'include'. `問題的解決方案

2015-2016 ACM-ICPC, NEERC, Northern Subregional Contest Journey to the "The World's Start"

2015-2016 ACM-ICPC, NEERC, Northern Subregional Contest J - Journey to the "The World's Start"

CF1292C Xenon's Attack on the Gangs|DP，貪心

What's the difference between re.DOTALL and re.MULTILINE? [duplicate]

What's the difference between dependencies, devDependencies and peerDependencies in npm package.json file?

class "org.bouncycastle.openssl.PEMException"'s signer information does not match signer information of other classes in the same package

mybatis-plus @Select註解的坑 The method's class, org.apache.ibatis.annotations.Select, is available from the following locations:

【POJ3686】The Windy's

1055 The World's Richest (25 分)（排序）

What's the difference between JWTs and Bearer Token?

the hardware designer's favorite solution

相關推薦