cache原理學習(作者 gooogleman)
分享一下我老師大神的人工智慧教程!零基礎,通俗易懂!http://blog.csdn.net/jiangjunshow
也歡迎大家轉載本篇文章。分享知識,造福人民,實現我們中華民族偉大復興!
cache是ARM最難理解,也是最具有閃光點的地方之一,現在是解決他的時候了。
對於這麼經典的東西,我還是引用ARM工程師的書籍吧,免得誤人子弟。
cache以及write buffer的介紹
A cache is a small, fast array of memory placed between the processor core and main
memory that stores portions of recently referenced main memory. The processor uses
cachememory instead ofmainmemory whenever possible to increase systemperformance.
The goal of a cache is to reduce the memory access bottleneck imposed on the processor
core by slow memory.
Often used with a cache is a write buffer—a very small first-in-first-out (FIFO) memory
placed between the processor core and main memory
free the processor core and cache memory from the slow write time associated with writing
to main memory.
cache是否有效以及使能等造成的後果
The basic unit of storage in a cache is the cache line. A cache line is said to be valid when it contains cached
data or instructions, and invalid when it does not. All cache lines in a cache are invalidated on reset. A cache
line becomes valid when data or instructions are loaded into it from memory.
When a cache line is valid, it contains up-to-date values for a block of consecutive main memory locations.
The length of a cache line is always a power of two, and is typically in the range of 16 to 64 bytes. If the
cache line length is 2L bytes, the block of main memory locations is always 2L-byte aligned. Because of this
alignment requirement, virtual address bits[31:L] are identical for all bytes in a cache line
cache所在的位置
——————————————————————————————————————————
由此可知,cache是可以選擇不同位置的,分為物理和虛擬/邏輯型別,但是對於2440是邏輯cache的,請看下圖
++++++++++++++++++++++++++++++++++++++==========================================+++++++++++++++++
多路cache(單一cache效率很低,不做介紹)
____________________________________________________________________________________________
Tag對應記憶體中資料的位置,status有兩位,一位是有效位(表示所在cache行是否有啟用),另外一位
是髒位(判斷cache中的內容和記憶體中的內容是否一致:注意不一致一定要想辦法一致,否則後患無窮)
===================================================================================
現在來看看和2440靠譜的文件吧(ARM920T)
=====ICache=====
The ARM920T includes a 16KB ICache. The ICache has 512 lines of 32 bytes (8 words),arranged as a 64-way set-associative cache and uses MVAs, translated by CP15 register 13 (see Address translation on page 3-6), from the ARM9TDMI core.The ICache implements allocate-on-read-miss. Random or round-robin replacement can be selected under software control using the RR bit (CP15 register 1, bit 14). Random replacement is selected at reset.Instructions can also be locked in the ICache so that they cannot be overwritten by a linefill.This operates with a granularity of 1/64th of the cache, which is 64 words (256 bytes).All instruction accesses are subject to MMU permission and translation checks. Instruction fetches that are aborted by the MMU do not cause linefills or instruction fetches to appear on the AMBA ASB interface.
Note
————————————————————
For clarity, the I bit (bit 12 in CP15 register 1) is called the Icr bit throughout the
following text. The C bit from the MMU translation table descriptor corresponding to
the address being accessed is called the Ctt bit.
ICache organization(ICache 操作)
——————————————————————————————————————————————————
The ICache is organized as eight segments, each containing 64 lines, and each line
containing eight words. The position of the line within the segment is a number from 0
to 63. This is called the index. A line in the cache can be uniquely identified by its
segment and index. The index is independent of the MVA. The segment is selected by
bits [7:5] of the MVA.
————————————————
Bits [4:2] of the MVA specify the word within a cache line that is accessed. For
halfword operations, bit [1] of the MVA specifies the halfword that is accessed within
the word. For byte operations, bits [1:0] specify the byte within the word that is
accessed.
—————————————————
Bits [31:8] of the MVA of each cache line are called the TAG. The MVA TAG is store
in the cache, along with the 8-words of data, when the line is loaded by a linefill.——所有cache的讀寫原理都是一樣的
—————————————————
Cache lookups compare bits [31:8] of the MVA of the access with the stored TAG to
determine whether the access is a hit or miss. The cache is therefore said to be virtually
addressed. The logical model of the 16KB ICache is shown in Figure 4-1 on page 4-5.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Enabling and disabling the ICache
————————————————
On reset, the ICache entries are all invalidated and the ICache is disabled.
You can enable the ICache by writing 1 to the Icr bit, and disable it by writing 0 to the
Icr bit.
When the ICache is disabled, the cache contents are ignored and all instruction fetches
appear on the AMBA ASB interface as separate nonsequential accesses. The ICache is
usually used with the MMU enabled. In this case the Ctt in the relevant MMU
translation table descriptor indicates whether an area of memory is cachable.
If the cache is disabled after having been enabled, all cache contents are ignored. All
instruction fetches appear on the AMBA ASB interface as separate nonsequential
accesses and the cache is not updated. If the cache is subsequently re-enabled its
contents are unchanged. If the contents are no longer coherent with main memory, you
must invalidate the ICache before you re-enable it (see Register 7, cache operations
register on page2-17).——主存和cache的內容不一致,在重新使能ICache之前必須清除ICache
————————————————————————————————————
If the cache is enabled with the MMU disabled, all instruction fetches are treated as
cachable. No protection checks are made, and the physical address is flat-mapped(?) to the
modified virtual address.(使能cache,但是禁用MMU,指令存取是cachable的,沒有保護檢查
實體地址等於虛擬地址。)
You can enable the MMU and ICache simultaneously(同時地) by writing a 1 to the M bit, a
1 to the Icr bit in CP15 register 1, with a single MCR instruction.
————————————————————————————————————
If the ICache is disabled, each instruction fetch results in a separate nonsequential
memory access on the AMBA ASB interface, giving very low bus and memory
performance. Therefore, you must enable the ICache as soon as possible after reset.
————————————————————————————————————
Note
The Prefetch ICache Line operation uses MVA format, because address aliasing(混淆現象) is not
performed on the address in Rd. It is advisable for the associated TLB entry to be locked
into the TLB to avoid page table walks during execution of the locked code.
————————————————————————————————————
Enabling and disabling the DCache and write buffer
On reset, the DCache entries are invalidated and the DCache is disabled, and the write
buffer contents are discarded(放棄).
——————————————
There is no explicit(直接的,清楚的) write buffer enable bit implemented in ARM920T. The write buffer
is used in the following ways:
?You can enable the DCache by writing 1 to the Ccr bit, and disable it by writing
0 to the Ccr bit.
?You must only enable the DCache when the MMU is enabled. This is because the
MMU translation tables define the cache and write buffer configuration for each
memory region.
?If the DCache is disabled after having been enabled, the cache contents are
ignored and all data accesses appear on the AMBA ASB interface as separate
nonsequential accesses and the cache is not updated. If the cache is subsequently
re-enabled its contents are unchanged. Depending on the software system design,
you might have to clean the cache after it is disabled, and invalidate it before you
re-enable it. See Cache coherence on page4-16.
?You can enable or disable the MMU and DCache simultaneously with a single
MCR
that changes the M bit and the C bit in the control register (CP15 register 1).
————————————————————————————————————————————
for seg = 0 to 7
for index = 0 to 63
Rd = {seg,index}
MCR p15,0,Rd,c7,c10,2; Clean DCache single
; entry (using index)
or
MCR p15,0,Rd,c7,c14,2; Clean and Invalidate
; DCache single entry
; (using index)
next index
next seg
DCache, ICache, and memory coherence is generally achieved by:
?cleaning the DCache to ensure memory is up to date with all changes
?invalidating the ICache to ensure that the ICache is forced to re-fetch instructions
from memory.
————————————————————————————————————————————
Situations that necessitate cache cleaning and invalidating include:(需要清理和清除cache的情況)
參考資料:
——————————————————————————————————————————————
MMU的大名,早就聽說了,可是一直不知道它是怎麼工作的,前幾月貌似看的模模糊糊,現在快年關了,來做個了結。在文中我會大量引用英文,並且不做翻譯,因為俺覺得我的英文水平會誤解別人。O(∩_∩)O哈哈~
One of the key services provided by an MMU is the ability to manage tasks as indepen-dent programs running in their own private memory space. A task written to run under the control of an operating system with an MMU does not need to know the memory
requirements of unrelated tasks. This simplifies the design requirements of individual tasks running under the control of an operating system.
——給每個任務提供獨立的執行空間。
The MMU simplifies the programming of application tasks because it provides the resources needed to enable virtual memory—an additional memory space that is indepen-dent of the physical memory attached to the system. The MMU acts as a translator, which
converts the addresses of programs and data that are compiled to run in virtual memory to the actual physical addresses where the programs are stored in physical main memory.This translation process allows programs to run with the same virtual addresses while being
held in different locations in physical memory.——MMU作為一個轉換器。程式可以執行在同一塊虛擬記憶體,而各自儲存在不同的實體記憶體。
We begin with a review of the protection features of an MPU and then present the additional features provided by an MMU. We introduce relocation registers, which hold the conversion data to translate virtual memory addresses to physical memory addresses,
and the Translation Lookaside Buffer (TLB), which is a cache of recent address relocations.We then explain the use of pages and page tables to configure the behavior of the relocation registers.
——這裡介紹重定位暫存器,它儲存轉換虛擬地址到實體地址的資料;介紹旁路緩衝器(TLB),它是存放最近的的地址重定位資訊的cache(快取記憶體);介紹如是使用頁和頁表來重新配置重定位暫存器。
We then discuss how to create regions by configuring blocks of pages in virtualmemory .We end the overview of the MMU and its support of virtual memory by showing how tomanipulate the MMU and page tables to support multitasking.
——討論通過虛擬記憶體中的塊頁表來配置來建立區域。最後演示使用虛擬記憶體來支援和建立多工作業系統
___________________________________________________________________________
現在來看看這個MMU到底有什麼東西,有什麼特備的硬體結構
—————————————————————————————————————
To permit tasks to have their own virtual memory map, the MMU hardware performs address relocation, translating the memory address output by the processor core before it reaches main memory. The easiest way to understand the translation process is to imagine
a relocation register located in the MMU between the core and main memory.——地址重定位暫存器,其實就是地址轉換器
Figure 14.1 shows an example of a task compiled to run at a starting address of 0x4000000 in virtual memory. The relocation register translates the virtual addresses ofTask 1 to physical addresses starting at 0x8000000.
A second task compiled to run at the same virtual address, in this case 0x400000, can be placed in physical memory at any other multiple of 0x10000 (64 KB) and mapped to 0x400000 simply by changing the value in the relocation register.
——為什麼一定要是64KB為倍數的的地址的物理儲存器上?難道這個是MMU有什麼特殊的硬體結構決定了?
———————————————————————————————————
A single relocation register can only translate a single area of memory, which is set by
the number of bits in the offset portion of the virtual address. This area of virtual memory
is known as a page. The area of physical memory pointed to by the translation process is
known as a page frame.——頁和頁幀
———————————————上面虛擬記憶體的轉換過程了———————————————
The set of relocation registers that temporarily store the translations in an ARM MMU
are really a fully associative cache of 64 relocation registers. This cache is known as a
Translation Lookaside Buffer (TLB). The TLB caches translations of recently accessed pages.
——重定位暫存器是由64個重定位暫存器cache相連成的,這個cache被稱為旁路緩衝器(TLB),它快取最近訪問頁轉換的資料。——我覺得是地址資料才對,因為ARM9是資料匯流排和地址匯流排分離的。
In addition to having relocation registers, theMMUuses tables inmainmemory to store
the data describing the virtualmemorymaps used in the system. These tables of translation
data are known as page tables. An entry in a page table represents all the information needed
to translate a page in virtual memory to a page frame in physical memory.
——除了使用重定位暫存器外,MMU還使用在主存中的表來存放描述虛擬記憶體對映的資料,這個表被稱為頁表。
而頁表的每個子表儲存了一個頁轉換到物理儲存器的一個頁幀所需要的資訊。
—————————————————————————————————
Apage table entry (PTE) in a page table contains the following information about a virtual
page: the physical base address used to translate the virtual page to the physical page frame,
the access permission assigned to the page, andthe cache and write buffer configuration for
the page. If you refer to Table 14.1, you can see that most of the region configuration data
in an MPU is now held in a page table entry. This means access permission and cache and
write buffer behavior are controlled at a granularity(粒度) of the page size, which provides finer
control over the use of memory. Regions in an MMU are created in software by grouping
blocks of virtual pages in memory.——MMU區域由在記憶體中的虛擬頁的塊群以軟體方法建立
___________________________________________________________________
Since a page in virtual memory has a corresponding (連續的)entry (條目)in a page table, a block of
virtual memory pages map to a set of sequential entries in a page table. Thus, a region can
be defined as a sequential set of page table entries. The location and size of a region can be
held in a software data structure while the actual translation data and attribute information
is held in the page tables.
Figure 14.3 shows an example of a single task that has three regions: one for text, one
for data, and a third to support the task stack. Each region in virtual memory is mapped
to different areas in physical memory. In the figure, the executable code is located in flash
memory, and the data and stack areas are located in RAM. This use of regions is typical of
operating systems that support sharing code between tasks.——作業系統就是這麼設計的?
With the exception of the master level 1 (L1) page table, all page tables represent 1 MB
areas of virtual memory. If a region’s size is greater than 1 MB or crosses over the 1 MB
boundary addresses that separate page tables, then the description of a region must also
include a list of page tables. The page tables for a region will always be derived from
sequential page table entries in the master L1 page table. However, the locations of the L2
page tables in physical memory do not need to be located sequentially. Page table levels are
explained more fully in Section 14.4.
——難道在wince中的OEMAddresstable中的虛擬記憶體大小都是1M的倍數,且是連續
的是由此而來?(其實wince可以使用不連續的,不過要使用特殊技巧才行)
在實體記憶體中的頁表可以是非連續的。——優龍的bootloader就有此見證!
看看下圖
————————————————————————————————————————
MMU是如何實現多工排程的?
——————————————
Page tables can reside inmemory and not bemapped toMMU hardware. One way to build
amultitasking system is to create separate sets of page tables, each mapping a unique virtual
memory space for a task. To activate a task, the set of page tables for the specific task and
its virtual memory space are mapped into use by theMMU. The other sets of inactive page
tables represent dormant tasks. This approach allows all tasks to remain resident in physical
memory and still be available immediately when a context switch occurs to activate it.
——頁表可以駐留在記憶體中,不必對映到MMU硬體。構建多工的一種方法是建立一批
獨立的頁表,每個對映到唯一的任務空間。為了啟用某個任務,對應這個任務的那組頁表
和其虛擬記憶體空間由MMU使用,沒有啟用的頁表代表睡眠的任務。這種方法使所有的任務
可以駐留在記憶體中,當發生上下文切換的時候可以立即使用。
By activating different page tables during a context switch, it is possible to execute
multiple tasks with overlapping virtual addresses. The MMU can relocate the execution
address of a task without the need to move it in physical memory. The task’s physical
memory is simply mapped into virtual memory by activating and deactivating page tables.
——在上下文切換時候通過啟用不同的頁表,使得在重疊的虛擬地址執行多工
成為可能。MMU可以通過重定位任務地址而不需要移動在記憶體中的任務。任務的
實體記憶體只是簡單的通過啟用與不啟用頁表來實現對映到虛擬記憶體
——My GOD!明白MMU的工作原理了!!!!!!!!!
——————————————————————————————————————
When the page tables are activated or deactivated, the virtual-to-physical address map-
pings change. Thus, accessing an address in virtual memory may suddenly translate to a
different address in physical memory after the activation of a page table. As mentioned in
Chapter 12, the ARM processor cores have a logical cache and store cached data in virtual
memory. When this translation occurs, the caches will likely contain invalid virtual data
from the old page table mapping. To ensure memory coherency, the caches may need
cleaning and flushing. The TLB may also need flushing because it will have cached old
translation data.——注意清理和清除cache哦
————————————————————
The effect of cleaning and flushing the caches and the TLB will slow system operation.
However, cleaning and flushing stale (陳舊的,過時的)code or data from cache and stale translated physical
addresses from the TLB keep the system from using invalid data and breaking.
——雖然清理和清除cache和TLB會導致系統執行變慢,但是清理和清除cache中過時的程式碼資料,
或者過時的實體地址,可以避免系統使用無效的資料而崩潰。
———————————————————————————————
During a context switch, page table data is not moved in physical memory; only pointers
to the locations of the page tables change.——任務切換如下步驟。
To switch between tasks requires the following steps:
1. Save the active task context and place the task in a dormant state.
2. Flush the caches; possibly clean the D-cache if using a writeback policy.
3. Flush the TLB to remove translations for the retiring task.
4. Configure the MMU to use new page tables translating the virtual memory execution
area to the awakening task’s location in physical memory.
5. Restore the context of the awakening task.
6. Resume execution of the restored task.
Note: to reduce the time it takes to perform a context switch, a writethrough cache
policy can be used in the ARM9 family. Cleaning the data cache can require hundreds of
writes to CP15 registers. By configuring the data cache to use a writethrough policy, there is
no need to clean the data cache during a context switch, which will provide better context
switch performance. Using a writethrough policy distributes these writes over the life of
the task. Although a writeback policy will provide better overall performance, it is simply
easier to write code for small embedded systems using a writethrough policy.
——使用檔案系統的應該使用會寫策略,這樣效率較高。
++++++++++++++++++++++為什麼虛擬記憶體和實體記憶體對映是要固定的?===========
Typically, page tables reside in an area of main memory where the virtual-to-physical
address mapping is fixed. By “fixed,” we mean data in a page table doesn’t change during
normal operation, as shown in Figure 14.5. This fixed area of memory also contains the
operating system kernel and other processes. The MMU, which includes the TLB shown
in Figure 14.5, is hardware that operates outside the virtual or physical memory space; its
function is to translate addresses between the two memory spaces.
——在執行系統時候不能改變兩者的對映,否則很容易出錯,wince是這樣的,
不知道linux是怎麼樣的了,ADS 下的bootloader也是這樣的。
————————————————方框的是固定對映————————————
參考資料
ARM System Developer's Guide: Designing and Optimizing System Software
——ARM System Developer's Guide: Designing and Optimizing System Software——ARM嵌入式系統開發:軟體設計與優化的英文原版——我個人感覺這是國內翻譯ARM書籍最好的一本之一,比杜XX的ARM體系結構與程式設計好千倍。 本書雖然說軟體設計與優化,但是講的硬體也很多,比如MMU和cache等,講的精彩紛呈:我剛才想寫關於MMU和cache的部落格,發現太龐大,看來這段時間要重新看看這本書才能寫。
下載地址:http://download.csdn.net/source/904273
ARM920T Technical Reference Manual——不多說了,想了解2440等的bootloader的人一定要看這個東西了,一些協處理器指令講的很詳細
下載地址:http://download.csdn.net/source/903240
ARM Architecture Reference Manual(2nd Edition) ——比較有價值的英文ARM書籍
下載地址http://download.csdn.net/source/901433
轉載請標明:作者[email protected].桂林電子科技大學一系科協,原文地址:http://blog.csdn.net/gooogleman——如有錯誤,希望能夠留言指出;如果你有更加好的方法,也請在部落格後面留言,我會感激你的批評和分享。
轉載請標明:作者[email protected].桂林電子科技大學一系科協,原文地址:http://blog.csdn.net/gooogleman——如有錯誤,希望能夠留言指出;如果你有更加好的方法,也請在部落格後面留言,我會感激你的批評和分享。
給我老師的人工智慧教程打call!http://blog.csdn.net/jiangjunshow
# 歡迎使用Markdown編輯器你好! 這是你第一次使用 Markdown編輯器 所展示的歡迎頁。如果你想學習如何使用Markdown編輯器, 可以仔細閱讀這篇文章,瞭解一下Markdown的基本語法知識。
新的改變
我們對Markdown編輯器進行了一些功能拓展與語法支援,除了標準的Markdown編輯器功能,我們增加了如下幾點新功能,幫助你用它寫部落格:
- 全新的介面設計 ,將會帶來全新的寫作體驗;
- 在創作中心設定你喜愛的程式碼高亮樣式,Markdown 將程式碼片顯示選擇的高亮樣式 進行展示;
- 增加了 圖片拖拽 功能,你可以將本地的圖片直接拖拽到編輯區域直接展示;
- 全新的 KaTeX數學公式 語法;
- 增加了支援甘特圖的mermaid語法1 功能;
- 增加了 多螢幕編輯 Markdown文章功能;
- 增加了 焦點寫作模式、預覽模式、簡潔寫作模式、左右區域同步滾輪設定 等功能,功能按鈕位於編輯區域與預覽區域中間;
- 增加了 檢查列表 功能。
功能快捷鍵
撤銷:Ctrl/Command + Z
重做:Ctrl/Command + Y
加粗:Ctrl/Command + B
斜體:Ctrl/Command + I
標題:Ctrl/Command + Shift + H
無序列表:Ctrl/Command + Shift + U
有序列表:Ctrl/Command + Shift + O
檢查列表:Ctrl/Command + Shift + C
插入程式碼:Ctrl/Command + Shift + K
插入連結:Ctrl/Command + Shift + L
插入圖片:Ctrl/Command + Shift + G
合理的建立標題,有助於目錄的生成
直接輸入1次#,並按下space後,將生成1級標題。
輸入2次#,並按下space後,將生成2級標題。
以此類推,我們支援6級標題。有助於使用TOC
語法後生成一個完美的目錄。
如何改變文字的樣式
強調文字 強調文字
加粗文字 加粗文字
標記文字
刪除文字
引用文字
H2O is是液體。
210 運算結果是 1024.
插入連結與圖片
連結: link.
圖片:
帶尺寸的圖片:
當然,我們為了讓使用者更加便捷,我們增加了圖片拖拽功能。
如何插入一段漂亮的程式碼片
去部落格設定頁面,選擇一款你喜歡的程式碼片高亮樣式,下面展示同樣高亮的 程式碼片
.
// An highlighted block var foo = 'bar';
生成一個適合你的列表
- 專案
- 專案
- 專案
- 專案
- 專案1
- 專案2
- 專案3
- 計劃任務
- 完成任務
建立一個表格
一個簡單的表格是這麼建立的:
專案 | Value |
---|---|
電腦 | $1600 |
手機 | $12 |
導管 | $1 |
設定內容居中、居左、居右
使用:---------:
居中
使用:----------
居左
使用----------:
居右
第一列 | 第二列 | 第三列 |
---|---|---|
第一列文字居中 | 第二列文字居右 | 第三列文字居左 |
SmartyPants
SmartyPants將ASCII標點字元轉換為“智慧”印刷標點HTML實體。例如:
TYPE | ASCII | HTML |
---|---|---|
Single backticks | 'Isn't this fun?' |
‘Isn’t this fun?’ |
Quotes | "Isn't this fun?" |
“Isn’t this fun?” |
Dashes | -- is en-dash, --- is em-dash |
– is en-dash, — is em-dash |
建立一個自定義列表
- Markdown
- Text-to- HTML conversion tool
- Authors
- John
- Luke
如何建立一個註腳
一個具有註腳的文字。2
註釋也是必不可少的
Markdown將文字轉換為 HTML。
KaTeX數學公式
您可以使用渲染LaTeX數學表示式 KaTeX:
Gamma公式展示 是通過尤拉積分
你可以找到更多關於的資訊 LaTeX 數學表示式here.
新的甘特圖功能,豐富你的文章
gantt
dateFormat YYYY-MM-DD
title Adding GANTT diagram functionality to mermaid
section 現有任務
已完成 :done, des1, 2014-01-06,2014-01-08
進行中 :active, des2, 2014-01-09, 3d
計劃一 : des3, after des2, 5d
計劃二 : des4, after des3, 5d
- 關於 甘特圖 語法,參考 這兒,
UML 圖表
可以使用UML圖表進行渲染。 Mermaid. 例如下面產生的一個序列圖::
這將產生一個流程圖。:
- 關於 Mermaid 語法,參考 這兒,
FLowchart流程圖
我們依舊會支援flowchart的流程圖:
- 關於 Flowchart流程圖 語法,參考 這兒.
匯出與匯入
匯出
如果你想嘗試使用此編輯器, 你可以在此篇文章任意編輯。當你完成了一篇文章的寫作, 在上方工具欄找到 文章匯出 ,生成一個.md檔案或者.html檔案進行本地儲存。
匯入
如果你想載入一篇你寫過的.md檔案或者.html檔案,在上方工具欄可以選擇匯入功能進行對應副檔名的檔案匯入,
繼續你的創作。
註腳的解釋 ↩︎