ARM與AARCH64指令集優化總結

阿新 • • 發佈：2018-12-12

序

前文《arm64》、《arm32》已經介紹arm，aarch64優化的一些基本知識，本文著重介紹優化過程中容易混淆的點，或需注意的點。

1. 關於指令編碼長度

1.1 aarch32

		A32模式（ARM instruction sets），指令固定的編碼長度為32bit
		T32模式（Thumb instruction sets），指令可以編碼成16bit長，也可編碼成32bit長

1.2 aarch64

		指令固定的編碼長度為32bit

參考https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf A1.3.2 The ARM instruction sets

2. 關於當前指令的地址

2.1 aarch32

在ARM32狀態下，當前執行指令的地址通常是pc-8，而在Thumb狀態下通常是pc-4。參考地址：http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0013d/index.html 程式計數器(pc) 　　疑問？　指令的編碼長度為32位，即4位元組，為什麼arm模式下，當前指令是pc-8：　拿ARMv7三級流水線做示例，如圖，假設add指令fetch時，指令地址為pc1; add指令decode時，下一條指令sub又進入fetch階段，此時pc2 = pc1 + 4; add指令execute時，sub指令後的cmp油進入fetch階段，此時pc = pc2 + 4，因此add指令執行時真正的pc地址pc1 = pc-8。　在這裡插入圖片描述

　參考https://blog.csdn.net/lee244868149/article/details/49488575/ 　

2.2 aarch64

在arm64狀態下，當前執行指令的地址通常是pc，英文原文：

Program counter 　The current Program Counter (PC) cannot be referred to by number as if part of the general register file and therefore cannot be used as the source or destination of arithmetic instructions, or as the base, index or transfer register of load and store instructions. 　The only instructions that read the PC are those whose function it is to compute a PC-relative address (ADR, ADRP, literal load, and direct branches), and the branch-and-link instructions that store a return address in the link register (BL and BLR). The only way to modify the program counter is using branch, exception generation and exception return instructions. 　Where the PC is read by an instruction to compute a PC-relative address, then its value is the address of that instruction. Unlike A32 and T32, there is no implied offset of 4 or 8 bytes. 參考http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch05s01s03.html 5.1.3. Registers

3. 關於形參超過指令個數，訪問方法

3.1 aarch32

arm32下，前4個引數是通過r0~r3傳遞，第4個引數需要通過sp訪問，第5個引數需要通過sp + 4 訪問，第n個引數需要通過sp + 4*(n-4)訪問。

3.2 aarch64

arm64下，前8個引數是通過x0~x7傳遞，第8個引數需要通過sp訪問，第9個引數需要通過sp + 8 訪問，第n個引數需要通過sp + 8*(n-8)訪問。

4. aarch64下< Vn>.< Ts>[< index2>]的用法

示例： mov < Vd>.< Ts>[< index1>], < Vn>.< Ts>[< index2>] 其中Ts的值需要注意，只能是以下情況之一：

B
H
S
D 不要將Ts寫成8B、2s等，因為是取向量暫存器(Vn)中的元素，所以帶數字是多此一舉，反而導致在ios下編譯有問題。

5. aarch64下imm需注意的地方

示例：cmp < Wn|WSP>, #< imm> {, < shift>} 其中imm是無符號立即數，取值範圍[0, 4095]。此處說明的是：在使用立即數的時候，需要看指令所支援的立即數範圍。

6. aarch64下v暫存器的寫法

示例： add v4.4H, v4.4H, V5.4H 指令有沒有問題？在Linux下編是不會提示錯誤的，但是在ios下(mac電腦上編)會提示指令不合法，正確的寫法： add v4.4H, v4.4H, v5.4H 注意到沒？V5.4H 改成了v5.4H，大小寫！！！

ARM與AARCH64指令集優化總結

序

1. 關於指令編碼長度

1.1 aarch32

1.2 aarch64

2. 關於當前指令的地址

2.1 aarch32

2.2 aarch64

3. 關於形參超過指令個數，訪問方法

3.1 aarch32

3.2 aarch64

4. aarch64下< Vn>.< Ts>[< index2>]的用法

5. aarch64下imm需注意的地方

6. aarch64下v暫存器的寫法

ARM與AARCH64指令集優化總結

ARM-彙編指令集（總結）

ARM彙編之指令集的切換：ARM切換到Thumb

ARM和neon指令集

CISC與RISC 指令集通俗理解，非常有趣

RISC與CISC（精簡指令集與複雜指令集）比較（轉載）

ARM指令集--ldr、mov與str的用法與區別

常用的ARM彙編指令集與彙編呼叫C語言

ARM cortex M3寄存器及指令集

ARM指令集、Thumb指令集、Thumb-2指令集

ARM指令集—SWP指令

arm 指令集

MySQL效能優化總結___本文乃《MySQL效能調優與架構設計》讀書筆記！

ARM指令集簡介

ARM彙編指令集_學習筆記（1）

【arm】arm32位和arm64位架構、暫存器和指令差異分析總結

ARM彙編：載入和儲存指令集（六大類）---LDR、LDRB、LDRH、STR、STRB、STRH

ARM學習筆記——異常與中斷——指令ldr及.word偽指令用法

ida Pro ARM指令集和Thumb指令集的切換

資料庫設計與優化總結（1）

ARM與AARCH64指令集優化總結

序

1. 關於指令編碼長度

1.1 aarch32

1.2 aarch64

2. 關於當前指令的地址

2.1 aarch32

2.2 aarch64

3. 關於形參超過指令個數，訪問方法

3.1 aarch32

3.2 aarch64

4. aarch64下< Vn>.< Ts>[< index2>]的用法

5. aarch64下imm需注意的地方

6. aarch64下v暫存器的寫法

相關推薦