Under The Hood: Assembly instructions used often1

阿新 • • 發佈：2019-01-18

Apparently,there's quite a bit more interest in Win32® assembly language than I had originally thought. After the February 1998 issue ofMSJ hit the stands, I received quite a bit of positive email and favorable comments from folks at trade shows. Many readers said, "Have you also thought about covering...?"

MyFebruary 1998 column could have been called "Just Enough Assembly Language to Get By." Since it was such a hit, it's time for the sequel: "Just Enough Assembly Language to Get By, Part II." I'll look at additional instructions and instruction sequences that come up often. I'll also describe some of the most common scenarios when an instruction faults, and what to look for.

Before JMPing into the details, make sure you're at least familiar with the Intelx86 registers and instruction addressing modes. I covered both subjects in my February column. Also note that none of the instructions mentioned in my February column—and none of the ones I'll mention here—require anything more than an 80386 system because the subset of instructions that compilers typically use was standardized at least 12 years ago.

Common Instructions

Instructions INC value, DEC value
Purpose Increments or decrements integer value by 1
Example
INC ESI
INC [EBP-8]
DEC [EAX+4]

The INC and DEC instructions are used to increment and decrement values kept in memory or registers. As you might imagine, these instructions map precisely to the ++ and - - operators in C++ for standard integer operations.

You could use the ADD or SUB instructions to achieve the same effect as INC and DEC, although it would be more expensive in terms of size. Since they are so commonly used, the smallest versions of the INC/DEC instructions take only a single byte. Looking at the Intel opcode map, you'll see that there's an opcode for each of the eight general-purpose registers that INC can be used against (EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP). Another eight opcodes are used for the DEC instruction and the same set of registers.

Instructions MUL value, value DIV value, value
Purpose Multiplication and division
Example
 MUL EAX,EDX
 MUL AL,BYTE PTR [EBP-14h]
 DIV EAX,EBX

I didn't cover the ADD and SUB instructions in my February column since their operation is straightforward. However, the MUL and DIV instructions have some quirks that make them difficult to read and downright quirky to write. Throughout this column, when I mention (E)AX, I'm referring to AL, AX, or EAX. Likewise, when I mention (E)DX, I'm referring to DL, DX, or EDX.

Both MUL and DIV treat their operands as unsigned values. The operands can't be immediate values (such as 3); rather, they must be in registers or memory. You may have noticed that the destination value (the first argument) always seems to be (E)AX. This is by design. The use of the (E)AX register is an implicit part of the instruction. Beyond the implicit use of (E)AX, the (E)DX register is also silently involved. The high bits of the MUL instruction end up in (E)DX. Likewise, for the DIV instruction, E(DX) holds the remainder and (E)AX holds the quotient.

If you write any assembler code, MUL and DIV get even weirder. The assembler (both MASM and the Visual C++® inline assembler) won't let you specify the (E)AX operand. Thus, if you want the instruction MUL EAX,ECX, you would write MUL ECX—just another example of the intuitive language syntax that's made assembly language wildly popular in recent years.

Instructions IMUL value, value IDIVvalue, value
Purpose Signed multiplication and division
Example
 IMUL WORD PTR [EBP+8]
 IMUL EDX,ECX,8
 IDIV EAX,DWORD PTR [EDX]

The IMUL and IDIV instructions treat the operands as signed values. Contrast this to MUL and DIV, which work on unsigned values. IDIV uses (E)AX as the implicit first operand, just as DIV does. Also, like its DIV counterpart, IDIV only works with register or memory values. IMUL, on the other hand, doesn't fit the general patterns of MUL, DIV, and IDIV. It can work with immediate values and it can have a non-(E)AX register as the destination. There's even a form of the IMUL instruction that takes three operands. To my knowledge, this is the only instruction in the Intel opcode set with this distinction.

Instructions PUSHAD, POPAD
Purpose Saves or restores all general-purpose registers via the stack

PUSHAD and POPAD push or pop EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI on the stack, in that order. These instructions are used in situations where many registers may be modified and the programmer wants to leave no evidence of the execution in the code. Although interrupt handlers are passé for most programmers, they're a perfect example of where PUSHAD and POPAD come in handy. Besides taking fewer opcodes than eight individual PUSH instructions, they also execute faster (five clock cycles on a Pentium).

Instructions PUSHFD, POPFD
Purpose Push or pop the EFLAGS register

In some cases, it's inconvenient to use the flags set by a prior operation immediately. Alternatively, you may want to make sure that some operation you're about to execute won't change the current flag values. For these situations, PUSHFD and POPFD are the easiest methods to save and restore those bits.

PUSHFD is one of the atomic components of an interrupt. When an interrupt or an exception occurs, the following code effectively executes:

PUSHFD, PUSH CS, PUSH EIP.

Following the three pushes, the EIP register changes to the interrupt handler address contained in the appropriate slot in the Interrupt Descriptor Table (IDT). Likewise, the IRETD effectively does a POPFD as part of returning from an interrupt.

Instructions SHL, SHR, SHLD, SHRD
Purpose Shift bits to the left or right
Example
 SHL EBX,3
 SHR EBX,CL
 SHLD EDX,ECX,4
 SHRD ESI,EDI,CL

The SHL and SHR instructions are logically equivalent to the C++ << and >> operators. Many of you probably recall that bitwise shifting is a quick way to perform multiplication and division by powers of 2. For example, the SHL EBX,3 instruction has the same effect as multiplying EBX by 8 (23 == 8). Indeed, if you write C++ code that multiplies or divides an unsigned value by 2, 4, 8, 16, and so on, it will most likely compile to a SHL instruction.

When shifting left, the low-order bits are filled with zeroes. The final high-order bit that's "shifted out" is moved to the carry flag (CF). In other words, the carry flag is like a virtual 33rd bit. When shifting right, the high-order bits are filled with zeroes, and the last bit shifted out moves to the carry flag.

Instruction ADD [EAX],AL
Purpose None

You may see a lot of this particular instruction, and you'll probably see it repeated. However, ADD[EAX], AL has no special significance. The opcode bytes for this instruction are 00 00. In other words, it's what you'll see if you're viewing a series of data bytes that all contain the value 0. Nothing to see here. You can all go home now.

Instruction CLD
Purpose Clears the direction flag

In myFebruary 1998 column, I described the string instructions LODSx, SCASx, STOSx, and MOVSx. Each of these instructions uses the ESI or EDI register to point at the memory to be read or written to. These instructions are typically used in conjunction with the REP, REPE, or REPNE prefixes, which cause the string instruction to execute several times until some specific condition is met.

After each REPx-induced iteration, the CPU changes the ESI or EDI register to point to an adjacent memory location. The direction in which the registers move is given by the direction flag. If the direction flag is clear, ESI or EDI is incremented after each instruction (thus causing the next higher memory location to be referenced in the next iteration). When the direction flag is set, ESI or EDI decrements after each iteration.

Most of the time it's easiest to work moving forward in memory (toward higher addresses) so that the direction flag is usually clear. However, it's generally not safe to assume that the flag is clear. Thus, you'll often see the CLD instruction somewhere before a string operation such as REP MOVSB.

Instructions NOT value, NEG value
Purpose Negation of values
Example
 NOT DWORD PTR [EBP-8]
 NEG EDX

The NOT instruction does ones-complement negation. That is, it applies the NOT operation to each bit in the operand. An initial value of 0 will become 0xFFFFFFFF after a NOT instruction. The C++ ~ operator is typically implemented via the NOT instruction.

The NEG instruction does twos-complement negation. (If you're not 100 percent up on ones versus twos-complement negation, don't feel bad. I learned this stuff 10 years ago in college, and I've completely forgotten it!) An easier way to think of the NEG instruction is that it puts a - sign in front of the value. Thus, using NEG on -3 yields 3, while NEG applied to 4 yields -4. To summarize, you can think of NOT as affecting individual bits, while NEG operates on the entire value.

Instruction NOP
Purpose No operation

The NOP instruction does nothing and affects nothing. It's a single-byte opcode that executes in one clock cycle and is primarily used to pad code. For example, a compiler might want the beginning of a procedure to start on a 16- byte boundary. The compiler/linker would insert enough NOP instructions between the end of one procedure and the beginning of the next procedure to create the desired alignment.

If you're confident in your assembler abilities, the NOP instruction can be applied to code in memory or in the executable file. You might know that some instruction you're about to execute will cause a fault in a debugger. If you want to skip that instruction, use the debugger to write enough NOP opcodes (0x90) to eliminate the instruction. This is useful to squash hardcoded INT 3 breakpoint instructions while you're running under the debugger, effectively not stopping at the breakpoint. Really advanced users can implement NOP instructions to obliterate entire regions of code in an executable. (Warning! Harder than it looks.)

Another advanced use of the NOP instruction is when you want to make it easy to patch or hook into your code. At the beginning of a procedure or block of code, put in enough NOP instructions for the desired goal. Subsequent patching or hooking code can write JMPs, CALLs, or whatever into the NOP area.

Instruction INT 3
Purpose Debugger interrupt

INT 3 has two uses—one intended by the original CPU designers, the other accidental. The INT 3 instruction is the standard method to suspend a program and transfer control to a debugger. In normal use, programs don't include INT 3 instructions in their code. Rather, when you set a traditional breakpoint with a debugger, it temporarily overwrites the target instruction with an INT 3 instruction. (The LODPRF32 program from my July 1995 column illustrates this.) Note that an INT 3 instruction is the heart of the DebugBreak API for Intel CPUs.

The other offbeat use of the INT 3 instruction is as a paranoid NOP. In those cases where a NOP would be used for padding (and theoretically never executed), an INT 3 can be used instead. Like NOP, an INT 3 instruction is only a single byte. The key difference is that if a bug crept in and you executed the INT 3 instruction, you'd pop into the debugger. In the same scenario, the CPU would blithely sail through NOP instructions and wreak havoc someplace farther away from the original error.

The Microsoft® linker uses INT 3s as paranoid NOPs when creating padding for incremental linking. The linker also uses them as padding between procedures it wants to align on a particular memory boundary. Usually this alignment is on a multiple of 16 bytes unless you have the "optimize for size" compiler option set.Figure 1 shows a section of code from CALC.EXE that illustrates INT 3 padding in action.

Instruction LOCK
Purpose This instruction locks the memory bus during the next instruction
Example
LOCK INC DWORD PTR [EDX+04] 

Technically speaking, LOCK is an instruction prefix rather than an instruction in its own right. In a multiprocessor environment, multiple processors could access the same memory location at the same time. The LOCK prefix insures that the instruction associated with it will have exclusive access to the destination memory location.

If you've ever examined the EnterCriticalSection API, you'll see that if the critical section isn't currently held, the code essentially just increments a counter. A LOCK prefix is used with an INC instruction to guarantee that one thread won't increment the counter while another thread on another CPU is reading it. You'll also see the LOCK instruction used with multiprocessor synchronization APIs such as InterlockedExchange and InterlockedIncrement.

A final thought on the LOCK prefix: you may recall a bug on older Pentium CPUs where a particular instruction sequence could cause the CPU to freeze up. (See the February 1998 Editor's Note if you need a refresher.) That instruction sequence isn't a valid sequence, and the LOCK prefix plays a vital role in the ensuing CPU meltdown.
Common Instruction Sequences

Sequence CMP register_X, immediate_value_A
                  JE XXXXXXXX
                  CMP register_X, immediate_value_B
                  JE XXXXXXXX
Purpose C++ switch statement
Example
 CMP EAX,1
 JE  00400248
 CMP EAX,3
 JE  0040026E
 CMP EAX,7
 JE  004002A0

This sequence (compare and JMP if equal) is the most straightforward encoding of a C++ switch statement that I've seen. It's also very easy to pick out when you encounter it in a debugger. In the example code, the switch statement would look something like this:

Under The Hood: Assembly instructions used often1

Apparently,there's quite a bit more interest in Win32® assembly language than I had originally thought. After the February 1998 issue ofMSJ hit the stands

Under the hood of Pixling World

Under the hood of Pixling WorldThis will be a look under the hood of Pixling World, an artificial life/evolution simulator/god simulator I’m building. As a

Under the hood: Airbnb

Under the hood: AirbnbThis blog series focuses on examining the collection of device data by various popular mobile applications. This data is often collec

To make sense of A.I. decisions, ‘peek under the hood’

To make sense of A.I. decisions, ‘peek under the hood’Now that humans have programmed computers to learn, we want to know exactly what they’ve learned and

Uber Under the Hood

Sharing the CurbCurb space in city centers is a scarce resource. And as demand for this space increases — with the rise of urban deliveries, shared mobilit

Under the hood of TCP Socket Implementation On Golang

Under the hood of TCP Socket Implementation On GolangGolang is surely my first to go language to write web application, it hides many details but still giv

stm32編譯時出現 error: #35: #error directive: "Please select first the target STM32F10x device used

用keil4新增標頭檔案時,為了圖方便把建立的標頭檔案放在桌面編譯的時候就出現了#error directive: "Please select first the target STM32F10x device used 這時候應該在你的c/c++(這裡找c/c++)裡面的Defi

Under the Pinus

不久前算是將韋東山第二期視訊教程看完了，其中大多數的實驗都做過，但其中由於Linux核心版本原因，韋老師教學用的是2.6，我下載的學習用的是4.4，當初年少輕狂啊。。。顯然，核心中已經有很多東西發生了變化，學習時也難免囫圇吞棗。再加上本人記憶曲線下降的很快所以，想著在

Under the Pine

/* AUTHOR: Pinus * Creat on : 2018-11-4 * KERNEL : linux-4.4.145 */ 概述現象：把USB裝置接到PC 1. 右下角彈出"發現android phone" 2. 跳出一個對話方塊，提示你安裝驅

Could an artificial intelligence be considered a person under the law?

In the U.S., corporations have been given rights of free speech and religion. Some natural features also have person-like rights. A new argument has laid a

The Best Resources I Used to Teach Myself Machine Learning

The Best Resources I Used to Teach Myself Machine LearningThe field of machine learning is becoming more and more mainstream every year. With this growth c

Medicine Under the Magnifying Glass

We’ve moved from a beloved old adage to formal characterizations of systemic problems; from models that describe the unreliability of research environments

模型評估指標AUC（area under the curve）

AUC在機器學習領域中是一種模型評估指標。根據維基百科的定義，AUC(area under the curve)是ROC曲線下的面積。所以，在理解AUC之前，要先了解ROC是什麼。而ROC的計算又需要藉助混淆矩陣，因此，我們先從混淆矩陣開始談起。混淆矩陣

Medicine Under the Magnifying Glass – Towards Data Science

In Part 1, I'll introduce the problem of bad medicine. As you review the evidence for it, you'll also get a good sense of why it's so entrenched. In Part 2

sbt assembly編譯打包時報: deduplicate: different file contents found in the following:

assembly sbt marathon 參考文章：http://blog.csdn.net/oopsoom/article/details/41318599[error] (marathon/*:assembly) deduplicate:different file contents fou

google瀏覽器chrome The certificate used to load uses

chrome瀏覽器谷歌瀏覽器 ssl證書 google谷歌瀏覽器chrome升級到62版本後，打開《秦子恒微信課堂》網頁，原來可以使用的https，突然出現下面的提示：The certificate used to load https://211.qinziheng.com/ uses an

The used SELECT statements have a different number of columns？？？

自己 lin ack -o _id strong clas 之前不一致今天我們組就我一個人留守在這裏修復bug了，有點小悲傷啊，他們都問我能不能hold得住啊，我當然能hold得住啊；在看一個入庫的存儲過程中，在數據庫運行的時候是沒問題的，項目已啟動，進行入庫操作就是

EntityFramework 啟用遷移 Enable-Migrations 報異常 "No context type was found in the assembly"

update 五個作用 services 繼承。。 common www log 轉自：http://www.cnblogs.com/stevenhqq/archive/2013/04/18/3028350.html 以前做項目的時候，沒有采用分類庫的形式，所以遷移一

OperationFailed Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

imu command mongo sed 大內存 gson style fan index 　　按照錯誤提示，知道這是排序的時候報的錯，因為 mongo 的 sort 操作是在內存中操作的，必然會占據內存，同時mongo 內的一個機制限制排序時最大內存為 32M，當排序的

Qualcomm platform, the commonly used parameters of charger and battery in device tree file

inpu res alc max stc sum In nat TE 1 battery charging voltage : qcom,float-voltage-mv = <0x10fe>; 2 battery recharge threshold : qc

Under The Hood: Assembly instructions used often1

相關推薦