1. 程式人生 > 其它 >CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)

CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)

CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)

標籤(空格分隔): CS:APP

目錄

prologue

computer can only execute machine code,which is a seuqence of binary code encoding the low-level operations such as manipulating data,manage memory,read and write data on storage devices and communicating over the internet.

most machine codes are derived from the work of compiler,which can generate machine code under the constraint of programming language,the targeted machine and operation system.

Generally speaking,(pre-processor will expand the source code to include all files declared in the source file with the command #include as well as any macros specified with #define. )compiler first convert source code to assembly code followed by invoku=ing noth an assembler and a linker to generate the executable machine code from assembly code.

What the chapter 3 is gonna tell is one system created for studying and working backwards to take a closer look into compiler.


some terms:

terms description
x86-64 one processor named as Intel64:the 64 bits extension to IA32

3.1 historical perspective

3.2 program encdoings

GNU provides many tools for us to compile and aseemble the whole code via various cpmmands:

3.2.1 some commands

1.gcc -Og -o exe_name source1_name.c source2_name.c

gcc compiles the source code then outputs the executable file.

options description similar options
-Og the level of optimization -O1,-O2

2.gcc -Og -S source_name.c

gcc run the compiler to generate an assembly file and go no further.

3.gcc -Og -c source_name

gcc runs the compiler to generate object file in the binary format with the extension of .o.

3.2.2 machine-level code

1.several different forms of abstraction

a) What computer acts and follows can be described at machine level winthin the ISA(Instructions set architecture).

b) each instructions defines :

1. the processor state
2. the format of the instructions
3. the effect after each instruction executed on the processor state 

c) virtual address space

this abstraction treats the whole memory as a very large byte array.(actual implement will cover in the chapter9 but for better understanding virtual treatment is accpeted.)

2.ISA : assembly language

the whole brief process from source code to machine code also reveals the even close relation between assembly code and machine code.Their main distinction is assembly code is representated as texture in an easily decipherable way whereas machine code is representated in binary format.

3.machine code versus C language

There are four parts of the processor state visable in machine code &&assembly code but hidden from C language:

No. name description
1 PC(program counter) where the PC points at is the next instruction will be executed
2 rgister file 16 named registers of the length 64 bits[x86-64]
3 condition code register(flag register) record the program state
4 vector registers ????no idea!

Even though C provideds a tons of data types , for example ,interger,float,array,customized data types,machine language just treats them as a continous array of bytes in virtual address space.

4.run-time stack

It is run-time stack which is crucial to manage procudures and returns and parameters as so forth.

3.2.3 codes example

The code of this chapter03 is that the program executed by compter is simplely a sequence of binary code.Nothing more,nothing less.

the whole process of C compiler:

code.c -> code.s -> code.o ->code.exe

1. one tool : disassembler

disassembler is a tool provided by gdb where it can generate assembly code from executable file.

One important point:disassembler identifies the end of some segment extends with nop,which means no operation will happen here,the main purpose of it is make a better placement for the next segment in terms of managing system performance.

2. x86-64 instructions

1.Instructions ranges in length from 1 to 15 bytes.A commonly used instructions with fewer operands has a smaller number of bytes than a less commonly used ones or ones with more operands.

2.Every instruction has its unique decoding of bytes.

3.Disassembler only decode bytes in executable file without any access to assembly code generated by C compiler.

4.Different naming convention between disassembler and compiler.

3.2.4 assembly code

Not only some information we donot need to concern about but also no any readible texture can impende the understanding to it.So it's important to learn how to read these assembly code.

  1. easy instructions like mov,add and so on

  2. any line begins with "." are the directives to guide the linker and assembler.(CSAPP suggest that it's better to ignore these kinds of directives.)

  3. a brilliant stylized version provided on page.212

  4. how to incorporate assembly code into c code? =>combine assembly code of one complete function with the c code DURING the linking stage.

3.3 Data formats

Consider this issues with the knowledge of assembly language we learnt,it's similar to the topic of how to move immediate number into the main memory?

One question arise:
If 1 of int is ready to put into main memory,how to specify the length of 4 bytes?

solution provided by asm:(assume number 1 of 4 bytes)

;solution
mov dword ptr ds:[ea] , 1

=>the size of operand must be specified .

3.3.1 several types of data formats(x86-64)

bit:the smallest unit of describing computer storage device.

data size description
byte 8 bits
word 16 bits - 2 bytes
dword(double words) 2 words-4 bytes
qword(quad words) 4 words-8 bytes

3.3.2 the suffixes to the instructions in assembly code

movw ds:[ea],1
;equivalent to 
mov word ptr ds:[ea],1 

3.3.3 assembly instruction versus C language

c data type data size intel asm suffix bytes
char byte b 1
short word w 2
int dword l[1] 4
long quad word q 8
char* quad word q 8
float dword l 4
double qword q 8

Question :
integer and point number appears to be dufferent in terms of instructions....no detail so far!


  1. l stands for "long words" ↩︎