CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)
CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)
標籤(空格分隔): CS:APP
目錄- CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)
prologue
computer can only execute machine code,which is a seuqence of binary code encoding the low-level operations such as manipulating data,manage memory,read and write data on storage devices and communicating over the internet.
most machine codes are derived from the work of compiler,which can generate machine code under the constraint of programming language,the targeted machine and operation system.
Generally speaking,(pre-processor will expand the source code to include all files declared in the source file with the command #include as well as any macros specified with #define. )compiler first convert source code to assembly code followed by invoku=ing noth an assembler and a linker to generate the executable machine code from assembly code.
What the chapter 3 is gonna tell is one system created for studying and working backwards to take a closer look into compiler.
some terms:
terms | description |
---|---|
x86-64 | one processor named as Intel64:the 64 bits extension to IA32 |
3.1 historical perspective
3.2 program encdoings
GNU provides many tools for us to compile and aseemble the whole code via various cpmmands:
3.2.1 some commands
1.gcc -Og -o exe_name source1_name.c source2_name.c
gcc compiles the source code then outputs the executable file.
options | description | similar options |
---|---|---|
-Og | the level of optimization | -O1,-O2 |
2.gcc -Og -S source_name.c
gcc run the compiler to generate an assembly file and go no further.
3.gcc -Og -c source_name
gcc runs the compiler to generate object file in the binary format with the extension of .o.
3.2.2 machine-level code
1.several different forms of abstraction
a) What computer acts and follows can be described at machine level winthin the ISA(Instructions set architecture).
b) each instructions defines :
1. the processor state
2. the format of the instructions
3. the effect after each instruction executed on the processor state
c) virtual address space
this abstraction treats the whole memory as a very large byte array.(actual implement will cover in the chapter9 but for better understanding virtual treatment is accpeted.)
2.ISA : assembly language
the whole brief process from source code to machine code also reveals the even close relation between assembly code and machine code.Their main distinction is assembly code is representated as texture in an easily decipherable way whereas machine code is representated in binary format.
3.machine code versus C language
There are four parts of the processor state visable in machine code &&assembly code but hidden from C language:
No. | name | description |
---|---|---|
1 | PC(program counter) | where the PC points at is the next instruction will be executed |
2 | rgister file | 16 named registers of the length 64 bits[x86-64] |
3 | condition code register(flag register) | record the program state |
4 | vector registers | ????no idea! |
Even though C provideds a tons of data types , for example ,interger,float,array,customized data types,machine language just treats them as a continous array of bytes in virtual address space.
4.run-time stack
It is run-time stack which is crucial to manage procudures and returns and parameters as so forth.
3.2.3 codes example
The code of this chapter03 is that the program executed by compter is simplely a sequence of binary code.Nothing more,nothing less.
the whole process of C compiler:
code.c -> code.s -> code.o ->code.exe
1. one tool : disassembler
disassembler is a tool provided by gdb where it can generate assembly code from executable file.
One important point:disassembler identifies the end of some segment extends with nop,which means no operation will happen here,the main purpose of it is make a better placement for the next segment in terms of managing system performance.
2. x86-64 instructions
1.Instructions ranges in length from 1 to 15 bytes.A commonly used instructions with fewer operands has a smaller number of bytes than a less commonly used ones or ones with more operands.
2.Every instruction has its unique decoding of bytes.
3.Disassembler only decode bytes in executable file without any access to assembly code generated by C compiler.
4.Different naming convention between disassembler and compiler.
3.2.4 assembly code
Not only some information we donot need to concern about but also no any readible texture can impende the understanding to it.So it's important to learn how to read these assembly code.
-
easy instructions like mov,add and so on
-
any line begins with "." are the directives to guide the linker and assembler.(CSAPP suggest that it's better to ignore these kinds of directives.)
-
a brilliant stylized version provided on page.212
-
how to incorporate assembly code into c code? =>combine assembly code of one complete function with the c code DURING the linking stage.
3.3 Data formats
Consider this issues with the knowledge of assembly language we learnt,it's similar to the topic of how to move immediate number into the main memory?
One question arise:
If 1 of int is ready to put into main memory,how to specify the length of 4 bytes?
solution provided by asm:(assume number 1 of 4 bytes)
;solution
mov dword ptr ds:[ea] , 1
=>the size of operand must be specified .
3.3.1 several types of data formats(x86-64)
bit:the smallest unit of describing computer storage device.
data size | description |
---|---|
byte | 8 bits |
word | 16 bits - 2 bytes |
dword(double words) | 2 words-4 bytes |
qword(quad words) | 4 words-8 bytes |
3.3.2 the suffixes to the instructions in assembly code
movw ds:[ea],1
;equivalent to
mov word ptr ds:[ea],1
3.3.3 assembly instruction versus C language
c data type | data size | intel asm suffix | bytes |
---|---|---|---|
char | byte | b | 1 |
short | word | w | 2 |
int | dword | l[1] | 4 |
long | quad word | q | 8 |
char* | quad word | q | 8 |
float | dword | l | 4 |
double | qword | q | 8 |
Question :
integer and point number appears to be dufferent in terms of instructions....no detail so far!
-
l stands for "long words" ↩︎