Study Notes of CS:APP (Till Book 3.8 & Lecture 8.1, Regularly Updated)
Study Notes of the Book CS:APP and its ICS+ Course 15-213
Book & Course Information
Instructors |
Randal E. Bryant and David R. O'Hallaron |
Textbooks |
Randal E. Bryant and David R. O'Halloron, Computer Systems: A Programmer's Perspective, Third Edition, Pearson, 2016 Brian W. Kernighan and Dennis M. Ritchie, The C Programming Language, Second Edition, Prentice Hall, 1988 |
Home |
Related Materials
Reading Notes:
•嵌入式與Linux那些事's cnblogs posts
•北洛's cnblogs posts
•FannieGirl's cnblogs posts
•頔瀟's CSDN posts
Learning Materials:
•CS:APP3e Book Site
•15-213/18-213 Fall 2019 (Latest Course Lectured by Randy Bryant, Up-to-date Slides, etc. Available)
•SJTU ICS SE101 2019
•Compiler Explorer
My Foreword
Content Covered
Currently I decide only to study the common parts of the five system courses suggested and the 15-213 implementation, as is listed in the table below.
Chapter |
Phase |
||
1 |
|||
1 |
Overview |
1.1-1.10 |
|
2 |
2.1-2.3, 2.5 |
||
3 |
3.1-3.10, 3.12 |
||
4 |
|||
5 |
|||
6 |
6.1-6.4, 6.7 |
||
7 |
|||
8 |
|||
9 |
9.1-9.8, 9.13 |
||
10 |
|||
11 |
|||
12 |
First, read a section or sections of the book with the assistance of eudic and take down the key points. Second, watch the corresponding 15-213 video(s) if available and add the complement provided by the lecture PPT.
Book errata will be included.
Notes Organization
The structural style of my notes lies between that of a normal article and of a normal piece of slides.
Notes of "Summary" sections are no more and no less than the original text with topic words highlighted.
Formatting
Use the style scheme as much close as the global edition as possible, to the degree in which formatting would not become a burden on me.
Text size and font have been adjusted for online reading on a browser.
Fixing Post Style
•Macros for Blog Post on Word
•Reminder for myself: No applying to the original Word Document!
Sub BeforePublish()
Call ConvertNumbersToText
Call PictureResize(110)
Call AddSmallCapsMarks
Call AddAllCapsMarks
Call IndentToBlankSpaces
End Sub
•Replacements for HTML on VS Code.
Replace |
With |
\$MALL([^$@]*)CAP\$ |
<span style="font-variant: small-caps;">$1</span> |
@LL([^$@]*)C@PS |
<span style="text-transform: uppercase;">$1</span> |
<p(>(<.*>*)?) |
<p style="text-indent:2em"$1 |
|
|
(<p style=")margin-left: (\d)0pt;(">(?:<strong>)?<span style=".*)(">(?:<strong>)?• ) |
$1padding-left:$2em;$3margin-left:-1em$4 |
•Alternatively, replacements using a Java program (unstable).
To-do List
•Have more useful chapters or sections covered. What I mean by "useful" is that the content may make sense for my personal advances; I may include parts that are required by some exam or some job.
•Refine my notes. The first versions are definitely way too rambling. Also, avoid abusing \r\ns and list items! In particular:
•Remove all derivations.
•…
•Remove in-line CSS style of
•Fully match labels (such as '<', '>' and '/', etc.) for accuracy.
Course Overview: Topics
Programs and Data
•Bits operations, arithmetic, assembly language programs
•Representation of C control and data structures
•Includes aspects of architecture and compilers
The Memory Hierarchy
•Memory technology, memory hierarchy, caches, disks, locality
•Includes aspects of architecture and OS
Exceptional Control Flow
•Hardware exceptions, processes, process control, Unix signals, nonlocal jumps
•Includes aspects of compilers, OS, and architecture
Virtual Memory
•Virtual memory, address translation, dynamic storage allocation
•Includes aspects of architecture and OS
Networking, and Concurrency
•High level and low-level I/O, network programming
•Internet services, Web servers
•concurrency, concurrent server design, threads
•I/O multiplexing with select
•Includes aspects of networking, OS, and architecture
Chapter 1A Tour of Computer Systems
A computer system consists of hardware and systems software that work together to run application programs.
1.1Information Is Bits + Context
A program begins life as a source program (or source file). The source program is a sequence of bits, each with a value of 0 or 1, organized in bytes (8-bit chunks). Each byte represents some text character in the program.
Most computer systems represent text characters using the ASCII standard that represents each character with a unique byte-size integer value.
Text files: files that consist exclusively of ASCII characters. Binary files: all other files.
A fundamental idea: All information in a system is represented as a bunch of bits. The only thing that distinguishes different data objects is the context in which we view them.
1.2Programs Are Translated by Other Programs into Different Forms
In order to run .c on the system, the individual C statements must be translated by other programs into a sequence of low-level machine-language instructions. These instructions are then packaged in anexecutable object program (or an executable object file) and stored as a binary disk file.
The compilation system: the programs that perform the four phases (preprocessor, compiler, assembler, and linker).
•Preprocessing phase. The preprocessor (cpp) modifies the original C program according to directives that begin with the '#' character. The result is another C program, typically with the .i suffix.
•Compilation phase. The compiler (cc1) translates .i into .s, which contains an assembly-language program. Assembly language is useful because it provides a common output language for different compilers for different high-level languages.
•Assembly phase. Next, the assembler (as) translates .s into machine language instructions, packages them in a relocatable object program, and stores the result in the object file .o.
•Linking phase. The printf function is part of the standard C libraryprovided by every C compiler. The printf function resides in a separate precompiled object file printf.o, which is merged with our .o program by the linker (ld). The result is an executable object file (or simply executable) that is ready to be loaded into memory and executed by the system.
1.3It Pays to Understand How Compilation Systems Work
Some important reasons:
•Optimizing program performance.
•Understanding link-time errors.
•Avoiding security holes.
•Buffer overflow vulnerabilities
1.4Processors Read and Interpret Instructions Stored in Memory
The shell is a command-line interpreter that prints a prompt, waits for you to type a command line, and then performs the command. If the first word of the command line does not correspond to a built-in shell command, then the shell assumes that it is the name of an executable file that it should load and run.
1.4.1Hardware Organization of a System
Buses
Buses: a collection of electrical conduits running throughout the system that carry bytes of information back and forth between the components.
•Typically designed to transfer words (fixed-size chunks of bytes).
•The word size (The number of bytes in a word) is a fundamental system parameter that varies across systems. Most machines: either 4 bytes (32bits) or 8 bytes (64 bits).
I/O Devices
Input/output (I/O) devices: the system's connection to the external world.
•Example system has 4: a keyboard and mouse, a display and a disk drive (or simply disk)
•Each is connected to the I/O bus by either a controller or an adapter.
The distinction: packaging.
•Controllers are chip sets in the device itself or on the motherboard (the system's main printed circuit board).
•An adapter is a card that plugs into a slot on the motherboard.
•The purpose of each: to transfer information back and forth between the I/O bus and an I/O device.
Main Memory
The main memory: a temporary storage device that holds both a program and the data it manipulates while the processor is executing the program.
•Physically, consists of a collection of dynamic random access memory (DRAM) chips.
•Logically, is organized as a linear array of bytes, each with its own unique address (array index) starting at zero.
•In general, each of the machine instructions that constitute a program can consist of a variable number of bytes. The sizes of data items that correspond to C program variables vary according to type.
Processor
The central processing unit (CPU), or simply processor: the engine that executes (interprets) instructions stored in main memory.
•The program counter (PC): a register (a word-size storage device) at its core.
•At any point in time, points at (contains the address of) some machine-language instruction in main memory.
•Repeatedly
•executes the instruction pointed at by the PC
•updates the PC to point to the next instruction
•Appears to operate according to a very simple instruction execution model, defined by its instruction set architecture.
•Instructions execute in strict sequence, and executing a single instruction involves performing a series of steps.
•The processor
•reads the instruction from memory pointed at by the program counter (PC),
•interprets the bits in the instruction,
•performs some simple operation dictated by the instruction, and then
•updates the PC to point to the next instruction, which may or may not be contiguous in memory to the instruction that was just executed.
•Operations revolve around
•main memory,
•the register file, and
•the arithmetic/logic unit (ALU).
•The register file: a small storage device that consists of a collection of word-size registers, each with its own unique name.
•The ALU: computes new data and address values.
•Examples of operations:
•Load: Copy a byte or a word from main memory into a register, overwriting the previous contents of the register.
•Store: Copy a byte or a word from a register to a location in main memory, overwriting the previous contents of that location.
•Operate: Copy the contents of two registers to the ALU, perform an arithmetic operation on the two words, and store the result in a register, overwriting the previous contents of that register.
•Jump: Extract a word from the instruction itself and copy that word into the PC, overwriting the previous value of the PC.
•Distinguish the processor's instruction set architecture,
•describing the effect of each machine-code instruction,
•from its microarchitecture,
•describing how the processor is actually implemented.
1.4.2Running the hello Program
Using direct memory access (DMA), the data travel directly from disk to main memory, without passing through the processor.
1.5Caches Matter
Lesson: A system spends a lot of time moving information from one place to another.
The machine instructions in the program
•Stored on disk
•Copied to main memory
•Copied into the processor
The data string
•On disk
•Copied to main memory
•Copied to the display device
Much of this copying is overhead: slows down the "real work" of the program.
A major goal for system designers: to make these copy operations run as fast as possible.
Larger storage devices are slower than smaller ones. And faster devices are more expensive to build than their slower counterparts.
The processor–memory gap: The processor can read data from the register file much faster than from memory.
To deal with it: Cache memories(or simply caches): smaller, faster storage devices that serve as temporary staging areas for information that the processor is likely to need in the near future.
•An L1 cache
•On the processor chip
•Holds many bytes
•Can be accessed nearly as fast as the register file
•An L2 cache
•Larger
•With fewer bytes
•Connected to the processor by a special bus
•Still much faster than accessing the main memory.
•The L1 and L2 caches are implemented with static random access memory (SRAM, a hardware technology).
•Newer and more powerful systems: 3 levels.
•The idea behind:
•Exploit locality(the tendency for programs to access data and code in localized regions) to get a very large and fast memory.
•Set up caches to hold data that are likely to be accessed often.
1.6Storage Devices Form a Hierarchy
The storage devices in every computer system are organized as a memory hierarchy similar to Figure.
The main idea: storage at one level serves as a cache for storage at the next lower level.
1.7The Operating System Manages the Hardware
Programs relay on the services provided by the operating system to accessed the hardware.
The operating system:
•A layer of software interposed between the application program and the hardware.
•All attempts by an application program to manipulate the hardware must go through the operating system.
•Two primary purposes:
(1) to protect the hardware from misuse by runaway applications and
(2) to provide
applications with simple and uniform mechanisms for manipulating complicated and often wildly different low-level hardware devices.
•Both goals are achieved via the fundamental abstractions: processes, virtual memory, and files.
1.7.1Processes
A process: the operating system's abstraction for a running program.
•Multiple processes can run concurrently on the same system, and each process appears to have exclusive use of the hardware.
Concurrently: The instructions of one process are interleaved with the instructions of another process.
•In either case, a single CPU can appear to execute multiple processes concurrently by having the processor switch among them:
•Traditional systems could only execute one program at a time,
while newer multicore processors can execute several programs simultaneously.
•The operating system performs this interleaving with context switching (a mechanism).
•A uniprocessor system containing a single CPU and multiprocessor systems
Context switching of a uniprocessor system:
•The operating system keeps track of all the context (state) information that the process needs in order to run. Including information such as the current values of the PC, the register file and the contents of main memory
•At any point in time, a uniprocessor system can only execute the code for a single process.
When the operating system decides to transfer control from the current process to some new process, it performs a context switch by
•saving the context of the current process,
•restoring the context of the new process,
•and then passing control to the new process.
The new process picks up exactly where it left off.
•The basic idea for the example scenario:
The kernel: the portion of the operating system code that is always resident in memory.
•The transition from one process to another is managed by the operating system kernel.
•When an application program requires some action by the operating system, it executes a special system call instruction, transferring control to the kernel.
The kernel then performs the requested operation and returns back to the application program.
•Note:
•Not a separate process.
•Instead, a collection of code and data structures that the system uses to manage all the processes.
1.7.2Threads
•A process can actually consist of multiple threads (execution units)
•Each running in the context of the process and sharing the same code and global data.
•Increasingly important:
•Requirement for concurrency in network servers
•Easier to share data between than processes
•More efficient than processes
•Multi-threading: make programs run faster
1.7.3Virtual Memory
Virtual memory: an abstraction that provides each process with the illusion that it has exclusive use of the main memory.
Each process has the same uniform view of memory: virtual address space.
•For Linux processes:
Areas (starting with the lowest addresses and working the way up):
•Program code and data.
•Code begins at the same fixed address for all processes, followed by data locations that correspond to global C variables.
•Are initialized directly from the contents of an executable object file.
•Heap.
•Follow the code and data areas immediately.
•Expands and contracts dynamically at run time as a result of calls to C standard library routines.
•Shared libraries.
•Near the middle of the address space.
•Holds the code and data for shared libraries.
•Powerful but difficult.
•Stack (user stack).
•At the top of the user's virtual address space.
•Used by the compiler to implement function calls.
•Expands and contracts dynamically during the execution of the program.
In particular, each time:
•call a function, grows.
•return from a function, contracts.
•Kernel virtual memory.
•The top region reserved.
•Application must invoke the kernel to read or write the contents of this area or to directly call functions defined in the kernel code.
For virtual memory to work, a sophisticated interaction is required between the hardware and the operating system software.
•The basic idea: to store the contents of a process's virtual memory on disk and then use the main memory as a cache for the disk.
1.7.4Files
A file: a sequence of bytes.
•Every I/O device is modeled as a file.
•All input and output in the system is performed by reading and writing files, using Unix I/O (a small set of system calls).
•Very powerful
•Provides applications with a uniform view of all the varied I/O devices that might be contained in the system.
1.8Systems Communicate with Other Systems Using Networks
From the point of view of an individual system, the network can be viewed as just another I/O device.
•When the system copies a sequence of bytes from main memory to the network adapter, the data flow across the network to another machine.
•Similarly, the system can read data sent from other machines and copy these data to its main memory.
1.9Important Themes
A system is a collection of intertwined hardware and systems software that must cooperate in order to achieve the ultimate goal of running application programs.
1.9.1Amdahl's Law
Amdahl's law:
•An observation about the effectiveness of improving the performance of one part of a system.
•The main idea: when we speed up one part of a system, the effect on the overall system performance depends on both how significant this part was and how much it sped up.
•Consider a system in which executing some application requires time Told.
Suppose some part of the system requires a fraction α of this time, and that we improve its performance by a factor of k.
The overall execution time would be Tnew = Told[(1− α) + α/k].
The speedup S = Told/Tnew = 1/[(1 − α) + α/k].
•The major insight—to significantly speed up the entire system, we must improve the speed of a very large fraction of the overall system. One interesting special case: setting k to ∞. S∞ = 1/(1 − α).
•A general principle for improving any process.
•Most meaningful for computers. Performance is routinely improved by high factors.
1.9.2Concurrency and Parallelism
Concurrency: the general concept of a system with multiple, simultaneous activities.
Parallelism: the use of concurrency to make a system run faster.
•Can be exploited at multiple levels of abstraction in a computer system.
•Highlight 3 here from the highest to the lowest level in the system hierarchy.
Thread-Level Concurrency
With the process abstraction, multiple programs execute at the same time (concurrency). With threads, multiple control flows execute within a single process.
A uniprocessor system: the configuration that a single processor has to switch among multiple tasks.
•Since the advent of time-sharing
•Only simulated, by having a single computer rapidly switch among its executing processes
•Allows:
•multiple users to interact with a system at the same time
•a single user to engage in multiple tasks concurrently
A multiprocessor system: a system consisting of multiple processors all under the control of a single operating system kernel.
•Have become commonplace with the advent of multi-core processors and hyperthreading
•Multi-core processors have several CPUs ("cores") integrated onto a single integrated-circuit chip.
•Each L1 cache is split into two parts—
•one to hold recently fetched instructions
•one to hold data
•The cores share higher levels of cache as well as the interface to main memory.
•Hyperthreading (simultaneous multi-threading): a technique that allows a single CPU to execute multiple flows of control.
•It involves having multiple copies of some of the CPU hardware,
•such as program counters and register files,
•while having only single copies of other parts of the hardware,
•such as the units that perform floating-point arithmetic.
•A hyperthreaded processor decides which of its threads to execute on a cycle-by-cycle basis.
•It enables the CPU to take better advantage of its processing resources.
•Improve system performance—
•Reduces the need to simulate concurrency when performing multiple tasks.
•Can run a single application program faster, but only if that program is expressed in terms of multiple threads that can effectively execute in parallel.
Instruction-Level Parallelism
Instruction-level parallelism: the property that processors can execute multiple instructions at one time.
•Pipelining: the actions required to execute an instruction are partitioned into different steps and the processor hardware is organized as a series of stages, each performing one of these steps.
•The stages can operate in parallel, working on different parts of different instructions.
•Superscalar processors: Processors that can sustain execution rates faster than 1 instruction per cycle.
•Most modern processors support superscalar operation.
•Application programmers can use a high-level model to understand the performance of their programs.
•They can then write programs such that the generated code achieves higher degrees of instruction-level parallelism and therefore runs faster.
Single-Instruction, Multiple-Data (SIMD) Parallelism
Single-instruction, multiple-data (SIMD) parallelism: The mode that processors with special hardware allows a single instruction to cause multiple operations to be performed in parallel.
1.9.3The Importance of Abstractions in Computer Systems
The use of abstractions is one of the most important concepts in computer science.
•On the processor side
•The instruction set architecture: an abstraction of the actual processor hardware
•A machine-code program behaves as if it were executed on a processor that performs just one instruction at a time.
•The underlying hardware is always in a consistent way.
•Different processor implementations can execute the same machine code while offering a range of cost and performance.
•On the operating system side
•Files: an abstraction of I/O devices
•Virtual memory: an abstraction of program memory
•Processes: an abstraction of a running program
•The virtual machine: an abstraction of the entire computer
•A way to manage computers that must be able to run programs designed for multiple operating systems or different versions of the same operating system.
1.10Summary
A computer system consists of hardware and systems software that cooperate to run application programs. Information inside the computer is represented as groups of bits that are interpreted in different ways, depending on the context. Programs are translated by other programs into different forms, beginning as ASCII text and then translated by compilers and linkers into binary executable files.
Processors read and interpret binary instructions that are stored in main memory. Since computers spend most of their time copying data between memory, I/O devices, and the CPU registers, the storage devices in a system are arranged in a hierarchy, with the CPU registers at the top, followed by multiple levels of hardware cache memories, DRAM main memory, and disk storage. Storage devices that are higher in the hierarchy are faster and more costly per bit than those lower in the hierarchy. Storage devices that are higher in the hierarchy serve as caches for devices that are lower in the hierarchy. Programmers can optimize the performance of their C programs by understanding and exploiting the memory hierarchy.
The operating system kernel serves as an intermediary between the application and the hardware. It provides three fundamental abstractions: (1) Files are abstractions for I/O devices. (2) Virtual memory is an abstraction for both main memory and disks. (3) Processes are abstractions for the processor, main memory, and I/O devices.
Finally, networks provide ways for computer systems to communicate with one another. From the viewpoint of a particular system, the network is just another I/O device.
Part IProgram Structure and Execution
How application programs are represented and executed.
Chapter 2Representing and Manipulating Information
Bits
•Computers store and process information represented as two-valued signals.
•Bits form the basis of the digital revolution.
•The decimal, or base-10, representation: natural for humans. Binary values: when building machines that store and process information.
•Two-valued signals can readily be represented, stored, and transmitted
Encodings
•Group bits together and apply some interpretation and represent the elements of any finite set.
•The 3 most important representations of numbers.
•Unsigned encodings
•Based on traditional binary notation
•Represent numbers ≥ 0
•Two's-complement encodings
•Represent signed integers
•Either positive or negative
•Floating-point encodings
•A base-2 version of scientific notation for representing real numbers
•Computers implement arithmetic operations with them.
•Some operations can overflow when the results are too large to be represented.
•The different mathematical properties of integer versus floating-point arithmetic:
•Integer computer arithmetic satisfies many of the familiar properties of true integer arithmetic.
•Floating-point arithmetic has altogether different mathematical properties.
•Stem from the difference in how they handle the finiteness of their representations—
•Integer representations encode a comparatively small range of values precisely
•Floating-point representations encode a wide range of values approximately
•A number of computer security vulnerabilities have arisen due to some of the subtleties of computer arithmetic.
•Computers use several different binary representations to encode numeric values.
2.1Information Storage
Computers use bytes (blocks of 8 bits) as the smallest addressable unit of memory.
A machine-level program views memory as a very large array of bytes — virtual memory.
Every byte of memory is identified by a unique number — its address.
The virtual address space: the set of all possible addresses. Just a conceptual image presented to the machine-level program.
Program objects: program data, instructions, and control information. The management of the storage is all performed within the virtual address space. Example: The value of a pointer in C is the virtual address of the first byte of some block of storage. Type information also is associated with each pointer.
2.1.1Hexadecimal Notation
A Byte in Different Notations
•= 8 bits.
•Binary
•000000002 to 111111112.
•Decimal
•010 to 25510.
•Hexadecimal (or "hex", base-16):
•very convenient for describing bit patterns.
•Use '0' through '9' along with 'A' through 'F' to represent 16 possible values
•0016 to FF16.
•In C, starting with 0x or 0X.
•Example: write FA1D37B16 as 0xFA1D37B, as 0xfa1d37b, or even mixing
Manually converting between decimal, binary, and hexadecimal representations of bit patterns:
•Converting between binary and hexadecimal: straightforward.
•Convert binary to hexadecimal. Example: 0x173A4C
•Convert binary to hexadecimal. Note: if the total number of bits is not a multiple of 4, make the leftmost group be the one with fewer than 4 bits, effectively padding the number with leading 0s. Example: 1111001010110110110011.
When x = 2n for some nonnegative integer n, the binary representation of x is simply 1 followed by n 0s. The hexadecimal 0 represents 4 binary 0s. So, for n = i + 4j, where 0 ≤ i ≤ 3, write x with a leading 2i, followed by j 0s. An example: x = 2,048 = 211, n = 11= 3 + 4 · 2, 0x800.
•Converting between decimal and hexadecimal: multiplication or division.
•Convert decimal to hexadecimal. To convert a decimal number x to hexadecimal, repeatedly x = q · 16 + r. Use r as the least significant digit and generate the remaining digits by repeating the process on q. Example: decimal 314,156:
0x4CB2C.
•Convert hexadecimal to decimal. Multiply each of the hexadecimal digits by the appropriate power of 16. Example: 0x7AF, 7 · 162 + 10 · 16 + 15 = 7 · 256 + 10 · 16 + 15 = 1,967.
2.1.2Data Sizes
Every computer has a word size (the nominal size of pointer data). w-bit: the virtual addresses range from 0 to 2w − 1.
A widespread shift from with 32-bit machines to 64-bit ones (a virtual address space of 16 exabytes).
32-bit programs vs 64-bit programs: the distinction lies in how a program is compiled, rather than the type of machine on which it runs.
Computers and compilers support multiple data formats using different ways to encode data.
The C language supports multiple data formats for both integer and floating point data.
•Integer data:
•Signed: negative, zero, and positive values.
•Unsigned: nonnegative values.
•char: a single byte. Can also be used to store integer values.
•short, int, and long: provide a range of sizes.
•A pointer (e.g., char *): uses the full word size.
•Two different floating-point formats:
•float: single precision, 4 bytes
•double: double precision, 8 bytes
Fixed-size integer types: int32_t and int64_t. Use them is the best way for programmers to have close control over data representations.
Most of the data types encode signed values, unless prefixed by unsigned or using the specific unsigned declaration for fixed-size data types. The exception: char. The C standard does not guarantee these signed data. Use signed char to guarantee a 1-byte signed value. In many contexts, however, insensitive.
The C language allows a variety of ways to order the keywords and to include or omit optional keywords.
One aspect of portability is to make the program insensitive to the exact sizes of the different data types. The C standards set lower bounds on the numeric ranges of the different data types, but there are no upper bounds (except with the fixed-size types).
2.1.3Addressing and Byte Ordering
Must establish two conventions for multi-byte objects: addressing and byte ordering.
Addressing
•A multi-byte object is stored as a contiguous sequence of bytes. Address: the smallest address of the bytes used.
•Example: intx at 0x100. (Assuming 32-bit) The 4 bytes would be stored in 0x100, 0x101, 0x102, and 0x103.
Byte Ordering
•Two common conventions.
A w-bit integer [xw −1, xw −2, . . . , x1, x0], xw−1: the most significant bit, x0: the least.
Assuming w: a multiple of 8. The most significant byte: [xw−1, xw−2, …, xw−8], the least significant byte: [x7, x6, …, x0] and the other bytes: bits from the middle.
•Little endian: the least significant byte comes first
•Big endian: the most significant byte comes first
•Example: intx at 0x100: 0x01234567. The ordering depends on the type of machine:
Note: The high-order byte is 0x01, while the low-order byte is 0x67.
•Machines:
•Little-endian: most Intel-compatible machines, machines that use Intel-compatible processors manufactured by IBM or Oracle, Android, iOS.
•Big-endian: most machines from IBM and Oracle
•Bi-endian: ARM microprocessors
•At times, it becomes an issue.
•When binary data are communicated over a network between different machines.
•When looking at the byte sequences representing integer data.
•Often when inspecting machine-level programs
•4004d3: 01 05 43 0b 20 00 add %eax,0x200b43(%rip)
generated by a disassembler
•Disassembler: a tool that determines the instruction sequence represented by an executable program file
•Add 0x200b43 to the current value of the program counter
•Having bytes appear in reverse order is common when reading machine-level program representations generated for little-endian machines
•When programs are written that circumvent the normal type system.
•In C, a cast or a union
•size_t: the preferred data type for expressing the sizes of data structures
•sizeof(T)returns the number of bytes required to store an object of type T.
•To write portable code
•The different machine/operating system configurations use different conventions for storage allocation
2.1.4Representing Strings
A string in C
•Encoded by an array of characters
•Terminated by the null (having value 0) character.
•Each character represented by the ASCII character code.
•The ASCII code for decimal digit x happens to be 0x3x.
•The terminating byte = 0x00
•Independent of conventions
2.1.5Representing Code
Binary code is seldom portable across different combinations of machine and operating system.
A fundamental concept: a program is simply a sequence of bytes.
2.1.6Introduction to Boolean Algebra
Boolean algebra.
•The work of George Boole
•Encode true and false as 1 and 0
Boolean Operations
The simplest Boolean algebra: defined over {0, 1}.
•The 4 Boolean operations
Boolean operation |
Corresponding logical operation |
|
~ |
not |
¬ |
& |
and |
∧ |
| |
or |
∨ |
^ |
exclusive-or |
⊕ |
Boolean Operations over Bit Vectors
Extending the 4 Boolean operations to bit vectors(strings of 0's and 1's of fixed length w).
•Defined according to applications to the matching elements.
•Examples:
•Application: to represent finite sets.
•Encode A ⊆ {0, 1, …, w− 1} with [aw−1, …, a1, a0], ai = 1 if and only if i ∈ A.
•Example: a = [01101001] encodes A = {0, 3, 5, 6}, b = [01010101] encodes B = {0, 2, 4, 6}.
Boolean operation |
Corresponding set operation |
| |
Set union |
& |
Set intersection |
~ |
Set complement |
•Continuing the example: a & b yields [01000001], A ∩ B = {0, 6}.
•Practical applications example: There are a number of different signals that can interrupt the execution of a program. Selectively enable or disable different signals by specifying a bit-vector mask, where a 1 in bit position i indicates that signal i is enabled and a 0 indicates that it is disabled. Thus, the mask represents the set of enabled signals.
2.1.7Bit-Level Operations in C
The symbols for the Boolean operations can be applied to any "integral" data type.
•Examples: expression evaluation for char:
•Evaluating bit-level expression: (1) Expand hexadecimal to binary, (2) perform the operations, and (3) convert back to hexadecimal.
•Use: to implement masking operations
•A mask: a bit pattern that indicates a selected set of bits within a word.
•Example:
•x & 0xFF: the least significant byte of x and 0's (all others).
•Example: x = 0x89ABCDEF, 0x000000EF.
•~0: all 1's
•can be written 0xFFFFFFFF when int is 32 bits, not as portable.
2.1.8Logical Operations in C
Logical operator |
Corresponding logic operation |
|| |
or |
&& |
and |
! |
not |
•Any nonzero argument as true, argument 0 as false
•Return either 1 or 0, indicating a result of either true or false
•A bitwise operation matches that of its logical counterpart only when the arguments are restricted to 0 or 1.
•The logical operators do not evaluate the second argument if the result of can be determined by evaluating the first.
•Examples: a && 5/a, p && *p++.
2.1.9Shift Operations in C
Shift operations shift bit patterns to the left and to the right.
•x: [xw−1, xw−2, …, x0]
•Left shift operation: x << k
•[xw−k−1, xw−k−2, …, x0, 0, …, 0]
•x is shifted k bits to the left, dropping off the k most significant bits and filling the right end with k 0's.
•0 ≤ k ≤ w − 1
•Associate
•Right shift operation: x >> k, 2 forms
•Logical.
•The left end is filled with k 0's
•[0, …, 0, xw−1, xw−2, …, xk]
•Arithmetic.
•The left end is filled with k repetitions of the most significant bit
•[xw−1, …, xw−1, xw−1, xw−2, …, xk]
•Useful for operating on signed integer data.
•Example (the italicized digits fill the ends):
•Not precisely defined by the C standards
•In practice, arithmetic for signed data, logical (must) for unsigned data.
•The definition in Java: x >> k arithmetically, x >>> k logically
2.2Integer Representations
2.2.1Integral Data Types
Integral data types represent finite ranges of integers.
•Size: char, short, long
•All nonnegative or possibly negative: unsigned or the default
•The only machine-dependent: long (8-byte with 64-bit, 4-byte with 32-bit)
•Asymmetric
•≤ the typical
•Symmetric except the fixed-size data types
•int could be 2-byte (mostly for 16-bit), long can be 4-byte (typically for 32-bit)
•The fixed-size data types
•Ranges = those of typical numbers
•Asymmetric
2.2.2Unsigned Encodings
Consider an integer data type of w bits.
•A bit vector is written as:
•, to denote the entire vector
•[xw−1, xw−2, ..., x0], to denote the individual bits within the vector
•The unsigned interpretation of : treated as binary.
•xi = 0 or 1 (2i is part of the value).
principle: Definition of unsigned encoding
For vector = [xw−1, xw−2, ..., x0]:
B2Uw() ≐ xi2i
•≐: the left-hand side is defined to be equal to the right-hand side.
•Examples:
•UMaxw ≐ 2i =2w − 1
•A mapping: B2Uw: {0, 1}w→{0, … , UMaxw}.
principle: Uniqueness of unsigned encoding
Function B2Uw is a bijection.
•Bijection: a function f that goes two ways: it maps a value x to a value y where y = f (x), but it can also operate in reverse, since for every y, there is a unique value x such that f (x) = y, which is given by the inverse function x = f−1(y).
•U2Bw: the inverse of B2Uw
2.2.3Two's-Complement Encodings
Two's-complement form: the most common representation of signed numbers
principle: Definition of two's-complement encoding
For vector = [xw−1, xw−2, ..., x0]:
B2Tw() ≐ −xw−12w−1 +xi2i
The sign bit: xw−1
•"Weight": −2w−1
•1: negative; 0: nonnegative.
•TMinw≐ −2w−1, TMaxw≐ 2i =2w−1 − 1.
•A mapping: B2Tw: {0, 1}w →{TMinw, …, TMaxw}.
principle: Uniqueness of two's-complement encoding
Function B2Tw is a bijection.
•T2Bw: the inverse of B2Tw
Drop w: UMax, TMin, and TMax
Points worth highlighting:
•Asymmetric Range: |TMin| = |TMax| + 1
•UMax = 2TMax + 1
•−1: same representation as UMax
•0: a string of all 0's in both
Two's-complement in languages:
•Not required by C. <limits.h> defines a set of constants delimiting the ranges of the different integer data types for the particular machine.
•Example: for a two's-complement machine, INT_MAX = TMaxw, INT_MIN = TMinw and UINT_MAX = UMaxw.
•Required by Java.
Example to get a better understanding:
2.2.4Conversions between Signed and Unsigned
Casting example:
•int x, unsigned u
•(unsigned) x converts x to unsigned, (int) u converts u to int.
A general rule of handling conversions between signed and unsigned numbers with the same word size—the numeric values might change, but the bit patterns do not.
U2Bw and T2Bw
•U2Bw
•0 ≤ x ≤ UMaxw, U2Bw(x)
•Unique unsigned
•T2Bw
•TMinw ≤ x ≤ TMaxw, T2Bw(x)
•Unique two's-complement
T2Uw and U2Tw
•T2Uw
•TMinw ≤ x ≤ TMaxw, T2Uw(x) ≐ B2Uw(T2Bw(x))
•0 ≤ T2Uw(x) ≤ UMaxw, same representation
principle: Conversion from two's complement to unsigned
For x such that TMinw ≤ x ≤ TMaxw:
T2Uw(x) =
derivation: Conversion from two's complement to unsigned
B2Uw(T2Bw(x)) = T2Uw(x) = x + xw−12w
In a two's-complement representation of x, bit xw−1 determines whether or not x is negative.
•Examples:
•The behavior of T2U:
•< 0: converted to large positive
•≥ 0: unchanged
•U2Tw(u)
•U2Tw 0 ≤ x ≤ UMaxw, U2Tw(x) ≐ B2Tw (U2B w(x))
•0 ≤ U2T w(x) ≤ UMaxw, same representation
principle: Unsigned to two's-complement conversion
For u such that 0 ≤ u ≤ UMaxw:
U2Tw(u) =
derivation: Unsigned to two's-complement conversion
U2Tw(u) = −uw−12w + u
In the unsigned representation of u, bit uw−1 determines whether or not u is greater than TMaxw
= 2w−1 − 1.
•The behavior of U2T:
•≤ TMaxw, unchanged
•> TMaxw, converted to negative
Summary
The effects of converting in both directions:
•0 ≤ x ≤ TMaxw
•T2Uw(x) = x, U2Tw(x) = x, identical
•Outside
•+/− 2w
•2 extremes:
•T2Uw (−1) = UMaxw
•T2Uw(TMinw) = TMaxw + 1
2.2.5Signed versus Unsigned in C
Numbers
•Signed: by default
•Example: 12345 or 0x1A2B
•Unsigned: adding 'U' or 'u' as a suffix
•Example: 12345U or 0x1A2Bu
Conversion between:
•Allowed but not specified. Mostly U2Tw and T2Uw.
•Explicit casting and implicit casting (when an expression of one type is assigned to a variable of another)
•printf does not use type information.
•Possibly nonintuitive behavior:
2.2.6Expanding the Bit Representation of a Number
One common operation: to convert between integers having different word sizes while retaining the same numeric value. Converting from a smaller to a larger data type should always be possible.
Converting Unsigned to Larger
Zero extension: adding leading zeros to the representation.
principle: Expansion of an unsigned number by zero extension
Define bit vectors = [uw−1, uw−2, …, u0] of width w and ′ = [0, …, 0, uw−1, uw−2, …, u0] of width w, where w′>w. Then B2Uw′() = B2Uw(′).
Converting Two's-complement to Larger
Sign extension: adding copies of the most significant bit to the representation.
principle: Expansion of a two's-complement number by sign extension
Define bit vectors = [xw−1, xw−2, …, x0] of width w and ′ = [xw−1, …, xw−1, xw−1, xw−2, …, x0] of width w, where w′>w. Then B2Tw (x) = B2Tw′(x).
•The value is preserved:
derivation: Expansion of a two's-complement number by sign extension
•The relative order of conversion: The program first changes the size and then the type.
2.2.7Truncating Numbers
Truncating = [xw−1, xw−2, …, x0] to k-bit: drop the high-order w − k bits, ′ = [xk−1, xk−2, …, x0]. Truncating a number can alter its value—a form of overflow.
Truncating Unsigned
principle: Truncation of an unsigned number
Let be the bit vector [xw−1, xw−2, …, x0], and let ′ be the result of truncating it to k bits: ′ = [xk−1, xk−2, …, x0]. Let x = B2Uw() and x′ = B2Uk(′). Then x = x′ mod 2k.
Truncating Two's-complement
principle: Truncation of a two's-complement number
Let be the bit vector [xw−1, xw−2, …, x0], and let ′ be the result of truncating it to k bits: ′ = [xk−1, xk−2, …, x0]. Let x = B2Uw() and x′ = B2Tk(′). Then x = U2Tk(x mod 2k).
derivation: Truncation of a two's-complement number
B2Tw ([xw−1, xw−2, …, x0]) mod 2k = B2Uk ([xk−1, xk−2, …, x0])
Summary
The effect of truncation
•For unsigned:
B2Uk([xk−1, xk−2, …, x0]) = B2Uw([xw−1, xw−2, …, x0]) mod 2k
•For two's-complement:
B2Tk([xk−1, xk−2, …, x0]) = U2Tk(B2Uw ([xw−1, xw−2, …, x0]) mod 2k)
2.2.8Advice on Signed versus Unsigned
The implicit conversion can lead to errors or vulnerabilities.
•One way to avoid: to never use unsigned.
•Example: Java.
Unsigned values are useful
•When words = collections of bits. Example:
•Packing a word with flags describing various Boolean conditions
•Addresses (naturally unsigned)
•When implementing mathematical packages for modular / multiprecision arithmetic.
2.3Integer Arithmetic
2.3.1Unsigned Addition
"Word size inflation":
•Some programming languages support arbitrary size arithmetic;
More commonly, programming languages support fixed-size arithmetic.
x y for arguments x and y, where 0 ≤ x, y < 2w: the result of truncating the integer sum x + y to be w bits long, viewed as an unsigned number.
•Characterized as a form of modular arithmetic: discarding any bits with weight > 2w−1
principle: Unsigned addition
For x and y such that 0 ≤ x, y < 2w:
x y =
•Illustration:
Overflow: an arithmetic operation whose integer result cannot fit within the word size limits of the data type.
•Occurs when the two operands sum to 2w or more
•Not signaled as errors
principle: Detecting overflow of unsigned addition
For x and y in the range 0 ≤ x, y ≤ UMaxw, let s ≐ x y. Then the computation of s overflowed if and only if s < x (or equivalently, s < y).
Modular addition forms an abelian group (a mathematical structure).
•Commutative and associative
•The identity element: 0; every element has an additive inverse
•Value x for every value x: x x = 0
principle: Unsigned negation
For any number x such that 0 ≤ x < 2w, its w-bit unsigned negation x is given by the following:
x =
2.3.2Two's-Complement Addition
x y, given integer values x and y where −2w−1 ≤ x, y ≤ 2w−1 − 1: the result of truncating the integer sum x + y to w bits, viewed as a two's-complement number.
principle: Two's-complement addition
For integer values x and y in the range −2w−1 ≤ x, y ≤ 2w−1 − 1:
•Illustration:
•Positive overflow: x + y exceeds TMaxw (case 4)
•Negative overflow: x + y is less than TMinw (case 1)
•Has the same bit-level representation as the unsigned sum
•Examples:
principle: Detecting overflow in two's-complement addition
For x and y in the range TMinw ≤ x, y ≤ TMaxw, let s ≐ x y. Then the computation of s has had positive overflow if and only if x > 0 and y > 0 but s ≤ 0. The computation has had negative overflow if and only if x < 0 and y <0 but s ≥ 0.
•Illustrations:
2.3.3Two's-Complement Negation
x: the additive inverse under
principle: Two's-complement negation
For x in the range TMinw ≤ x ≤ TMaxw, its two's-complement negation x is given by the formula
x =
2.3.4Unsigned Multiplication
x y for integers x and y where 0 ≤ x, y ≤ 2w − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as an unsigned number.
principle: Unsigned multiplication
For x and y such that 0 ≤ x, y ≤ UMaxw:
x y = (x · y) mod 2w
2.3.5Two's-Complement Multiplication
x y for integers x and y where −2w-1 ≤ x, y ≤ 2w-1 − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as a two's-complement number.
principle: Two's-complement multiplication
For x and y such that TMinw ≤ x, y ≤ TMaxw:
x y = U2Tw((x · y) mod 2w)
principle: Bit-level equivalence of unsigned and two's-complement multiplication
Let and be bit vectors of length w. Define integers x and y as the values represented by these bits in two's-complement form: x = B2Tw(x) and y = B2Tw(y).Define nonnegative integers x′ and y′ as the values represented by these bits in unsigned form: x = B2Uw(x) and y = B2Uw(y). Then
T2Bw(x y) = U2Bw(x′ y′)
•Illustrations:
2.3.6Multiplying by Constants
Integer multiply is slow. Optimization: to replace multiplications by constants with combinations of shift and addition operations.
principle: Multiplication by a power of 2
Let x be the unsigned integer represented by bit pattern [xw−1, xw−2, …, x0]. Then for any k ≥ 0, the w + k-bit unsigned representation of x2k is given by [xw−1, xw−2, …, x0, 0, …, 0], where k zeros have been added to the right.
principle: Unsigned multiplication by a power of 2
For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x << k yields the value x 2k.
principle: Two's-complement multiplication by a power of 2
For C variables x and k with two's-complement value x and unsigned value k, such that 0 ≤ k < w, the C expression x << k yields the value x 2k.
The task of generating code for the expression x * K, for some constant K.
•K: [(0…0) (1…1) (0…0) . . . (1…1)].
•Compute a run of 1's from bit position n down to bit position m (n ≥ m) using either form:
•Form A: (x<<n) + (x<<(n − 1)) + . . . + (x<<m)
•Form B: (x<<(n + 1)) - (x<<m)
•Compute x * K by adding together the results for each run.
2.3.7Dividing by Powers of 2
Performed using a right shift:
•Logical: unsigned
•Arithmetic: two's-complement
Integer division always rounds toward 0. Some notation for any real number a:
•⌊a⌋: the unique integer a′ such that a′ ≤ a < a′ + 1.
•⌈a⌉: the unique integer a′ such that a′ − 1< a′ ≤ a′.
Dividing by a Power of 2 with Unsigned Arithmetic
principle: Unsigned division by a power of 2
For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x >> k yields the value ⌊x/2k⌋.
•Examples:
Dividing by a Power of 2 with Two's-complement Arithmetic
Using an arithmetic right shift:
principle: Two's-complement division by a power of 2, rounding down
Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression x >> k, when the shift is performed arithmetically, yields the value ⌊x/2k⌋.
•Examples:
Correcting for the improper rounding that occurs when a negative number is shifted right by "biasing" the value before shifting.
principle: Two's-complement division by a power of 2, rounding up
Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression (x + (1 << k) - 1) >> k, when the shift is performed arithmetically, yields the value ⌈x/2k⌉.
•Demonstration:
(x<0 ? x+(1<<k)-1 : x) >> k will compute x/2k.
2.3.8Final Thoughts on Integer Arithmetic
2.4Floating Point
2.4.1Fractional Binary Numbers
2.4.2IEEE Floating-Point Representation
2.4.3Example Numbers
2.4.4Rounding
2.4.5Floating-Point Operations
2.4.6Floating Point in C
2.5Summary
Computers encode information as bits, generally organized as sequences of bytes. Different encodings are used for representing integers, real numbers, and character strings. Different models of computers use different conventions for encoding numbers and for ordering the bytes within multi-byte data.
The C language is designed to accommodate a wide range of different implementations in terms of word sizes and numeric encodings. Machines with 64-bit word sizes have become increasingly common, replacing the 32-bit machines that dominated the market for around 30 years. Because 64-bit machines can also run programs compiled for 32-bit machines, we have focused on the distinction between 32- and 64-bit programs, rather than machines. The advantage of 64-bit programs is that they can go beyond the 4 GB address limitation of 32-bit programs.
Most machines encode signed numbers using a two's-complement representation and encode floating-point numbers using IEEE Standard 754. Understanding these encodings at the bit level, as well as understanding the mathematical characteristics of the arithmetic operations, is important for writing programs that operate correctly over the full range of numeric values.
When casting between signed and unsigned integers of the same size, most C implementations follow the convention that the underlying bit pattern does not change. On a two's-complement machine, this behavior is characterized by functions T2Uw and U2Tw, for a w-bit value. The implicit casting of C gives results that many programmers do not anticipate, often leading to program bugs.
Due to the finite lengths of the encodings, computer arithmetic has properties quite different from conventional integer and real arithmetic. The finite length can cause numbers to overflow, when they exceed the range of the representation. Floating-point values can also underflow, when they are so close to 0.0 that they are changed to zero.
The finite integer arithmetic implemented by C, as well as most other programming languages, has some peculiar properties compared to true integer arithmetic. For example, the expression x*x can evaluate to a negative number due to overflow. Nonetheless, both unsigned and two's-complement arithmetic satisfy many of the other properties of integer arithmetic, including associativity, commutativity, and distributivity. This allows compilers to do many optimizations. For example, in replacing the expression 7*x by (x<<3)-x, we make use of the associative, commutative, and distributive properties, along with the relationship between shifting and multiplying by powers of 2.
We have seen several clever ways to exploit combinations of bit-level operations and arithmetic operations. For example, we saw that with two's-complement arithmetic, ~x+1 is equivalent to -x. As another example, suppose we want a bit pattern of the form [0, …, 0, 1, …, 1], consisting of w − k zeros followed by k ones. Such bit patterns are useful for masking operations. This pattern can be generated by the C expression (1<<k)-1, exploiting the property that the desired bit pattern has numeric value 2k − 1. For example, the expression (1<<8)-1 will generate the bit pattern 0xFF.
Floating-point representations approximate real numbers by encoding numbers of the form x × 2y. IEEE Standard 754 provides for several different precisions, with the most common being single (32 bits) and double (64 bits). IEEE floating point also has representations for special values representing plus and minus infinity, as well as not-a-number.
Floating-point arithmetic must be used very carefully, because it has only limited range and precision, and because it does not obey common mathematical properties such as associativity.
Chapter 3Machine-Level Representation of Programs
Machine Code and Assembly Code
•Computers execute machine code.
•Machine code: sequences of bytes encoding the low-level operations that manipulate data, manage memory, read and write data on storage devices, and communicate over networks.
•A compiler generates machine code through a series of stages, based on the rules of the programming language, the instruction set of the target machine, and the conventions followed by the operating system.
•The gcc C compiler
•Generates its output in the form of assembly code.
•Assembly code: a textual representation of the machine code giving the individual instructions in the program.
•Then invokes both an assembler and a linker to generate the executable machine code from the assembly code.
A High-level Language versus Low-level instructions
A high-level language |
Low-level instructions |
Shields programmers from the detailed machine-level implementation. Much more productive and reliable. |
Must be specified by a programmer. |
A program can be compiled and executed on a number of different machines. |
Assembly code is highly machine specific. |
The Importance of Learning Machine Code
•Code optimization.
•Understanding the run-time behavior of a program.
•Concurrent programming.
•Guarding against attacks.
The Relation between Source Code and the Generated Assembly
Understanding the relation between source code and the generated assembly: a form of reverse engineering.
•Reverse engineering: trying to understand the process by which a system was created by studying the system and working backward.
•The system is a machine-generated assembly language program.
x86-64
•The machine language for most processors in laptop and desktop machines, data centers and supercomputers.
•Started with Intel's 16 bits, expanded to 32 bits, and most recently to 64 bits.
•Its rival: Advanced Micro Devices (AMD).
The Transition from 32-bit to 64-bit Machines
•A 32-bit machine:
•Can only use around 4 gigabytes (232bytes) of RAM.
•Current 64-bit machines:
•Can use up to 256 terabytes (248 bytes)
•Could readily be extended to use up to 16 exabytes (264 bytes)
3.1A Historical Perspective
The Intel processor line (x86) has followed a long evolutionary development.
Some models of Intel processors and some of their key features:
•8086
•One of the first 16-bit microprocessors.
•A variant 8088: IBM PCs and MS-DOS.
•i386
•Expanded the architecture to 32 bits.
•Added the flat addressing model. The first to fully support Unix.
•PentiumPro
•Introduced P6 microarchitecture (a radically new processor design).
•Pentium 4E
•Added hyperthreading and EM64T.
•Hyperthreading: a method to run two programs simultaneously on a single processor.
•EM64T(x86-64): Intel's implementation of a 64-bit extension to IA32 developed by Advanced Micro Devices (AMD).
•Core 2.
•First multi-core Intel microprocessor.
•Multi-core processor: multiple processors are implemented on a single chip.
•Core i7, Nehalem.
•Incorporated both hyperthreading and multi-core, with the initial version supporting two executing programs on each core and up to four cores on each chip.
Backward compatible: able to run code compiled for any earlier version.
Intel's names for their processor line:
•IA32: "Intel Architecture 32-bit"
•Intel64 (x86-64): the 64-bit extension to IA32
•"x86" (colloquial): the overall line
Advanced Micro Devices (AMD) have produced Intel-compatible processors. Introduced x86-64.
3.2Program Encodings
Suppose C program: p1.c and p2.c. Compiling using a Unix command line:
•The command gcc: the gcc C compiler.
•Since default on Linux, also cc
•The command-line option -Og1: a level of optimization.
•Higher levels of optimization: -O1, -O2.
•The command-line directive -o p: p: the final executable code file
The gcc command invokes an entire sequence of programs to turn the source
code into executable code.
•First, the C preprocessor expands the source code
•To include any files specified with #include commands
•To expand any macros, specified with #define declarations
•Second, the compiler generates assembly code versions of the two source file p1.s and p2.s.
•Next, the assembler converts the assembly code into binary object-code files p1.o and p2.o.
•Object code: One form of machine code
•Contains binary representations of all of the instructions, but the addresses of global values are not yet filled in.
•Finally, the linker merges these two object-code files along with code implementing library functions (e.g., printf) and generates the final executable code file p.
•Executable code: the second form of machine code
•The exact form of code that is executed by the processor.
3.2.1Machine-Level Code
Computer systems employ several different forms of abstraction, hiding details of an implementation through the use of a simpler abstract model. Two important forms of abstraction for machine-level programming:
•The instruction set architecture, or ISA defines the format and behavior of a machine-level program.
•Defines the processor state, The format of the instructions, and the effect each of these instructions will have on the state.
•Most ISAs describe the behavior of a program as if each instruction is executed in sequence.
•The processor hardware: Executes instructions concurrently.
•Virtual addresses: the memory addresses used by a machine-level program.
•Providing a memory model that appears to be a very large byte array.
•The actual implementation of the memory system: A combination of multiple hardware memories and operating system software
The compiler does most of the work in the overall compilation sequence, transforming programs into instructions. The main feature of the assembly-code representation: in a more readable textual format.
Visible parts of the x86-64 processor state:
•The program counter (the PC, %rip in x86-64) indicates the address in memory of the next instruction to be executed.
•The integer register file contains 16 registers.
•Registers: named locations storing 64-bit values.
•Hold addresses or integer data.
•Some keep track of critical parts of the program state. Others hold temporary data.
•The condition code registers hold status information about the most recently executed arithmetic or logical instruction.
•Implement conditional changes in the control or data flow.
•A set of vector registers can each hold one or more values.
Machine code views the memory as a large byte-addressable array.
•Aggregate data types: contiguous collections of bytes.
•Scalar data types: no distinctions.
The program memory
•Contains
•The executable machine code for the program
•Some information required by the operating system
•A run-time stack for managing procedure calls and returns
•Blocks of memory allocated by the user (e.g., malloc).
•Addressed using virtual addresses.
•Only limited subranges are valid.
•The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.
A single machine instruction performs only a very elementary operation.
3.2.2Code Examples
Suppose mstore.c:
Generating mstore.s
The assembly-code file:
•Each indented line: a single machine instruction.
•All information about local variable names or data types has been stripped away.
Generating mstore.o
The object-code file:
•In binary format
•The hexadecimal representation:
•A key lesson: the program executed by the machine is simply a sequence of bytes encoding a series of instructions.
Generating prog
Requires running a linker on the set of object-code files, one of which must contain main.
•Suppose main.c:
•Contains not just the machine code for the procedures but also code used to start and terminate the program as well as to interact with the operating system.
Disassembling mstore.o and prog
Disassemblers:
•To inspect the contents of machine-code files
•Generates a format similar to assembly code from the machine code.
•With Linux systems, objdump (for "object dump") given -d:
•The result:
Features about machine code and its disassembled representation:
•x86-64 instructions can range in length from 1 to 15 bytes.
•Commonly used instructions and those with fewer operands require a smaller number of bytes
•The instruction format: from a given starting position, there is a unique decoding of the bytes into machine instructions.
•Example: pushq %rbx: 53.
•The disassembler determines the assembly code based purely on the byte sequences in the machine-code file.
•The disassembler uses a slightly different naming convention for the instructions than does the assembly code generated by gcc.
•Example: the omissions or additions of the suffix 'q'.
Disassembling prog:
Extract various code sequences:
•Almost identical to that generated by the disassembly of mstore.o.
•Differences:
•The addresses—the linker has shifted the location of this code to a different range of addresses.
•The linker has filled in the address that callq should use in calling mult2.
•One task for the linker: to match function calls with the locations of the executable code for those functions.
•Two additional lines of code.
•No effect on the program. Memory system performance.
3.2.3Notes on Formatting
•Generates mstore.s
•The full content:
•Lines beginning with '.': directives
•A clearer presentation:
3.3Data Formats
"Word": 2 bytes
•"Double words": 4 bytes
•"Quad words": 8 bytes
3.4Accessing Information
An x86-64 central processing unit (CPU) contains a set of 16 general-purpose registers storing 64-bit values.
•General-purpose registers store integer data and pointers.
2 conventions for instructions for copying and generating values, having registers as destinations:
Number of bytes generated |
The remaining bytes |
1 or 2 |
Unchanged |
4 |
0 |
Different registers serve different roles in typical programs.
•Most unique: %rsp
•Used to indicate the end position in the run-time stack
•Specifically read and written by some instructions
•The other 15 registers: more flexible
•A small number of instructions make specific use of certain registers.
•A set of standard programming conventions governs how the registers are to be used for managing the stack, passing function arguments, returning values from functions, and storing local and temporary data.
3.4.1Operand Specifiers
Operands: the source values to use in performing an operation and the destination location into which to place the result.
Operand types:
•Immediate
•Constant values
•A '$' followed by an integer using C notion
•Example: $-577 or $0x1F.
•Different instructions allow different ranges of immediate values; the assembler will automatically select the most compact way of encoding a value.
•Register
•The contents of a register, one of the 16 low-order portions of the registers.
•ra: An arbitrary register a
•R[ra]: Viewing the set of registers as an array R indexed by register identifiers.
•A memory reference
•Memory location is accessed according to the effective address (a computed address)
•Mb[Addr]: A reference to the b-byte value stored in memory starting at address Addr
•Drop b.
•An addressing mode: a form of memory references.
•The most general form: Imm(rb,ri,s)
•4 components:
•An immediate offset Imm
•A base register rb (64-bit)
•An index register ri (64-bit)
•A scale factor s (1, 2, 4, or 8)
•The effective address = Imm + R[rb]+ R[ri]·s
•Often seen when referencing elements of arrays.
•The other forms: special cases.
•The more complex addressing modes are useful when referencing array and structure elements.
3.4.2Data Movement Instructions
Grouping different instructions into instruction classes: The instructions in a class perform the same operation but with different operand sizes.
Simple Data Movement Instructions
mov: Copy data from a source location to a destination location, without any transformation.
•Operands
Operand |
Description |
Type |
S |
Source |
Immediate / register / memory |
D |
Destination |
Register / Memory |
•Copying from one memory location to another
•mov Memory, Register
•mov Register, Memory
•Register operands for these instructions can be the labeled portions of any of the 16 registers
•Only update the specific register bytes or memory locations indicated by D except movl having D: a register.
•Convention: Any instruction that generates a 32-bit value for a register also sets the high-order portion of the register to 0.
•Examples showing 5 possible combinations:
movabsq: For dealing with 64-bit immediate
•movq: Only by sign-extension
•Operands:
Operand |
Description |
Type |
I |
Source |
Immediate (64-bit) |
R |
Destination |
Register |
Zero- and Sign-extending Data Movement Instructions
Copying a smaller source value to a larger destination
movz & movs: Zero extension and sign extension
•Operands
Operand |
Description |
Type |
S |
Source |
Register / memory |
R |
Destination |
Register |
•Final 2 characters of each instruction: Size designators
•The absence of explicit "movzlq".
•Instead, movl having D: a register
•Property: an instruction generating a 4-byte value with a register as the destination will fill the upper 4 bytes with zeros.
cltq: The same effect as movslq %eax, %rax.
3.4.3Data Movement Example
•C "Pointers" = addresses
•Dereferencing a pointer involves copying that pointer into a register, and then using this register in a memory reference.
•Local variables are often kept in registers rather than stored in memory locations.
•Register access is much faster than memory access.
3.4.4Pushing and Popping Stack Data
Push and pop instructions
•The stack data structure
•Discipline: "Last-in, first-out"
•Operations:
•Push: Add data to a stack
•Pop: Remove data
•Array implementation
•Insert and remove elements from top (one end of the array)
•The program stack
•Stored in some region of memory.
•The stack pointer %rsp holds the address of the top stack element.
•pushq: Push data
•Operand:
Operand |
Description |
Type |
S |
Source |
Register |
•Behavior: pushq %rbp is equivalent to
•popq: Pop data
•Operand:
Operand |
Description |
Type |
R |
Destination |
Register |
•Behavior: popq %rax is equivalent to
•The popped value remains until overwritten
•Arbitrary stack positions can be addressed
•Example: movq 8(%rsp),%rdx
3.5Arithmetic and Logical Operations
The x86-64 integer and logic operations.
•leaq (no other size variants) + instruction classes (having 4 size variants)
•4 groups:
•Load effective address
•Unary: 2 operands
•Binary: 1 operand
•Shifts
3.5.1Load Effective Address
leaq (the load effective address instruction): Copy the effective address to the destination
•Operands
Operand |
Description |
Type |
S |
Source |
Memory |
D |
Destination |
Register |
•Uses
•To generate pointers for later memory references.
•To compactly describe common arithmetic operations.
•Example: if %rdx = x, then leaq 7(%rdx,%rdx,4), %rax set %rax = 5x + 7.
•Clever uses by compilers.
•Illustration:
•C program:
•The arithmetic operations:
•The ability to perform addition and limited forms of multiplication proves useful when compiling simple arithmetic expressions.
3.5.2Unary and Binary Operations
Unary Operations
Unary operations:
•Operand:
Operand |
Description |
Type |
D |
Both source and destination |
Register / memory |
•Example: incq (%rsp)
Binary Operations
Binary operations:
•Operands:
Operand |
Description |
Type |
S |
Source |
Immediate / register / memory |
D |
Both source and destination |
Register / memory |
•Cannot both be memory
•When D is a memory location, the processor must read the value from memory, perform the operation, and then write the result back to memory.
•Example: subq %rax,%rdx
3.5.3Shift Operations
Shift operations:
•Operands
Operand |
Description |
Type |
k |
Shift amount |
Immediate / register (%cl) |
D |
Value to shift |
Register / memory |
•With x86-64, when D is w-bit, amount = the low-order m bits of %cl, where 2m = w.
•Example: When %cl = 0xFF, then
•salb would shift by 7
•salw would shift by 15
•sall would shift by 31
•salq would shift by 63
•Left shift
•sal and shl: Fill from the right with zeros.
•Right shift
•sar (arithmetic, >>A): Fill with copies of the sign bit
•shr (logical, >>L): Fill with zeros
3.5.4Discussion
Most instructions shown (except sar and shr) can be used for either unsigned or two's-complement arithmetic.
•Makes two's-complement arithmetic preferred to implement signed integer arithmetic.
Example:
•In general, compilers generate code that uses individual registers for multiple program values and moves program values among the registers.
3.5.5Special Arithmetic Operations
Operations involving 128-bit (16-byte) numbers:
•Oct word: A 16-byte quantity
Full Multiply Operations
mulq (for unsigned) and imulq (for two's-complement): Compute the full 128-bit product of two.
•Operand: 1-operand
Operand |
Description |
Type |
S |
Source |
Immediate / register / memory |
•Arguments: %rax and S.
•Product: %rdx (high-order 64 bits) and %rax (low-order 64 bits).
•2 forms of imulq:
•Another: Member of imul, 2-operand, implements and
•Tell by counting operand number
•mulq example:
•Declarations:
•uint64_t: In inttypes.h, C extension.
•__int128: Support provided by gcc
•Assembly code:
Division or Modulus Operations
The 1-operand divide instructions:
idivq: Signed division instruction
•Parts
•Dividend: %rdx (high-order 64 bits) and %rax (low-order 64 bits)
•Divisor: S
•Quotient: %rax
•Remainder: %rdx
•64-bit division:
•Dividend: %rax (64-bit)
•Set bits of %rdx =
•0's (unsigned arithmetic)
•Sign bit of %rax (signed arithmetic), using cqto
•cqto: Read the sign bit from %rax and copies it across all of %rdx
•No operands
•Illustration:
•Function:
•Assembly code:
divq: Unsigned division instruction.
•Set %rdx = 0 beforehand
3.6Control
Sequential and Conditional Behavior
•Sequential behavior:
•Straight-line code
•Instructions follow one another in sequence
•Conditional behavior:
•C control constructs
•Such as conditionals, loops, and switches
•Conditional execution
•The sequence of operations that get performed depends on the outcomes of tests applied to the data
•2 strategies for implementing conditional operations
•Conditional control transfers
•The execution order:
•Normally, sequential: Statements are in the order they appear in the program.
•Alternatively, a jump instruction: Control should pass to some other part of the program.
•Conditional data transfers
3.6.1Condition Codes
Condition code registers:
•Single-bit
•Describe attributes of the most recent arithmetic or logical operation
•Tested to perform conditional branches.
Most useful condition codes:
•CF: Carry flag.
•The most recent operation generated a carry out of the most significant bit.
•Used to detect overflow for unsigned operations.
•ZF: Zero flag.
•The most recent operation yielded zero.
•SF: Sign flag.
•The most recent operation yielded a negative value.
•OF: Overflow flag.
•The most recent operation caused a two's-complement overflow—either negative or positive.
•Example: add, t = a+b, integers
•Condition codes:
The setting of conditional codes by instructions:
•Integer arithmetic operations.
•leaq does not alter any; address computations
•The remaining ones
Operation type / instruction class |
CF |
OF |
Logical operations |
Set to 0 |
Set to 0 |
Shift operations |
Last bit shifted out |
Set to 0 |
inc and dec |
Set |
Unchanged |
•cmp and test: Set without altering any other registers
•cmp: Set the condition codes according to the differences of their two operands
•Behavior: sub without updating destinations
•ATT: Operands are in reverse order
•Flags:
•ZF: Set if S1 = S2
•The others: Determine ordering relation
•test:
•Behavior: and without altering destinations
•Operands:
•Typically, S1 = S2
•E.g., testq %rax,%rax: Whether %rax >, = or < 0
•Or one is a mask indicating which bits should be tested
3.6.2Accessing the Condition Codes
Using conditional codes:
•The set instructions
•The conditional jump instructions
•Conditional data transfers
set: Set a single byte to 0 or to 1 depending on some combination of the condition codes.
•The suffixes: Different conditions
•Operand:
Operand |
Description |
Type |
Length |
D |
Destination |
Register / memory |
1 byte |
•To generate a 32-bit or 64-bit result: Clear the high-order bits
•Typical instruction sequence to compute a < b (a, b: long)
•"Synonyms"
•Set condition codes according to the computation t = a-b
•Comparison tests
•Signed comparisons: Combinations of SF ^ OF and ZF
•Unsigned comparisons: Combinations of CF and ZF
How machine code does or does not distinguish between signed and unsigned values:
•It mostly uses the same instructions
•Some circumstances require different instructions
•Different versions of right shifts, division and multiplication instructions
•Different combinations of condition codes
3.6.3Jump Instructions
A jump instruction: Causes the execution to switch to a completely new position in the program.
•Jump destinations: Indicated in assembly code by a label.
•Example:
•
•In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.
•The different jump instructions
•
•jmp
•Unconditional
•Either direct or indirect
•A direct jump: The jump target is encoded as part of the instruction
•The jump target: A label
•Example: .L1
•An indirect jump: The jump target is read from a register or a memory location. Direct jumps are
•'*' followed by memory
•Examples:
•jmp *%rax uses %rax as the jump target
•jmp *(%rax) uses %rax as the read address
•The remaining: conditional—they either jump or continue executing at the next instruction in the code sequence, depending on some combination of the condition codes.
•The names and the conditions match those of set.
•"Synonyms"
•Can only be direct
3.6.4Jump Instruction Encodings
Jump encodings:
•PC-relative addressing.
•Encode the difference between the address of the target instruction and the address of the instruction immediately following the jump.
•These offsets can be encoded using 1, 2, or 4 bytes.
•Absolute addressing.
•Give an "absolute" address, using 4 bytes to directly specify the target.
•The assembler and linker select the appropriate encodings of the jump destinations.
•PC-relative addressing example: branch.c
•The assembly code:
•2 jumps: jmp, jg
•The disassembled version of .o
•PC = The address of the instruction following the jump
•The disassembled version of the program after linking:
The jump instructions provide a means to implement conditional execution (if), as well as several different loop constructs.
3.6.5Implementing Conditional Branches with Conditional Control
Implementing conditional branches
•Most general: Conditional control transfers
•Alternative: Conditional data transfers
Conditional control transfers:
•Example:
•The assembly implementation of if-else
•The general form of if-else in C
•The form of assembly implementation:
3.6.6Implementing Conditional Branches with Conditional Moves
Implementing conditional operations:
•Conventional: Conditional transfer of control
•The program follows one execution path when a condition holds and another when it does not.
•Simple and general, but very inefficient on modern processors.
•Alternate: Conditional transfer of data.
•Computes both outcomes of a conditional operation and then selects one based on whether or not the condition holds.
•Makes sense only in restricted cases
•Implemented by a simple conditional move instruction
•Better matched to the performance characteristics of modern processors.
Conditional control transfer
•Example:
The relative performance of using conditional data transfers versus conditional control transfers
•Processors achieve high performance through pipelining
•Pipelining: An instruction is processed via a sequence of stages, each operating concurrently
•E.g.,
•Fetching the instruction from memory
•Determining the instruction type
•Reading from memory
•Performing an arithmetic operation
•Writing to memory
•Updating the program counter
•Achieves high performance by overlapping the steps of the successive instructions
•Such as fetching one while performing the arithmetic operations for a previous one.
•Requires being able to determine the sequence well ahead of time in order to keep the pipeline full.
•When the machine encounters a conditional jump (a "branch"), it cannot determine until it has evaluated the branch condition.
•Branch prediction logic is employed.
•Guessing reliably: The pipeline will be full.
•Mispredicting a jump: The processor cancels and fetches.
•Misprediction performance penalty.
Conditional move instructions:
•Operands:
Operand |
Description |
Type |
Length |
S |
Source |
Register / memory |
16, 32, or 64 bits |
R |
Destination |
Register |
•The outcome: depends on the values of the condition codes.
•As with the different set and jump instructions
•S is copied D only if the specified condition holds.
•Single-byte: Not supported.
•The operand length: Inferred from R
•Unlike the unconditional instructions: Explicitly encoded
•The processor can execute conditional move instructions without having to predict the outcome of the test.
•The processor simply reads the source value (possibly from memory), checks the condition code, and then either updates the destination register or keeps it the same.
•Unlike conditional jumps
Implementing conditional operations via conditional data transfers
•The general form of conditional expression and assignment:
•Conditional control transfer:
•Combines conditional and unconditional jumps
•Conditional move:
•The final statement: A conditional move
Bad cases for conditional moves.
•Invalid behavior
•The case for the earlier example
•Illustration:
•C function:
•Invalid implementation:
•Null pointer dereferencing error.
•Must be compiled using branching code
•Code efficiency.
•Example: wasted computation.
•Compilers must take into account the relative performance of wasted computation versus the potential for performance penalty due to branch misprediction.
•Used by gcc only when computations are easy
3.6.7Loops
C looping constructs: do-while, while, and for.
•Implementation
•No corresponding instructions
•Instead, combinations of conditional tests and jumps
•Compilers generate loop code based on the two basic loop patterns.
Do-While Loops
The general do-while Translation:
•C code
•Equivalent goto version
•Example:
•Reverse engineering assembly code requires determining which registers are used for which program values
While Loops
The general while translation:
•while version
•2 translation methods:
•Jump to middle translation: Performs the initial test by performing an unconditional jump to the test at the end of the loop.
•-Og
•Equivalent goto version:
•Example:
•Guarded do translation: First transforms the code into a do-while loop by using a conditional branch to skip over the loop if the initial test fails.
•≥ -O1
•Equivalent do-while version:
•Equivalent goto version:
•The compiler can often optimize the initial test, for example, determining that the test condition will always hold.
•Example:
For Loops
The general for translation:
•for version:
•Equivalent while version:
•Equivalent goto version:
•Following the jump-to-middle strategy:
•Following the guarded-do strategy:
•Examples:
•for version:
•Components:
•Equivalent while version
•Equivalent goto version (jump-to-middle):
•Corresponding assembly-language code (-Og)
3.6.8Switch Statements
switch statements allow jump table implementation.
•Jump table: An array where entry i is the address of a code segment implementing the action the program should take when the switch index equals i.
•The code performs a jump table reference using the switch index to determine the jump target.
•Advantage over if-else: The time taken to perform the switch is independent of the number of switch cases.
•Used by gcc when there are a number of cases and they span a small range of values.
Example:
•switch_eg and switch_eg_impl
•Features of (a)
•Case labels that do not span a contiguous range
•Cases with multiple labels
•Cases that fall through to other cases
•Assembly code for switch statement
•Gcc operator &&: Used to create a pointer for a code location
•The range is shifted
•Treating index as unsigned
•Simplifies the branching possibilities
•Key step in executing: To access a code location through the jump table.
•In (b), computed goto (gcc's extension): goto *jt[index];
•In assembly code for switch, indirect jmp
•Jump table
•In (b), an array
•Duplicate cases: Same code label
•Missing cases: Default label
•In assembly code, declarations
•.rodata (for "read-only data"): Segment of the object-code file
•A sequence of 7 "quad" words:
•Value of each = address associated with the labels.
•.L4: The start of this allocation.
•The address associated: Base for the indirect jump.
•The use of a jump table allows a very efficient way to implement a multiway branch.
3.7Procedures
Procedures: Key abstraction
•Suppose procedure P calls procedure Q, and Q then executes and returns back to P.
•Mechanisms:
•Passing control.
•The program counter must be set to the starting address of the code for Q upon entry and then set to the instruction in P following the call to Q upon return.
•Passing data.
•P must be able to provide one or more parameters to Q, and Q must be able to return a value back to P.
•Allocating and deallocating memory.
•Q may need to allocate space for local variables when it begins and then free that storage before it returns.
•The x86-64 implementation: Minimalist strategy
•Only as much as is required.
3.7.1The Run-Time Stack
Storage management using a stack:
•LIFO discipline
•The stack and the registers store the information required for:
•Passing control and data
•Allocating memory
The x86-64 stack
•Grows toward lower addresses
•%rsp points to the top element
•Storing data on and retrieving it from the stack:
•pushq
•popq
•Allocating and deallocating space:
•Decrement %rsp
•Increment %rsp
The procedure's stack frame: The region where a procedure allocates space on the stack
•When it requires storage beyond what it can hold in registers
•General structure:
•The frame for theexecuting procedure is always at the top.
•Frame for the caller
•Portions:
•Arguments 7-n
•If required by the callee
•The return address: Wherewithin the caller the program should resume execution once the callee returns.
•Pushed by the caller when call
•Frame for the callee
•Allocated by extending the currentstack boundary
•Portions:
•Saved registers
•Local variables
•Arguments build areas
•Sizes
•Fixed size frames
•Allocated at the beginning of the procedure.
•Variable-size frames
•Procedures allocate only the portions of stack frames they require.
•A leaf procedure: All of the local variables can be held in registers and the function does not call any other functions.
3.7.2Control Transfer
call and ret:
•call: Pushes the return address onto the stack and sets the PC to the beginning of the callee.
•The return address: The address of the instruction immediately following call.
•ret: Pops the return address off the stack and sets the PC to it.
•The general forms:
•In the disassembly by objdump: callq and retq
•'q': x86-64 versions
•call
•The target: The address of the instruction where the called procedure starts.
•Either direct or indirect
•The target of a direct call: Label
•The target of an indirect call: *Operand
•Example: The execution of call and ret for multstore and main
•Excerpts of disassembly:
•More detailed example: Detailed execution of top and leaf
The standard call/return mechanism conveniently matches the LIFO memory management discipline.
3.7.3Data Transfer
Data passing
•Calls may involve passing data as arguments
•Returning may also involve returning a value
•Mostly via registers
Passing integral (i.e., integer and pointer) arguments
•Passing up to 6 arguments via registers
•The registers
•Passing arguments 7–n on the stack (n > 6)
•Stack top: Argument 7.
•All data sizes are rounded up to be multiples of 8.
•The portion "Argument build area": Space allocated within a procedure's stack frame for these arguments.
•Example:
3.7.4Local Storage on the Stack
Common cases where local data must be stored in memory:
•Not enough registers
•The address operator '&'
•Arrays or structures
The portion of the stack frame labeled "Local variables": Space allocated by a procedure son the stack frame by decrementing the stack pointer.
Example of the handling of '&'
•The run-time stack provides a simple mechanism for allocating local storage when it is required and deallocating it when the function completes.
More complex example:
3.7.5Local Storage in Registers
The set of program registers acts as a single resource shared by all of the procedures.
When one procedure (the caller) calls another (the callee), the callee does not overwrite some register value that the caller planned to use later.
The uniform set of conventions for register usage:
•Callee-saved registers: %rbx, %rbp, and %r12–%r15.
•One's value must be preserved by the callee.
•By not changing it at all
•By pushq-ing it, altering it, and then popq-ing it before ret.
•The portion "Saved registers": Created by the pushing of register values.
•With this convention, the caller can safely store a value in a callee-saved register, call, and then use it without risk of corruption.
•Caller-saved registers: %rax, %rdi, %rsi, %rdx, %rcx and%r8-%r11
•Can be modified by any function.
•Example: P
3.7.6Recursive Procedures
Procedures can call themselves recursively
•Provided by the stack discipline
•Example: rfact
•Mechanism: Each invocation of a function has its own private storage for state information
•Return address
•Callee-saved registers
•The stack discipline of allocation and deallocation naturally matches the call-return ordering of functions.
•Even works for mutual recursion
•E.g., when P calls Q, which in turn calls P.
3.8Array Allocation and Access
Pointers to elements within arrays are translated into address computations in machine code.
3.8.1Basic Principles
Declaration
•For data type T and integer constant N
•xA: the starting location
•2 effects:
•Allocates a contiguous region of L · N bytes in memory
•L: the size (in bytes) of data type T
•Introduces an identifier A that can be used as a pointer to xA.
•0 ≤ i ≤ N−1, A[i]is at
&A[i] = xA + L · i
•Examples
•Declarations:
•Arrays generated:
Array access
•Example: Evaluating E[i]
•Suppose E: an int array
•E in %rdx, and i in %rcx
•Address computation:
3.8.2Pointer Arithmetic
Arithmetic on pointers: If T *p = xp, then p+i = xp + L· i, where L is the size of T.
The generation and dereferencing of pointers:'&' and '*'.
Example:
•Expressions involving E each with an assembly-code implementation
•E in %rdx, and i in %rcx
•Result: Data in %eax, and pointers in %rax
3.8.3Nested Arrays
The general principles hold even for arrays of arrays
•Example
•Declaration
•Elements order in memory
•Row-major
•A[0], followed by A[1], and so on.
•Illustration:
•Consequence of the nested declaration.
•To access elements of multidimensional arrays
•Compute the offset
•mov (xD, C · i + j,L), D
•In general
•Declaration
•D[i][j] is at
&D[i][j] = xD + L (C · i + j)
•L: the size of data type T in bytes
•Example
•Declaration:
•Copying A[i][j] to %eax:
3.8.4Fixed-Size Arrays
Optimizing code operating on multidimensional arrays of fixed size.
Example: fix_prod_ele (-O1)
•Declaration of fix_matrix
•fix_prod_ele and fix_prod_ele_opt
•Optimizations:
•Generating Aptr
•Generating Bptr
•Generating Bend
•Assembly code
3.8.5Variable-Size Arrays
Variable-size arrays
•Array dimension expressions: Computed as the array is being allocated
•Declaration
•expr1 and expr2 are evaluated as the declaration is encountered
•Example:
•var_ele: Access A[i][j] of A[n][n]
•Code
•&A[i][j] = xA + 4(n · i) + 4j = xA + 4(n · i + j)
•Must use imul: Can incur significant performance penalty (unavoidable)
•Optimized when referenced within a loop:
•Optimize the index computations by exploiting the regularity of the access patterns.
•Example: var_prod_ele
•var_prod_ele and var_prod_ele_opt
•Assembly code for the loop
3.9Heterogeneous Data Structures
3.9.1Structures
3.9.2Unions
3.9.3Data Alignment
3.10Combining Control and Data in Machine-Level Programs
3.10.1Understanding Pointers
3.10.2Life in the RealWorld: Using the gdb Debugger
3.10.3Out-of-Bounds Memory References and Buffer Overflow
3.10.4Thwarting Buffer Overflow Attacks
3.10.5Supporting Variable-Size Stack Frames
3.11Floating-Point Code
3.11.1Floating-Point Movement and Conversion Operations
3.11.2Floating-Point Code in Procedures
3.11.3Floating-Point Arithmetic Operations
3.11.4Defining and Using Floating-Point Constants
3.11.5Using Bitwise Operations in Floating-Point Code
3.11.6Floating-Point Comparison Operations
3.11.7Observations about Floating-Point Code
3.12Summary
In this chapter, we have peered beneath the layer of abstraction provided by the C language to get a view of machine-level programming. By having the compiler generate an assembly-code representation of the machine-level program, we gain insights into both the compiler and its optimization capabilities, along with the machine, its data types, and its instruction set. In Chapter 5, we will see that knowing the characteristics of a compiler can help when trying to write programs that have efficient mappings onto the machine. We have also gotten a more complete picture of how the program stores data in different memory regions. In Chapter 12, we will see many examples where application programmers need to know whether a program variable is on the run-time stack, in some dynamically allocated data structure, or part of the global program data. Understanding how programs map onto machines makes it easier to understand the differences between these kinds of storage.
Machine-level programs, and their representation by assembly code, differ in many ways from C programs. There is minimal distinction between different data types. The program is expressed as a sequence of instructions, each of which performs a single operation. Parts of the program state, such as registers and the run-time stack, are directly visible to the programmer. Only low-level operations are provided to support data manipulation and program control. The compiler must use multiple instructions to generate and operate on different data structures and to implement control constructs such as conditionals, loops, and procedures. We have covered many different aspects of C and how it gets compiled. We have seen that the lack of bounds checking in C makes many programs prone to buffer overflows. This has made many systems vulnerable to attacks by malicious intruders, although recent safeguards provided by the run-time system and the compiler help make programs more secure.
We have only examined the mapping of C onto x86-64, but much of what we have covered is handled in a similar way for other combinations of language and machine. For example, compiling C++ is very similar to compiling C. In fact, early implementations of C++ first performed a source-to-source conversion from C++ to C and generated object code by running a C compiler on the result. C++ objects are represented by structures, similar to a C struct. Methods are represented by pointers to the code implementing the methods. By contrast, Java is implemented in an entirely different fashion. The object code of Java is a special binary representation known as Java byte code. This code can be viewed as a machine-level program for a virtual machine. As its name suggests, this machine is not implemented directly in hardware. Instead, software interpreters process the byte code, simulating the behavior of the virtual machine. Alternatively, an approach known as just-in-time compilation dynamically translates byte code sequences into machine instructions. This approach provides faster execution when code is executed multiple times, such as in loops. The advantage of using byte code as the low-level representation of a program is that the same code can be "executed" on many different machines, whereas the machine code we have considered runs only on x86-64 machines.
Chapter 4
Chapter 5
Chapter 6The Memory Hierarchy
6.1Storage Technologies
6.1.1Random Access Memory
6.1.2Disk Storage
6.1.3Solid State Disks
6.1.4Storage Technology Trends
6.2Locality
6.2.1Locality of References to Program Data
6.2.2Locality of Instruction Fetches
6.2.3Summary of Locality
6.3The Memory Hierarchy
6.3.1Caching in the Memory Hierarchy
6.3.2Summary of Memory Hierarchy Concepts
6.4Cache Memories
6.4.1Generic Cache Memory Organization
6.4.2Direct-Mapped Caches
6.4.3Set Associative Caches
6.4.4Fully Associative Caches
6.4.5Issues with Writes
6.4.6Anatomy of a Real Cache Hierarchy
6.4.7Performance Impact of Cache Parameters
6.5
6.6
6.7Summary
Part IIRunning Programs on a System
The interaction between your programs and the hardware.
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 9Virtual Memory
9.1Physical and Virtual Addressing
9.2Address Spaces
9.3VM as a Tool for Caching
9.3.1DRAM Cache Organization
9.3.2Page Tables
9.3.3Page Hits
9.3.4Page Faults
9.3.5Allocating Pages
9.3.6Locality to the Rescue Again
9.4VM as a Tool for Memory Management
9.5VM as a Tool for Memory Protection
9.6Address Translation
9.6.1Integrating Caches and VM
9.6.2Speeding Up Address Translation with a TLB
9.6.3Multi-Level Page Tables
9.6.4Putting It Together: End-to-End Address Translation
9.7Case Study: The Intel Core i7/Linux Memory System
9.7.1Core i7 Address Translation
9.7.2Linux Virtual Memory System
9.8Memory Mapping
9.8.1Shared Objects Revisited
9.8.2The fork Function Revisited
9.8.3The execve Function Revisited
9.8.4User-Level Memory Mapping with the mmap Function
9.9
9.10Garbage Collection
9.10.1Garbage Collector Basics
9.10.2Mark&Sweep Garbage Collectors
9.10.3Conservative Mark&Sweep for C Programs
9.11Common Memory-Related Bugs in C Programs
9.11.1Dereferencing Bad Pointers
9.11.2Reading Uninitialized Memory
9.11.3Allowing Stack Buffer Overflows
9.11.4Assuming That Pointers and the Objects They Point to Are the Same Size
9.11.5Making Off-by-One Errors
9.11.6Referencing a Pointer Instead of the Object It Points To
9.11.7Misunderstanding Pointer Arithmetic
9.11.8Referencing Nonexistent Variables
9.11.9Referencing Data in Free Heap Blocks
9.11.10Introducing Memory Leaks
9.12Summary
Part IIIInteraction and Communication between Programs
The basic I/O services provided by Unix operating systems and how to use these services to build applications.