Study Notes of CS:APP (Till Book 3.8 & Lecture 8.1, Regularly Updated)

阿新 • • 發佈：2021-07-06

Computer Systems: A Programmer's Perspective, Third Edition, Pearson, 2016 15-213/18-213: Introduction to Computer Systems (ICS)

Study Notes of the Book CS:APP and its ICS+ Course 15-213

Book & Course Information

Instructors

Randal E. Bryant and David R. O'Hallaron

Textbooks

Randal E. Bryant and David R. O'Halloron,

Computer Systems: A Programmer's Perspective, Third Edition, Pearson, 2016

Brian W. Kernighan and Dennis M. Ritchie,

The C Programming Language, Second Edition, Prentice Hall, 1988

Home

http://www.cs.cmu.edu/~213

Related Materials

Reading Notes:

•嵌入式與Linux那些事's cnblogs posts

•北洛's cnblogs posts

•FannieGirl's cnblogs posts

•頔瀟's CSDN posts

Learning Materials:

•CS:APP3e Book Site

•15-213/18-213 Fall 2019 (Latest Course Lectured by Randy Bryant, Up-to-date Slides, etc. Available)

•SJTU ICS SE101 2019

•Compiler Explorer

My Foreword

Content Covered

Currently I decide only to study the common parts of the five system courses suggested and the 15-213 implementation, as is listed in the table below.

Chapter	Phase
Chapter	1
1	Overview	1.1-1.10
2		2.1-2.3, 2.5
3		3.1-3.10, 3.12
4
5
6		6.1-6.4, 6.7
7
8
9		9.1-9.8, 9.13
10
11
12

First, read a section or sections of the book with the assistance of eudic and take down the key points. Second, watch the corresponding 15-213 video(s) if available and add the complement provided by the lecture PPT.

Book errata will be included.

Notes Organization

The structural style of my notes lies between that of a normal article and of a normal piece of slides.

Notes of "Summary" sections are no more and no less than the original text with topic words highlighted.

Formatting

Use the style scheme as much close as the global edition as possible, to the degree in which formatting would not become a burden on me.

Text size and font have been adjusted for online reading on a browser.

Fixing Post Style

•Macros for Blog Post on Word

•Reminder for myself: No applying to the original Word Document!

Sub BeforePublish()

Call ConvertNumbersToText

Call PictureResize(110)

Call AddSmallCapsMarks

Call AddAllCapsMarks

Call IndentToBlankSpaces

End Sub

•Replacements for HTML on VS Code.

Replace	With
\$MALL([^$@]*)CAP\$	<span style="font-variant: small-caps;">$1</span>
@LL([^$@]*)C@PS	<span style="text-transform: uppercase;">$1</span>
<p(>(<.>)?)	<p style="text-indent:2em"$1

(<p style=")margin-left: (\d)0pt;(">(?:<strong>)?<span style=".*)(">(?:<strong>)?•  )	$1padding-left:$2em;$3margin-left:-1em$4

•Alternatively, replacements using a Java program (unstable).

To-do List

•Have more useful chapters or sections covered. What I mean by "useful" is that the content may make sense for my personal advances; I may include parts that are required by some exam or some job.

•Refine my notes. The first versions are definitely way too rambling. Also, avoid abusing \r\ns and list items! In particular:

•Remove all derivations.

•…

•Remove in-line CSS style of

•Fully match labels (such as '<', '>' and '/', etc.) for accuracy.

Course Overview: Topics

Programs and Data

•Bits operations, arithmetic, assembly language programs

•Representation of C control and data structures

•Includes aspects of architecture and compilers

The Memory Hierarchy

•Memory technology, memory hierarchy, caches, disks, locality

•Includes aspects of architecture and OS

Exceptional Control Flow

•Hardware exceptions, processes, process control, Unix signals, nonlocal jumps

•Includes aspects of compilers, OS, and architecture

Virtual Memory

•Virtual memory, address translation, dynamic storage allocation

•Includes aspects of architecture and OS

Networking, and Concurrency

•High level and low-level I/O, network programming

•Internet services, Web servers

•concurrency, concurrent server design, threads

•I/O multiplexing with select

•Includes aspects of networking, OS, and architecture

Chapter 1A Tour of Computer Systems

A computer system consists of hardware and systems software that work together to run application programs.

1.1Information Is Bits + Context

A program begins life as a source program (or source file). The source program is a sequence of bits, each with a value of 0 or 1, organized in bytes (8-bit chunks). Each byte represents some text character in the program.

Most computer systems represent text characters using the ASCII standard that represents each character with a unique byte-size integer value.

Text files: files that consist exclusively of ASCII characters. Binary files: all other files.

A fundamental idea: All information in a system is represented as a bunch of bits. The only thing that distinguishes different data objects is the context in which we view them.

1.2Programs Are Translated by Other Programs into Different Forms

In order to run .c on the system, the individual C statements must be translated by other programs into a sequence of low-level machine-language instructions. These instructions are then packaged in anexecutable object program (or an executable object file) and stored as a binary disk file.

The compilation system: the programs that perform the four phases (preprocessor, compiler, assembler, and linker).

•Preprocessing phase. The preprocessor (cpp) modifies the original C program according to directives that begin with the '#' character. The result is another C program, typically with the .i suffix.

•Compilation phase. The compiler (cc1) translates .i into .s, which contains an assembly-language program. Assembly language is useful because it provides a common output language for different compilers for different high-level languages.

•Assembly phase. Next, the assembler (as) translates .s into machine language instructions, packages them in a relocatable object program, and stores the result in the object file .o.

•Linking phase. The printf function is part of the standard C libraryprovided by every C compiler. The printf function resides in a separate precompiled object file printf.o, which is merged with our .o program by the linker (ld). The result is an executable object file (or simply executable) that is ready to be loaded into memory and executed by the system.

1.3It Pays to Understand How Compilation Systems Work

Some important reasons:

•Optimizing program performance.

•Understanding link-time errors.

•Avoiding security holes.

•Buffer overflow vulnerabilities

1.4Processors Read and Interpret Instructions Stored in Memory

The shell is a command-line interpreter that prints a prompt, waits for you to type a command line, and then performs the command. If the first word of the command line does not correspond to a built-in shell command, then the shell assumes that it is the name of an executable file that it should load and run.

1.4.1Hardware Organization of a System

Buses

Buses: a collection of electrical conduits running throughout the system that carry bytes of information back and forth between the components.

•Typically designed to transfer words (fixed-size chunks of bytes).

•The word size (The number of bytes in a word) is a fundamental system parameter that varies across systems. Most machines: either 4 bytes (32bits) or 8 bytes (64 bits).

I/O Devices

Input/output (I/O) devices: the system's connection to the external world.

•Example system has 4: a keyboard and mouse, a display and a disk drive (or simply disk)

•Each is connected to the I/O bus by either a controller or an adapter.
The distinction: packaging.

•Controllers are chip sets in the device itself or on the motherboard (the system's main printed circuit board).

•An adapter is a card that plugs into a slot on the motherboard.

•The purpose of each: to transfer information back and forth between the I/O bus and an I/O device.

Main Memory

The main memory: a temporary storage device that holds both a program and the data it manipulates while the processor is executing the program.

•Physically, consists of a collection of dynamic random access memory (DRAM) chips.

•Logically, is organized as a linear array of bytes, each with its own unique address (array index) starting at zero.

•In general, each of the machine instructions that constitute a program can consist of a variable number of bytes. The sizes of data items that correspond to C program variables vary according to type.

Processor

The central processing unit (CPU), or simply processor: the engine that executes (interprets) instructions stored in main memory.

•The program counter (PC): a register (a word-size storage device) at its core.

•At any point in time, points at (contains the address of) some machine-language instruction in main memory.

•Repeatedly

•executes the instruction pointed at by the PC

•updates the PC to point to the next instruction

•Appears to operate according to a very simple instruction execution model, defined by its instruction set architecture.

•Instructions execute in strict sequence, and executing a single instruction involves performing a series of steps.

•The processor

•reads the instruction from memory pointed at by the program counter (PC),

•interprets the bits in the instruction,

•performs some simple operation dictated by the instruction, and then

•updates the PC to point to the next instruction, which may or may not be contiguous in memory to the instruction that was just executed.

•Operations revolve around

•main memory,

•the register file, and

•the arithmetic/logic unit (ALU).

•The register file: a small storage device that consists of a collection of word-size registers, each with its own unique name.

•The ALU: computes new data and address values.

•Examples of operations:

•Load: Copy a byte or a word from main memory into a register, overwriting the previous contents of the register.

•Store: Copy a byte or a word from a register to a location in main memory, overwriting the previous contents of that location.

•Operate: Copy the contents of two registers to the ALU, perform an arithmetic operation on the two words, and store the result in a register, overwriting the previous contents of that register.

•Jump: Extract a word from the instruction itself and copy that word into the PC, overwriting the previous value of the PC.

•Distinguish the processor's instruction set architecture,

•describing the effect of each machine-code instruction,

•from its microarchitecture,

•describing how the processor is actually implemented.

1.4.2Running the hello Program

Using direct memory access (DMA), the data travel directly from disk to main memory, without passing through the processor.

1.5Caches Matter

Lesson: A system spends a lot of time moving information from one place to another.

The machine instructions in the program

•Stored on disk

•Copied to main memory

•Copied into the processor

The data string

•On disk

•Copied to main memory

•Copied to the display device

Much of this copying is overhead: slows down the "real work" of the program.

A major goal for system designers: to make these copy operations run as fast as possible.

Larger storage devices are slower than smaller ones. And faster devices are more expensive to build than their slower counterparts.

The processor–memory gap: The processor can read data from the register file much faster than from memory.

To deal with it: Cache memories(or simply caches): smaller, faster storage devices that serve as temporary staging areas for information that the processor is likely to need in the near future.

•An L1 cache

•On the processor chip

•Holds many bytes

•Can be accessed nearly as fast as the register file

•An L2 cache

•Larger

•With fewer bytes

•Connected to the processor by a special bus

•Still much faster than accessing the main memory.

•The L1 and L2 caches are implemented with static random access memory (SRAM, a hardware technology).

•Newer and more powerful systems: 3 levels.

•The idea behind:

•Exploit locality(the tendency for programs to access data and code in localized regions) to get a very large and fast memory.

•Set up caches to hold data that are likely to be accessed often.

1.6Storage Devices Form a Hierarchy

The storage devices in every computer system are organized as a memory hierarchy similar to Figure.

The main idea: storage at one level serves as a cache for storage at the next lower level.

1.7The Operating System Manages the Hardware

Programs relay on the services provided by the operating system to accessed the hardware.

The operating system:

•A layer of software interposed between the application program and the hardware.

•All attempts by an application program to manipulate the hardware must go through the operating system.

•Two primary purposes:
(1) to protect the hardware from misuse by runaway applications and
(2) to provide applications with simple and uniform mechanisms for manipulating complicated and often wildly different low-level hardware devices.

•Both goals are achieved via the fundamental abstractions: processes, virtual memory, and files.

1.7.1Processes

A process: the operating system's abstraction for a running program.

•Multiple processes can run concurrently on the same system, and each process appears to have exclusive use of the hardware.
Concurrently: The instructions of one process are interleaved with the instructions of another process.

•In either case, a single CPU can appear to execute multiple processes concurrently by having the processor switch among them:

•Traditional systems could only execute one program at a time,
while newer multicore processors can execute several programs simultaneously.

•The operating system performs this interleaving with context switching (a mechanism).

•A uniprocessor system containing a single CPU and multiprocessor systems

Context switching of a uniprocessor system:

•The operating system keeps track of all the context (state) information that the process needs in order to run. Including information such as the current values of the PC, the register file and the contents of main memory

•At any point in time, a uniprocessor system can only execute the code for a single process.
When the operating system decides to transfer control from the current process to some new process, it performs a context switch by

•saving the context of the current process,

•restoring the context of the new process,

•and then passing control to the new process.

The new process picks up exactly where it left off.

•The basic idea for the example scenario:

The kernel: the portion of the operating system code that is always resident in memory.

•The transition from one process to another is managed by the operating system kernel.

•When an application program requires some action by the operating system, it executes a special system call instruction, transferring control to the kernel.
The kernel then performs the requested operation and returns back to the application program.

•Note:

•Not a separate process.

•Instead, a collection of code and data structures that the system uses to manage all the processes.

1.7.2Threads

•A process can actually consist of multiple threads (execution units)

•Each running in the context of the process and sharing the same code and global data.

•Increasingly important:

•Requirement for concurrency in network servers

•Easier to share data between than processes

•More efficient than processes

•Multi-threading: make programs run faster

1.7.3Virtual Memory

Virtual memory: an abstraction that provides each process with the illusion that it has exclusive use of the main memory.

Each process has the same uniform view of memory: virtual address space.

•For Linux processes:

Areas (starting with the lowest addresses and working the way up):

•Program code and data.

•Code begins at the same fixed address for all processes, followed by data locations that correspond to global C variables.

•Are initialized directly from the contents of an executable object file.

•Heap.

•Follow the code and data areas immediately.

•Expands and contracts dynamically at run time as a result of calls to C standard library routines.

•Shared libraries.

•Near the middle of the address space.

•Holds the code and data for shared libraries.

•Powerful but difficult.

•Stack (user stack).

•At the top of the user's virtual address space.

•Used by the compiler to implement function calls.

•Expands and contracts dynamically during the execution of the program.
In particular, each time:

•call a function, grows.

•return from a function, contracts.

•Kernel virtual memory.

•The top region reserved.

•Application must invoke the kernel to read or write the contents of this area or to directly call functions defined in the kernel code.

For virtual memory to work, a sophisticated interaction is required between the hardware and the operating system software.

•The basic idea: to store the contents of a process's virtual memory on disk and then use the main memory as a cache for the disk.

1.7.4Files

A file: a sequence of bytes.

•Every I/O device is modeled as a file.

•All input and output in the system is performed by reading and writing files, using Unix I/O (a small set of system calls).

•Very powerful

•Provides applications with a uniform view of all the varied I/O devices that might be contained in the system.

1.8Systems Communicate with Other Systems Using Networks

From the point of view of an individual system, the network can be viewed as just another I/O device.

•When the system copies a sequence of bytes from main memory to the network adapter, the data flow across the network to another machine.

•Similarly, the system can read data sent from other machines and copy these data to its main memory.

1.9Important Themes

A system is a collection of intertwined hardware and systems software that must cooperate in order to achieve the ultimate goal of running application programs.

1.9.1Amdahl's Law

Amdahl's law:

•An observation about the effectiveness of improving the performance of one part of a system.

•The main idea: when we speed up one part of a system, the effect on the overall system performance depends on both how significant this part was and how much it sped up.

•Consider a system in which executing some application requires time T_old.
Suppose some part of the system requires a fraction α of this time, and that we improve its performance by a factor of k.
The overall execution time would be T_new = T_old[(1− α) + α/k].
The speedup S = T_old/T_new = 1/[(1 − α) + α/k].

•The major insight—to significantly speed up the entire system, we must improve the speed of a very large fraction of the overall system. One interesting special case: setting k to ∞. S_∞ = 1/(1 − α).

•A general principle for improving any process.

•Most meaningful for computers. Performance is routinely improved by high factors.

1.9.2Concurrency and Parallelism

Concurrency: the general concept of a system with multiple, simultaneous activities.

Parallelism: the use of concurrency to make a system run faster.

•Can be exploited at multiple levels of abstraction in a computer system.

•Highlight 3 here from the highest to the lowest level in the system hierarchy.

Thread-Level Concurrency

With the process abstraction, multiple programs execute at the same time (concurrency). With threads, multiple control flows execute within a single process.

A uniprocessor system: the configuration that a single processor has to switch among multiple tasks.

•Since the advent of time-sharing

•Only simulated, by having a single computer rapidly switch among its executing processes

•Allows:

•multiple users to interact with a system at the same time

•a single user to engage in multiple tasks concurrently

A multiprocessor system: a system consisting of multiple processors all under the control of a single operating system kernel.

•Have become commonplace with the advent of multi-core processors and hyperthreading

•Multi-core processors have several CPUs ("cores") integrated onto a single integrated-circuit chip.

•Each L1 cache is split into two parts—

•one to hold recently fetched instructions

•one to hold data

•The cores share higher levels of cache as well as the interface to main memory.

•Hyperthreading (simultaneous multi-threading): a technique that allows a single CPU to execute multiple flows of control.

•It involves having multiple copies of some of the CPU hardware,

•such as program counters and register files,

•while having only single copies of other parts of the hardware,

•such as the units that perform floating-point arithmetic.

•A hyperthreaded processor decides which of its threads to execute on a cycle-by-cycle basis.

•It enables the CPU to take better advantage of its processing resources.

•Improve system performance—

•Reduces the need to simulate concurrency when performing multiple tasks.

•Can run a single application program faster, but only if that program is expressed in terms of multiple threads that can effectively execute in parallel.

Instruction-Level Parallelism

Instruction-level parallelism: the property that processors can execute multiple instructions at one time.

•Pipelining: the actions required to execute an instruction are partitioned into different steps and the processor hardware is organized as a series of stages, each performing one of these steps.

•The stages can operate in parallel, working on different parts of different instructions.

•Superscalar processors: Processors that can sustain execution rates faster than 1 instruction per cycle.

•Most modern processors support superscalar operation.

•Application programmers can use a high-level model to understand the performance of their programs.

•They can then write programs such that the generated code achieves higher degrees of instruction-level parallelism and therefore runs faster.

Single-Instruction, Multiple-Data (SIMD) Parallelism

Single-instruction, multiple-data (SIMD) parallelism: The mode that processors with special hardware allows a single instruction to cause multiple operations to be performed in parallel.

1.9.3The Importance of Abstractions in Computer Systems

The use of abstractions is one of the most important concepts in computer science.

•On the processor side

•The instruction set architecture: an abstraction of the actual processor hardware

•A machine-code program behaves as if it were executed on a processor that performs just one instruction at a time.

•The underlying hardware is always in a consistent way.

•Different processor implementations can execute the same machine code while offering a range of cost and performance.

•On the operating system side

•Files: an abstraction of I/O devices

•Virtual memory: an abstraction of program memory

•Processes: an abstraction of a running program

•The virtual machine: an abstraction of the entire computer

•A way to manage computers that must be able to run programs designed for multiple operating systems or different versions of the same operating system.

1.10Summary

A computer system consists of hardware and systems software that cooperate to run application programs. Information inside the computer is represented as groups of bits that are interpreted in different ways, depending on the context. Programs are translated by other programs into different forms, beginning as ASCII text and then translated by compilers and linkers into binary executable files.

Processors read and interpret binary instructions that are stored in main memory. Since computers spend most of their time copying data between memory, I/O devices, and the CPU registers, the storage devices in a system are arranged in a hierarchy, with the CPU registers at the top, followed by multiple levels of hardware cache memories, DRAM main memory, and disk storage. Storage devices that are higher in the hierarchy are faster and more costly per bit than those lower in the hierarchy. Storage devices that are higher in the hierarchy serve as caches for devices that are lower in the hierarchy. Programmers can optimize the performance of their C programs by understanding and exploiting the memory hierarchy.

The operating system kernel serves as an intermediary between the application and the hardware. It provides three fundamental abstractions: (1) Files are abstractions for I/O devices. (2) Virtual memory is an abstraction for both main memory and disks. (3) Processes are abstractions for the processor, main memory, and I/O devices.

Finally, networks provide ways for computer systems to communicate with one another. From the viewpoint of a particular system, the network is just another I/O device.

Part IProgram Structure and Execution

How application programs are represented and executed.

Chapter 2Representing and Manipulating Information

Bits

•Computers store and process information represented as two-valued signals.

•Bits form the basis of the digital revolution.

•The decimal, or base-10, representation: natural for humans. Binary values: when building machines that store and process information.

•Two-valued signals can readily be represented, stored, and transmitted

Encodings

•Group bits together and apply some interpretation and represent the elements of any finite set.

•The 3 most important representations of numbers.

•Unsigned encodings

•Based on traditional binary notation

•Represent numbers ≥ 0

•Two's-complement encodings

•Represent signed integers

•Either positive or negative

•Floating-point encodings

•A base-2 version of scientific notation for representing real numbers

•Computers implement arithmetic operations with them.

•Some operations can overflow when the results are too large to be represented.

•The different mathematical properties of integer versus floating-point arithmetic:

•Integer computer arithmetic satisfies many of the familiar properties of true integer arithmetic.

•Floating-point arithmetic has altogether different mathematical properties.

•Stem from the difference in how they handle the finiteness of their representations—

•Integer representations encode a comparatively small range of values precisely

•Floating-point representations encode a wide range of values approximately

•A number of computer security vulnerabilities have arisen due to some of the subtleties of computer arithmetic.

•Computers use several different binary representations to encode numeric values.

2.1Information Storage

Computers use bytes (blocks of 8 bits) as the smallest addressable unit of memory.

A machine-level program views memory as a very large array of bytes — virtual memory.

Every byte of memory is identified by a unique number — its address.

The virtual address space: the set of all possible addresses. Just a conceptual image presented to the machine-level program.

Program objects: program data, instructions, and control information. The management of the storage is all performed within the virtual address space. Example: The value of a pointer in C is the virtual address of the first byte of some block of storage. Type information also is associated with each pointer.

2.1.1Hexadecimal Notation

A Byte in Different Notations

•= 8 bits.

•Binary

•00000000₂ to 11111111₂.

•Decimal

•0₁₀ to 255₁₀.

•Hexadecimal (or "hex", base-16):

•very convenient for describing bit patterns.

•Use '0' through '9' along with 'A' through 'F' to represent 16 possible values

•00₁₆ to FF₁₆.

•In C, starting with 0x or 0X.

•Example: write FA1D37B16 as 0xFA1D37B, as 0xfa1d37b, or even mixing

Manually converting between decimal, binary, and hexadecimal representations of bit patterns:

•Converting between binary and hexadecimal: straightforward.

•Convert binary to hexadecimal. Example: 0x173A4C

•Convert binary to hexadecimal. Note: if the total number of bits is not a multiple of 4, make the leftmost group be the one with fewer than 4 bits, effectively padding the number with leading 0s. Example: 1111001010110110110011.

When x = 2ⁿ for some nonnegative integer n, the binary representation of x is simply 1 followed by n 0s. The hexadecimal 0 represents 4 binary 0s. So, for n = i + 4j, where 0 ≤ i ≤ 3, write x with a leading 2ⁱ, followed by j 0s. An example: x = 2,048 = 2¹¹, n = 11= 3 + 4 · 2, 0x800.

•Converting between decimal and hexadecimal: multiplication or division.

•Convert decimal to hexadecimal. To convert a decimal number x to hexadecimal, repeatedly x = q · 16 + r. Use r as the least significant digit and generate the remaining digits by repeating the process on q. Example: decimal 314,156:

0x4CB2C.

•Convert hexadecimal to decimal. Multiply each of the hexadecimal digits by the appropriate power of 16. Example: 0x7AF, 7 · 162 + 10 · 16 + 15 = 7 · 256 + 10 · 16 + 15 = 1,967.

2.1.2Data Sizes

Every computer has a word size (the nominal size of pointer data). w-bit: the virtual addresses range from 0 to 2^w − 1.

A widespread shift from with 32-bit machines to 64-bit ones (a virtual address space of 16 exabytes).

32-bit programs vs 64-bit programs: the distinction lies in how a program is compiled, rather than the type of machine on which it runs.

Computers and compilers support multiple data formats using different ways to encode data.

The C language supports multiple data formats for both integer and floating point data.

•Integer data:

•Signed: negative, zero, and positive values.

•Unsigned: nonnegative values.

•char: a single byte. Can also be used to store integer values.

•short, int, and long: provide a range of sizes.

•A pointer (e.g., char *): uses the full word size.

•Two different floating-point formats:

•float: single precision, 4 bytes

•double: double precision, 8 bytes

Fixed-size integer types: int32_t and int64_t. Use them is the best way for programmers to have close control over data representations.

Most of the data types encode signed values, unless prefixed by unsigned or using the specific unsigned declaration for fixed-size data types. The exception: char. The C standard does not guarantee these signed data. Use signed char to guarantee a 1-byte signed value. In many contexts, however, insensitive.

The C language allows a variety of ways to order the keywords and to include or omit optional keywords.

One aspect of portability is to make the program insensitive to the exact sizes of the different data types. The C standards set lower bounds on the numeric ranges of the different data types, but there are no upper bounds (except with the fixed-size types).

2.1.3Addressing and Byte Ordering

Must establish two conventions for multi-byte objects: addressing and byte ordering.

Addressing

•A multi-byte object is stored as a contiguous sequence of bytes. Address: the smallest address of the bytes used.

•Example: intx at 0x100. (Assuming 32-bit) The 4 bytes would be stored in 0x100, 0x101, 0x102, and 0x103.

Byte Ordering

•Two common conventions.
A w-bit integer [x_w −1, x_w −2, . . . , x₁, x₀], x_w−1: the most significant bit, x₀: the least.

Assuming w: a multiple of 8. The most significant byte: [x_w−1, x_w−2, …, x_w−8], the least significant byte: [x₇, x₆, …, x₀] and the other bytes: bits from the middle.

•Little endian: the least significant byte comes first

•Big endian: the most significant byte comes first

•Example: intx at 0x100: 0x01234567. The ordering depends on the type of machine:

Note: The high-order byte is 0x01, while the low-order byte is 0x67.

•Machines:

•Little-endian: most Intel-compatible machines, machines that use Intel-compatible processors manufactured by IBM or Oracle, Android, iOS.

•Big-endian: most machines from IBM and Oracle

•Bi-endian: ARM microprocessors

•At times, it becomes an issue.

•When binary data are communicated over a network between different machines.

•When looking at the byte sequences representing integer data.

•Often when inspecting machine-level programs

•4004d3: 01 05 43 0b 20 00 add %eax,0x200b43(%rip)
generated by a disassembler

•Disassembler: a tool that determines the instruction sequence represented by an executable program file

•Add 0x200b43 to the current value of the program counter

•Having bytes appear in reverse order is common when reading machine-level program representations generated for little-endian machines

•When programs are written that circumvent the normal type system.

•In C, a cast or a union

•size_t: the preferred data type for expressing the sizes of data structures

•sizeof(T)returns the number of bytes required to store an object of type T.

•To write portable code

•The different machine/operating system configurations use different conventions for storage allocation

2.1.4Representing Strings

A string in C

•Encoded by an array of characters

•Terminated by the null (having value 0) character.

•Each character represented by the ASCII character code.

•The ASCII code for decimal digit x happens to be 0x3x.

•The terminating byte = 0x00

•Independent of conventions

2.1.5Representing Code

Binary code is seldom portable across different combinations of machine and operating system.

A fundamental concept: a program is simply a sequence of bytes.

2.1.6Introduction to Boolean Algebra

Boolean algebra.

•The work of George Boole

•Encode true and false as 1 and 0

Boolean Operations

The simplest Boolean algebra: defined over {0, 1}.

•The 4 Boolean operations

Boolean operation	Corresponding logical operation
~	not	¬
&	and	∧
\|	or	∨
^	exclusive-or	⊕

Boolean Operations over Bit Vectors

Extending the 4 Boolean operations to bit vectors(strings of 0's and 1's of fixed length w).

•Defined according to applications to the matching elements.

•Examples:

•Application: to represent finite sets.

•Encode A ⊆ {0, 1, …, w− 1} with [a_w₋₁, …, a₁, a₀], a_i= 1 if and only if i ∈ A.

•Example: a = [01101001] encodes A = {0, 3, 5, 6}, b = [01010101] encodes B = {0, 2, 4, 6}.

Boolean operation	Corresponding set operation
\|	Set union
&	Set intersection
~	Set complement

•Continuing the example: a & b yields [01000001], A ∩ B = {0, 6}.

•Practical applications example: There are a number of different signals that can interrupt the execution of a program. Selectively enable or disable different signals by specifying a bit-vector mask, where a 1 in bit position i indicates that signal i is enabled and a 0 indicates that it is disabled. Thus, the mask represents the set of enabled signals.

2.1.7Bit-Level Operations in C

The symbols for the Boolean operations can be applied to any "integral" data type.

•Examples: expression evaluation for char:

•Evaluating bit-level expression: (1) Expand hexadecimal to binary, (2) perform the operations, and (3) convert back to hexadecimal.

•Use: to implement masking operations

•A mask: a bit pattern that indicates a selected set of bits within a word.

•Example:

•x & 0xFF: the least significant byte of x and 0's (all others).

•Example: x = 0x89ABCDEF, 0x000000EF.

•~0: all 1's

•can be written 0xFFFFFFFF when int is 32 bits, not as portable.

2.1.8Logical Operations in C

Logical operator	Corresponding logic operation
\|\|	or
&&	and
!	not

•Any nonzero argument as true, argument 0 as false

•Return either 1 or 0, indicating a result of either true or false

•A bitwise operation matches that of its logical counterpart only when the arguments are restricted to 0 or 1.

•The logical operators do not evaluate the second argument if the result of can be determined by evaluating the first.

•Examples: a && 5/a, p && *p++.

2.1.9Shift Operations in C

Shift operations shift bit patterns to the left and to the right.

•x: [x_w₋₁, x_w₋₂, …, x₀]

•Left shift operation: x << k

•[x_w_−k−1, x_w_−k−2, …, x₀, 0, …, 0]

•x is shifted k bits to the left, dropping off the k most significant bits and filling the right end with k 0's.

•0 ≤ k ≤ w − 1

•Associate

•Right shift operation: x >> k, 2 forms

•Logical.

•The left end is filled with k 0's

•[0, …, 0, x_w₋₁, x_w₋₂, …, x_k]

•Arithmetic.

•The left end is filled with k repetitions of the most significant bit

•[x_w₋₁, …, x_w₋₁, x_w₋₁, x_w₋₂, …, x_k]

•Useful for operating on signed integer data.

•Example (the italicized digits fill the ends):

•Not precisely defined by the C standards

•In practice, arithmetic for signed data, logical (must) for unsigned data.

•The definition in Java: x >> k arithmetically, x >>> k logically

2.2Integer Representations

2.2.1Integral Data Types

Integral data types represent finite ranges of integers.

•Size: char, short, long

•All nonnegative or possibly negative: unsigned or the default

•The only machine-dependent: long (8-byte with 64-bit, 4-byte with 32-bit)

•Asymmetric

•≤ the typical

•Symmetric except the fixed-size data types

•int could be 2-byte (mostly for 16-bit), long can be 4-byte (typically for 32-bit)

•The fixed-size data types

•Ranges = those of typical numbers

•Asymmetric

2.2.2Unsigned Encodings

Consider an integer data type of w bits.

•A bit vector is written as:

•, to denote the entire vector

•[x_w₋₁, x_w₋₂, ..., x₀], to denote the individual bits within the vector

•The unsigned interpretation of : treated as binary.

•x_i = 0 or 1 (2ⁱ is part of the value).

principle: Definition of unsigned encoding

For vector = [x_w₋₁, x_w₋₂, ..., x₀]:

B2U_w() ≐ x_i2ⁱ

•≐: the left-hand side is defined to be equal to the right-hand side.

•Examples:

•UMax_w ≐ 2ⁱ =2^w − 1

•A mapping: B2U_w: {0, 1}^w→{0, … , UMax_w}.

principle: Uniqueness of unsigned encoding

Function B2U_w is a bijection.

•Bijection: a function f that goes two ways: it maps a value x to a value y where y = f (x), but it can also operate in reverse, since for every y, there is a unique value x such that f (x) = y, which is given by the inverse function x = f⁻¹(y).

•U2B_w: the inverse of B2U_w

2.2.3Two's-Complement Encodings

Two's-complement form: the most common representation of signed numbers

principle: Definition of two's-complement encoding

For vector = [x_w₋₁, x_w₋₂, ..., x₀]:

B2T_w() ≐ −x_w₋₁2^w−1 +x_i2ⁱ

The sign bit: x_w₋₁

•"Weight": −2^w−1

•1: negative; 0: nonnegative.

•TMin_w≐ −2^w−1, TMax_w≐ 2ⁱ =2^w−1 − 1.

•A mapping: B2T_w: {0, 1}^w →{TMin_w, …, TMax_w}.

principle: Uniqueness of two's-complement encoding

Function B2T_w is a bijection.

•T2B_w: the inverse of B2T_w

Drop w: UMax, TMin, and TMax

Points worth highlighting:

•Asymmetric Range: |TMin| = |TMax| + 1

•UMax = 2TMax + 1

•−1: same representation as UMax

•0: a string of all 0's in both

Two's-complement in languages:

•Not required by C. <limits.h> defines a set of constants delimiting the ranges of the different integer data types for the particular machine.

•Example: for a two's-complement machine, INT_MAX = TMax_w, INT_MIN = TMin_w and UINT_MAX = UMax_w.

•Required by Java.

Example to get a better understanding:

2.2.4Conversions between Signed and Unsigned

Casting example:

•int x, unsigned u

•(unsigned) x converts x to unsigned, (int) u converts u to int.

A general rule of handling conversions between signed and unsigned numbers with the same word size—the numeric values might change, but the bit patterns do not.

U2B_w and T2B_w

•U2B_w

•0 ≤ x ≤ UMax_w, U2B_w(x)

•Unique unsigned

•T2B_w

•TMin_w ≤ x ≤ TMax_w, T2B_w(x)

•Unique two's-complement

T2U_w and U2T_w

•T2U_w

•TMin_w ≤ x ≤ TMax_w, T2U_w(x) ≐ B2U_w(T2B_w(x))

•0 ≤ T2U_w(x) ≤ UMax_w, same representation

principle: Conversion from two's complement to unsigned

For x such that TMin_w ≤ x ≤ TMax_w:

T2U_w(x) =

derivation: Conversion from two's complement to unsigned

B2U_w(T2B_w(x)) = T2U_w(x) = x + x_w₋₁2^w

In a two's-complement representation of x, bit x_w₋₁ determines whether or not x is negative.

•Examples:

•The behavior of T2U:

•< 0: converted to large positive

•≥ 0: unchanged

•U2T_w(u)

•U2T_w 0 ≤ x ≤ UMax_w, U2T_w(x) ≐ B2T_w (U2B_w(x))

•0 ≤ U2T_w(x) ≤ UMax_w, same representation

principle: Unsigned to two's-complement conversion

For u such that 0 ≤ u ≤ UMax_w:

U2T_w(u) =

derivation: Unsigned to two's-complement conversion

U2T_w(u) = −u_w₋₁2^w + u

In the unsigned representation of u, bit u_w₋₁ determines whether or not u is greater than TMax_w

= 2^w−1 − 1.

•The behavior of U2T:

•≤ TMax_w, unchanged

•> TMax_w, converted to negative

Summary

The effects of converting in both directions:

•0 ≤ x ≤ TMax_w

•T2U_w(x) = x, U2T_w(x) = x, identical

•Outside

•+/− 2^w

•2 extremes:

•T2U_w (−1) = UMax_w

•T2U_w(TMin_w) = TMax_w + 1

2.2.5Signed versus Unsigned in C

Numbers

•Signed: by default

•Example: 12345 or 0x1A2B

•Unsigned: adding 'U' or 'u' as a suffix

•Example: 12345U or 0x1A2Bu

Conversion between:

•Allowed but not specified. Mostly U2T_w and T2U_w.

•Explicit casting and implicit casting (when an expression of one type is assigned to a variable of another)

•printf does not use type information.

•Possibly nonintuitive behavior:

2.2.6Expanding the Bit Representation of a Number

One common operation: to convert between integers having different word sizes while retaining the same numeric value. Converting from a smaller to a larger data type should always be possible.

Converting Unsigned to Larger

Zero extension: adding leading zeros to the representation.

principle: Expansion of an unsigned number by zero extension

Define bit vectors = [u_w₋₁, u_w₋₂, …, u₀] of width w and ′ = [0, …, 0, u_w₋₁, u_w₋₂, …, u₀] of width w, where w′>w. Then B2U_w_′() = B2U_w(′).

Converting Two's-complement to Larger

Sign extension: adding copies of the most significant bit to the representation.

principle: Expansion of a two's-complement number by sign extension

Define bit vectors = [x_w₋₁, x_w₋₂, …, x₀] of width w and ′ = [x_w₋₁, …, x_w₋₁, x_w₋₁, x_w₋₂, …, x₀] of width w, where w′>w. Then B2T_w (x) = B2T_w_′(x).

•The value is preserved:

derivation: Expansion of a two's-complement number by sign extension

•The relative order of conversion: The program first changes the size and then the type.

2.2.7Truncating Numbers

Truncating = [x_w₋₁, x_w₋₂, …, x₀] to k-bit: drop the high-order w − k bits, ′ = [x_k₋₁, x_k₋₂, …, x₀]. Truncating a number can alter its value—a form of overflow.

Truncating Unsigned

principle: Truncation of an unsigned number

Let be the bit vector [x_w₋₁, x_w₋₂, …, x₀], and let ′ be the result of truncating it to k bits: ′ = [x_k₋₁, x_k₋₂, …, x₀]. Let x = B2U_w() and x′ = B2U_k(′). Then x = x′ mod 2^k.

Truncating Two's-complement

principle: Truncation of a two's-complement number

Let be the bit vector [x_w₋₁, x_w₋₂, …, x₀], and let ′ be the result of truncating it to k bits: ′ = [x_k₋₁, x_k₋₂, …, x₀]. Let x = B2U_w() and x′ = B2T_k(′). Then x = U2T_k(x mod 2^k).

derivation: Truncation of a two's-complement number

B2T_w ([x_w₋₁, x_w₋₂, …, x₀]) mod 2^k = B2U_k ([x_k₋₁, x_k₋₂, …, x₀])

Summary

The effect of truncation

•For unsigned:

B2U_k([x_k₋₁, x_k₋₂, …, x₀]) = B2U_w([x_w₋₁, x_w₋₂, …, x₀]) mod 2^k

•For two's-complement:

B2T_k([x_k₋₁, x_k₋₂, …, x₀]) = U2T_k(B2U_w ([x_w₋₁, x_w₋₂, …, x₀]) mod 2^k)

2.2.8Advice on Signed versus Unsigned

The implicit conversion can lead to errors or vulnerabilities.

•One way to avoid: to never use unsigned.

•Example: Java.

Unsigned values are useful

•When words = collections of bits. Example:

•Packing a word with flags describing various Boolean conditions

•Addresses (naturally unsigned)

•When implementing mathematical packages for modular / multiprecision arithmetic.

2.3Integer Arithmetic

2.3.1Unsigned Addition

"Word size inflation":

•Some programming languages support arbitrary size arithmetic;
More commonly, programming languages support fixed-size arithmetic.

x y for arguments x and y, where 0 ≤ x, y < 2^w: the result of truncating the integer sum x + y to be w bits long, viewed as an unsigned number.

•Characterized as a form of modular arithmetic: discarding any bits with weight > 2^w−1

principle: Unsigned addition

For x and y such that 0 ≤ x, y < 2^w:

x y =

•Illustration:

Overflow: an arithmetic operation whose integer result cannot fit within the word size limits of the data type.

•Occurs when the two operands sum to 2^w or more

•Not signaled as errors

principle: Detecting overflow of unsigned addition

For x and y in the range 0 ≤ x, y ≤ UMax_w, let s ≐ x y. Then the computation of s overflowed if and only if s < x (or equivalently, s < y).

Modular addition forms an abelian group (a mathematical structure).

•Commutative and associative

•The identity element: 0; every element has an additive inverse

•Value x for every value x: x x = 0

principle: Unsigned negation

For any number x such that 0 ≤ x < 2^w, its w-bit unsigned negation x is given by the following:

x =

2.3.2Two's-Complement Addition

x y, given integer values x and y where −2^w−1≤ x, y ≤ 2^w−1 − 1: the result of truncating the integer sum x + y to w bits, viewed as a two's-complement number.

principle: Two's-complement addition

For integer values x and y in the range −2^w−1≤ x, y ≤ 2^w−1 − 1:

•Illustration:

•Positive overflow: x + y exceeds TMax_w (case 4)

•Negative overflow: x + y is less than TMin_w (case 1)

•Has the same bit-level representation as the unsigned sum

•Examples:

principle: Detecting overflow in two's-complement addition

For x and y in the range TMin_w ≤ x, y ≤ TMax_w, let s ≐ x y. Then the computation of s has had positive overflow if and only if x > 0 and y > 0 but s ≤ 0. The computation has had negative overflow if and only if x < 0 and y <0 but s ≥ 0.

•Illustrations:

2.3.3Two's-Complement Negation

x: the additive inverse under

principle: Two's-complement negation

For x in the range TMin_w ≤ x ≤ TMax_w, its two's-complement negation x is given by the formula

x =

2.3.4Unsigned Multiplication

x y for integers x and y where 0 ≤ x, y ≤ 2^w − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as an unsigned number.

principle: Unsigned multiplication

For x and y such that 0 ≤ x, y ≤ UMax_w:

x y = (x · y) mod 2^w

2.3.5Two's-Complement Multiplication

x y for integers x and y where −2^w-1 ≤ x, y ≤ 2^w-1 − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as a two's-complement number.

principle: Two's-complement multiplication

For x and y such that TMin_w ≤ x, y ≤ TMax_w:

x y = U2T_w((x · y) mod 2^w)

principle: Bit-level equivalence of unsigned and two's-complement multiplication

Let and be bit vectors of length w. Define integers x and y as the values represented by these bits in two's-complement form: x = B2T_w(x) and y = B2T_w(y).Define nonnegative integers x′ and y′ as the values represented by these bits in unsigned form: x = B2U_w(x) and y = B2U_w(y). Then

T2B_w(x y) = U2B_w(x′ y′)

•Illustrations:

2.3.6Multiplying by Constants

Integer multiply is slow. Optimization: to replace multiplications by constants with combinations of shift and addition operations.

principle: Multiplication by a power of 2

Let x be the unsigned integer represented by bit pattern [x_w₋₁, x_w₋₂, …, x₀]. Then for any k ≥ 0, the w + k-bit unsigned representation of x^2k is given by [x_w₋₁, x_w₋₂, …, x₀, 0, …, 0], where k zeros have been added to the right.

principle: Unsigned multiplication by a power of 2

For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x << k yields the value x 2^k.

principle: Two's-complement multiplication by a power of 2

For C variables x and k with two's-complement value x and unsigned value k, such that 0 ≤ k < w, the C expression x << k yields the value x 2^k.

The task of generating code for the expression x * K, for some constant K.

•K: [(0…0) (1…1) (0…0) . . . (1…1)].

•Compute a run of 1's from bit position n down to bit position m (n ≥ m) using either form:

•Form A: (x<<n) + (x<<(n − 1)) + . . . + (x<<m)

•Form B: (x<<(n + 1)) - (x<<m)

•Compute x * K by adding together the results for each run.

2.3.7Dividing by Powers of 2

Performed using a right shift:

•Logical: unsigned

•Arithmetic: two's-complement

Integer division always rounds toward 0. Some notation for any real number a:

•⌊a⌋: the unique integer a′ such that a′ ≤ a < a′ + 1.

•⌈a⌉: the unique integer a′ such that a′ − 1< a′ ≤ a′.

Dividing by a Power of 2 with Unsigned Arithmetic

principle: Unsigned division by a power of 2

For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x >> k yields the value ⌊x/2^k⌋.

•Examples:

Dividing by a Power of 2 with Two's-complement Arithmetic

Using an arithmetic right shift:

principle: Two's-complement division by a power of 2, rounding down

Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression x >> k, when the shift is performed arithmetically, yields the value ⌊x/2^k⌋.

•Examples:

Correcting for the improper rounding that occurs when a negative number is shifted right by "biasing" the value before shifting.

principle: Two's-complement division by a power of 2, rounding up

Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression (x + (1 << k) - 1) >> k, when the shift is performed arithmetically, yields the value ⌈x/2^k⌉.

•Demonstration:

(x<0 ? x+(1<<k)-1 : x) >> k will compute x/2^k.

2.3.8Final Thoughts on Integer Arithmetic

2.4Floating Point

2.4.1Fractional Binary Numbers

2.4.2IEEE Floating-Point Representation

2.4.3Example Numbers

2.4.4Rounding

2.4.5Floating-Point Operations

2.4.6Floating Point in C

2.5Summary

Computers encode information as bits, generally organized as sequences of bytes. Different encodings are used for representing integers, real numbers, and character strings. Different models of computers use different conventions for encoding numbers and for ordering the bytes within multi-byte data.

The C language is designed to accommodate a wide range of different implementations in terms of word sizes and numeric encodings. Machines with 64-bit word sizes have become increasingly common, replacing the 32-bit machines that dominated the market for around 30 years. Because 64-bit machines can also run programs compiled for 32-bit machines, we have focused on the distinction between 32- and 64-bit programs, rather than machines. The advantage of 64-bit programs is that they can go beyond the 4 GB address limitation of 32-bit programs.

Most machines encode signed numbers using a two's-complement representation and encode floating-point numbers using IEEE Standard 754. Understanding these encodings at the bit level, as well as understanding the mathematical characteristics of the arithmetic operations, is important for writing programs that operate correctly over the full range of numeric values.

When casting between signed and unsigned integers of the same size, most C implementations follow the convention that the underlying bit pattern does not change. On a two's-complement machine, this behavior is characterized by functions T2U_w and U2T_w, for a w-bit value. The implicit casting of C gives results that many programmers do not anticipate, often leading to program bugs.

Due to the finite lengths of the encodings, computer arithmetic has properties quite different from conventional integer and real arithmetic. The finite length can cause numbers to overflow, when they exceed the range of the representation. Floating-point values can also underflow, when they are so close to 0.0 that they are changed to zero.

The finite integer arithmetic implemented by C, as well as most other programming languages, has some peculiar properties compared to true integer arithmetic. For example, the expression x*x can evaluate to a negative number due to overflow. Nonetheless, both unsigned and two's-complement arithmetic satisfy many of the other properties of integer arithmetic, including associativity, commutativity, and distributivity. This allows compilers to do many optimizations. For example, in replacing the expression 7*x by (x<<3)-x, we make use of the associative, commutative, and distributive properties, along with the relationship between shifting and multiplying by powers of 2.

We have seen several clever ways to exploit combinations of bit-level operations and arithmetic operations. For example, we saw that with two's-complement arithmetic, ~x+1 is equivalent to -x. As another example, suppose we want a bit pattern of the form [0, …, 0, 1, …, 1], consisting of w − k zeros followed by k ones. Such bit patterns are useful for masking operations. This pattern can be generated by the C expression (1<<k)-1, exploiting the property that the desired bit pattern has numeric value 2^k − 1. For example, the expression (1<<8)-1 will generate the bit pattern 0xFF.

Floating-point representations approximate real numbers by encoding numbers of the form x × 2^y. IEEE Standard 754 provides for several different precisions, with the most common being single (32 bits) and double (64 bits). IEEE floating point also has representations for special values representing plus and minus infinity, as well as not-a-number.

Floating-point arithmetic must be used very carefully, because it has only limited range and precision, and because it does not obey common mathematical properties such as associativity.

Chapter 3Machine-Level Representation of Programs

Machine Code and Assembly Code

•Computers execute machine code.

•Machine code: sequences of bytes encoding the low-level operations that manipulate data, manage memory, read and write data on storage devices, and communicate over networks.

•A compiler generates machine code through a series of stages, based on the rules of the programming language, the instruction set of the target machine, and the conventions followed by the operating system.

•The gcc C compiler

•Generates its output in the form of assembly code.

•Assembly code: a textual representation of the machine code giving the individual instructions in the program.

•Then invokes both an assembler and a linker to generate the executable machine code from the assembly code.

A High-level Language versus Low-level instructions

A high-level language	Low-level instructions
Shields programmers from the detailed machine-level implementation. Much more productive and reliable.	Must be specified by a programmer.
A program can be compiled and executed on a number of different machines.	Assembly code is highly machine specific.

The Importance of Learning Machine Code

•Code optimization.

•Understanding the run-time behavior of a program.

•Concurrent programming.

•Guarding against attacks.

The Relation between Source Code and the Generated Assembly

Understanding the relation between source code and the generated assembly: a form of reverse engineering.

•Reverse engineering: trying to understand the process by which a system was created by studying the system and working backward.

•The system is a machine-generated assembly language program.

x86-64

•The machine language for most processors in laptop and desktop machines, data centers and supercomputers.

•Started with Intel's 16 bits, expanded to 32 bits, and most recently to 64 bits.

•Its rival: Advanced Micro Devices (AMD).

The Transition from 32-bit to 64-bit Machines

•A 32-bit machine:

•Can only use around 4 gigabytes (232bytes) of RAM.

•Current 64-bit machines:

•Can use up to 256 terabytes (248 bytes)

•Could readily be extended to use up to 16 exabytes (264 bytes)

3.1A Historical Perspective

The Intel processor line (x86) has followed a long evolutionary development.

Some models of Intel processors and some of their key features:

•8086

•One of the first 16-bit microprocessors.

•A variant 8088: IBM PCs and MS-DOS.

•i386

•Expanded the architecture to 32 bits.

•Added the flat addressing model. The first to fully support Unix.

•PentiumPro

•Introduced P6 microarchitecture (a radically new processor design).

•Pentium 4E

•Added hyperthreading and EM64T.

•Hyperthreading: a method to run two programs simultaneously on a single processor.

•EM64T(x86-64): Intel's implementation of a 64-bit extension to IA32 developed by Advanced Micro Devices (AMD).

•Core 2.

•First multi-core Intel microprocessor.

•Multi-core processor: multiple processors are implemented on a single chip.

•Core i7, Nehalem.

•Incorporated both hyperthreading and multi-core, with the initial version supporting two executing programs on each core and up to four cores on each chip.

Backward compatible: able to run code compiled for any earlier version.

Intel's names for their processor line:

•IA32: "Intel Architecture 32-bit"

•Intel64 (x86-64): the 64-bit extension to IA32

•"x86" (colloquial): the overall line

Advanced Micro Devices (AMD) have produced Intel-compatible processors. Introduced x86-64.

3.2Program Encodings

Suppose C program: p1.c and p2.c. Compiling using a Unix command line:

•The command gcc: the gcc C compiler.

•Since default on Linux, also cc

•The command-line option -Og1: a level of optimization.

•Higher levels of optimization: -O1, -O2.

•The command-line directive -o p: p: the final executable code file

The gcc command invokes an entire sequence of programs to turn the source

code into executable code.

•First, the C preprocessor expands the source code

•To include any files specified with #include commands

•To expand any macros, specified with #define declarations

•Second, the compiler generates assembly code versions of the two source file p1.s and p2.s.

•Next, the assembler converts the assembly code into binary object-code files p1.o and p2.o.

•Object code: One form of machine code

•Contains binary representations of all of the instructions, but the addresses of global values are not yet filled in.

•Finally, the linker merges these two object-code files along with code implementing library functions (e.g., printf) and generates the final executable code file p.

•Executable code: the second form of machine code

•The exact form of code that is executed by the processor.

3.2.1Machine-Level Code

Computer systems employ several different forms of abstraction, hiding details of an implementation through the use of a simpler abstract model. Two important forms of abstraction for machine-level programming:

•The instruction set architecture, or ISA defines the format and behavior of a machine-level program.

•Defines the processor state, The format of the instructions, and the effect each of these instructions will have on the state.

•Most ISAs describe the behavior of a program as if each instruction is executed in sequence.

•The processor hardware: Executes instructions concurrently.

•Virtual addresses: the memory addresses used by a machine-level program.

•Providing a memory model that appears to be a very large byte array.

•The actual implementation of the memory system: A combination of multiple hardware memories and operating system software

The compiler does most of the work in the overall compilation sequence, transforming programs into instructions. The main feature of the assembly-code representation: in a more readable textual format.

Visible parts of the x86-64 processor state:

•The program counter (the PC, %rip in x86-64) indicates the address in memory of the next instruction to be executed.

•The integer register file contains 16 registers.

•Registers: named locations storing 64-bit values.

•Hold addresses or integer data.

•Some keep track of critical parts of the program state. Others hold temporary data.

•The condition code registers hold status information about the most recently executed arithmetic or logical instruction.

•Implement conditional changes in the control or data flow.

•A set of vector registers can each hold one or more values.

Machine code views the memory as a large byte-addressable array.

•Aggregate data types: contiguous collections of bytes.

•Scalar data types: no distinctions.

The program memory

•Contains

•The executable machine code for the program

•Some information required by the operating system

•A run-time stack for managing procedure calls and returns

•Blocks of memory allocated by the user (e.g., malloc).

•Addressed using virtual addresses.

•Only limited subranges are valid.

•The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.

A single machine instruction performs only a very elementary operation.

3.2.2Code Examples

Suppose mstore.c:

Generating mstore.s

The assembly-code file:

•Each indented line: a single machine instruction.

•All information about local variable names or data types has been stripped away.

Generating mstore.o

The object-code file:

•In binary format

•The hexadecimal representation:

•A key lesson: the program executed by the machine is simply a sequence of bytes encoding a series of instructions.

Generating prog

Requires running a linker on the set of object-code files, one of which must contain main.

•Suppose main.c:

•Contains not just the machine code for the procedures but also code used to start and terminate the program as well as to interact with the operating system.

Disassembling mstore.o and prog

Disassemblers:

•To inspect the contents of machine-code files

•Generates a format similar to assembly code from the machine code.

•With Linux systems, objdump (for "object dump") given -d:

•The result:

Features about machine code and its disassembled representation:

•x86-64 instructions can range in length from 1 to 15 bytes.

•Commonly used instructions and those with fewer operands require a smaller number of bytes

•The instruction format: from a given starting position, there is a unique decoding of the bytes into machine instructions.

•Example: pushq %rbx: 53.

•The disassembler determines the assembly code based purely on the byte sequences in the machine-code file.

•The disassembler uses a slightly different naming convention for the instructions than does the assembly code generated by gcc.

•Example: the omissions or additions of the suffix 'q'.

Disassembling prog:

Extract various code sequences:

•Almost identical to that generated by the disassembly of mstore.o.

•Differences:

•The addresses—the linker has shifted the location of this code to a different range of addresses.

•The linker has filled in the address that callq should use in calling mult2.

•One task for the linker: to match function calls with the locations of the executable code for those functions.

•Two additional lines of code.

•No effect on the program. Memory system performance.

3.2.3Notes on Formatting

•Generates mstore.s

•The full content:

•Lines beginning with '.': directives

•A clearer presentation:

3.3Data Formats

"Word": 2 bytes

•"Double words": 4 bytes

•"Quad words": 8 bytes

3.4Accessing Information

An x86-64 central processing unit (CPU) contains a set of 16 general-purpose registers storing 64-bit values.

•General-purpose registers store integer data and pointers.

2 conventions for instructions for copying and generating values, having registers as destinations:

Number of bytes generated	The remaining bytes
1 or 2	Unchanged
4	0

Different registers serve different roles in typical programs.

•Most unique: %rsp

•Used to indicate the end position in the run-time stack

•Specifically read and written by some instructions

•The other 15 registers: more flexible

•A small number of instructions make specific use of certain registers.

•A set of standard programming conventions governs how the registers are to be used for managing the stack, passing function arguments, returning values from functions, and storing local and temporary data.

3.4.1Operand Specifiers

Operands: the source values to use in performing an operation and the destination location into which to place the result.

Operand types:

•Immediate

•Constant values

•A '$' followed by an integer using C notion

•Example: $-577 or $0x1F.

•Different instructions allow different ranges of immediate values; the assembler will automatically select the most compact way of encoding a value.

•Register

•The contents of a register, one of the 16 low-order portions of the registers.

•r_a: An arbitrary register a

•R[r_a]: Viewing the set of registers as an array R indexed by register identifiers.

•A memory reference

•Memory location is accessed according to the effective address (a computed address)

•M_b[Addr]: A reference to the b-byte value stored in memory starting at address Addr

•Drop b.

•An addressing mode: a form of memory references.

•The most general form: Imm(r_b,r_i,s)

•4 components:

•An immediate offset Imm

•A base register r_b (64-bit)

•An index register r_i (64-bit)

•A scale factor s (1, 2, 4, or 8)

•The effective address = Imm + R[r_b]+ R[r_i]·s

•Often seen when referencing elements of arrays.

•The other forms: special cases.

•The more complex addressing modes are useful when referencing array and structure elements.

3.4.2Data Movement Instructions

Grouping different instructions into instruction classes: The instructions in a class perform the same operation but with different operand sizes.

Simple Data Movement Instructions

mov: Copy data from a source location to a destination location, without any transformation.

•Operands

Operand	Description	Type
S	Source	Immediate / register / memory
D	Destination	Register / Memory

•Copying from one memory location to another

•mov Memory, Register

•mov Register, Memory

•Register operands for these instructions can be the labeled portions of any of the 16 registers

•Only update the specific register bytes or memory locations indicated by D except movl having D: a register.

•Convention: Any instruction that generates a 32-bit value for a register also sets the high-order portion of the register to 0.

•Examples showing 5 possible combinations:

movabsq: For dealing with 64-bit immediate

•movq: Only by sign-extension

•Operands:

Operand	Description	Type
I	Source	Immediate (64-bit)
R	Destination	Register

Zero- and Sign-extending Data Movement Instructions

Copying a smaller source value to a larger destination

movz & movs: Zero extension and sign extension

•Operands

Operand	Description	Type
S	Source	Register / memory
R	Destination	Register

•Final 2 characters of each instruction: Size designators

•The absence of explicit "movzlq".

•Instead, movl having D: a register

•Property: an instruction generating a 4-byte value with a register as the destination will fill the upper 4 bytes with zeros.

cltq: The same effect as movslq %eax, %rax.

3.4.3Data Movement Example

•C "Pointers" = addresses

•Dereferencing a pointer involves copying that pointer into a register, and then using this register in a memory reference.

•Local variables are often kept in registers rather than stored in memory locations.

•Register access is much faster than memory access.

3.4.4Pushing and Popping Stack Data

Push and pop instructions

•The stack data structure

•Discipline: "Last-in, first-out"

•Operations:

•Push: Add data to a stack

•Pop: Remove data

•Array implementation

•Insert and remove elements from top (one end of the array)

•The program stack

•Stored in some region of memory.

•The stack pointer %rsp holds the address of the top stack element.

•pushq: Push data

•Operand:

Operand	Description	Type
S	Source	Register

•Behavior: pushq %rbp is equivalent to

•popq: Pop data

•Operand:

Operand	Description	Type
R	Destination	Register

•Behavior: popq %rax is equivalent to

•The popped value remains until overwritten

•Arbitrary stack positions can be addressed

•Example: movq 8(%rsp),%rdx

3.5Arithmetic and Logical Operations

The x86-64 integer and logic operations.

•leaq (no other size variants) + instruction classes (having 4 size variants)

•4 groups:

•Load effective address

•Unary: 2 operands

•Binary: 1 operand

•Shifts

3.5.1Load Effective Address

leaq (the load effective address instruction): Copy the effective address to the destination

•Operands

Operand	Description	Type
S	Source	Memory
D	Destination	Register

•Uses

•To generate pointers for later memory references.

•To compactly describe common arithmetic operations.

•Example: if %rdx = x, then leaq 7(%rdx,%rdx,4), %rax set %rax = 5x + 7.

•Clever uses by compilers.

•Illustration:

•C program:

•The arithmetic operations:

•The ability to perform addition and limited forms of multiplication proves useful when compiling simple arithmetic expressions.

3.5.2Unary and Binary Operations

Unary Operations

Unary operations:

•Operand:

Operand	Description	Type
D	Both source and destination	Register / memory

•Example: incq (%rsp)

Binary Operations

Binary operations:

•Operands:

Operand	Description	Type
S	Source	Immediate / register / memory
D	Both source and destination	Register / memory

•Cannot both be memory

•When D is a memory location, the processor must read the value from memory, perform the operation, and then write the result back to memory.

•Example: subq %rax,%rdx

3.5.3Shift Operations

Shift operations:

•Operands

Operand	Description	Type
k	Shift amount	Immediate / register (%cl)
D	Value to shift	Register / memory

•With x86-64, when D is w-bit, amount = the low-order m bits of %cl, where 2^m = w.

•Example: When %cl = 0xFF, then

•salb would shift by 7

•salw would shift by 15

•sall would shift by 31

•salq would shift by 63

•Left shift

•sal and shl: Fill from the right with zeros.

•Right shift

•sar (arithmetic, >>_A): Fill with copies of the sign bit

•shr (logical, >>_L): Fill with zeros

3.5.4Discussion

Most instructions shown (except sar and shr) can be used for either unsigned or two's-complement arithmetic.

•Makes two's-complement arithmetic preferred to implement signed integer arithmetic.

Example:

•In general, compilers generate code that uses individual registers for multiple program values and moves program values among the registers.

3.5.5Special Arithmetic Operations

Operations involving 128-bit (16-byte) numbers:

•Oct word: A 16-byte quantity

Full Multiply Operations

mulq (for unsigned) and imulq (for two's-complement): Compute the full 128-bit product of two.

•Operand: 1-operand

Operand	Description	Type
S	Source	Immediate / register / memory

•Arguments: %rax and S.

•Product: %rdx (high-order 64 bits) and %rax (low-order 64 bits).

•2 forms of imulq:

•Another: Member of imul, 2-operand, implements and

•Tell by counting operand number

•mulq example:

•Declarations:

•uint64_t: In inttypes.h, C extension.

•__int128: Support provided by gcc

•Assembly code:

Division or Modulus Operations

The 1-operand divide instructions:

idivq: Signed division instruction

•Parts

•Dividend: %rdx (high-order 64 bits) and %rax (low-order 64 bits)

•Divisor: S

•Quotient: %rax

•Remainder: %rdx

•64-bit division:

•Dividend: %rax (64-bit)

•Set bits of %rdx =

•0's (unsigned arithmetic)

•Sign bit of %rax (signed arithmetic), using cqto

•cqto: Read the sign bit from %rax and copies it across all of %rdx

•No operands

•Illustration:

•Function:

•Assembly code:

divq: Unsigned division instruction.

•Set %rdx = 0 beforehand

3.6Control

Sequential and Conditional Behavior

•Sequential behavior:

•Straight-line code

•Instructions follow one another in sequence

•Conditional behavior:

•C control constructs

•Such as conditionals, loops, and switches

•Conditional execution

•The sequence of operations that get performed depends on the outcomes of tests applied to the data

•2 strategies for implementing conditional operations

•Conditional control transfers

•The execution order:

•Normally, sequential: Statements are in the order they appear in the program.

•Alternatively, a jump instruction: Control should pass to some other part of the program.

•Conditional data transfers

3.6.1Condition Codes

Condition code registers:

•Single-bit

•Describe attributes of the most recent arithmetic or logical operation

•Tested to perform conditional branches.

Most useful condition codes:

•CF: Carry flag.

•The most recent operation generated a carry out of the most significant bit.

•Used to detect overflow for unsigned operations.

•ZF: Zero flag.

•The most recent operation yielded zero.

•SF: Sign flag.

•The most recent operation yielded a negative value.

•OF: Overflow flag.

•The most recent operation caused a two's-complement overflow—either negative or positive.

•Example: add, t = a+b, integers

•Condition codes:

The setting of conditional codes by instructions:

•Integer arithmetic operations.

•leaq does not alter any; address computations

•The remaining ones

Operation type / instruction class	CF	OF
Logical operations	Set to 0	Set to 0
Shift operations	Last bit shifted out	Set to 0
inc and dec	Set	Unchanged

•cmp and test: Set without altering any other registers

•cmp: Set the condition codes according to the differences of their two operands

•Behavior: sub without updating destinations

•ATT: Operands are in reverse order

•Flags:

•ZF: Set if S₁ = S₂

•The others: Determine ordering relation

•test:

•Behavior: and without altering destinations

•Operands:

•Typically, S₁ = S₂

•E.g., testq %rax,%rax: Whether %rax >, = or < 0

•Or one is a mask indicating which bits should be tested

3.6.2Accessing the Condition Codes

Using conditional codes:

•The set instructions

•The conditional jump instructions

•Conditional data transfers

set: Set a single byte to 0 or to 1 depending on some combination of the condition codes.

•The suffixes: Different conditions

•Operand:

Operand	Description	Type	Length
D	Destination	Register / memory	1 byte

•To generate a 32-bit or 64-bit result: Clear the high-order bits

•Typical instruction sequence to compute a < b (a, b: long)

•"Synonyms"

•Set condition codes according to the computation t = a-b

•Comparison tests

•Signed comparisons: Combinations of SF ^ OF and ZF

•Unsigned comparisons: Combinations of CF and ZF

How machine code does or does not distinguish between signed and unsigned values:

•It mostly uses the same instructions

•Some circumstances require different instructions

•Different versions of right shifts, division and multiplication instructions

•Different combinations of condition codes

3.6.3Jump Instructions

A jump instruction: Causes the execution to switch to a completely new position in the program.

•Jump destinations: Indicated in assembly code by a label.

•Example:

•

•In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.

•The different jump instructions

•

•jmp

•Unconditional

•Either direct or indirect

•A direct jump: The jump target is encoded as part of the instruction

•The jump target: A label

•Example: .L1

•An indirect jump: The jump target is read from a register or a memory location. Direct jumps are

•'*' followed by memory

•Examples:

•jmp *%rax uses %rax as the jump target

•jmp *(%rax) uses %rax as the read address

•The remaining: conditional—they either jump or continue executing at the next instruction in the code sequence, depending on some combination of the condition codes.

•The names and the conditions match those of set.

•"Synonyms"

•Can only be direct

3.6.4Jump Instruction Encodings

Jump encodings:

•PC-relative addressing.

•Encode the difference between the address of the target instruction and the address of the instruction immediately following the jump.

•These offsets can be encoded using 1, 2, or 4 bytes.

•Absolute addressing.

•Give an "absolute" address, using 4 bytes to directly specify the target.

•The assembler and linker select the appropriate encodings of the jump destinations.

•PC-relative addressing example: branch.c

•The assembly code:

•2 jumps: jmp, jg

•The disassembled version of .o

•PC = The address of the instruction following the jump

•The disassembled version of the program after linking:

The jump instructions provide a means to implement conditional execution (if), as well as several different loop constructs.

3.6.5Implementing Conditional Branches with Conditional Control

Implementing conditional branches

•Most general: Conditional control transfers

•Alternative: Conditional data transfers

Conditional control transfers:

•Example:

•The assembly implementation of if-else

•The general form of if-else in C

•The form of assembly implementation:

3.6.6Implementing Conditional Branches with Conditional Moves

Implementing conditional operations:

•Conventional: Conditional transfer of control

•The program follows one execution path when a condition holds and another when it does not.

•Simple and general, but very inefficient on modern processors.

•Alternate: Conditional transfer of data.

•Computes both outcomes of a conditional operation and then selects one based on whether or not the condition holds.

•Makes sense only in restricted cases

•Implemented by a simple conditional move instruction

•Better matched to the performance characteristics of modern processors.

Conditional control transfer

•Example:

The relative performance of using conditional data transfers versus conditional control transfers

•Processors achieve high performance through pipelining

•Pipelining: An instruction is processed via a sequence of stages, each operating concurrently

•E.g.,

•Fetching the instruction from memory

•Determining the instruction type

•Reading from memory

•Performing an arithmetic operation

•Writing to memory

•Updating the program counter

•Achieves high performance by overlapping the steps of the successive instructions

•Such as fetching one while performing the arithmetic operations for a previous one.

•Requires being able to determine the sequence well ahead of time in order to keep the pipeline full.

•When the machine encounters a conditional jump (a "branch"), it cannot determine until it has evaluated the branch condition.

•Branch prediction logic is employed.

•Guessing reliably: The pipeline will be full.

•Mispredicting a jump: The processor cancels and fetches.

•Misprediction performance penalty.

Conditional move instructions:

•Operands:

Operand	Description	Type	Length
S	Source	Register / memory	16, 32, or 64 bits
R	Destination	Register	16, 32, or 64 bits

•The outcome: depends on the values of the condition codes.

•As with the different set and jump instructions

•S is copied D only if the specified condition holds.

•Single-byte: Not supported.

•The operand length: Inferred from R

•Unlike the unconditional instructions: Explicitly encoded

•The processor can execute conditional move instructions without having to predict the outcome of the test.

•The processor simply reads the source value (possibly from memory), checks the condition code, and then either updates the destination register or keeps it the same.

•Unlike conditional jumps

Implementing conditional operations via conditional data transfers

•The general form of conditional expression and assignment:

•Conditional control transfer:

•Combines conditional and unconditional jumps

•Conditional move:

•The final statement: A conditional move

Bad cases for conditional moves.

•Invalid behavior

•The case for the earlier example

•Illustration:

•C function:

•Invalid implementation:

•Null pointer dereferencing error.

•Must be compiled using branching code

•Code efficiency.

•Example: wasted computation.

•Compilers must take into account the relative performance of wasted computation versus the potential for performance penalty due to branch misprediction.

•Used by gcc only when computations are easy

3.6.7Loops

C looping constructs: do-while, while, and for.

•Implementation

•No corresponding instructions

•Instead, combinations of conditional tests and jumps

•Compilers generate loop code based on the two basic loop patterns.

Do-While Loops

The general do-while Translation:

•C code

•Equivalent goto version

•Example:

•Reverse engineering assembly code requires determining which registers are used for which program values

While Loops

The general while translation:

•while version

•2 translation methods:

•Jump to middle translation: Performs the initial test by performing an unconditional jump to the test at the end of the loop.

•-Og

•Equivalent goto version:

•Example:

•Guarded do translation: First transforms the code into a do-while loop by using a conditional branch to skip over the loop if the initial test fails.

•≥ -O1

•Equivalent do-while version:

•Equivalent goto version:

•The compiler can often optimize the initial test, for example, determining that the test condition will always hold.

•Example:

For Loops

The general for translation:

•for version:

•Equivalent while version:

•Equivalent goto version:

•Following the jump-to-middle strategy:

•Following the guarded-do strategy:

•Examples:

•for version:

•Components:

•Equivalent while version

•Equivalent goto version (jump-to-middle):

•Corresponding assembly-language code (-Og)

3.6.8Switch Statements

switch statements allow jump table implementation.

•Jump table: An array where entry i is the address of a code segment implementing the action the program should take when the switch index equals i.

•The code performs a jump table reference using the switch index to determine the jump target.

•Advantage over if-else: The time taken to perform the switch is independent of the number of switch cases.

•Used by gcc when there are a number of cases and they span a small range of values.

Example:

•switch_eg and switch_eg_impl

•Features of (a)

•Case labels that do not span a contiguous range

•Cases with multiple labels

•Cases that fall through to other cases

•Assembly code for switch statement

•Gcc operator &&: Used to create a pointer for a code location

•The range is shifted

•Treating index as unsigned

•Simplifies the branching possibilities

•Key step in executing: To access a code location through the jump table.

•In (b), computed goto (gcc's extension): goto *jt[index];

•In assembly code for switch, indirect jmp

•Jump table

•In (b), an array

•Duplicate cases: Same code label

•Missing cases: Default label

•In assembly code, declarations

•.rodata (for "read-only data"): Segment of the object-code file

•A sequence of 7 "quad" words:

•Value of each = address associated with the labels.

•.L4: The start of this allocation.

•The address associated: Base for the indirect jump.

•The use of a jump table allows a very efficient way to implement a multiway branch.

3.7Procedures

Procedures: Key abstraction

•Suppose procedure P calls procedure Q, and Q then executes and returns back to P.

•Mechanisms:

•Passing control.

•The program counter must be set to the starting address of the code for Q upon entry and then set to the instruction in P following the call to Q upon return.

•Passing data.

•P must be able to provide one or more parameters to Q, and Q must be able to return a value back to P.

•Allocating and deallocating memory.

•Q may need to allocate space for local variables when it begins and then free that storage before it returns.

•The x86-64 implementation: Minimalist strategy

•Only as much as is required.

3.7.1The Run-Time Stack

Storage management using a stack:

•LIFO discipline

•The stack and the registers store the information required for:

•Passing control and data

•Allocating memory

The x86-64 stack

•Grows toward lower addresses

•%rsp points to the top element

•Storing data on and retrieving it from the stack:

•pushq

•popq

•Allocating and deallocating space:

•Decrement %rsp

•Increment %rsp

The procedure's stack frame: The region where a procedure allocates space on the stack

•When it requires storage beyond what it can hold in registers

•General structure:

•The frame for theexecuting procedure is always at the top.

•Frame for the caller

•Portions:

•Arguments 7-n

•If required by the callee

•The return address: Wherewithin the caller the program should resume execution once the callee returns.

•Pushed by the caller when call

•Frame for the callee

•Allocated by extending the currentstack boundary

•Portions:

•Saved registers

•Local variables

•Arguments build areas

•Sizes

•Fixed size frames

•Allocated at the beginning of the procedure.

•Variable-size frames

•Procedures allocate only the portions of stack frames they require.

•A leaf procedure: All of the local variables can be held in registers and the function does not call any other functions.

3.7.2Control Transfer

call and ret:

•call: Pushes the return address onto the stack and sets the PC to the beginning of the callee.

•The return address: The address of the instruction immediately following call.

•ret: Pops the return address off the stack and sets the PC to it.

•The general forms:

•In the disassembly by objdump: callq and retq

•'q': x86-64 versions

•call

•The target: The address of the instruction where the called procedure starts.

•Either direct or indirect

•The target of a direct call: Label

•The target of an indirect call: *Operand

•Example: The execution of call and ret for multstore and main

•Excerpts of disassembly:

•More detailed example: Detailed execution of top and leaf

The standard call/return mechanism conveniently matches the LIFO memory management discipline.

3.7.3Data Transfer

Data passing

•Calls may involve passing data as arguments

•Returning may also involve returning a value

•Mostly via registers

Passing integral (i.e., integer and pointer) arguments

•Passing up to 6 arguments via registers

•The registers

•Passing arguments 7–n on the stack (n > 6)

•Stack top: Argument 7.

•All data sizes are rounded up to be multiples of 8.

•The portion "Argument build area": Space allocated within a procedure's stack frame for these arguments.

•Example:

3.7.4Local Storage on the Stack

Common cases where local data must be stored in memory:

•Not enough registers

•The address operator '&'

•Arrays or structures

The portion of the stack frame labeled "Local variables": Space allocated by a procedure son the stack frame by decrementing the stack pointer.

Example of the handling of '&'

•The run-time stack provides a simple mechanism for allocating local storage when it is required and deallocating it when the function completes.

More complex example:

3.7.5Local Storage in Registers

The set of program registers acts as a single resource shared by all of the procedures.

When one procedure (the caller) calls another (the callee), the callee does not overwrite some register value that the caller planned to use later.

The uniform set of conventions for register usage:

•Callee-saved registers: %rbx, %rbp, and %r12–%r15.

•One's value must be preserved by the callee.

•By not changing it at all

•By pushq-ing it, altering it, and then popq-ing it before ret.

•The portion "Saved registers": Created by the pushing of register values.

•With this convention, the caller can safely store a value in a callee-saved register, call, and then use it without risk of corruption.

•Caller-saved registers: %rax, %rdi, %rsi, %rdx, %rcx and%r8-%r11

•Can be modified by any function.

•Example: P

3.7.6Recursive Procedures

Procedures can call themselves recursively

•Provided by the stack discipline

•Example: rfact

•Mechanism: Each invocation of a function has its own private storage for state information

•Return address

•Callee-saved registers

•The stack discipline of allocation and deallocation naturally matches the call-return ordering of functions.

•Even works for mutual recursion

•E.g., when P calls Q, which in turn calls P.

3.8Array Allocation and Access

Pointers to elements within arrays are translated into address computations in machine code.

3.8.1Basic Principles

Declaration

•For data type T and integer constant N

•x_A: the starting location

•2 effects:

•Allocates a contiguous region of L · N bytes in memory

•L: the size (in bytes) of data type T

•Introduces an identifier A that can be used as a pointer to x_A.

•0 ≤ i ≤ N−1, A[i]is at

&A[i] = x_A+ L · i

•Examples

•Declarations:

•Arrays generated:

Array access

•Example: Evaluating E[i]

•Suppose E: an int array

•E in %rdx, and i in %rcx

•Address computation:

3.8.2Pointer Arithmetic

Arithmetic on pointers: If T *p = x_p, then p+i = x_p + L· i, where L is the size of T.

The generation and dereferencing of pointers:'&' and '*'.

Example:

•Expressions involving E each with an assembly-code implementation

•E in %rdx, and i in %rcx

•Result: Data in %eax, and pointers in %rax

3.8.3Nested Arrays

The general principles hold even for arrays of arrays

•Example

•Declaration

•Elements order in memory

•Row-major

•A[0], followed by A[1], and so on.

•Illustration:

•Consequence of the nested declaration.

•To access elements of multidimensional arrays

•Compute the offset

•mov (x_D, C · i + j,L), D

•In general

•Declaration

•D[i][j] is at

&D[i][j] = x_D + L (C · i + j)

•L: the size of data type T in bytes

•Example

•Declaration:

•Copying A[i][j] to %eax:

3.8.4Fixed-Size Arrays

Optimizing code operating on multidimensional arrays of fixed size.

Example: fix_prod_ele (-O1)

•Declaration of fix_matrix

•fix_prod_ele and fix_prod_ele_opt

•Optimizations:

•Generating Aptr

•Generating Bptr

•Generating Bend

•Assembly code

3.8.5Variable-Size Arrays

Variable-size arrays

•Array dimension expressions: Computed as the array is being allocated

•Declaration

•expr1 and expr2 are evaluated as the declaration is encountered

•Example:

•var_ele: Access A[i][j] of A[n][n]

•Code

•&A[i][j] = x_A + 4(n · i) + 4j = x_A+ 4(n · i + j)

•Must use imul: Can incur significant performance penalty (unavoidable)

•Optimized when referenced within a loop:

•Optimize the index computations by exploiting the regularity of the access patterns.

•Example: var_prod_ele

•var_prod_ele and var_prod_ele_opt

•Assembly code for the loop

3.9Heterogeneous Data Structures

3.9.1Structures

3.9.2Unions

3.9.3Data Alignment

3.10Combining Control and Data in Machine-Level Programs

3.10.1Understanding Pointers

3.10.2Life in the RealWorld: Using the gdb Debugger

3.10.3Out-of-Bounds Memory References and Buffer Overflow

3.10.4Thwarting Buffer Overflow Attacks

3.10.5Supporting Variable-Size Stack Frames

3.11Floating-Point Code

3.11.1Floating-Point Movement and Conversion Operations

3.11.2Floating-Point Code in Procedures

3.11.3Floating-Point Arithmetic Operations

3.11.4Defining and Using Floating-Point Constants

3.11.5Using Bitwise Operations in Floating-Point Code

3.11.6Floating-Point Comparison Operations

3.11.7Observations about Floating-Point Code

3.12Summary

In this chapter, we have peered beneath the layer of abstraction provided by the C language to get a view of machine-level programming. By having the compiler generate an assembly-code representation of the machine-level program, we gain insights into both the compiler and its optimization capabilities, along with the machine, its data types, and its instruction set. In Chapter 5, we will see that knowing the characteristics of a compiler can help when trying to write programs that have efficient mappings onto the machine. We have also gotten a more complete picture of how the program stores data in different memory regions. In Chapter 12, we will see many examples where application programmers need to know whether a program variable is on the run-time stack, in some dynamically allocated data structure, or part of the global program data. Understanding how programs map onto machines makes it easier to understand the differences between these kinds of storage.

Machine-level programs, and their representation by assembly code, differ in many ways from C programs. There is minimal distinction between different data types. The program is expressed as a sequence of instructions, each of which performs a single operation. Parts of the program state, such as registers and the run-time stack, are directly visible to the programmer. Only low-level operations are provided to support data manipulation and program control. The compiler must use multiple instructions to generate and operate on different data structures and to implement control constructs such as conditionals, loops, and procedures. We have covered many different aspects of C and how it gets compiled. We have seen that the lack of bounds checking in C makes many programs prone to buffer overflows. This has made many systems vulnerable to attacks by malicious intruders, although recent safeguards provided by the run-time system and the compiler help make programs more secure.

We have only examined the mapping of C onto x86-64, but much of what we have covered is handled in a similar way for other combinations of language and machine. For example, compiling C++ is very similar to compiling C. In fact, early implementations of C++ first performed a source-to-source conversion from C++ to C and generated object code by running a C compiler on the result. C++ objects are represented by structures, similar to a C struct. Methods are represented by pointers to the code implementing the methods. By contrast, Java is implemented in an entirely different fashion. The object code of Java is a special binary representation known as Java byte code. This code can be viewed as a machine-level program for a virtual machine. As its name suggests, this machine is not implemented directly in hardware. Instead, software interpreters process the byte code, simulating the behavior of the virtual machine. Alternatively, an approach known as just-in-time compilation dynamically translates byte code sequences into machine instructions. This approach provides faster execution when code is executed multiple times, such as in loops. The advantage of using byte code as the low-level representation of a program is that the same code can be "executed" on many different machines, whereas the machine code we have considered runs only on x86-64 machines.

Chapter 4

Chapter 5 Chapter 6The Memory Hierarchy

6.1Storage Technologies

6.1.1Random Access Memory

6.1.2Disk Storage

6.1.3Solid State Disks

6.1.4Storage Technology Trends

6.2Locality

6.2.1Locality of References to Program Data

6.2.2Locality of Instruction Fetches

6.2.3Summary of Locality

6.3The Memory Hierarchy

6.3.1Caching in the Memory Hierarchy

6.3.2Summary of Memory Hierarchy Concepts

6.4Cache Memories

6.4.1Generic Cache Memory Organization

6.4.2Direct-Mapped Caches

6.4.3Set Associative Caches

6.4.4Fully Associative Caches

6.4.5Issues with Writes

6.4.6Anatomy of a Real Cache Hierarchy

6.4.7Performance Impact of Cache Parameters

6.5

6.6

6.7Summary

Part IIRunning Programs on a System

The interaction between your programs and the hardware.

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11 Chapter 9Virtual Memory

9.1Physical and Virtual Addressing

9.2Address Spaces

9.3VM as a Tool for Caching

9.3.1DRAM Cache Organization

9.3.2Page Tables

9.3.3Page Hits

9.3.4Page Faults

9.3.5Allocating Pages

9.3.6Locality to the Rescue Again

9.4VM as a Tool for Memory Management

9.5VM as a Tool for Memory Protection

9.6Address Translation

9.6.1Integrating Caches and VM

9.6.2Speeding Up Address Translation with a TLB

9.6.3Multi-Level Page Tables

9.6.4Putting It Together: End-to-End Address Translation

9.7Case Study: The Intel Core i7/Linux Memory System

9.7.1Core i7 Address Translation

9.7.2Linux Virtual Memory System

9.8Memory Mapping

9.8.1Shared Objects Revisited

9.8.2The fork Function Revisited

9.8.3The execve Function Revisited

9.8.4User-Level Memory Mapping with the mmap Function

9.9 9.10Garbage Collection

9.10.1Garbage Collector Basics

9.10.2Mark&Sweep Garbage Collectors

9.10.3Conservative Mark&Sweep for C Programs

9.11Common Memory-Related Bugs in C Programs

9.11.1Dereferencing Bad Pointers

9.11.2Reading Uninitialized Memory

9.11.3Allowing Stack Buffer Overflows

9.11.4Assuming That Pointers and the Objects They Point to Are the Same Size

9.11.5Making Off-by-One Errors

9.11.6Referencing a Pointer Instead of the Object It Points To

9.11.7Misunderstanding Pointer Arithmetic

9.11.8Referencing Nonexistent Variables

9.11.9Referencing Data in Free Heap Blocks

9.11.10Introducing Memory Leaks

9.12Summary

Part IIIInteraction and Communication between Programs

The basic I/O services provided by Unix operating systems and how to use these services to build applications.

Study Notes of CS:APP (Till Book 3.8 & Lecture 8.1, Regularly Updated)

Computer Systems: A Programmer\'s Perspective, Third Edition, Pearson, 2016 15-213/18-213: Introduction to Computer Systems (ICS)

Study Notes of CS:APP (Till Book 3 & Lecture 9, Regularly Updated)

Computer Systems: A Programmer\'s Perspective, Third Edition, Pearson, 2016 15-213/18-213: Introduction to Computer Systems (ICS)

CS:APP--Chapter03 : machine-level representation of program - part 1 basic(1)

CS:APP--Chapter03: machine-level representation of program - part 1 basic(1) 標籤（空格分隔）： CS:APP

windows下create-react-app 升級至3.3.1版本踩坑記

電腦原先使用npm install -g create-react-app命令全域性安裝過，根據文件先全域性解除安裝npm uninstall -g create-react-app，然後採用npx create-react-app my-app建立專案。然而，專案建立完後，卻只生成了nod

C# read and compute the code lines number of cs files based on given directory

static void ComputeCodeLines() { string dir = @\"D:\\Work\"; int totalLines = 0; string[] allFiles = Directory.GetFiles(dir,\"*.cs\",SearchOption.AllDirectories);

Netty4 study notes

Server side: Architecture Normally, all handlers in server side pipeline will be executed in the same thread, sequencially. Alternatively, an option is given to execute handler in a separa

YAML - Study Notes

A singleline breakisfoldedinto a singlespace, whileempty linesare interpreted asline breakcharacters.

微軟 Surface Book 3 獲得更新：改善圖形效能和穩定性

2月6日訊息來自 neowin 的訊息，微軟的 Surface Book 3 於去年 5 月首次釋出，從釋出之時起，該裝置就獲得了相當頻繁的韌體和驅動程式更新。近日，Surface Book 3 再次迎來新的一批更新，帶來了圖形效能和穩定性方面

微軟承認 Surface Book 3 存在螢幕閃爍問題，正在進行調查

3月13日訊息據外媒 softpedia 報道，微軟已經確認正在調查 Surface Book 3 的螢幕閃爍問題，該問題是在去年釋出秋季韌體更新後開始出現的。

The Different of Python 2.x and 3.x

為了不給Python3.0帶入過多的累贅,Python 3.x沒有考慮向下相容 1,print 函式 print語句沒有了,取而代之的是print()函式.

微軟 Surface Book 3 更新 Win11 預覽版後出現無限閃屏，附解決方法

8 月 21 日訊息近期，有網友在微軟反饋中心中提交問題，稱 Surface Book 3 15 英寸更新執行 Windows 11 Insider Preview Build 22000.132 出現無限閃屏問題。

The art of multipropcessor programming 讀書筆記-3. 自旋鎖與爭用(2)

本系列是 The art of multipropcessor programming 的讀書筆記，在原版圖書的基礎上，結合 OpenJDK 11 以上的版本的程式碼進行理解和實現。並根據個人的查資料以及理解的經歷，給各位想更深入理解的人分享一些個人

Notes of Toulmin Models

Toulmin Models References: English Article: Toulmin Argument Toulmin Argument Model Chinese Article: Toulmin Model

今天起，手機個人所得稅 App 可填報 3 歲以下嬰幼兒照護專項附加扣除（附上步驟方法）

感謝網友藍海岸Nibiru、雞排飯加個蛋、MissBook、深圳靚仔的線索投遞！

Notes of mathematical modeling course.

Written with StackEdit. Modeling Change Proportionality: If \$y = kx\$ , then we remember \$y\\propto x\$.我們稱 \$x\$ 和 \$y\$ 成比例關係. \$k\$ 為非零常數. 如果 \$x\$ 和 \$y\$ 成比例

【轉載】每天5分鐘用C#學習資料結構（3）單鏈表 Part 1

在上一篇中，我們學習了線性表最基礎的表現形式-順序表，但是其存在一定缺點：必須佔用一整塊事先分配好的儲存空間，在插入和刪除操作上需要移動大量元素（即操作不方便），於是不受固定儲存空間限制並且可以進行比較

[Zabbix5.0]Transaction check error: file /etc/my.cnf from install of Percona-Server-shared-56-5.6.48-rel88.0.1.el7.x86_64 conflicts with file from package mysql-community-server-5.7.30-1.el7.x86_64

安裝ZabbixServer5.0報錯 Transaction check error: file /etc/my.cnf from install of Percona-Server-shared-56-5.6.48-rel88.0.1.el7.x86_64 conflicts with file from package mysql-community-server-5.7.30-

寫出下面各邏輯表示式的值。設a=3,b=4,c=5 (1）a + b > c && b == c （2）a || b + c && b - c （3）!(a > b) && !c || 1 （4）!(x = a) && (y = b) && 0 （5）!(a + b) + c - 1 &

寫出下面各邏輯表示式的值。設a=3,b=4,c=5。（1）a + b > c && b == c （2）a || b + c && b - c

有一個分數序列，求出這個數列的前20項之和 2/1,3/2,5/3,8/5,13/8,25/13

有一個分數序列，求出這個數列的前20項之和。 \$\\frac{2}{1}\$，\$\\frac{3}{2}\$，\$\\frac{5}{3}\$，\$\\frac{8}{5}\$，\$\\frac{13}{8}\$，\$\\frac{25}{13}\$，...

gulp 3.0 & 4.0 | watch task has to be a function問題

gulpfile.js gulp.task(\'watch\', function () { gulp.watch(\'./css/*.css\',[\'css\']); gulp.watch(\'./html/*.html\',[\'html\']);

Study Notes of CS:APP (Till Book 3.8 & Lecture 8.1, Regularly Updated)

Book & Course Information

Related Materials

My Foreword

Content Covered

Notes Organization

Formatting

Fixing Post Style

To-do List

Course Overview: Topics

Programs and Data

The Memory Hierarchy

Exceptional Control Flow

Virtual Memory

Networking, and Concurrency

Chapter 1A Tour of Computer Systems

1.1Information Is Bits + Context

1.2Programs Are Translated by Other Programs into Different Forms

1.3It Pays to Understand How Compilation Systems Work

1.4Processors Read and Interpret Instructions Stored in Memory

1.4.1Hardware Organization of a System

Buses

I/O Devices

Main Memory

Processor

1.4.2Running the hello Program

1.5Caches Matter

1.6Storage Devices Form a Hierarchy

1.7The Operating System Manages the Hardware

1.7.1Processes

1.7.2Threads

1.7.3Virtual Memory

1.7.4Files

1.8Systems Communicate with Other Systems Using Networks

1.9Important Themes

1.9.1Amdahl's Law

1.9.2Concurrency and Parallelism

Thread-Level Concurrency

Instruction-Level Parallelism

Single-Instruction, Multiple-Data (SIMD) Parallelism

1.9.3The Importance of Abstractions in Computer Systems

1.10Summary

Part IProgram Structure and Execution

Chapter 2Representing and Manipulating Information

Bits

Encodings

2.1Information Storage

2.1.1Hexadecimal Notation

A Byte in Different Notations

2.1.2Data Sizes

2.1.3Addressing and Byte Ordering

Addressing

Byte Ordering

2.1.4Representing Strings

2.1.5Representing Code

2.1.6Introduction to Boolean Algebra

Boolean Operations

Boolean Operations over Bit Vectors

2.1.7Bit-Level Operations in C

2.1.8Logical Operations in C

2.1.9Shift Operations in C

2.2Integer Representations

2.2.1Integral Data Types

2.2.2Unsigned Encodings

2.2.3Two's-Complement Encodings

2.2.4Conversions between Signed and Unsigned

U2Bw and T2Bw

T2Uw and U2Tw

Summary

2.2.5Signed versus Unsigned in C

2.2.6Expanding the Bit Representation of a Number

Converting Unsigned to Larger

Converting Two's-complement to Larger

2.2.7Truncating Numbers

Truncating Unsigned

Truncating Two's-complement

Summary

2.2.8Advice on Signed versus Unsigned

2.3Integer Arithmetic

2.3.1Unsigned Addition

U2B_w and T2B_w

T2U_w and U2T_w