CS 208 s22 — Compilation, Memory Layout, and the Stack
Table of Contents
1 Compilation
- We will start by zooming out a little from assembly to look briefly at the whole compilation process
- Let's say out code is in two files:
p1.c
andp2.c
- We compile our program with the command:
gcc -Og p1.c p2.c -o p
- this produces a binary file
p
containing the resulting machine code - we can run the program with the command:
./p
- this produces a binary file
- compilation consists of 3 primary phases:
- translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
- translating assembly to binary object files (each source file is assembled separately)
- linking object files (and any standard library files) together into the final executable
1.1 Producing Machine Language
- how do we go from C code to machine code? How do we get the information we need?
- simple cases: arithmetic and logical operations, shifts, etc.
- all necessary information is contained in the instruction itself (source, destination, size)
- but consider
- conditional jump
- accessing static data
call
- all of these depend on addresses and/or labels
- these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
- this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values
2 Stack Operations
pushq
andpopq
instructions push/pop quad words onto/off of the program stackpushq
has a source operand,popq
has a destination- the stack is a region of memory used to facilitate local variables and procedure calls
- top of the stack is the lowest memory address, and is conventionally drawn at the bottom
- each of these instructions combine a data move (copy the source to memory location for
push
, copy value in memory to destination forpop
) and modifying the stack pointer%rsp
, the stack pointer, always contains the address of the top of the stack- either decremented by 8 (
push
, stack grows down) or incremented by 8 (pop
)
3 Procedures
3.1 Are Jumps Enough?
- could we implement procedure calls using jumps?
- maybe we could use
jmp
to go into a function call, and then return byjmp
-ing to a label right after the call
func1: ... jmp func2 back: ... done: func2: ... jmp back done:
- but what if
func1
callsfunc2
twice?
func1: ... jmp func2 back1: ... jmp func2 back2: ... done: func2: ... jmp back? // which label do we jump to?? Have to choose at compile time done:
- we would need to compile a different version of
func2
for every call, so that we could jump back to the right place - there's got to be a better way…
3.2 Overview
- mechanisms needed to facilitate procedures (e.g., procedure
P
calls procedureQ
, thenQ
executes and returns back toP
):- passing control: instruction pointer (
%rip
) must be set to the start ofQ
(call) and then set to the instruction following the call toQ
inP
(return) - passing data:
P
has to provide arguments toQ
andQ
has to return a value toP
- allocating and deallocating memeory:
Q
needs to acquire space for local variables and then free that space
- passing control: instruction pointer (
- requires seperate storage per call (not just per procedure)
3.3 The Run-Time Stack
- a stack data structure (last-in, first-out) a natural fit for managing run-time procedure memory
- only the most recent procedure call needs to allocate space for local variables or make a new procedure call
- when a procedure returns, we want to free the memory used by this most recent call
- hence it's a natural fit to push and pop procedure data from a stack
- when a procedure allocates space on the stack it is called that procedure's stack frame
- x86-64 only allocates what a procedure actually needs
- if a procedure's local variables can all be held in registers and it calls no other procedures, no stack frame is needed
3.4 Control Transfer
- processor needs to know where it should resume execution after a procedure call returns
- the
call
instruction pushes the return address of the following instruction onto the stack (part of the calling procedure's stack frame) and sets the instruction pointer (%rip
)to the start of the new procedurecall
operand can either be direct (a label) or indirect (*
followed by one of the standard operand formats)
- the
ret
instruction pops the return address off the stack and copies it to%rip
- the
3.4.1 Example
- At the start of a function, the return address will be stored at
%rsp
. - In gdb,
p $rsp
will show something like0x7fffffffdfa8
- We can deference the pointer with
x $rsp
, which shows that an address like0x004012e9
is stored at the top of the stack. This is the return address. - If we dereference that address, and tell gdb to interpret the bytes as an instruction (
x /i 0x004012e9
), we will see the intruction following the original call (e.g.,0x4012e9 <main+82>: mov %rax,%rdi
). This makes sense as the instruction after thecallq
instruction is where the program should return to.
4 Exercise
Top of the stack at 0x200
, 8 bytes stored there contain 0x20
. What changes about registers or memory as a result of popq %r8
?1
5 Practice
CSPP practice problem 3.32 (p. 244)
Footnotes:
1
popq
copies the top 8 bytes of the stack to %r8
, so %r8
will now contain 0x20
.
This also "pops" these 8 bytes off the stack, so 8 is added to %rsp
, making it 0x208
(remember, the stack grows down to lower memory addresses, so moving %rsp
up in memory shrinks the stack).
Nothing changes in terms of values stored in memory — %rsp
tracks the top of the stack, but the system doesn't zero-out popped values or anything like that.