CS 208 s22 — Introduction to x86-64 Assembly
Table of Contents
You can access a pdf of the slides here.
1 Review
Before we dive in, a quick review of some C concepts1
void swap(long* xp, long* yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; }
- what type of data are these parameters, what size?
- what is the effect of this function?
How would we call?
long a = 10; long b = 42; swap(???);
2 Introduction
- We’ve covered how and where data is stored and how numbers are represented
- Starting today: what operations does CPU use to actually execute your program
- Start of digital programmable computers
- Colossus in the 1940s (code breaking)
- manual input of machine code
- later that decade saw the first assembly languages, providing text representation of machine code
- In the decades since, computer scientists have created layers of abstraction
- much nicer to program in C than machine code
- why study assembly—shouldn’t we stand on the shoulders of giants?
- understand optimizations
- understand exactly how data is accessed
- prevent malware
3 Assembly Programming
- Build up assembly programming picture
- CPU
- PC, registers
- Memory
- Code, data
- remember: memory is one huge array of bytes, each with unique address
- no types, just contiguous bytes of binary data
- Incomplete, we’ll add pieces in future lectures
- Just as the details of this picture are dependent on the hardware, the assembly language operating on it will be specific to the hardware
- instruction set architecture
- CPU
- Registers
- Small amount of data that CPU can access extremely quickly
- Unlike locations in memory, registers have names not addresses
- being with % (%rdi)
- slide with 16 x86+64 registers, %rsp is reserved
- not necessary to memorize all these names, listed here for reference
- memory vs registers
- addresses vs names
- big (8GB) vs small (16 x 8B)
- slow (50ns) vs fast (<1ns)
- dynamic vs static
4 swap example
- see the slides here: swap-slides.pdf
- what happens when you only use one temporary variable in the C implementation of
swap
?- i.e.,
*xp = *yp
instead of lines 2 and 3
- i.e.,
4.1 Moving Data
- Moving data is a fundamental operation
mov_ src, dest
- missing letter
_
specifies size of data- b(yte) = 1 byte, w(ord) = 2 bytes, l(ong word) = 4 bytes, q(uad word) = 8 bytes
- “word” is 2 bytes (16 bits) to be backwards compatible with 8086 programs (16-bit predecessor to x86 hardware)
- operand types
- immediate (constant integer value)
$0x400
,$-533
- register (name of 1 of 16 registers)
%rax
,%r13
- memory (consecutive bytes of memory starting at given address)
(%rax)
, full form isD(reg_base, reg_index, s)
- refers to memory address
reg_base + reg_index * s + D
D
ands
are immediate values,s
can only be 1, 2, 4, or 8 (why?)- Various components can be omitted see book section 3.4.1
- refers to memory address
- cannot mov memory to memory—how would you do it?
- immediate (constant integer value)
- missing letter
5 Practice
- CSPP practice problems 3.1 (p. 182) and 3.5 (p. 189)
What is the value of each of the operands?2
- Write C code for
mystery()
3
void mystery(long* xp, long* yp, long *zp) // xp in %rdi, yp in %rsi, zp in %rdx
mystery: movq (%rdi), %r8 movq (%rsi), %rcx movq (%rdx), %rax movq %r8, (%rsi) movq %rcx, (%rdx) movq %rax, (%rdi) ret
Footnotes:
1
- what type of data are these parameters, what size?
xp
andyp
are pointers to long. As pointers, they each take 8 bytes. - what is the effect of this function?
swap
switches the values thatxp
andyp
point to. They still point to the same locations in memory, but the values at those locations have been switched. - How would we call
swap
? Use the&
operator to get the address ofa
andb
. After the call toswap
,a
would be 42 andb
would be 10.
long a = 10; long b = 42; swap(&a, &b);
2
$0x108
is0x108
260(%rcx, %rdx)
is the value at the memory address260 + %rcx + %rdx
=260 + 0x1 + 0x3
. 260 is0x104
(256 + 4), so the computed address is0x108
. The value at that address is0x13
(%rax, %rdx, 4)
is the value at the memory address%rax + %rdx * 4
=0x100 + 0x3 * 4
.0x3 * 4
is 12 or0xC
, so the computed address is0x10C
. The value at that address is0x11
.
3
There are many different C implementations of mystery
that would be consistent with the given assembly, here is one possibility:
void mystery(long *xp, long *yp, long *zp) { long t0 = *xp; long t1 = *yp; long t2 = *zp; *yp = t0; *zp = t1; *xp = t2; }