CS 208 s22 — Introduction to x86-64 Assembly
Table of Contents
You can access a pdf of the slides here.
1 Review
Before we dive in, a quick review of some C concepts1
void swap(long* xp, long* yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; }
- what type of data are these parameters, what size?
- what is the effect of this function?
How would we call?
long a = 10; long b = 42; swap(???);
2 Introduction
- We’ve covered how and where data is stored and how numbers are represented
- Starting today: what operations does CPU use to actually execute your program
- Start of digital programmable computers
- Colossus in the 1940s (code breaking)
- manual input of machine code
- later that decade saw the first assembly languages, providing text representation of machine code
- In the decades since, computer scientists have created layers of abstraction
- much nicer to program in C than machine code
- why study assembly—shouldn’t we stand on the shoulders of giants?
- understand optimizations
- understand exactly how data is accessed
- prevent malware
3 Assembly Programming
- Build up assembly programming picture
- CPU
- PC, registers
- Memory
- Code, data
- remember: memory is one huge array of bytes, each with unique address
- no types, just contiguous bytes of binary data
- Incomplete, we’ll add pieces in future lectures
- Just as the details of this picture are dependent on the hardware, the assembly language operating on it will be specific to the hardware
- instruction set architecture
- CPU
- Registers
- Small amount of data that CPU can access extremely quickly
- Unlike locations in memory, registers have names not addresses
- being with % (%rdi)
- slide with 16 x86+64 registers, %rsp is reserved
- not necessary to memorize all these names, listed here for reference
- memory vs registers
- addresses vs names
- big (8GB) vs small (16 x 8B)
- slow (50ns) vs fast (<1ns)
- dynamic vs static
4 swap example
- see the slides here: swap-slides.pdf
- what happens when you only use one temporary variable in the C implementation of
swap?- i.e.,
*xp = *ypinstead of lines 2 and 3
- i.e.,
4.1 Moving Data
- Moving data is a fundamental operation
mov_ src, dest- missing letter
_specifies size of data- b(yte) = 1 byte, w(ord) = 2 bytes, l(ong word) = 4 bytes, q(uad word) = 8 bytes
- “word” is 2 bytes (16 bits) to be backwards compatible with 8086 programs (16-bit predecessor to x86 hardware)
- operand types
- immediate (constant integer value)
$0x400,$-533
- register (name of 1 of 16 registers)
%rax,%r13
- memory (consecutive bytes of memory starting at given address)
(%rax), full form isD(reg_base, reg_index, s)- refers to memory address
reg_base + reg_index * s + D Dandsare immediate values,scan only be 1, 2, 4, or 8 (why?)- Various components can be omitted see book section 3.4.1
- refers to memory address
- cannot mov memory to memory—how would you do it?
- immediate (constant integer value)
- missing letter
5 Practice
- CSPP practice problems 3.1 (p. 182) and 3.5 (p. 189)
What is the value of each of the operands?2
- Write C code for
mystery()3
void mystery(long* xp, long* yp, long *zp) // xp in %rdi, yp in %rsi, zp in %rdx
mystery: movq (%rdi), %r8 movq (%rsi), %rcx movq (%rdx), %rax movq %r8, (%rsi) movq %rcx, (%rdx) movq %rax, (%rdi) ret
Footnotes:
1
- what type of data are these parameters, what size?
xpandypare pointers to long. As pointers, they each take 8 bytes. - what is the effect of this function?
swapswitches the values thatxpandyppoint to. They still point to the same locations in memory, but the values at those locations have been switched. - How would we call
swap? Use the&operator to get the address ofaandb. After the call toswap,awould be 42 andbwould be 10.
long a = 10; long b = 42; swap(&a, &b);
2
$0x108is0x108260(%rcx, %rdx)is the value at the memory address260 + %rcx + %rdx=260 + 0x1 + 0x3. 260 is0x104(256 + 4), so the computed address is0x108. The value at that address is0x13(%rax, %rdx, 4)is the value at the memory address%rax + %rdx * 4=0x100 + 0x3 * 4.0x3 * 4is 12 or0xC, so the computed address is0x10C. The value at that address is0x11.
3
There are many different C implementations of mystery that would be consistent with the given assembly, here is one possibility:
void mystery(long *xp, long *yp, long *zp) { long t0 = *xp; long t1 = *yp; long t2 = *zp; *yp = t0; *zp = t1; *xp = t2; }