CS 208 s20 — Introduction to x86-64 Assembly
Table of Contents
Here's a recorded lecture for today's topic. Some folks have expressed a strong preference for video over written notes, so I'm giving this a try. Please let me know if you have any issues viewing the video, I don't want slow/unreliable internet to be a barrier for anyone! I'd also appreciate any feedback you have on video vs detailed text notes, either via email or anonymous feedback on Moodle. My outline for this lecture is included below, and you can access a pdf of the slides here.
1 Introduction
- We’ve covered how and where data is stored and how numbers are represented
- Starting today: what operations does CPU use to actually execute your program
- Start of digital programmable computers
- Colossus in the 1940s (code breaking)
- manual input of machine code
- later that decade saw the first assembly languages, providing text representation of machine code
- In the decades since, computer scientists have created layers of abstraction
- much nicer to program in C than machine code
- why study assembly—shouldn’t we stand on the shoulders of giants?
- understand optimizations
- understand exactly how data is accessed
- prevent malware
- Before we dive in, a quick review of some C concepts
void f(long* xp, long* yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; }
- what type of data are these parameters, what size?
- what is the effect of this function?
- How would we call?
long a = 10; long b = 42; f(???);
- today we will see how this function gets translated to assembly
2 Assembly Programming
- Build up assembly programming picture
- CPU
- PC, registers
- Memory
- Code, data
- remember: memory is one huge array of bytes, each with unique address
- no types, just contiguous bytes of binary data
- Incomplete, we’ll add pieces in future lectures
- Just as the details of this picture are dependent on the hardware, the assembly language operating on it will be specific to the hardware
- instruction set architecture
- CPU
- Registers
- Small amount of data that CPU can access extremely quickly
- Unlike locations in memory, registers have names not addresses
- being with % (%rdi)
- slide with 16 x86+64 registers, %rsp is reserved
- not necessary to memorize all these names, listed here for reference
- memory vs registers
- addresses vs names
- big (8GB) vs small (16 x 8B)
- slow (50ns) vs fast (<1ns)
- dynamic vs static
2.1 Moving Data
- Moving data is a fundamental operation
mov_ src, dest
- missing letter
_
specifies size of data- b(yte) = 1 byte, w(ord) = 2 bytes, l(ong word) = 4 bytes, q(uad word) = 8 bytes
- “word” is 2 bytes (16 bits) to be backwards compatible with 8086 programs (16-bit predecessor to x86 hardware)
- operand types
- immediate (constant integer value)
$0x400
,$-533
- register (name of 1 of 16 registers)
%rax
,%r13
- memory (consecutive bytes of memory starting at given address)
(%rax)
, full form isD(reg_base, reg_index, s)
- refers to memory address
reg_base + reg_index * s + D
D
ands
are immediate values,s
can only be 1, 2, 4, or 8 (why?)- Various components can be omitted see book section 3.4.1
- refers to memory address
- quick check: if memory looks like this and registers look like this, what is
%rax
, what is(%rax)
- cannot mov memory to memory—how would you do it?
- immediate (constant integer value)
- missing letter
- swap example
- give header, initial registers, try to write assembly
- step through example line by line
gcc -Og -S swap.c
producesswap.s
3 Reading: Data Formats, Accessing Information
Sections 3.3 and 3.4 through 3.4.3 (p. 177–188) of the CSPP book cover today's material. Read through them to solidify your understanding of the material.