CS 208 s22 — Introduction to x86-64 Assembly

Table of Contents

You can access a pdf of the slides here.

1 Review

  • Before we dive in, a quick review of some C concepts1

    void swap(long* xp, long* yp) {
      long t0 = *xp;
      long t1 = *yp;
      *xp = t1;
      *yp = t0;
    }
    
    • what type of data are these parameters, what size?
    • what is the effect of this function?
    • How would we call?

      long a = 10;
      long b = 42;
      swap(???);    
      

2 Introduction

  • We’ve covered how and where data is stored and how numbers are represented
  • Starting today: what operations does CPU use to actually execute your program
  • Start of digital programmable computers
    • Colossus in the 1940s (code breaking)
    • manual input of machine code
    • later that decade saw the first assembly languages, providing text representation of machine code
  • In the decades since, computer scientists have created layers of abstraction
    • much nicer to program in C than machine code
  • why study assembly—shouldn’t we stand on the shoulders of giants?
    • understand optimizations
    • understand exactly how data is accessed
    • prevent malware

3 Assembly Programming

  • Build up assembly programming picture
    • CPU
      • PC, registers
    • Memory
      • Code, data
      • remember: memory is one huge array of bytes, each with unique address
    • no types, just contiguous bytes of binary data
    • Incomplete, we’ll add pieces in future lectures
    • Just as the details of this picture are dependent on the hardware, the assembly language operating on it will be specific to the hardware
      • instruction set architecture
  • Registers
    • Small amount of data that CPU can access extremely quickly
    • Unlike locations in memory, registers have names not addresses
      • being with % (%rdi)
    • slide with 16 x86+64 registers, %rsp is reserved
      • not necessary to memorize all these names, listed here for reference
    • memory vs registers
      • addresses vs names
      • big (8GB) vs small (16 x 8B)
      • slow (50ns) vs fast (<1ns)
      • dynamic vs static

registers.png

4 swap example

  • see the slides here: swap-slides.pdf
  • what happens when you only use one temporary variable in the C implementation of swap?
    • i.e., *xp = *yp instead of lines 2 and 3

4.1 Moving Data

  • Moving data is a fundamental operation
    • mov_ src, dest
      • missing letter _ specifies size of data
        • b(yte) = 1 byte, w(ord) = 2 bytes, l(ong word) = 4 bytes, q(uad word) = 8 bytes
        • “word” is 2 bytes (16 bits) to be backwards compatible with 8086 programs (16-bit predecessor to x86 hardware)
      • operand types
        • immediate (constant integer value)
          • $0x400, $-533
        • register (name of 1 of 16 registers)
          • %rax, %r13
        • memory (consecutive bytes of memory starting at given address)
          • (%rax), full form is D(reg_base, reg_index, s)
            • refers to memory address reg_base + reg_index * s + D
            • D and s are immediate values, s can only be 1, 2, 4, or 8 (why?)
            • Various components can be omitted see book section 3.4.1
        • cannot mov memory to memory—how would you do it?

5 Practice

  • CSPP practice problems 3.1 (p. 182) and 3.5 (p. 189)
  • What is the value of each of the operands?2

    operand-practice.png

  • Write C code for mystery() 3
void mystery(long* xp, long* yp, long *zp)
// xp in %rdi, yp in %rsi, zp in %rdx
mystery:
    movq (%rdi), %r8
    movq (%rsi), %rcx
    movq (%rdx), %rax
    movq %r8, (%rsi)
    movq %rcx, (%rdx)
    movq %rax, (%rdi)
    ret

Footnotes:

1
  • what type of data are these parameters, what size? xp and yp are pointers to long. As pointers, they each take 8 bytes.
  • what is the effect of this function? swap switches the values that xp and yp point to. They still point to the same locations in memory, but the values at those locations have been switched.
  • How would we call swap? Use the & operator to get the address of a and b. After the call to swap, a would be 42 and b would be 10.
long a = 10;
long b = 42;
swap(&a, &b);    
2
  • $0x108 is 0x108
  • 260(%rcx, %rdx) is the value at the memory address 260 + %rcx + %rdx = 260 + 0x1 + 0x3. 260 is 0x104 (256 + 4), so the computed address is 0x108. The value at that address is 0x13
  • (%rax, %rdx, 4) is the value at the memory address %rax + %rdx * 4 = 0x100 + 0x3 * 4. 0x3 * 4 is 12 or 0xC, so the computed address is 0x10C. The value at that address is 0x11.
3

There are many different C implementations of mystery that would be consistent with the given assembly, here is one possibility:

void mystery(long *xp, long *yp, long *zp) {
    long t0 = *xp;
    long t1 = *yp;
    long t2 = *zp;
    *yp = t0;
    *zp = t1;
    *xp = t2;
}