CS 208 s21 — Data Structure Representation: Arrays

Table of Contents

1 Compilation

  • We will start by zooming out a little from assembly to look briefly at the whole compilation process
  • Let's say out code is in two files: p1.c and p2.c
  • We compile our program with the command: gcc -Og p1.c p2.c -o p
    • this produces a binary file p containing the resulting machine code
    • we can run the program with the command: ./p

compilation.png

  • compilation consists of 3 primary phases:
    1. translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
    2. translating assembly to binary object files (each source file is assembled separately)
    3. linking object files (and any standard library files) together into the final executable

1.1 Producing Machine Language

  • how do we go from C code to machine code? How do we get the information we need?
  • simple cases: arithmetic and logical operations, shifts, etc.
    • all necessary information is contained in the instruction itself (source, destination, size)
  • but consider
    • conditional jump
    • accessing static data
    • call
  • all of these depend on addresses and/or labels
    • these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
  • this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values

linking.png

2 Array Basics

  • T A[N] means an array of N elements of type T
  • contiguously allocated region—how big in terms of N?1
  • A is a pointer (T*) to the start of the array

array-types.png

2.1 Array Access

  • int x[5] = {3, 7, 1, 9, 5};
  • indexes 0, 1, 2, 3, 4
  • addresses a, a+4, a+8, a+12, a+16, a+20
    • where a is the address of the start and a+20 is the address of the end of the array
Expression Type Value
x[4] int 5
x int* a
x + 1 int* a+4
&x[2] int* a+8
x[5] int ?? (whatever's there in memory)
*(x + 1) int 7
x + i int* a + 4*i

2.2 Pointer Arithmetic

  • C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
  • array subscripting is the combination of pointer arithmetic and dereference (e.g., A[i] is equivalent to *(A+i))
  • int *nums and int nums[] are nearly identical declarations
    • subtle differences include initialization, sizeof
  • an array name is an expression (not a variable) that returns the address of the array
    • it looks like a pointer to the first (0th) element
      • *ar same as ar[0], *(ar+2) same as ar[2]
  • an array name is read-only (no assignment) because it is a label
    • cannot use ar = <anything>
int get_digit(int z[5], int digit) {
    return z[digit];
}
get_digit:
        movq    (%rdi,%rsi,4), %rax
        ret

3 Arrays and Functions

  • an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) {
    ... arr[size - 1] ...
}
  • arr is really an int* (%rdi can only fit 8 bytes)
  • without an explicite size parameter, no way to determine the length of the array

4 Nested Arrays

  • T A[R][C]
    • 2D array to data type T
    • R rows, C columns
    • What's the array's total size? R*C*sizeof(T)
  • single contigious block of memory
  • stored in row-major order
    • all elements in row 0, followed by all elements in row 1, etc.
    • address of row i is A + i*(C * sizeof(T))
int sea[4][5] = 
  {{ 9, 8, 1, 9, 5},
   { 9, 8, 1, 0, 5},
   { 9, 8, 1, 0, 3},
   { 9, 8, 1, 1, 5}}

https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing

int* get_sea_zip(int index) {
    return sea[index];
}
get_sea_zip:
        leaq    (%rdi,%rdi,4), %rax  # 5 * index
        leaq    sea(,%rax,4), %rax   # sea + 20 * index
        ret
sea:
        .long   9
        .long   8
        .long   1
        .long   9
        .long   5
        ...
  • A[i][j] to access an individual element of a nested array
    • the address works out to A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
int get_sea_digit (int index, int digit) {
    return sea[index][digit];
}
get_sea_digit:
        leaq (%rdi,%rdi,4), %rax  # 5 * index
        addl %rax, %rsi           # 5 * index + digit
        movl sea(,%rsi,4), %eax   # *(sea + 4 * (5 * index + digit))

5 Activity

What affect does this assembly program have on registers and memory given the initial values below?2

f:
        movl    $1, (%rdi)
        movl    $1, 4(%rdi)
        movl    $2, %edx
        jmp     .L2
.L3:
        movslq  %edx, %rax
        salq    $2, %rax
        movl    -8(%rdi,%rax), %ecx
        addl    -4(%rdi,%rax), %ecx
        movl    %ecx, (%rdi,%rax)
        addl    $1, %edx
.L2:
        cmpl    %esi, %edx
        jl      .L3
        rep ret
main:
        subq    $32, %rsp
        movl    $7, %esi
        movq    %rsp, %rdi
        call    f
        movl    $0, %eax
        addq    $32, %rsp
        ret

lb12-activity-diagram-start.png

6 Exercise

Given the C code and the register to variable mapping below, see how far you can get filling in the corresponding assembly:3

for (long i = 0; i < size; i++) {
    total += arr[i];
}
Register Use
%rdi arr
%rsi size
%rdx i
%rax total
init:
        ________________
        ________________
body:
        ________________
        ________________
test:
        ________________
        ________________

7 Practice

CSPP practice problems 3.36 (p. 256) and 3.37 (p. 258)

Footnotes:

1

N * sizeof(T)

2

Here's video walkthrough. The panopto video is below (view it in panopto here), along with the initial memory/register diagram, the C code and assembly side-by-side in godbolt, and the final memory/register diagram.






3
init:
        movl $0, %edx
        jmp  test
body:
        addl (%rdi, %rdx, 4), %eax
        addq $1, %rdx
test:
        cmpq %rsi, %rdx
        jl   body