CS 208 s21 — Data Structure Representation: Arrays

1. Video
2. Compilation
- 2.1. Producing Machine Language
3. Array Basics
- 3.1. Array Access
- 3.2. Pointer Arithmetic
4. Arrays and Functions
5. Nested Arrays

1 Video

Here is a video lecture for the material outlined below. It covers CSPP section 3.8 (p. 255–265). It contains sections on

compilation overview (0:00)
linking (4:10)
array basics (9:58)
example array expressions (14:07)
array access in assembly (25:32)
arrays and functions (28:09)
2D arrays (30:18)

The Panopto viewer has table of contents entries for these sections. Link to the Panopto viewer: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=e8a98333-9b7d-46b3-93ac-ac470131154f

2 Compilation

We will start by zooming out a little from assembly to look briefly at the whole compilation process
Let's say out code is in two files: p1.c and p2.c
We compile our program with the command: gcc -Og p1.c p2.c -o p
- this produces a binary file p containing the resulting machine code
- we can run the program with the command: ./p

compilation consists of 3 primary phases:
1. translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
2. translating assembly to binary object files (each source file is assembled separately)
3. linking object files (and any standard library files) together into the final executable

2.1 Producing Machine Language

how do we go from C code to machine code? How do we get the information we need?
simple cases: arithmetic and logical operations, shifts, etc.
- all necessary information is contained in the instruction itself (source, destination, size)
but consider
- conditional jump
- accessing static data
- call
all of these depend on addresses and/or labels
- these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values

3 Array Basics

T A[N] means an array of N elements of type T
contiguously allocated region—how big in terms of N?¹
A is a pointer (T*) to the start of the array

3.1 Array Access

int x[5] = {3, 7, 1, 9, 5};
indexes 0, 1, 2, 3, 4
addresses a, a+4, a+8, a+12, a+16, a+20
- where a is the address of the start and a+20 is the address of the end of the array

Expression	Type	Value
`x[4]`	`int`	`5`
`x`	`int*`	`a`
`x + 1`	`int*`	`a+4`
`&x[2]`	`int*`	`a+8`
`x[5]`	`int`	?? (whatever's there in memory)
`*(x + 1)`	`int`	`7`
`x + i`	`int*`	`a + 4*i`

3.2 Pointer Arithmetic

C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
array subscripting is the combination of pointer arithmetic and dereference (e.g., A[i] is equivalent to *(A+i))
int *nums and int nums[] are nearly identical declarations
- subtle differences include initialization, sizeof
an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
  - *ar same as ar[0], *(ar+2) same as ar[2]
an array name is read-only (no assignment) because it is a label
- cannot use ar = <anything>

int get_digit(int z[5], int digit) {
    return z[digit];
}

get_digit:
        movq    (%rdi,%rsi,4), %rax
        ret

4 Arrays and Functions

an array is passed to a function as a pointer—this means the size gets lost!

int foo(int arr[], unsigned int size) {
    ... arr[size - 1] ...
}

arr is really an int* (%rdi can only fit 8 bytes)
without an explicite size parameter, no way to determine the length of the array

5 Nested Arrays

T A[R][C]
- 2D array to data type T
- R rows, C columns
- What's the array's total size? R*C*sizeof(T)
single contigious block of memory
stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row i is A + i*(C * sizeof(T))

int sea[4][5] = 
  {{ 9, 8, 1, 9, 5},
   { 9, 8, 1, 0, 5},
   { 9, 8, 1, 0, 3},
   { 9, 8, 1, 1, 5}}

https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing

int* get_sea_zip(int index) {
    return sea[index];
}

get_sea_zip:
        leaq    (%rdi,%rdi,4), %rax  # 5 * index
        leaq    sea(,%rax,4), %rax   # sea + 20 * index
        ret
sea:
        .long   9
        .long   8
        .long   1
        .long   9
        .long   5
        ...

A[i][j] to access an individual element of a nested array
- the address works out to A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)

int get_sea_digit (int index, int digit) {
    return sea[index][digit];
}

get_sea_digit:
        leaq (%rdi,%rdi,4), %rax  # 5 * index
        addl %rax, %rsi           # 5 * index + digit
        movl sea(,%rsi,4), %eax   # *(sea + 4 * (5 * index + digit))

Footnotes:

N * sizeof(T)

CS 208 s21 — Data Structure Representation: Arrays

Table of Contents

1 Video

2 Compilation

2.1 Producing Machine Language

3 Array Basics

3.1 Array Access

3.2 Pointer Arithmetic

4 Arrays and Functions

5 Nested Arrays

Footnotes: