CS 208 s21 — Data Structure Representation: Arrays
Table of Contents
1 Compilation
- We will start by zooming out a little from assembly to look briefly at the whole compilation process
- Let's say out code is in two files:
p1.c
andp2.c
- We compile our program with the command:
gcc -Og p1.c p2.c -o p
- this produces a binary file
p
containing the resulting machine code - we can run the program with the command:
./p
- this produces a binary file
- compilation consists of 3 primary phases:
- translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
- translating assembly to binary object files (each source file is assembled separately)
- linking object files (and any standard library files) together into the final executable
1.1 Producing Machine Language
- how do we go from C code to machine code? How do we get the information we need?
- simple cases: arithmetic and logical operations, shifts, etc.
- all necessary information is contained in the instruction itself (source, destination, size)
- but consider
- conditional jump
- accessing static data
call
- all of these depend on addresses and/or labels
- these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
- this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values
2 Array Basics
T A[N]
means an array ofN
elements of typeT
- contiguously allocated region—how big in terms of
N
?1 A
is a pointer (T*
) to the start of the array
2.1 Array Access
int x[5] = {3, 7, 1, 9, 5};
- indexes
0
,1
,2
,3
,4
- addresses
a
,a+4
,a+8
,a+12
,a+16
,a+20
- where
a
is the address of the start anda+20
is the address of the end of the array
- where
Expression | Type | Value |
---|---|---|
x[4] |
int |
5 |
x |
int* |
a |
x + 1 |
int* |
a+4 |
&x[2] |
int* |
a+8 |
x[5] |
int |
?? (whatever's there in memory) |
*(x + 1) |
int |
7 |
x + i |
int* |
a + 4*i |
2.2 Pointer Arithmetic
- C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
- array subscripting is the combination of pointer arithmetic and dereference (e.g.,
A[i]
is equivalent to*(A+i)
) int *nums
andint nums[]
are nearly identical declarations- subtle differences include initialization,
sizeof
- subtle differences include initialization,
- an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
*ar
same asar[0]
,*(ar+2)
same asar[2]
- it looks like a pointer to the first (0th) element
- an array name is read-only (no assignment) because it is a label
- cannot use
ar = <anything>
- cannot use
int get_digit(int z[5], int digit) { return z[digit]; }
get_digit: movq (%rdi,%rsi,4), %rax ret
3 Arrays and Functions
- an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) { ... arr[size - 1] ... }
arr
is really anint*
(%rdi
can only fit 8 bytes)- without an explicite
size
parameter, no way to determine the length of the array
4 Nested Arrays
T A[R][C]
- 2D array to data type
T
R
rows,C
columns- What's the array's total size?
R*C*sizeof(T)
- 2D array to data type
- single contigious block of memory
- stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row
i
isA + i*(C * sizeof(T))
int sea[4][5] = {{ 9, 8, 1, 9, 5}, { 9, 8, 1, 0, 5}, { 9, 8, 1, 0, 3}, { 9, 8, 1, 1, 5}}
https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing
int* get_sea_zip(int index) { return sea[index]; }
get_sea_zip: leaq (%rdi,%rdi,4), %rax # 5 * index leaq sea(,%rax,4), %rax # sea + 20 * index ret sea: .long 9 .long 8 .long 1 .long 9 .long 5 ...
A[i][j]
to access an individual element of a nested array- the address works out to
A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
- the address works out to
int get_sea_digit (int index, int digit) { return sea[index][digit]; }
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit))
5 Activity
What affect does this assembly program have on registers and memory given the initial values below?2
f: movl $1, (%rdi) movl $1, 4(%rdi) movl $2, %edx jmp .L2 .L3: movslq %edx, %rax salq $2, %rax movl -8(%rdi,%rax), %ecx addl -4(%rdi,%rax), %ecx movl %ecx, (%rdi,%rax) addl $1, %edx .L2: cmpl %esi, %edx jl .L3 rep ret main: subq $32, %rsp movl $7, %esi movq %rsp, %rdi call f movl $0, %eax addq $32, %rsp ret
6 Exercise
Given the C code and the register to variable mapping below, see how far you can get filling in the corresponding assembly:3
for (long i = 0; i < size; i++) { total += arr[i]; }
Register | Use |
---|---|
%rdi |
arr |
%rsi |
size |
%rdx |
i |
%rax |
total |
init: ________________ ________________ body: ________________ ________________ test: ________________ ________________
7 Practice
CSPP practice problems 3.36 (p. 256) and 3.37 (p. 258)
Footnotes:
1
N * sizeof(T)
2
Here's video walkthrough. The panopto video is below (view it in panopto here),
along with the initial memory/register diagram, the C code and assembly side-by-side in godbolt, and the final memory/register diagram.
3
init: movl $0, %edx jmp test body: addl (%rdi, %rdx, 4), %eax addq $1, %rdx test: cmpq %rsi, %rdx jl body