CS 208 s21 — Data Structure Representation: Arrays
Table of Contents
1 Video
Here is a video lecture for the material outlined below. It covers CSPP section 3.8 (p. 255–265). It contains sections on
- compilation overview (0:00)
- linking (4:10)
- array basics (9:58)
- example array expressions (14:07)
- array access in assembly (25:32)
- arrays and functions (28:09)
- 2D arrays (30:18)
The Panopto viewer has table of contents entries for these sections. Link to the Panopto viewer: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=e8a98333-9b7d-46b3-93ac-ac470131154f
2 Compilation
- We will start by zooming out a little from assembly to look briefly at the whole compilation process
- Let's say out code is in two files:
p1.c
andp2.c
- We compile our program with the command:
gcc -Og p1.c p2.c -o p
- this produces a binary file
p
containing the resulting machine code - we can run the program with the command:
./p
- this produces a binary file
- compilation consists of 3 primary phases:
- translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
- translating assembly to binary object files (each source file is assembled separately)
- linking object files (and any standard library files) together into the final executable
2.1 Producing Machine Language
- how do we go from C code to machine code? How do we get the information we need?
- simple cases: arithmetic and logical operations, shifts, etc.
- all necessary information is contained in the instruction itself (source, destination, size)
- but consider
- conditional jump
- accessing static data
call
- all of these depend on addresses and/or labels
- these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
- this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values
3 Array Basics
T A[N]
means an array ofN
elements of typeT
- contiguously allocated region—how big in terms of
N
?1 A
is a pointer (T*
) to the start of the array
3.1 Array Access
int x[5] = {3, 7, 1, 9, 5};
- indexes
0
,1
,2
,3
,4
- addresses
a
,a+4
,a+8
,a+12
,a+16
,a+20
- where
a
is the address of the start anda+20
is the address of the end of the array
- where
Expression | Type | Value |
---|---|---|
x[4] |
int |
5 |
x |
int* |
a |
x + 1 |
int* |
a+4 |
&x[2] |
int* |
a+8 |
x[5] |
int |
?? (whatever's there in memory) |
*(x + 1) |
int |
7 |
x + i |
int* |
a + 4*i |
3.2 Pointer Arithmetic
- C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
- array subscripting is the combination of pointer arithmetic and dereference (e.g.,
A[i]
is equivalent to*(A+i)
) int *nums
andint nums[]
are nearly identical declarations- subtle differences include initialization,
sizeof
- subtle differences include initialization,
- an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
*ar
same asar[0]
,*(ar+2)
same asar[2]
- it looks like a pointer to the first (0th) element
- an array name is read-only (no assignment) because it is a label
- cannot use
ar = <anything>
- cannot use
int get_digit(int z[5], int digit) { return z[digit]; }
get_digit: movq (%rdi,%rsi,4), %rax ret
4 Arrays and Functions
- an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) { ... arr[size - 1] ... }
arr
is really anint*
(%rdi
can only fit 8 bytes)- without an explicite
size
parameter, no way to determine the length of the array
5 Nested Arrays
T A[R][C]
- 2D array to data type
T
R
rows,C
columns- What's the array's total size?
R*C*sizeof(T)
- 2D array to data type
- single contigious block of memory
- stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row
i
isA + i*(C * sizeof(T))
int sea[4][5] = {{ 9, 8, 1, 9, 5}, { 9, 8, 1, 0, 5}, { 9, 8, 1, 0, 3}, { 9, 8, 1, 1, 5}}
https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing
int* get_sea_zip(int index) { return sea[index]; }
get_sea_zip: leaq (%rdi,%rdi,4), %rax # 5 * index leaq sea(,%rax,4), %rax # sea + 20 * index ret sea: .long 9 .long 8 .long 1 .long 9 .long 5 ...
A[i][j]
to access an individual element of a nested array- the address works out to
A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
- the address works out to
int get_sea_digit (int index, int digit) { return sea[index][digit]; }
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit))
Footnotes:
1
N * sizeof(T)