CS 208 s20 — Data Structure Representation: Arrays
Table of Contents
1 Video
Here is a video lecture for the material outlined below. It covers CSPP section 3.8 (p. 255–265). It contains sections on
- the Sora (0:00)
- compilation overview (0:38)
- linking (4:52
- warmup exercise (11:15)
- array basics (13:03)
- example array expressions (17:11)
- array access in assembly (28:39)
- for loop/array access exercise (31:15)
- arrays and functions (33:26)
- 2D arrays (42:00)
- end notes (52:17)
The Panopto viewer has table of contents entries for these sections. Link to the Panopto viewer: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=ca2b2f65-5029-4466-810c-abb80107011c
2 Warmup
If %rsp
is 0x100
and the current stack frame consists of a local array of 4 ints, where is the return address for the current function call stored?1
3 Compilation
- We will start by zooming out a little from assembly to look briefly at the whole compilation process
- Let's say out code is in two files:
p1.c
andp2.c
- We compile our program with the command:
gcc -Og p1.c p2.c -o p
- this produces a binary file
p
containing the resulting machine code - we can run the program with the command:
./p
- this produces a binary file
- compilation consists of 3 primary phases:
- translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
- translating assembly to binary object files (each source file is assembled separately)
- linking object files (and any standard library files) together into the final executable
3.1 Producing Machine Language
- how do we go from C code to machine code? How do we get the information we need?
- simple cases: arithmetic and logical operations, shifts, etc.
- all necessary information is contained in the instruction itself (source, destination, size)
- but consider
- conditional jump
- accessing static data
call
- all of these depend on addresses and/or labels
- these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
- this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values
4 Array Basics
T A[N]
means an array ofN
elements of typeT
- contiguously allocated region—how big in terms of
N
?2 A
is a pointer (T*
) to the start of the array
4.1 Array Access
int x[5] = {3, 7, 1, 9, 5};
- indexes
0
,1
,2
,3
,4
- addresses
a
,a+4
,a+8
,a+12
,a+16
,a+20
- where
a
is the address of the start anda+20
is the address of the end of the array
- where
Expression | Type | Value |
---|---|---|
x[4] |
int |
5 |
x |
int* |
a |
x + 1 |
int* |
a+4 |
&x[2] |
int* |
a+8 |
x[5] |
int |
?? (whatever's there in memory) |
*(x + 1) |
int |
7 |
x + i |
int* |
a + 4*i |
4.2 Pointer Arithmetic
- C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
- array subscripting is the combination of pointer arithmetic and dereference (e.g.,
A[i]
is equivalent to*(A+i)
) int *nums
andint nums[]
are nearly identical declarations- subtle differences include initialization,
sizeof
- subtle differences include initialization,
- an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
*ar
same asar[0]
,*(ar+2)
same asar[2]
- it looks like a pointer to the first (0th) element
- an array name is read-only (no assignment) because it is a label
- cannot use
ar = <anything>
- cannot use
int get_digit(int z[5], int digit) { return z[digit]; }
get_digit: movq (%rdi,%rsi,4), %rax ret
4.3 Exercise
Given the C code and the register to variable mapping below, see how far you can get filling in the corresponding assembly:3
for (long i = 0; i < size; i++) { total += arr[i]; }
Register | Use |
---|---|
%rdi |
arr |
%rsi |
size |
%rdx |
i |
%rax |
total |
init: ________________ ________________ body: ________________ ________________ test: ________________ ________________
5 Arrays and Functions
- arrays declared as local variables are allocated in the current stack frame
char* foo() { char string[32]; ...; return string; } void bar() { int nums[6]; ... } int main() { char *str = foo(); // after foo returns, its stack frame (containing str) is popped off // the string is still there for now... bar(); // stack-allocated nums will use the same memory as string from foo ... using str ... // program can still use str (refers to valid memory address), but its // contents have been reused in the meantime }
- the above code is broken, future function calls will overwrite
string
(draw stack frame) - an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) { ... arr[size - 1] ... }
arr
is really anint*
(%rdi
can only fit 8 bytes)- without an explicite
size
parameter, no way to determine the length of the array
6 Nested Arrays
T A[R][C]
- 2D array to data type
T
R
rows,C
columns- What's the array's total size?
R*C*sizeof(T)
- 2D array to data type
- single contigious block of memory
- stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row
i
isA + i*(C * sizeof(T))
int sea[4][5] = {{ 9, 8, 1, 9, 5}, { 9, 8, 1, 0, 5}, { 9, 8, 1, 0, 3}, { 9, 8, 1, 1, 5}}
https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing
int* get_sea_zip(int index) { return sea[index]; }
get_sea_zip: leaq (%rdi,%rdi,4), %rax # 5 * index leaq sea(,%rax,4), %rax # sea + 20 * index ret sea: .long 9 .long 8 .long 1 .long 9 .long 5 ...
A[i][j]
to access an individual element of a nested array- the address works out to
A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
- the address works out to
int get_sea_digit (int index, int digit) { return sea[index][digit]; }
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit))
7 Homework
- CSPP practice problems 3.36 (p. 256) and 3.37 (p. 258)
- Keep working on Lab 2! The handin script has proved too unstable, so we'll do submissions via Moodle from here on out. For lab 2, upload your
solutions.txt
to the lab 2 assignment on Moodle.
Footnotes:
The return address is located at 0x110
(16 bytes above the current %rsp
, as an array of 4 ints takes up 16 bytes and the return address is stored immediately above the stack frame).
N * sizeof(T)
init: movl $0, %edx jmp test body: addl (%rdi, %rdx, 4), %eax addq $1, %rdx test: cmpq %rsi, %rdx jl body