CS 208 s20 — Data Structure Representation: Arrays
Table of Contents
1 Video
Here is a video lecture for the material outlined below. It covers CSPP section 3.8 (p. 255–265). It contains sections on
- the Sora (0:00)
- compilation overview (0:38)
- linking (4:52
- warmup exercise (11:15)
- array basics (13:03)
- example array expressions (17:11)
- array access in assembly (28:39)
- for loop/array access exercise (31:15)
- arrays and functions (33:26)
- 2D arrays (42:00)
- end notes (52:17)
The Panopto viewer has table of contents entries for these sections. Link to the Panopto viewer: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=ca2b2f65-5029-4466-810c-abb80107011c
2 Warmup
If %rsp is 0x100 and the current stack frame consists of a local array of 4 ints, where is the return address for the current function call stored?1
3 Compilation
- We will start by zooming out a little from assembly to look briefly at the whole compilation process
- Let's say out code is in two files:
p1.candp2.c - We compile our program with the command:
gcc -Og p1.c p2.c -o p- this produces a binary file
pcontaining the resulting machine code - we can run the program with the command:
./p
- this produces a binary file
- compilation consists of 3 primary phases:
- translating source code (e.g., C code) to assembly (still in human-readable text form at this point)
- translating assembly to binary object files (each source file is assembled separately)
- linking object files (and any standard library files) together into the final executable
3.1 Producing Machine Language
- how do we go from C code to machine code? How do we get the information we need?
- simple cases: arithmetic and logical operations, shifts, etc.
- all necessary information is contained in the instruction itself (source, destination, size)
- but consider
- conditional jump
- accessing static data
call
- all of these depend on addresses and/or labels
- these are a problem because we don't have the final executable yet (i.e., we don't know where these will be located in the final executable)
- this is one purpose of the linking step: connect everything together and fill in these "holes" with the appropriate values
4 Array Basics
T A[N]means an array ofNelements of typeT- contiguously allocated region—how big in terms of
N?2 Ais a pointer (T*) to the start of the array
4.1 Array Access
int x[5] = {3, 7, 1, 9, 5};- indexes
0,1,2,3,4 - addresses
a,a+4,a+8,a+12,a+16,a+20- where
ais the address of the start anda+20is the address of the end of the array
- where
| Expression | Type | Value |
|---|---|---|
x[4] |
int |
5 |
x |
int* |
a |
x + 1 |
int* |
a+4 |
&x[2] |
int* |
a+8 |
x[5] |
int |
?? (whatever's there in memory) |
*(x + 1) |
int |
7 |
x + i |
int* |
a + 4*i |
4.2 Pointer Arithmetic
- C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
- array subscripting is the combination of pointer arithmetic and dereference (e.g.,
A[i]is equivalent to*(A+i)) int *numsandint nums[]are nearly identical declarations- subtle differences include initialization,
sizeof
- subtle differences include initialization,
- an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
*arsame asar[0],*(ar+2)same asar[2]
- it looks like a pointer to the first (0th) element
- an array name is read-only (no assignment) because it is a label
- cannot use
ar = <anything>
- cannot use
int get_digit(int z[5], int digit) { return z[digit]; }
get_digit: movq (%rdi,%rsi,4), %rax ret
4.3 Exercise
Given the C code and the register to variable mapping below, see how far you can get filling in the corresponding assembly:3
for (long i = 0; i < size; i++) { total += arr[i]; }
| Register | Use |
|---|---|
%rdi |
arr |
%rsi |
size |
%rdx |
i |
%rax |
total |
init: ________________ ________________ body: ________________ ________________ test: ________________ ________________
5 Arrays and Functions
- arrays declared as local variables are allocated in the current stack frame
char* foo() { char string[32]; ...; return string; } void bar() { int nums[6]; ... } int main() { char *str = foo(); // after foo returns, its stack frame (containing str) is popped off // the string is still there for now... bar(); // stack-allocated nums will use the same memory as string from foo ... using str ... // program can still use str (refers to valid memory address), but its // contents have been reused in the meantime }
- the above code is broken, future function calls will overwrite
string(draw stack frame) - an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) { ... arr[size - 1] ... }
arris really anint*(%rdican only fit 8 bytes)- without an explicite
sizeparameter, no way to determine the length of the array
6 Nested Arrays
T A[R][C]- 2D array to data type
T Rrows,Ccolumns- What's the array's total size?
R*C*sizeof(T)
- 2D array to data type
- single contigious block of memory
- stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row
iisA + i*(C * sizeof(T))
int sea[4][5] = {{ 9, 8, 1, 9, 5}, { 9, 8, 1, 0, 5}, { 9, 8, 1, 0, 3}, { 9, 8, 1, 1, 5}}
https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing
int* get_sea_zip(int index) { return sea[index]; }
get_sea_zip: leaq (%rdi,%rdi,4), %rax # 5 * index leaq sea(,%rax,4), %rax # sea + 20 * index ret sea: .long 9 .long 8 .long 1 .long 9 .long 5 ...
A[i][j]to access an individual element of a nested array- the address works out to
A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
- the address works out to
int get_sea_digit (int index, int digit) { return sea[index][digit]; }
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit))
7 Homework
- CSPP practice problems 3.36 (p. 256) and 3.37 (p. 258)
- Keep working on Lab 2! The handin script has proved too unstable, so we'll do submissions via Moodle from here on out. For lab 2, upload your
solutions.txtto the lab 2 assignment on Moodle.
Footnotes:
The return address is located at 0x110 (16 bytes above the current %rsp, as an array of 4 ints takes up 16 bytes and the return address is stored immediately above the stack frame).
N * sizeof(T)
init: movl $0, %edx jmp test body: addl (%rdi, %rdx, 4), %eax addq $1, %rdx test: cmpq %rsi, %rdx jl body