CS 208 s22 — Data Structure Representation: Arrays
Table of Contents
1 Array Basics
T A[N]
means an array ofN
elements of typeT
- contiguously allocated region—how big in terms of
N
?1 A
is a pointer (T*
) to the start of the array
1.1 Array Access
int x[5] = {3, 7, 1, 9, 5};
- indexes
0
,1
,2
,3
,4
- addresses
a
,a+4
,a+8
,a+12
,a+16
,a+20
- where
a
is the address of the start anda+20
is the address of the end of the array
- where
Expression | Type | Value |
---|---|---|
x[4] |
int |
5 |
x |
int* |
a |
x + 1 |
int* |
a+4 |
&x[2] |
int* |
a+8 |
x[5] |
int |
?? (whatever's there in memory) |
*(x + 1) |
int |
7 |
x + i |
int* |
a + 4*i |
1.2 Pointer Arithmetic
- C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
- array subscripting is the combination of pointer arithmetic and dereference (e.g.,
A[i]
is equivalent to*(A+i)
) int *nums
andint nums[]
are nearly identical declarations- subtle differences include initialization,
sizeof
- subtle differences include initialization,
- an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
*ar
same asar[0]
,*(ar+2)
same asar[2]
- it looks like a pointer to the first (0th) element
- an array name is read-only (no assignment) because it is a label
- cannot use
ar = <anything>
- cannot use
int get_digit(int z[5], int digit) { return z[digit]; }
get_digit: movq (%rdi,%rsi,4), %rax ret
2 Arrays and Functions
- an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) { ... arr[size - 1] ... }
arr
is really anint*
(%rdi
can only fit 8 bytes)- without an explicite
size
parameter, no way to determine the length of the array
3 Nested Arrays
T A[R][C]
- 2D array to data type
T
R
rows,C
columns- What's the array's total size?
R*C*sizeof(T)
- 2D array to data type
- single contigious block of memory
- stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row
i
isA + i*(C * sizeof(T))
int sea[4][5] = {{ 9, 8, 1, 9, 5}, { 9, 8, 1, 0, 5}, { 9, 8, 1, 0, 3}, { 9, 8, 1, 1, 5}}
https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing
int* get_sea_zip(int index) { return sea[index]; }
get_sea_zip: leaq (%rdi,%rdi,4), %rax # 5 * index leaq sea(,%rax,4), %rax # sea + 20 * index ret sea: .long 9 .long 8 .long 1 .long 9 .long 5 ...
A[i][j]
to access an individual element of a nested array- the address works out to
A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
- the address works out to
int get_sea_digit (int index, int digit) { return sea[index][digit]; }
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit))
4 Multilevel Arrays
- is this multi-dimensional array equivalent to previous
sea
?
int sea0[5] = {9, 8, 1, 9, 5}; int sea1[5] = {9, 8, 1, 0, 5}; int sea2[5] = {9, 8, 1, 0, 3}; int sea3[5] = {9, 8, 1, 1, 5}; int *sea_m[4] = {sea0, sea1, sea2, sea3};
- it contains the same 20 ints
- however, each of the four elements of
sea_m
is a pointer—none of the elements ofsea
were pointers - within each of the rows (
sea0
,sea1
,sea2
,sea3
), the 5 ints are allocated as a contiguous block of memory, but each row could be put anywhere - see the difference visually in this spreadsheet
- the C code for
get_sea_m_digit
is the same asget_sea_digit
for 2D arrays
int get_sea_m_digit (int index, int digit) { return sea_m[index][digit]; }
- but the assembly for accessing an element will be different
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit)) ret get_sea_m_digit: salq $2, %rsi # rsi = 4*digit addq sea_m(,%rdi,8), %rsi # p = sea_m[index] + 4*digit movl (%rsi), %eax # return *p ret
- accessing an element now requies two memory accesses
- the benefit of this multilevel structure is that the rows can be different lengths
- array access looks the same
sea[3][2]
andsea_m[3][2]
, but underneath- Mem[
sea + 20*index + 4*digit
] vs Mem[ Mem[sea_m + 8*index
]+ 4*digit
]
- Mem[
5 Warmup
For each of the following array accesses to the array pictured below, determine if it is a valid access and, if so, what value it returns.2
sea[2][5]
sea[4][-1]
sea[0][19]
Which of the following statements is FALSE?3
sea[4][-2]
is a valid array referencesea[1][1]
makes two memory accessessea[2][1]
will always be a higher address thansea[1][2]
sea[2]
is calculated using only lea
6 Multi-level array exercise
For each of the following array accesses to the array pictured below, determine if it is a valid access and, if so, what value it returns.4
sea[2][3]
sea[1][5]
sea[2][-2]
7 Background for Lab 2
- register uses: %rax, %rsp, %rdi, %rsi, %rdx, %rcx, %r8, %r9
- starting lab 2
- hacking vs analyzing
- either way, practice the skill of finding the relevant detail and ignoring the rest
- basic commands: r(un), b(reak), stepi (si), nexti (ni), c(ontinue), layout asm, layout reg, disas, p(rint), x, finish
- blank line at the end of defuse.txt
- push/pop/moving %rsp at the beginning and end of functions
- sscanf
8 gdb
activity
- Get started with these commands:
wget http://cs.carleton.edu/faculty/awb/cs208/s22/notes/gdb-activity.tar tar xvf gdb-activity.tar cd gdb-activity make
- Try running the program with
./gdb-activity
, what happens? - Open
gdb-activity.c
#include <string.h> #include <stdlib.h> #include <stdio.h> int compare(int a, int b); int main(int argc, char** argv) { int a, b, n; char input[100]; printf("enter good args: "); if (fgets(input, 100, stdin) == NULL) { printf("I said good args\n"); } n = sscanf(input, "%d %d", &a, &b); if (n == 2 && compare(a,b) == 1) { printf("good args!\n"); } else { printf("bad args, try harder!\n"); } return 0; }
Observations:
- four functions are called:
printf
,fgets
,sscanf
, andcompare
- the first three are C library functions (since they aren't declared anywhere, they must come from the
#include
of library headers) - look each library function up in the terminal with
man 3 FUNCTION
, or consult cplusplus.com/FUNCTION (the latter is often easier to understand)
- the first three are C library functions (since they aren't declared anywhere, they must come from the
- to get the program to print "good args!", we need
fgets
to return something other thanNULL
, havesscanf
return 2, and havecompare(a, b)
return 1fgets
will read from the command line (stdin
) and store the string ininput
, up to 100 characters- only returns
NULL
on failure, so we probably don't have to worry about that
- only returns
sscanf
is a super useful function: it parses a string (the first parameter) according to a format string given by the second parameter%d
is the format specifier for an integer, so thissscanf
call will parseinput
as two integers separated by a space- the parsed items (i.e., each
%d
) will be written to the corresponding pointers provided as arguments after the format string- so the first integer in
input
will be stored ina
and the second will be stored inb
- so the first integer in
compare
is actually implemented in raw assembly ingdb-activity.s
- Lets use
gdb
to get a sense for how everything is fitting together. (This tutorial video goes over usinggdb
if you want to review.) - Running
disas compare
from withingdb
gives
0x00000000004006d7 <+0>: push %rbx 0x00000000004006d8 <+1>: mov %rdi,%rbx 0x00000000004006db <+4>: add $0x5,%rbx 0x00000000004006df <+8>: add %rsi,%rbx 0x00000000004006e2 <+11>: cmp $0xd0,%rbx 0x00000000004006e9 <+18>: sete %al 0x00000000004006ec <+21>: movzbq %al,%rax 0x00000000004006f0 <+25>: pop %rbx 0x00000000004006f1 <+26>: retq
- We can reverse engineer the C code to be something like
int compare(int a, int b) { return a + b + 5 == 0xd0; // we add %rdi, %rsi, and 5 together in %rbx and then compare it to $0xd0 // sete writes 1 to the given register if the cmp indicates the operands are equal // since this is the return value, and we want compare to return 1, // we should choose inputs to make these equal }
9 Activity
What affect does this assembly program have on registers and memory given the initial values below?5
f: movl $1, (%rdi) movl $1, 4(%rdi) movl $2, %edx jmp .L2 .L3: movslq %edx, %rax salq $2, %rax movl -8(%rdi,%rax), %ecx addl -4(%rdi,%rax), %ecx movl %ecx, (%rdi,%rax) addl $1, %edx .L2: cmpl %esi, %edx jl .L3 rep ret main: subq $32, %rsp movl $7, %esi movq %rsp, %rdi call f movl $0, %eax addq $32, %rsp ret
10 Exercise
Given the C code and the register to variable mapping below, see how far you can get filling in the corresponding assembly:6
for (long i = 0; i < size; i++) { total += arr[i]; }
Register | Use |
---|---|
%rdi |
arr |
%rsi |
size |
%rdx |
i |
%rax |
total |
init: ________________ ________________ body: ________________ ________________ test: ________________ ________________
11 Practice
CSPP practice problems 3.36 (p. 256) and 3.37 (p. 258)
Footnotes:
N * sizeof(T)
- valid, 0
- invalid, past the end of row 1 (and rows are not adjacent in memory in this multilevel array)
- invalid, past the start of row 2 (and rows are not adjacent in memory in this multilevel array)
2. is the false statement.
The assembly for sea[1][1]
will compute the address of that specific element and then make a single memory access to that address.
- valid, 9 (start of row 2, then 5 ints forward)
- valid, 5 (start of the non-existent row 4, then 1 int backwards, thereby reference the valid element at the end of row 3)
- valid, 5 (start of row 0, then 19 ints forward, referencing the final int in the array)
init: movl $0, %edx jmp test body: addl (%rdi, %rdx, 4), %eax addq $1, %rdx test: cmpq %rsi, %rdx jl body