CS 208 s22 — Data Structure Representation: Arrays
Table of Contents
1 Array Basics
T A[N]means an array ofNelements of typeT- contiguously allocated region—how big in terms of
N?1 Ais a pointer (T*) to the start of the array
1.1 Array Access
int x[5] = {3, 7, 1, 9, 5};- indexes
0,1,2,3,4 - addresses
a,a+4,a+8,a+12,a+16,a+20- where
ais the address of the start anda+20is the address of the end of the array
- where
| Expression | Type | Value |
|---|---|---|
x[4] |
int |
5 |
x |
int* |
a |
x + 1 |
int* |
a+4 |
&x[2] |
int* |
a+8 |
x[5] |
int |
?? (whatever's there in memory) |
*(x + 1) |
int |
7 |
x + i |
int* |
a + 4*i |
1.2 Pointer Arithmetic
- C allows pointer arithmetic where the result is scaled according to the size of the data type referenced by the pointer
- array subscripting is the combination of pointer arithmetic and dereference (e.g.,
A[i]is equivalent to*(A+i)) int *numsandint nums[]are nearly identical declarations- subtle differences include initialization,
sizeof
- subtle differences include initialization,
- an array name is an expression (not a variable) that returns the address of the array
- it looks like a pointer to the first (0th) element
*arsame asar[0],*(ar+2)same asar[2]
- it looks like a pointer to the first (0th) element
- an array name is read-only (no assignment) because it is a label
- cannot use
ar = <anything>
- cannot use
int get_digit(int z[5], int digit) { return z[digit]; }
get_digit: movq (%rdi,%rsi,4), %rax ret
2 Arrays and Functions
- an array is passed to a function as a pointer—this means the size gets lost!
int foo(int arr[], unsigned int size) { ... arr[size - 1] ... }
arris really anint*(%rdican only fit 8 bytes)- without an explicite
sizeparameter, no way to determine the length of the array
3 Nested Arrays
T A[R][C]- 2D array to data type
T Rrows,Ccolumns- What's the array's total size?
R*C*sizeof(T)
- 2D array to data type
- single contigious block of memory
- stored in row-major order
- all elements in row 0, followed by all elements in row 1, etc.
- address of row
iisA + i*(C * sizeof(T))
int sea[4][5] = {{ 9, 8, 1, 9, 5}, { 9, 8, 1, 0, 5}, { 9, 8, 1, 0, 3}, { 9, 8, 1, 1, 5}}
https://docs.google.com/spreadsheets/d/17HGr47X1Q8EqkFmZ4Fv8Y54mO8o-wIPaabQBf_bORZ8/edit?usp=sharing
int* get_sea_zip(int index) { return sea[index]; }
get_sea_zip: leaq (%rdi,%rdi,4), %rax # 5 * index leaq sea(,%rax,4), %rax # sea + 20 * index ret sea: .long 9 .long 8 .long 1 .long 9 .long 5 ...
A[i][j]to access an individual element of a nested array- the address works out to
A + i*(C*sizeof(T)) + j*sizeof(T) == A + (i*C + j)*sizeof(T)
- the address works out to
int get_sea_digit (int index, int digit) { return sea[index][digit]; }
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit))
4 Multilevel Arrays
- is this multi-dimensional array equivalent to previous
sea?
int sea0[5] = {9, 8, 1, 9, 5}; int sea1[5] = {9, 8, 1, 0, 5}; int sea2[5] = {9, 8, 1, 0, 3}; int sea3[5] = {9, 8, 1, 1, 5}; int *sea_m[4] = {sea0, sea1, sea2, sea3};
- it contains the same 20 ints
- however, each of the four elements of
sea_mis a pointer—none of the elements ofseawere pointers - within each of the rows (
sea0,sea1,sea2,sea3), the 5 ints are allocated as a contiguous block of memory, but each row could be put anywhere - see the difference visually in this spreadsheet
- the C code for
get_sea_m_digitis the same asget_sea_digitfor 2D arrays
int get_sea_m_digit (int index, int digit) { return sea_m[index][digit]; }
- but the assembly for accessing an element will be different
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit)) ret get_sea_m_digit: salq $2, %rsi # rsi = 4*digit addq sea_m(,%rdi,8), %rsi # p = sea_m[index] + 4*digit movl (%rsi), %eax # return *p ret
- accessing an element now requies two memory accesses
- the benefit of this multilevel structure is that the rows can be different lengths
- array access looks the same
sea[3][2]andsea_m[3][2], but underneath- Mem[
sea + 20*index + 4*digit] vs Mem[ Mem[sea_m + 8*index]+ 4*digit]
- Mem[
5 Warmup
For each of the following array accesses to the array pictured below, determine if it is a valid access and, if so, what value it returns.2
sea[2][5]sea[4][-1]sea[0][19]
Which of the following statements is FALSE?3
sea[4][-2]is a valid array referencesea[1][1]makes two memory accessessea[2][1]will always be a higher address thansea[1][2]sea[2]is calculated using only lea
6 Multi-level array exercise
For each of the following array accesses to the array pictured below, determine if it is a valid access and, if so, what value it returns.4
sea[2][3]sea[1][5]sea[2][-2]
7 Background for Lab 2
- register uses: %rax, %rsp, %rdi, %rsi, %rdx, %rcx, %r8, %r9
- starting lab 2
- hacking vs analyzing
- either way, practice the skill of finding the relevant detail and ignoring the rest
- basic commands: r(un), b(reak), stepi (si), nexti (ni), c(ontinue), layout asm, layout reg, disas, p(rint), x, finish
- blank line at the end of defuse.txt
- push/pop/moving %rsp at the beginning and end of functions
- sscanf
8 gdb activity
- Get started with these commands:
wget http://cs.carleton.edu/faculty/awb/cs208/s22/notes/gdb-activity.tar tar xvf gdb-activity.tar cd gdb-activity make
- Try running the program with
./gdb-activity, what happens? - Open
gdb-activity.c
#include <string.h> #include <stdlib.h> #include <stdio.h> int compare(int a, int b); int main(int argc, char** argv) { int a, b, n; char input[100]; printf("enter good args: "); if (fgets(input, 100, stdin) == NULL) { printf("I said good args\n"); } n = sscanf(input, "%d %d", &a, &b); if (n == 2 && compare(a,b) == 1) { printf("good args!\n"); } else { printf("bad args, try harder!\n"); } return 0; }
Observations:
- four functions are called:
printf,fgets,sscanf, andcompare- the first three are C library functions (since they aren't declared anywhere, they must come from the
#includeof library headers) - look each library function up in the terminal with
man 3 FUNCTION, or consult cplusplus.com/FUNCTION (the latter is often easier to understand)
- the first three are C library functions (since they aren't declared anywhere, they must come from the
- to get the program to print "good args!", we need
fgetsto return something other thanNULL, havesscanfreturn 2, and havecompare(a, b)return 1fgetswill read from the command line (stdin) and store the string ininput, up to 100 characters- only returns
NULLon failure, so we probably don't have to worry about that
- only returns
sscanfis a super useful function: it parses a string (the first parameter) according to a format string given by the second parameter%dis the format specifier for an integer, so thissscanfcall will parseinputas two integers separated by a space- the parsed items (i.e., each
%d) will be written to the corresponding pointers provided as arguments after the format string- so the first integer in
inputwill be stored inaand the second will be stored inb
- so the first integer in
compareis actually implemented in raw assembly ingdb-activity.s
- Lets use
gdbto get a sense for how everything is fitting together. (This tutorial video goes over usinggdbif you want to review.) - Running
disas comparefrom withingdbgives
0x00000000004006d7 <+0>: push %rbx 0x00000000004006d8 <+1>: mov %rdi,%rbx 0x00000000004006db <+4>: add $0x5,%rbx 0x00000000004006df <+8>: add %rsi,%rbx 0x00000000004006e2 <+11>: cmp $0xd0,%rbx 0x00000000004006e9 <+18>: sete %al 0x00000000004006ec <+21>: movzbq %al,%rax 0x00000000004006f0 <+25>: pop %rbx 0x00000000004006f1 <+26>: retq
- We can reverse engineer the C code to be something like
int compare(int a, int b) { return a + b + 5 == 0xd0; // we add %rdi, %rsi, and 5 together in %rbx and then compare it to $0xd0 // sete writes 1 to the given register if the cmp indicates the operands are equal // since this is the return value, and we want compare to return 1, // we should choose inputs to make these equal }
9 Activity
What affect does this assembly program have on registers and memory given the initial values below?5
f: movl $1, (%rdi) movl $1, 4(%rdi) movl $2, %edx jmp .L2 .L3: movslq %edx, %rax salq $2, %rax movl -8(%rdi,%rax), %ecx addl -4(%rdi,%rax), %ecx movl %ecx, (%rdi,%rax) addl $1, %edx .L2: cmpl %esi, %edx jl .L3 rep ret main: subq $32, %rsp movl $7, %esi movq %rsp, %rdi call f movl $0, %eax addq $32, %rsp ret
10 Exercise
Given the C code and the register to variable mapping below, see how far you can get filling in the corresponding assembly:6
for (long i = 0; i < size; i++) { total += arr[i]; }
| Register | Use |
|---|---|
%rdi |
arr |
%rsi |
size |
%rdx |
i |
%rax |
total |
init: ________________ ________________ body: ________________ ________________ test: ________________ ________________
11 Practice
CSPP practice problems 3.36 (p. 256) and 3.37 (p. 258)
Footnotes:
N * sizeof(T)
- valid, 0
- invalid, past the end of row 1 (and rows are not adjacent in memory in this multilevel array)
- invalid, past the start of row 2 (and rows are not adjacent in memory in this multilevel array)
2. is the false statement.
The assembly for sea[1][1] will compute the address of that specific element and then make a single memory access to that address.
- valid, 9 (start of row 2, then 5 ints forward)
- valid, 5 (start of the non-existent row 4, then 1 int backwards, thereby reference the valid element at the end of row 3)
- valid, 5 (start of row 0, then 19 ints forward, referencing the final int in the array)


init: movl $0, %edx jmp test body: addl (%rdi, %rdx, 4), %eax addq $1, %rdx test: cmpq %rsi, %rdx jl body