CS 208 s20 — Data Structure Representation: structs
Table of Contents
1 Video
Here is a video lecture for the material outlined below. It covers CSPP section 3.9.1 and 3.9.3 (p. 265–267, 273–275). It contains sections on
- introduction (0:03)
- multi-level arrays (0:56)
- spreadsheet visualization (3:10)
- assembly for multi-level array access (5:41)
- multi-level array exercise (14:11)
- struct basics (14:54)
- typedef with structs (20:09)
- accessing struct fields (22:37)
- size of a struct (25:46)
- assembly for struct access (28:04)
- data alignment (33:34)
- alignment for structs (40:36)
- struct alignment exercises (50:25)
- end notes (54:34)
The Panopto viewer has table of contents entries for these sections. Link to the Panopto viewer: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=c97fa161-5c36-4d3c-b26d-abbc01028ee4
2 Warmup
For each of the following array accesses to the array pictured below, determine if it is a valid access and, if so, what value it returns.1
sea[2][5]
sea[4][-1]
sea[0][19]
Which of the following statements is FALSE?2
sea[4][-2]
is a valid array referencesea[1][1]
makes two memory accessessea[2][1]
will always be a higher address thansea[1][2]
sea[2]
is calculated using only lea
3 Multilevel Arrays
- is this multi-dimensional array equivalent to previous
sea
?
int sea0[5] = {9, 8, 1, 9, 5}; int sea1[5] = {9, 8, 1, 0, 5}; int sea2[5] = {9, 8, 1, 0, 3}; int sea3[5] = {9, 8, 1, 1, 5}; int *sea_m[4] = {sea0, sea1, sea2, sea3};
- it contains the same 20 ints
- however, each of the four elements of
sea_m
is a pointer—none of the elements ofsea
were pointers - within each of the rows (
sea0
,sea1
,sea2
,sea3
), the 5 ints are allocated as a contiguous block of memory, but each row could be put anywhere - see the difference visually in this spreadsheet
- the C code for
get_sea_m_digit
is the same asget_sea_digit
from topic 15
int get_sea_m_digit (int index, int digit) { return sea_m[index][digit]; }
- but the assembly for accessing an element will be different
get_sea_digit: leaq (%rdi,%rdi,4), %rax # 5 * index addl %rax, %rsi # 5 * index + digit movl sea(,%rsi,4), %eax # *(sea + 4 * (5 * index + digit)) ret get_sea_m_digit: salq $2, %rsi # rsi = 4*digit addq sea_m(,%rdi,8), %rsi # p = sea_m[index] + 4*digit movl (%rsi), %eax # return *p ret
- accessing an element now requies two memory accesses
- the benefit of this multilevel structure is that the rows can be different lengths
- array access looks the same
sea[3][2]
andsea_m[3][2]
, but underneath- Mem[
sea + 20*index + 4*digit
] vs Mem[ Mem[sea_m + 8*index
]+ 4*digit
]
- Mem[
3.1 Exercise
For each of the following array accesses to the array pictured below, determine if it is a valid access and, if so, what value it returns.3
sea[2][3]
sea[1][5]
sea[2][-2]
4 Structures
Two ways to create data types in C: structures (struct
) and unions (union
) (we won't worry about unions in this course)
// a way of combining different types of data together struct song { char *title; int length_in_seconds; int year_released; }; struct song song1; song1.title = "What is Urinetown?"; song1.length_in_seconds = 213; song1.year_released = 2001;
- variable declarations like any other type:
struct name name1;
struct name *pn;
struct name name_ar[3];
- common to use
typedef
to give the struct type a more concise aliastypedef struct song song_t;
makes it so we can usesong_t
anywhere we would need to usestruct song
- access fields with
.
or->
in the case of the pointer (p->field
is shorthand for(*p).field)
- like arrays, struct elements are stored in a contiguous region with a pointer to the first byte
- what will
sizeof(struct song)
return? (sizeof
is a built-in operator that returns the size, in bytes, of a type)- 16 bytes
- compiler maintains the byte offset information needed to access additional fields (e.g.,
length_in_seconds
is 8 bytes from the start of the struct) - can find offset of individual fields using
offsetof(type, member)
- what will
4.1 Examples
struct rec { int a[4]; long i; struct rec *next; };
- fields are laid out in memory in same order in which they were declared
- machine code knows nothing about structures, all just byte offsets from a pointer
long get_i(struct rec *r) { return r->i; }
- the compiler knows the field
i
is always 16 bytes (the size of the array of 4 intsa
) from the start of the struct:
get_i: movq 16(%rdi), %rax ret
long* addr_of_i(struct rec *r) { return &(r->i); }
- note to get the address of the
i
field, we uselea
to add 16 to the address of the struct
addr_of_i: leaq 16(%rdi), %rax ret
struct rec** addr_of_next(struct rec *r) { return &(r->next); }
- the
next
field is 24 bytes from the start of the struct (16 fora
+ 8 fori
)
addr_of_next: leaq 24(%rdi), %rax ret
int get_array_elem (struct rec *r, long index) { return r->a[index]; }
- since
a
is located at the start of the struct, accessing an element requires no offset, just normal array indexing
get_array_elem: movl (%rdi, %rsi, 4), %eax ret
4.2 Data Alignment
- suppose a processor always fetches 8 bytes from an address that must be a multiple of 8
- if every
double
is guaranteed to have a memory address that is a multiple of 8, then it's guaranteed to take only a single operation to read - otherwise, it would take two operations if a
double
were split across two 8-byte blocks
- if every
- this kind of behavior is typical of hardware interfacing between the processor and memory
- hence, systems institute alignment restrictions to improve memory performance
- Intel recommends data alignment to improve performance
- x86-64 alignment principle: any primitive object of \(K\) bytes must have an address that is a multiple of \(K\)
- this means for structures, the compiler sometimes must insert gaps between fields to maintain alignment (internal fragmentation)
- even if this padding isn't required within a structure, it sometimes must be added to the end to ensure an array of structures is aligned (external fragmentation)
- each structure has alignment requirement \(K_{max}\) = largest alignment of any element
- counts array elements individually as elements
- even if this padding isn't required within a structure, it sometimes must be added to the end to ensure an array of structures is aligned (external fragmentation)
5 Homework
- CSPP practice problems 3.41 (p. 268) and 3.45 (p. 275)
- The week 6 quiz is due 9pm Wednesday, May 20
- Make a post in the lab 3 check-in forum, due 9pm Monday, May 18 (let this be a reason to start the lab by Monday!)
Footnotes:
- valid, 9 (start of row 2, then 5 ints forward)
- valid, 5 (start of the non-existent row 4, then 1 int backwards, thereby reference the valid element at the end of row 3)
- valid, 5 (start of row 0, then 19 ints forward, referencing the final int in the array)
2. is the false statement.
The assembly for sea[1][1]
will compute the address of that specific element and then make a single memory access to that address.
- valid, 0
- invalid, past the end of row 1 (and rows are not adjacent in memory in this multilevel array)
- invalid, past the start of row 2 (and rows are not adjacent in memory in this multilevel array)
The int i
is 4 bytes, so it must have an address that's a multiple of 4.
Hence, the compiler adds 3 bytes of padding after the 1-byte char c
to achieve this.
The size of the struct as a whole must be a multiple of the size of the largest field.
Hence, the compiler adds 3 bytes of padding to the end.
S4
has both internal and external fragmentation, while S5
has just external.
struct new { int i; float f; char *c; short s[3]; }
This would eliminate the internal fragmentation and leave just 2 bytes of external fragmentation.
sizeof(struct old)
= 32 bytes (6 bytes of internal fragmentation, 6 bytes external)sizeof(struct new)
= 24 bytes (2 bytes external)