CS 208 s22 — Data Structure Representation: structs
Table of Contents
1 Exericse
Write the assembly for the following function
#include <stdio.h> void parse(char *input, char *name, int *age) { sscanf(input, "%s, %d", name, age); }
Assume the string "%s, %d"
will be stored at address .LC0
.1
2 read_six_numbers
Here we see leaq
used to actually compute pointers, rather that just do efficient arithmetic.
Arguments to sscanf
beyond the sixth are passed via the stack—the function is compiled to read them from there.
Forms of stack allocation:
subq $___, %rsp
pushq
- (also
callq
, more about this next class)
Forms of stack deallocation:
addq $___, %rsp
popq
- (also
ret
, more about this next class)
3 Structures
Two ways to create data types in C: structures (struct
) and unions (union
) (we won't worry about unions in this course)
// a way of combining different types of data together struct song { char *title; int length_in_seconds; int year_released; }; struct song song1; song1.title = "What is Urinetown?"; song1.length_in_seconds = 213; song1.year_released = 2001;
- variable declarations like any other type:
struct name name1;
struct name *pn;
struct name name_ar[3];
- common to use
typedef
to give the struct type a more concise aliastypedef struct song song_t;
makes it so we can usesong_t
anywhere we would need to usestruct song
- access fields with
.
or->
in the case of the pointer (p->field
is shorthand for(*p).field)
- like arrays, struct elements are stored in a contiguous region with a pointer to the first byte
- what will
sizeof(struct song)
return? (sizeof
is a built-in operator that returns the size, in bytes, of a type)- 16 bytes
- compiler maintains the byte offset information needed to access additional fields (e.g.,
length_in_seconds
is 8 bytes from the start of the struct) - can find offset of individual fields using
offsetof(type, member)
- what will
3.1 Examples
struct rec { int a[4]; long i; struct rec *next; };
- fields are laid out in memory in same order in which they were declared
- machine code knows nothing about structures, all just byte offsets from a pointer
long get_i(struct rec *r) { return r->i; }
- the compiler knows the field
i
is always 16 bytes (the size of the array of 4 intsa
) from the start of the struct:
get_i: movq 16(%rdi), %rax ret
long* addr_of_i(struct rec *r) { return &(r->i); }
- note to get the address of the
i
field, we uselea
to add 16 to the address of the struct
addr_of_i: leaq 16(%rdi), %rax ret
struct rec** addr_of_next(struct rec *r) { return &(r->next); }
- the
next
field is 24 bytes from the start of the struct (16 fora
+ 8 fori
)
addr_of_next: leaq 24(%rdi), %rax ret
int get_array_elem (struct rec *r, long index) { return r->a[index]; }
- since
a
is located at the start of the struct, accessing an element requires no offset, just normal array indexing
get_array_elem: movl (%rdi, %rsi, 4), %eax ret
3.2 Data Alignment
- suppose a processor always fetches 8 bytes from an address that must be a multiple of 8
- if every
double
is guaranteed to have a memory address that is a multiple of 8, then it's guaranteed to take only a single operation to read - otherwise, it would take two operations if a
double
were split across two 8-byte blocks
- if every
- this kind of behavior is typical of hardware interfacing between the processor and memory
- hence, systems institute alignment restrictions to improve memory performance
- Intel recommends data alignment to improve performance
- x86-64 alignment principle: any primitive object of \(K\) bytes must have an address that is a multiple of \(K\)
- this means for structures, the compiler sometimes must insert gaps between fields to maintain alignment (internal fragmentation)
- even if this padding isn't required within a structure, it sometimes must be added to the end to ensure an array of structures is aligned (external fragmentation)
- each structure has alignment requirement \(K_{max}\) = largest alignment of any element
- counts array elements individually as elements
- even if this padding isn't required within a structure, it sometimes must be added to the end to ensure an array of structures is aligned (external fragmentation)
4 struct exercises
5 Practice
CSPP practice problems 3.41 (p. 268) and 3.45 (p. 275)
Footnotes:
The int i
is 4 bytes, so it must have an address that's a multiple of 4.
Hence, the compiler adds 3 bytes of padding after the 1-byte char c
to achieve this.
The size of the struct as a whole must be a multiple of the size of the largest field.
Hence, the compiler adds 3 bytes of padding to the end.
S4
has both internal and external fragmentation, while S5
has just external.
struct new { int i; float f; char *c; short s[3]; }
This would eliminate the internal fragmentation and leave just 2 bytes of external fragmentation.
sizeof(struct old)
= 32 bytes (6 bytes of internal fragmentation, 6 bytes external)sizeof(struct new)
= 24 bytes (2 bytes external)