CS 208 s22 — Data Structure Representation: structs

1. Exericse
2. read_six_numbers
3. Structures
- 3.1. Examples
- 3.2. Data Alignment
4. struct exercises
5. Practice

1 Exericse

Write the assembly for the following function

#include <stdio.h>

void parse(char *input, char *name, int *age) {
    sscanf(input, "%s, %d", name, age);
}

Assume the string "%s, %d" will be stored at address .LC0.¹

2 `read_six_numbers`

Here we see leaq used to actually compute pointers, rather that just do efficient arithmetic. Arguments to sscanf beyond the sixth are passed via the stack—the function is compiled to read them from there.

Forms of stack allocation:

subq $___, %rsp
pushq
(also callq, more about this next class)

Forms of stack deallocation:

addq $___, %rsp
popq
(also ret, more about this next class)

3 Structures

Two ways to create data types in C: structures (struct) and unions (union) (we won't worry about unions in this course)

// a way of combining different types of data together
struct song {
    char *title;
    int length_in_seconds;
    int year_released;
};
struct song song1;
song1.title = "What is Urinetown?";
song1.length_in_seconds = 213;
song1.year_released = 2001;

variable declarations like any other type:
- struct name name1;
- struct name *pn;
- struct name name_ar[3];
common to use typedef to give the struct type a more concise alias
- typedef struct song song_t; makes it so we can use song_t anywhere we would need to use struct song
access fields with . or -> in the case of the pointer (p->field is shorthand for (*p).field)
like arrays, struct elements are stored in a contiguous region with a pointer to the first byte
- what will sizeof(struct song) return? (sizeof is a built-in operator that returns the size, in bytes, of a type)
  - 16 bytes
- compiler maintains the byte offset information needed to access additional fields (e.g., length_in_seconds is 8 bytes from the start of the struct)
- can find offset of individual fields using offsetof(type, member)

3.1 Examples

struct rec {
    int a[4];
    long i;
    struct rec *next;
};

fields are laid out in memory in same order in which they were declared
machine code knows nothing about structures, all just byte offsets from a pointer

long get_i(struct rec *r) {
    return r->i;
}

the compiler knows the field i is always 16 bytes (the size of the array of 4 ints a) from the start of the struct:

get_i:
        movq 16(%rdi), %rax
        ret

long* addr_of_i(struct rec *r) {
    return &(r->i);
}

note to get the address of the i field, we use lea to add 16 to the address of the struct

addr_of_i:
        leaq 16(%rdi), %rax
        ret

struct rec** addr_of_next(struct rec *r) {
    return &(r->next);
}

the next field is 24 bytes from the start of the struct (16 for a + 8 for i)

addr_of_next:
        leaq 24(%rdi), %rax
        ret

int get_array_elem (struct rec *r, long index) {
    return r->a[index];
}

since a is located at the start of the struct, accessing an element requires no offset, just normal array indexing

get_array_elem:
        movl (%rdi, %rsi, 4), %eax
        ret

3.2 Data Alignment

suppose a processor always fetches 8 bytes from an address that must be a multiple of 8
- if every double is guaranteed to have a memory address that is a multiple of 8, then it's guaranteed to take only a single operation to read
- otherwise, it would take two operations if a double were split across two 8-byte blocks
this kind of behavior is typical of hardware interfacing between the processor and memory
- hence, systems institute alignment restrictions to improve memory performance
- Intel recommends data alignment to improve performance
x86-64 alignment principle: any primitive object of $K$ bytes must have an address that is a multiple of $K$
this means for structures, the compiler sometimes must insert gaps between fields to maintain alignment (internal fragmentation)
- even if this padding isn't required within a structure, it sometimes must be added to the end to ensure an array of structures is aligned (external fragmentation)
  - each structure has alignment requirement $K_{max}$ = largest alignment of any element
  - counts array elements individually as elements

4 struct exercises

why does struct S4 have 3 bytes of padding added after c and after d?²
what kinds of fragmentation do struct S4 and struct S5 each have?³

how would you reorder the fields of struct old in struct new to minimize the overall size of the struct?⁴

5 Practice

CSPP practice problems 3.41 (p. 268) and 3.45 (p. 275)

Footnotes:

The int i is 4 bytes, so it must have an address that's a multiple of 4. Hence, the compiler adds 3 bytes of padding after the 1-byte char c to achieve this. The size of the struct as a whole must be a multiple of the size of the largest field. Hence, the compiler adds 3 bytes of padding to the end.

S4 has both internal and external fragmentation, while S5 has just external.

⁴

struct new {
    int i;
    float f;
    char *c;
    short s[3];
}

This would eliminate the internal fragmentation and leave just 2 bytes of external fragmentation.

sizeof(struct old) = 32 bytes (6 bytes of internal fragmentation, 6 bytes external)
sizeof(struct new) = 24 bytes (2 bytes external)