CS 332 w22 — Caches and Address Translation

1 Caching

Cache
- Copy of data that is faster to access than the original
- Hit: if cache has copy
  - Desired data is resident in the cache
- Miss: if cache does not have copy
  - Load missing data into the cache
  - Overwrite (evict) resident data to make room, if necessary
Cache block
- Unit of cache storage (range of memory locations)
Temporal locality
- Programs tend to reference the same memory locations multiple times
- Example: instructions in a loop
Spatial locality
- Programs tend to reference nearby locations
- Example: data in a loop

Two kinds of cache write behavior
- Write through: changes sent immediately to next level of storage
- Write back: changes stored in cache until cache block is replaced

Intel Core i7 (pictured above)
- L1 caches are 32 or 48 KB
- L2 caches range from 256 KB to 2 MB depending on specific model
- L3 caches ranges from 4 MB to 24 MB
Apple M1 chip:
- 4 high-performance cores and 4 energy-efficient cores
  - M1 Pro and M1 Max have 8 high-performance and 2 energy-efficient
- Each performance core has a 192 KB instruction cache and 128 KB data cache (L1)
- Each efficient core has a 128 KB instruction cache and 64 KB data cache (L1)
- Performance cores share at 12 MB L2 cache, efficient cores share a 4 MB L2 cache
- The entire system on a chip (SoC) has a shared 16 MB cache

Going up the hierarchy: memory gets smaller, faster, and more expensive (per byte)
Going down the hierarchy: memory gets bigger, slower, and cheaper (per byte)

Events like context switches can cause a burst of misses:

Fully Associative cache
- Cache checks address against every entry and returns matching value, if any
Direct Mapped cache
- Cache hashes address to determine which location to check for a match
Set Associative cache
- Cache hashes address to determine which location to check
- Checks each set in parallel for a match

Conversion from memory address the program thinks it's referencing to the physical location of that memory cell
- Virtual address to physical address
- The range of memory addresses a program can use is called its address space
Program behaves correctly despite its memory being stored somewhere completely different than where it thinks it is stored
A useful fiction, transparent to the programmer
Goals:
- Memory protection
- Memory sharing: shared libraries, interprocess communication
- Sparse addresses: multiple regions of dynamic allocation (stack/heap)
- Efficiency: memory placement, runtime lookup, compact translation tables
- Portability

Virtual addresses range from 0 to bound
Physical addresses from base to base + bound
base and bound specific to each process, stored in special registers
- Saved on context switch
In a nutshell: each process reserves a range of physical memory
Pros:
- Simple, fast, safe
- Can relocate physical memory without changing process
Cons:
- Can't prevent process from overwriting its own code (only check exceeding bound)
- Can't share code/data with other processes
- Can't grow stack/heap as needed

OSTEP Chapter 15 (p. 151–161) provides a good walkthrough of address translation with examples and additional context.