CS 332 w22 — Race Conditions and Locks
Table of Contents
1 Too Much Milk
An example of a race condition and why simple solutions fail.
roommate 1 | roommate 2 | |
---|---|---|
3:00 | Look in fridge, out of milk | |
3:05 | Leave for store | |
3:10 | Arrive at store | Look in fridge, out of milk |
3:15 | Buy milk | Leave for store |
3:20 | Arrive home, put milk away | Arrive at store |
3:25 | Buy milk | |
3:30 | Arrive home, put milk away | |
3:35 | Oh no! |
We want solution with two key properties:
- safety: never more than one person buys milk
- liveness: if milk is needed, someone eventually buys it
1.1 Solution 1
Leave a note!
if (milk == 0) { if (note == 0) { note = 1; milk++; note = 0; } }
This violates safety, since one thread could check the milk and then get context switched.
1.2 Solution 2
Have each roommate write a note that they "might buy milk" before deciding to buy milk.
// roommate 1 // roommate 2 noteA = 1 noteB = 1 if (noteB == 0) { if (noteA == 0) { if (milk == 0) { if (milk == 0) { milk++; milk++; } } } } noteA = 0 noteB = 0
This ensures safety (only one milk bought)! Unfortunately, it's at the cost of liveness—it's possible for both threads to set their respective notes, and for both to check and decide not to buy milk.
1.3 Solution 3
Add a loop to have one roommate wait until the other has made a decision about whether to buy milk.
// roommate 1 // roommate 2 noteA = 1 noteB = 1 while (noteB == 1) { if (noteA == 0) { ; if (milk == 0) { } milk++; if (milk == 0) { } milk++; } } noteB = 0 noteA = 0
This solution gives us both safety and liveness. It is a special case of a more general approach called Peterson's algorithm, which works with any fixed number of threads (read more here). It is not without problems, however:
- the solution is complex and requires careful reasoning
- the solution is inefficient—while the roommate 1 thread is waiting, it is busy-waiting and consuming CPU resources. Busy-waiting can become a serious problem if the waiting thread is holding the processor waiting for an event than cannot occur until some preemptyed thread is re-scheduled to run.
- the solution may fail if the compiler or hardware reorders instructions (more on this in a bit)
There is a better way! What if we had a lock to control access to the shared state (i.e., amount of milk)?
lock.acquire();
if (milk == 0) {
milk++;
}
lock.release();
2 Reading: Locks
Read Chapter 28 through section 28.11 (p. 331–343) of the OSTEP book. It will introduce a basic kind of mutual exclusion lock called a spinlock, and describe several hardware instructions that can support such a lock.
3 Locks
- Use locks to provide mutual exclusion—when one thread holds the lock, no other thread can hold it (i.e., other threads are excluded)
- control access to shared state
- prevents other threads from observing intermediate state, making an arbitrary set of operations appear atomic to other threads
- Examples:
- bank account transactions and queries
- multiple threads calling
printf()
3.1 Properties
- provide two methods
Lock::acquire()
andLock::release()
(lock.unlock) - lock can be BUSY or FREE
- lock is initially FREE
acquire()
waits until lock is FREE then atomically makes it BUSY- even if multiple threads try to acquire the lock, at most one will succeed
- others will see the lock as BUSY and wait
release()
makes the lock FREE
Formally:
- mutual exclusion: at most one thread holds the lock
- progress: if no thread holds the lock and any thread attempts to acquire the lock, eventually some thread succeeds
- bounded waiting: if thread T attempts to acquire the lock, then there exists a bound on the number of times other threads can successfully acquire the lock before T does
- does not guarantee FIFO behavior
3.2 Spinlock Implementation
A spinlock is a lock where the processor waits in a loop for the lock to become free
- Assumes lock will be held for a short time
- Used to protect the CPU scheduler and to implement locks
osv
implementation:
void spinlock_acquire(struct spinlock* lock) { if (!synch_enabled) { return; } kassert(lock); intr_set_level(INTR_OFF); // can't grab the same lock again struct thread *curr = thread_current(); if (lock->holder != NULL && lock->holder == curr) { panic("lock holder trying to grab the same lock again"); } kassert(lock->holder == NULL || lock->holder != curr); while (lock->lock_status || __sync_lock_test_and_set(&lock->lock_status, 1) != 0); // Tell the C compiler and the processor to not move loads or stores // past this point, to ensure that the critical section's memory // references happen after the lock is acquired. __sync_synchronize(); lock->holder = curr; } void spinlock_release(struct spinlock* lock) { if (!synch_enabled) { return; } kassert(lock); lock->holder = NULL; // Tell the C compiler and the CPU to not move loads or stores // past this point, to ensure that all the stores in the critical // section are visible to other CPUs before the lock is released. __sync_synchronize(); __sync_lock_release(&lock->lock_status); __sync_synchronize(); intr_set_level(INTR_ON); }
3.2.1 Instruction Reordering
Can this panic?
// Thread 1 // Thread 2 p = some_computation(); while (!p_init) p_init = true; ; q = some_function(p); if (q != some_function(p)) panic();
It turns out that it can because the compiler or hardware might reorder the p_init
assignment to happen before the call to some_computation()
.
Why do compilers reorder instructions?
- Efficient code generation requires analyzing control/data dependency
- If variables can spontaneously change, most compiler optimizations become impossible
Why do CPUs reorder instructions?
- Write buffering: allow next instruction to execute while write is being completed
Fix: memory barrier
- Instruction to compiler/CPU
- All ops before barrier complete before barrier returns
- No op after barrier starts until barrier returns
__sync_synchronize()
is a built-in gcc function that acts as a memory barrier
4 Case Study: Thread-Safe Bounded Queue
bounded, thread-safe (aside: man malloc
), kind of data structure frequently used in OS kernels
// TSQueue.h // Thread-safe queue interface #define MAX_QSIZE 10; typedef struct { // Synchronization variables Lock lock; // in practice, pthread_mutex_t (or an osv-defined lock) // State variables int items[MAX_QSIZE]; int front; int nextEmpty; } TSQueue; void initTSQueue(); bool tryInsert(int item); bool tryRemove(int *item);
// Initialize the queue to empty // and the lock to free. void initTSQueue(TSQueue *q) { q->front = 0; q->nextEmpty = 0; } // Try to insert an item. If the queue is // full, return false; otherwise return true. bool tryInsert(TSQueue *q, int item) { bool success = false; lock_acquire(&q->lock); if ((q->nextEmpty - q->front) < MAX_QSIZE) { q->items[q->nextEmpty % MAX_QSIZE] = item; q->nextEmpty++; success = true; } lock_release(&q->lock); return success; } // Try to remove an item. If the queue is // empty, return false; otherwise return true. bool tryRemove(TSQueue *q, int *item) { bool success = false; lock_acquire(&q->lock); if (q->front < q->nextEmpty) { *item = q->items[q->front % MAX_QSIZE]; q->front++; success = true; } lock_release(&q->lock); return success; }
5 Summary
- shared object layer
- synchronization variable layer
- atomic instruction layer