CS 332 w22 — Microkernels and Unikernels
Adapted from Robert Morris. 6.S081: https://pdos.csail.mit.edu/6.S081/2020/schedule.html
- Topic:
- What should a kernel do?
- What should its abstractions / system calls look like?
- Answers depend on the application, and on programmer taste!
- There is no single best answer
- This topic is more about ideas and less about specific mechanisms
- The traditional approach
- powerful abstractions, and
- a "monolithic" kernel implementation
- UNIX, Linux, osv
- The philosophy behind traditional kernels is powerful abstractions:
- portable interfaces
- files, not disk controller registers
- address spaces, not MMU access
- simple interfaces that hide complexity
- all I/O via FDs and read/write, not specialized for each device &c
- address spaces with transparent disk paging
- abstractions help the kernel manage and share resources
- process abstraction lets kernel be in charge of scheduling
- file/directory abstraction lets kernel be in charge of disk layout
- abstractions help the kernel enforce security
- file permissions
- processes with private address spaces
- lots of indirection
- e.g. FDs, virtual addresses, file names, PIDs
- helps kernel virtualize, hide, revoke, schedule, &c
- portable interfaces
- Powerful abstractions have led to big "monolithic" kernels
- kernel is one big program, like osv
- easy for kernel sub-systems to cooperate – no irritating boundaries
exec()
andmmap()
are part of both FS and VM system- relatively easy to add sym links, COW fork, mmap, &c
- all kernel code runs with high privilege – no internal security restrictions
- What's wrong with traditional kernels?
- big => complex => buggy/insecure
- perhaps over-general and thus slow
- how much code executes to send one byte via a UNIX pipe?
- buffering, locks, sleep/wakeup, scheduler
- many design decisions are baked in, can't be changed, may be awkward
- maybe I want to wait for a process that's not my child
- maybe I want to change another process's address space
- maybe DB is better at laying out B-Tree files on disk than kernel FS
- hard to create kernel "extensions" that others can use
- new device drivers, file systems, &c
- Microkernels – a different approach
- big idea: move most O/S functionality to user-space service processes
- [diagram: h/w, kernel, services (FS disk VM TCP NIC display), apps]
- kernel can be small
- address spaces, threads, IPC (inter-process communication)
- IPC lets threads send each other messages
- 1980s saw big burst of research on microkernel designs
- CMU's Mach perhaps the most influential
- used today in embedded systems, phone chips, car entertainment
- ideas (esp user-level servers and IPC) influential e.g. Windows and MacOS
- Why the interest in microkernels?
- focused, elegant, clean slate
- small -> more security – less code means fewer bugs to exploit
- small -> verifiable (see seL4)
- small -> easier to optimize
- you don't have to pay for features you don't use
- small -> avoid forcing design decisions on applications
- user-level -> may encourage modularity of O/S services
- user-level -> easier to extend / customize / replace user-level services
- user-level -> more robust – restart individual user-level services
- most bugs are in drivers, get them out of the kernel!
- can run/emulate multiple O/Ses, like a VMM
- Microkernel challenges
- What's a minimum kernel API?
- Need simple primitives on which to build exec, fork, mmap, &c
- Need to build the rest of the O/S at user level
- How to get good performance, despite IPC and less integration?
- L4
- has evolved over time, many versions and re-implementations
- used commercially today, in phones and embedded controllers
- representative of the micro-kernel approach
- emphasis on minimality:
- 7 system calls (Linux has 300+, osv has 19 or 22, depending on how you count)
- 13,000 lines of code
- L4 basic abstractions
- [diagram]
- address space ("task")
- thread
- IPC
- L4 system calls:
- create an address space
- create/destroy a thread in [another] address space
- send/recv message via IPC (addresses are thread IDs)
- map pages of your memory into another address space
- it must agree
- this happens via IPC – one task can modify another task's page table
- used to create new tasks, share memory
- intercept another address space's page faults – "pager"
- kernel delivers via IPC
- access device hardware (not a system call, happens directly)
- handle device interrupts
- kernel delivers via IPC
- Note L4 kernel is missing almost everything that Linux or even osv has
- file system,
fork()
,exec()
, pipes, device drivers, network stack, &c - If you want these, they have to be user-level code
- library or server process
- file system,
- how does L4 thread switching work?
- current user-level thread can yield for 3 reasons:
- IPC system call waits
- timer interrupt
yield()
system call
- L4 kernel saves user thread registers,
- picks a RUNNABLE thread to run,
- restores user registers,
- switches page table,
- jumps to user space
- no surprises here
- current user-level thread can yield for 3 reasons:
- how do L4 external pagers work?
- every task has a pager task
- page fault
- kernel suspends thread
- kernel sends fault info in IPC to pager
- pager picks one its own pages
- pager sends virtual page address in IPC reply to faulting thread
- kernel intercepts IPS, maps in target, resumes target
- every task has a pager task
- what can you use an L4 pager for?
- allocating memory – "sigma0" allocates on fault for early tasks
- copy-on-write fork
- coupled with a system call that revokes access
- mmap of file
- problem: IPC performance
- Microkernel programs do lots of IPC!
- Was expensive in early systems
- multiple kernel crossings, TLB misses, context switches, &c
- Cost of IPC caused many to dismiss microkernels
- L4 designers put huge effort into IPC performance
- Here's a slow IPC design
- patterned on UNIX pipes
- [diagram, message queue in kernel]
send(id, msg)
- append
msg
to queue in kernel, return
- append
recv(&id, &data)!
- if
msg
waiting in queue, remove, return - otherwise
sleep()
- if
- called "asynchronous" and "buffered"
- now the usual request-response pattern (RPC) involves:
- [diagram: 2nd message queue for replies]
- 4 system calls (user->kernel->user)
send()
->recv()
recv()
<-send()
- each may disturb CPU's caches (TLB, data, instruction)
- four message copies (two for request, two for reply)
- two context switches, two general-purpose schedulings
- L4's fast IPC
- "Improving IPC by Kernel Design," Jochen Liedtke, 1993
- synchronous
- [diagram]
send()
waits for target thread'srecv()
- common case: target is already waiting in
recv()
send()
jumps into target's user space, as if returning fromrecv()
- no real context switch, no scheduler loop
- unbuffered
- no queue in kernel
- since synchronous, kernel can copy directly between user buffers
- small messages in registers
- kernel
send()
path does not disturb many of the registers- e.g., no context switch
- no copying required for small messages
- since
send()
jumps into target's user space, along with registers
- since
- kernel
- huge messages as virtual memory grants
- again, no copy required, though kernel
send()
code must change page table
- again, no copy required, though kernel
- combined
call()
andsendrecv()
system calls- [diagram]
- IPC almost always used as request-response RPC
- thus wasteful to use separate
send()
andrecv()
system calls - client:
call()
: send a message, wait for response - server:
sendrecv()
: reply to one request, wait for the next one - 2x reduction in user/kernel crossings
- careful layout of kernel code to minimize cache footprint
- result: 20x reduction in IPC cost
- How to build a full operating system on a microkernel?
- Remember the idea was to move most features into user-level servers.
- File system, device drivers, network stack, process control, &c
- For embedded systems this can be fairly simple.
- What about services for general-purpose use, e.g. workstations, web servers?
- Really need compatibility for existing applications.
- E.g. the system needs mimic something like UNIX.
- Re-implement UNIX kernel services as lots of user-level services?
- Or: run existing Linux kernel as a process on top of the microkernel.
- An "O/S server".
- Perhaps not elegant, but pragmatic.
- Part of a path to adoption:
- Users might start by just running Linux apps.
- Then gradually exploit possibilites of underlying microkernel.
- Remember the idea was to move most features into user-level servers.
- What's the current situation?
- Microkernels are sometimes used for embedded computing
- Microcontrollers, Apple "enclave" processor
- Running custom software
- Microkernels, as such, never caught on for general computing
- No compelling story for why one should switch from Linux &c
- Many ideas from microkernel research have been adopted into modern UNIXes
- Mach spurred adoption of sophisticated virtual memory support
- Virtual machines are partially a response to the O/S server idea
- Loadable kernel modules are a response to need to extensibility
- Client/server e.g. DNS server, window server
- MacOS has microkernel-style IPC
- Microkernels are sometimes used for embedded computing
- A more recent innovation in kernel design: the unikernel
- whereas microkernel expands what takes place in user space
- a unikernel design eliminates the idea of user space entirely
- everything is the kernel
- goal is to create a purpose-built kernel designed to run one or a small number of applications
- all code is trusted, no overhead from switching between user and kernel mode
- assemble kernel out of exactly the modules the application needs
- attractive for cloud computing services
References:
- Härtig, Hermann, et al. "The performance of μ-kernel-based systems." ACM SIGOPS Operating Systems Review 31.5 (1997): 66-77.
- The Fiasco.OC Microkernel – a current L4 descendent: https://l4re.org/doc/
- fast IPC in L4: https://cs.nyu.edu/~mwalfish/classes/15fa/ref/liedtke93improving.pdf
- later evolution of L4: https://ts.data61.csiro.au/publications/nicta_full_text/8988.pdf
- Unikernels: Rise of the Virtual Library Operating System (MirageOS paper, see https://mirage.io/)
- http://unikernel.org/