CS 332 w22 — Software Protection and Virtual Machines
Table of Contents
1 Software Protection
1.1 Why Handle Memory Protection in Software
- Simplify hardware
- Application-level protection
- e.g., a web browser protects against untrusted pages
- Protection inside the kernel
- e.g., protect against untrusted device drivers
- Portable security
- Create a common runtime environment across many devices that isolates apps from underlying system
- Want to provide a software sandbox where untrusted code can run without posing a threat to the rest of the system
1.2 Single-Language System
- UNIX packet filters
- Users can insert code into kernel to customize network processing (e.g., copy arriving packet headers to a user-level debugger)
- Provide protection by restricting packet filter language to permit only safe filters
- Only branch on packet contents; no loops
- JavaScript
- Website code executed on user's local machine
- Typically run in an interpreter that can prevent invalid procedure calls and memory references
- Must trust the interpreter and runtime libraries (sources of many vulnerabilities)
1.3 Most Systems Use Both Software and Hardware Protection
- Interpreted code is often slow, so many runtimes put most functionality into system-specific libraries that compile to machine code
- Increases potential vulnerabilities—any flawed library routine is an attack vector
- JavaScript also vulnerable to cross-site scripting attacks where a compromised website runs JavaScript to gather sensitive data (e.g., logins) stored in local cookies
- Windows runs web browsers as special processes with restricted permissions
- For the paranoid: run each web page in its own virtual machine
1.4 Language-Independent Software Fault Isolation
- Instead of trusting compilers, efficiently isolate application code
- Compile everything into JavaScript?
- We might be headed this way eventually (see The Birth & Death of JavaScript)
- Instead: a sandbox that can take machine instructions and modify them to guarantee safety (Google's Native Client, Microsoft's Application Domains)
- Check for self-modifying instructions and privileged instructions
- Insert instructions to check that each procedure call and memory access is valid
Use control and data flow to eliminate checks that can be proven unnecessary
test r1, data.base if less-than, exception test r1, data.bound if greater-than, exception store data at r1
1.5 The Java Virtual Machine: Sandbox via Intermediate Code
- Instead of compiling to machine code, compile to an intermediate form like Java Virtual Machine (JVM) byte code
- Intermediate code can be deliberately structured to make sandboxing and analysis much easier
- Code can contain annotations about which pointers need to be checked
- Garbage collection also helps to restrict memory mischief
- Many languages can now compile to Java byte code, making the JVM a kind of language-independent sandbox
- Python, Ruby, JavaScript, Scala, Kotlin
- A lot harder to do for languages "closer to the hardware" like C or Fortran
2 Virtualize Everything: the Virtual Machine
A software layer called a virtual machine manager (VMM) or hypervisor virtualizes the underlying host system allowing multiple guest OSes to run simultaneously (while isolated from each other and the host)
- Process VM: Translates a set of OS and user-level instructions to those of another platform
- wine, dosbox, JVM
- System VM: Translates the hardware instructions used by one platform to those of another
- Type 1: VMM runs directly on hardware
- Original hypervisors were type 1
- Microsoft Hyper-V, VMware ESXi, Xen
- Type 2: VMM is an application within a host OS
- qemu, VirtualBox
2.1 Example Use Cases of Virtual Machines
2.2 Benefits of VMs
- Ideal for operating system development (as you've experienced!)
- Kernel bug = system doesn’t boot or horrible corruption, want to isolate that
- Developer runs multiple VMs on their machine, can quickly and easily test app across multiple platforms
- Widely used in data centers (e.g., web hosting, cloud computing)
- Each website should be isolated, but doesn't need a dedicated server to itself
- Each physical server can host dozens of VMs
- Some VMMs provide live migration, allowing guest VMs to be moved to a different server if load gets too high or if the client requests more resources
2.3 Making a Virtual CPU
2.3.1 Trap-and-Emulate
- A virtual user mode and virtual kernel mode both running in user mode on the host system
- A privileged instruction in the guest causes a trap into the VMM (virtual kernel mode)
- Effect of this instruction emulated for the guest by the VMM
- Adds unpredictable overhead
- Better support for virtual machines in hardware has improved this
2.3.2 Binary Translation
- Not always a clean separation between normal and privileged
popf
can be executed in user or kernel mode with different behavior
- Trap-and-emulate can't work if no trap is generated
- Solution: translate special instructions like popf just-in-time (JIT) as the guest executes
- The Apple M1 runs x86 binaries
- Mitigate overhead by caching translations
2.4 Modern CPUs Provide Extensive VM Support
- Intel and AMD added new host and guest modes (root and not-root on Intel)
- VMM runs in host mode, the guest OS runs in guest mode
- Eliminates the need for binary translation
- Control passed to VMM when guest tries to access virtualized resource
- Dedicated support for VMM to set up direct connections between guests and I/O devices
- Interrupt remapping allows guests to receive interrupts directly without involving the VMM
- Enables very lightweight, or thin, hypervisors
2.5 Paravirtualization: Guest and VMM Meet in the Middle
- We have been thinking of a VMM as providing the illusion that the guest is running on normal hardware
- Instead, require the guest to be ported to the virtual environment
- Replace hardware-specific code with code tailored to the virtual hardware layer provided by the VMM
- Xen VMM presents a clean, simple device abstractions for I/O, guest must be modified to use these instead of "real" devices
- Efficient, with good communication between guest and VMM
- osv consumes an entire CPU spinning in a loop while sitting idle in qemu because underlying Linux does not know the osv thread is idle
- With paravirtualization, hardware-dependent layer can trap into the host kernel immediately and yield the processor
2.6 Virtual Machine Page Tables
- VMs need to do two translations on memory reference
- Guest user memory to guest physical memory
- Guest physical memory to host physical memory
- This prevents bugs in the guest from overwriting other memory in the host
- Host to guest kernel, everything works fine
- Guest kernel tries to transfer to guest user process
- Privileged instruction, traps back to host
- What page table to use? Guest table isn't real physical addresses, host table would give user control of guest kernel
- Solution: shadow page tables that compose guest and host tables
3 Reading: Xen and the Art of Virtualization
If you want to learn more about virtualization, read this foundational paper: Barham, Paul, et al. "Xen and the art of virtualization." ACM SIGOPS operating systems review 37.5 (2003): 164-177.