CS 332 w22 — Software Protection and Virtual Machines

Table of Contents

1 Software Protection

1.1 Why Handle Memory Protection in Software

  • Simplify hardware
  • Application-level protection
    • e.g., a web browser protects against untrusted pages
  • Protection inside the kernel
    • e.g., protect against untrusted device drivers
  • Portable security
    • Create a common runtime environment across many devices that isolates apps from underlying system
  • Want to provide a software sandbox where untrusted code can run without posing a threat to the rest of the system

sandbox.png

1.2 Single-Language System

  • UNIX packet filters
    • Users can insert code into kernel to customize network processing (e.g., copy arriving packet headers to a user-level debugger)
    • Provide protection by restricting packet filter language to permit only safe filters
      • Only branch on packet contents; no loops

packetFilter.png

  • JavaScript
    • Website code executed on user's local machine
    • Typically run in an interpreter that can prevent invalid procedure calls and memory references
    • Must trust the interpreter and runtime libraries (sources of many vulnerabilities)

javascript.png

1.3 Most Systems Use Both Software and Hardware Protection

  • Interpreted code is often slow, so many runtimes put most functionality into system-specific libraries that compile to machine code
    • Increases potential vulnerabilities—any flawed library routine is an attack vector
  • JavaScript also vulnerable to cross-site scripting attacks where a compromised website runs JavaScript to gather sensitive data (e.g., logins) stored in local cookies
  • Windows runs web browsers as special processes with restricted permissions
  • For the paranoid: run each web page in its own virtual machine

1.4 Language-Independent Software Fault Isolation

  • Instead of trusting compilers, efficiently isolate application code
  • Compile everything into JavaScript?
  • Instead: a sandbox that can take machine instructions and modify them to guarantee safety (Google's Native Client, Microsoft's Application Domains)
    • Check for self-modifying instructions and privileged instructions
    • Insert instructions to check that each procedure call and memory access is valid
    • Use control and data flow to eliminate checks that can be proven unnecessary

      test r1, data.base
      if less-than, exception
      test r1, data.bound
      if greater-than, exception
      store data at r1
      

1.5 The Java Virtual Machine: Sandbox via Intermediate Code

  • Instead of compiling to machine code, compile to an intermediate form like Java Virtual Machine (JVM) byte code
  • Intermediate code can be deliberately structured to make sandboxing and analysis much easier
    • Code can contain annotations about which pointers need to be checked
  • Garbage collection also helps to restrict memory mischief
  • Many languages can now compile to Java byte code, making the JVM a kind of language-independent sandbox
    • Python, Ruby, JavaScript, Scala, Kotlin
    • A lot harder to do for languages "closer to the hardware" like C or Fortran

2 Virtualize Everything: the Virtual Machine

A software layer called a virtual machine manager (VMM) or hypervisor virtualizes the underlying host system allowing multiple guest OSes to run simultaneously (while isolated from each other and the host)

vm-idea.png

  • Process VM: Translates a set of OS and user-level instructions to those of another platform
    • wine, dosbox, JVM
  • System VM: Translates the hardware instructions used by one platform to those of another

vm-puzzle-diagram.png

  • Type 1: VMM runs directly on hardware
    • Original hypervisors were type 1
    • Microsoft Hyper-V, VMware ESXi, Xen
  • Type 2: VMM is an application within a host OS
    • qemu, VirtualBox

2.1 Example Use Cases of Virtual Machines

vm-use-cases.png

2.2 Benefits of VMs

  • Ideal for operating system development (as you've experienced!)
    • Kernel bug = system doesn’t boot or horrible corruption, want to isolate that
  • Developer runs multiple VMs on their machine, can quickly and easily test app across multiple platforms
  • Widely used in data centers (e.g., web hosting, cloud computing)
    • Each website should be isolated, but doesn't need a dedicated server to itself
    • Each physical server can host dozens of VMs
    • Some VMMs provide live migration, allowing guest VMs to be moved to a different server if load gets too high or if the client requests more resources

2.3 Making a Virtual CPU

2.3.1 Trap-and-Emulate

  • A virtual user mode and virtual kernel mode both running in user mode on the host system
  • A privileged instruction in the guest causes a trap into the VMM (virtual kernel mode)
    • Effect of this instruction emulated for the guest by the VMM
  • Adds unpredictable overhead
    • Better support for virtual machines in hardware has improved this

vm-emulation.png

2.3.2 Binary Translation

  • Not always a clean separation between normal and privileged
    • popf can be executed in user or kernel mode with different behavior
  • Trap-and-emulate can't work if no trap is generated
  • Solution: translate special instructions like popf just-in-time (JIT) as the guest executes
    • The Apple M1 runs x86 binaries
    • Mitigate overhead by caching translations

vm-translation.png

2.4 Modern CPUs Provide Extensive VM Support

  • Intel and AMD added new host and guest modes (root and not-root on Intel)
    • VMM runs in host mode, the guest OS runs in guest mode
    • Eliminates the need for binary translation
  • Control passed to VMM when guest tries to access virtualized resource
  • Dedicated support for VMM to set up direct connections between guests and I/O devices
  • Interrupt remapping allows guests to receive interrupts directly without involving the VMM
  • Enables very lightweight, or thin, hypervisors

2.5 Paravirtualization: Guest and VMM Meet in the Middle

  • We have been thinking of a VMM as providing the illusion that the guest is running on normal hardware
  • Instead, require the guest to be ported to the virtual environment
    • Replace hardware-specific code with code tailored to the virtual hardware layer provided by the VMM
    • Xen VMM presents a clean, simple device abstractions for I/O, guest must be modified to use these instead of "real" devices
      • Efficient, with good communication between guest and VMM
  • osv consumes an entire CPU spinning in a loop while sitting idle in qemu because underlying Linux does not know the osv thread is idle
    • With paravirtualization, hardware-dependent layer can trap into the host kernel immediately and yield the processor

2.6 Virtual Machine Page Tables

  • VMs need to do two translations on memory reference
    • Guest user memory to guest physical memory
    • Guest physical memory to host physical memory
  • This prevents bugs in the guest from overwriting other memory in the host

twovmm.png

  • Host to guest kernel, everything works fine
  • Guest kernel tries to transfer to guest user process
    • Privileged instruction, traps back to host
    • What page table to use? Guest table isn't real physical addresses, host table would give user control of guest kernel
  • Solution: shadow page tables that compose guest and host tables

shadowvmm.png

3 Reading: Xen and the Art of Virtualization

If you want to learn more about virtualization, read this foundational paper: Barham, Paul, et al. "Xen and the art of virtualization." ACM SIGOPS operating systems review 37.5 (2003): 164-177.