CS 332 w22 — The Process Model

1. Background: Regions of Memory
2. Reading: The Process
- 2.1. QUICK CHECKS
3. Notes on Processes

1 Background: Regions of Memory

Though processes typically interact with memory as a long array of bytes, it is actually separated into various segments or memory regions. Memory allocated at compile time is located on the stack, which resides at high addresses in memory. As more data is added to the stack, the region grows to include lower addresses, so we say the stack grows down.

Memory allocated dynamically using malloc is placed on the heap, which resides somewhere in the middle of memory. As more data is added to the heap, the region grows to include higher addresses, so we say the heap grows up.

The other three regions (static data, literals, and instructions) are fixed in size and get initialized when a program starts running.

This C Tutor example demonstrates how strings are allocated in various regions of memory. Note that C Tutor labels the heap-allocated string and the string literal as both being in the "Heap", but pay attention to the pointer values that get printed out. The stack-allocated string is located at a high address, the heap-allocated string at a middle address, and the static literal at a low address.

2 Reading: The Process

Read The Abstraction: The Process (chapter 4, p. 27–36) from the OSTEP book. Remember that the whole book is available as a single pdf here if you prefer that. Key questions to think about:

What kinds of data/resources are associated with a process?
What would figure 4.1 look like when running multiple instances of the Program (i.e., multiple processes)?

Important terms to understand from the reading:

process
address space
program counter (instruction pointer)
stack

Figure 4.5 mentions xv6, which is a teaching operating system like osv. Take a look at include/kernel/proc.h to see how osv organizes process data. In osv, each process has a single thread, which keeps track of the process state, context, and trapfram (see include/kernel/thread.h). Note that osv separates architecture-dependent and -independent code, so you'll find the definitions for the context and trapframe structs in arch/x86-64/include/arch/trap.h and arch/x86-64/include/arch/cpu.h.

2.1 QUICK CHECKS

Why does the OS need to keep track of a process' register context?¹
What does it mean for a Unix process to be in a zombie state? Why is it useful?²

3 Notes on Processes

3.1 What is a Process?

The process is the OS's abstraction for execution
- A process is a program in execution
Simplest (classic) case: a sequential process
- An address space (an abstraction of memory)
- A single thread of execution (an abstraction of the CPU)
A sequential process is:
- The unit of execution
- The unit of scheduling
- The dynamic (active) execution context
  - vs. the program + static, just a bunch of bytes

Figure 3: A diagram illustrating the high-level idea of a process. Each process has it's own "memory" and "CPU", which in reality are virtualized shared resources. A program (like Chrome.exe) can have multiple running instances (processes). Note that the operating system is not a process—it's just a block of code.

3.2 What's in a process?

A process consists of (at least):
- An address space, containing
  - the code (instructions) for the running program
  - the data for the running program (static data, heap data, stack)
- CPU state, consisting of
  - The program counter (PC), indicating the next instruction
  - The stack pointer
  - Other general purpose register values
- A set of OS resources
  - open files, network connections, sound channels, …
In other words, it's all the stuff you need to run the program
- or to re-start it, if it's interrupted at some point

3.2.1 A process's address space (idealized)

SP is the stack pointer, PC is the program counter

3.3 OS Process Representation

(Like most things, the particulars depend on the specific OS, but the principles are general)
The name for a process is called a process ID (PID)
- An integer
The PID namespace is global to the system
- Only one process at a time has a particular PID
Operations that create processes return a PID
- E.g., fork()
Operations on processes take PIDs as an argument
- E.g., kill(), wait()
Much more on these calls in a later topic

The OS maintains a data structure to keep track of a process's state
- Called the process control block (PCB) or process descriptor
- Identified by the PID
OS keeps all of a process's execution state in (or linked from) the PCB when the process isn't running
- PC, SP, registers, etc.
- when a process is unscheduled, the execution state is transferred out of the hardware registers into the PCB
- (when a process is running, its state is spread between the PCB and the CPU)
Note: It's natural to think that there must be some esoteric techniques being used
- fancy data structures that you'd never think of yourself
- Wrong! It's pretty much just what you'd think of!

3.3.1 The PCB

The PCB is a data structure with many, many fields:
- process ID (PID)
- parent process ID (PPID)
- execution state
- program counter, stack pointer, registers
- address space info
- UNIX user id (uid), group id (gid)
- scheduling priority
- accounting info
- pointers for state queue
- …

Figure 5: This is (a simplification of) what each of those PCBs looks like inside!

PCBs and CPU state
- When a process is running, its CPU state is inside the CPU
  - PC, SP, registers
  - CPU contains current values
- When the OS gets control because of a system call or other source of control transfer, the OS saves the CPU state of the running process in that process's PCB
- When the OS returns the process to the running state, it loads the hardware registers with values from that process's PCB + general purpose registers, stack pointer, instruction pointer
- The act of switching the CPU from one process to another is called a context switch
  - systems may do 100s or 1000s of switches/sec.
  - takes a few microseconds on today's hardware
  - the diagram below shows a very simplified view of a context switch
    - the address space for each process lives in shared physical memory, isolated from each other by the OS
- Choosing which process to run next is called scheduling (more on this in a future topic)

3.3.2 Open File Management

OS needs to keep track of what files processes have opened, and provide a mechanism for processes to read and write them
A common way to implement this is to include an array of open files as part of the PCB
- Where each entry in the array is a pointer to a data structure describing an open file
  - struct file *files[MAX_FILES]
- (We'll ignore the contents of these open file structures until we discuss file systems towards the end of the term)
When a process opens a file, the kernel creates the file structure, adds an entry to the array, and returns the index of the new entry (i.e., returns an integer)
- This integer is the file descriptor
This has several nice properties
- The internal details of file management are hidden (and protected) from user code
- This enables the OS to perform optimizations such as having processes that open the same file refer to the same file structure

3.4 Process Execution States

Each process has an execution state, which indicates what it's currently doing
- ready: waiting to be assigned to a CPU
  - could run, but another process has the CPU
- running: executing on a CPU
  - it's the process that currently controls the CPU
- waiting (aka blocked): waiting for an event, e.g., I/O completion, or a message from (or the completion of) another process
  - cannot make progress until the event happens
As a process executes, it moves from state to state
- UNIX: run ps, STAT column shows current state
- which state is a process in most of the time?

Footnotes:

Because a process can move out of the running state and be replaced with a different running process. When the swapped-out process is run again, its register context will need to be loaded back onto the CPU. Hence, the OS must save it at the time a process is descheduled or blocked.

A zombie process is one that has terminated, but whose data has not been deallocated. This state is useful because it allows the parent process to examine the terminated process' return code (e.g., to check if it completed successfully).