CS 332 s20 — Unix Process API
Table of Contents
1 osv Repo Correction
Please read the announcement about changes to the osv repository.
2 Starting a New Process
To start a new process the kernel must
- allocate and initialize the process control block (PCB)
- allocate memory for the process
- copy the program from disk into the newly allocated memory
- allocate a user-level stack for user-level execution
- allocate a kernel-level stack for handling system calls, interrupts, and processor exceptions
To actually start the process running, the kernel has to take care of two other tasks:
- copy arguments into user memory
- for example, when you click on a file icon in MacOS or Windows, the window manager (GUI component) asks the kernel to start the application associated with that file
- the kernel then copies the file name into a speical region of memory in the new process
- by convention, arguments are copied to the base of the user-level stack
- transfer control to user mode
- most operating systems don't have special-case code for transferring to a new user process
- instead, the initial values for the processor state are pushed on the kernel stack
- then, the OS "returns" from the interrupt handler to the start of the new process
2.1 Windows
- the Windows approach to process management is to add a system call to create a process
- unsurprisingly called
CreateProcess
- turns out to be simple in theory and complex in practice, see the example call below
- unsurprisingly called
// Start the child process if (!CreateProcess(NULL, // No module name (use command line) argv[1], // Command line NULL, // Process handle not inheritable NULL, // Thread handle not inheritable FALSE, // Set handle inheritance to FALSE 0, // No creation flags NULL, // Use parent's environment block NULL, // Use parent's starting directory &si, // Pointer to STARTUPINFO structure &pi ) // Pointer to PROCESS_INFORMATION structure )
2.2 Unix
- Unix takes a different approach, splitting
CreateProcess
into two steps,fork
andexec
- complex in theory, simple in practice
fork
creates a complete copy of the parent process, with one key exception to differentiate the parent and child- the child sets up privileges, priorities, and I/O for the new program
exec
brings a new executable into memory and starts it running
- with this design,
fork
takes no arguments and returns an integer, andexec
takes two arguments (the name of the program to run and an array of arguments to pass to it)- in place of the ten parameters for
CreateProcess
- a testament to the strength of the design: has remained largely unchanged since Unix was designed in the early 1970s
- in place of the ten parameters for
2.3 Reading: Process API
Read Chapter 5 (p. 41–51) of the OSTEP book.
It goes over fork
, exec
, and the related system call wait
in more depth.
Make sure you understand the example code—you can download it here if you want to play around with it yourself.
2.4 Making Process Creation Faster
- The semantics of
fork
say the child's address space is a copy of the parent's - Implementing
fork
that way is slow- Have to allocate physical memory for the new address space
- Have to set up child's page tables to map new address space
- Have to copy parent's address space contents into child's address space
- Which you are likely to immediately overwrite with an
exec
- Which you are likely to immediately overwrite with an
2.4.1 Method 1: vfork
vfork
is the older (now uncommon) of the two approaches we'll cover- Instead of child's address space is a copy of the parent's, the semantics are child's address space is the parent's
- With a promise that the child won't modify the address space before doing an
execve
- Unenforced! You use
vfork
at your own peril
- Unenforced! You use
- When
execve
is called, a new address space is created and it's loaded with the new executable - Parent is blocked until
execve
is executed by child - Saves wasted effort of duplicating parent's address space, just to overwrite it
- With a promise that the child won't modify the address space before doing an
2.4.2 Method 2: copy-on-write
- Retains the original semantics, but copies only what is necessary rather than the entire address space
- On
fork
:- Create a new address space
- Initialize virtual memory with same mappings as the parent's (i.e., they both point to the same physical memory)
- No copying of address space contents have occurred at this point — with the sole exception of the top region of the stack
- Set both parent and child virtual memory to make all pages read-only
- If either parent or child writes to memory, an exception occurs
- When exception occurs, OS copies the page, adjusts the permission, etc.
3 Inter-process communication (IPC)
- Processes provide isolation (protection) — great!
- But sometimes you want processes to communicate / cooperate
- How can one process provide input to another?
- command line arguments (argv values)
- available only to parent process
- communicate through files
- one writes and the other reads
- optimize that: pipes
- use memory buffers, not files
- this works only if the processes are related (usually siblings)
- the same code needs to set up this communication channel on each end, meaning the communicating processes typically have to be forked from the same parent process
- named pipes
- like pipes, except that unrelated processes can use them
- need a namespace
- use file system names
- need a namespace
man 3 mkfifo
- like pipes, except that unrelated processes can use them
- named shared memory regions
shm_open()
followed bymmap()
- cut out the middle man
- sockets / Internet protocols
- robust — prepared to communicate using a heavyweight middle man!
- optimized when endpoints are on the same machine
- command line arguments (argv values)
3.1 IPC: signals
- Processes can register event handlers
- use
sigaction()
to do this in Linux
- use
- When the event occurs, process jumps to event handler routine
- Used to catch exceptions
- signal generated by the OS
- gives the application a chance to do something other than the default response to the exception
- Also used for inter-process (process-to-process) communication (IPC)
- signal is generated by another process
- send signal using
kill
(man 2 kill
) - only argument of the communication is a single int, the signal number
Signal | Value | Default Action | Comment |
---|---|---|---|
SIGHUP | 1 | Terminate | Hangup detected on controlling terminal or death of controlling process |
SIGINT | 2 | Terminate | Interrupt from keyboard |
SIGQUIT | 3 | Terminate (core dump) | Quit from keyboard |
SIGILL | 4 | Terminate (core dump) | Illegal Instruction |
SIGABRT | 6 | Terminate (core dump) | Abort signal from abort(3) |
SIGFPE | 8 | Terminate (core dump) | Floating point exception |
SIGKILL | 9 | Terminate | Kill signal |
SIGSEGV | 11 | Terminate (core dump) | Invalid memory reference |
SIGPIPE | 13 | Terminate | Broken pipe: write to pipe with no read |
SIGALRM | 14 | Terminate | Timer signal from alarm(2) |
SIGTERM | 15 | Terminate | Termination signal |
SIGUSR1 | 30,10,16 | Terminate | User-defined signal 1 |
SIGUSR2 | 31,12,17 | Terminate | User-defined signal 2 |
SIGCHLD | 20,17,18 | Ignore | Child stopped or terminated |
SIGCONT | 19,18,25 | Continue if stopped | |
SIGSTOP | 17,19,23 | Stop | Stop process |
SIGTSTP | 18,20,24 | Stop | Stop typed at tty |
SIGTTIN | 21,21,26 | Stop | tty input for background process |
SIGTTOU | 22,22,27 | Stop | tty output for background process |
3.1.1 Example use
- You're implementing Apache, a web server
- Apache reads a configuration file when it is launched
- Controls things like what the root directory of the web files is, what permissions there are on pieces of it, etc.
- Suppose you want to change the configuration while Apache is running
- If you restart the currently running Apache, you drop some unknown number of user connections
- Solution: send the running Apache process a signal
- It has registered an signal handler that gracefully re-reads the configuration file
4 Unix Shells
- Shells are just user-level programs
- They're mainly oriented towards launching other programs
- Using
fork()
/exec()
- Using
- They typically have few built-in commands
ls
,cat
, etc. are executables, opaque to the shell- (What must be built in?)
- Shells usually offer ways to build shell scripts
- E.g., some looping construct
- You can view everything you type into a shell as a program that is being simultaneously created and executed
4.1 Basic Operation
int main(int argc, char **argv) { while (1) { printf ("$ "); char *cmd = get_next_command(); int pid = fork(); if (pid == 0) { exec(cmd); panic("exec failed!"); } else { wait(pid); } } }
4.2 Jobs / Redirection
- Shells usually offer ways to make jobs — assemblages of executions
ls | grep *.c | less
pushd sub && make && popd
pushd sub; make; popd
- One way the shell helps you compose jobs is by input-output redirection
- You can make the output of one program the input of another, without ever writing to a file
4.2.1 Input/output redirection
$ ./myprog < input.txt > output.txt
- each process has an open file table
- by (universal) convention:
- 0: stdin
- 1: stdout
- 2: stderr
- A child process inherits the parent's open file table
- Redirection: the shell…
- copies its current stdin/stdout open file entries
- opens
input.txt
as stdin andoutput.txt
as stdout - forks
- restores original stdin/stdout
5 Homework
- To practice using
fork
and related system calls, complete at least three exercises from the homework for OSTEP Chapter 5 (your choice) and post your code and observations in this forum. - Reminder: lab 1 due 9pm tonight (April 20). This means you must commit an
end_lab1
tag to your repo by that time. You can use late days, see the course web page for the late work policy. - Lab 2 has been posted. It is due 9pm Friday, May 1.
Footnotes:
fork
returns twice, once in the parent process and once in the child process. It returns the child's PID to the parent and returns 0 to the child.
exec
replaces the instructions of the current process with another program, so it wouldn't make sense to return to where exec
was called
Because the OS can decide to run either the child or the parent first after a fork. Having the parent call wait
blocks it until the child terminates, making behavior deterministic.