January 23, 2022
A pipe is a sequential communication channel between two endpoints, supporting writes on one end, and reads on the other. Reads and writes are asynchronous and buffered, up to some internally defined size. The OS kernel allocates and manages this buffer. A user process calling read
on the read end of a pipe will block1 until there are bytes to read (read
is allowed to do a partial read, meaning it returns having read fewer bytes than the user requested); a write will block if there is no more room in the internal buffer. Pipes are a simple way to support interprocess communication, especially between processes started by fork
.
Pull the latest changes to the baseline code by running
$ git pull upstream master
and merge.
From the user’s perspective, pipes are just file descriptors that applications can read and write using the standard read
and write
system calls (sys_read
and sys_write
from lab 1). Thus, functions to read and write from pipes need to operate on file structs retrieved from a process’s open file table. But if sys_read
calls fs_read_file
, how can it handle the fact that reading from a pipe works differently underneath than reading from a file. To answer this, we can look at the implementation of fs_read_file
in kernel/fs/fs.c
:
ssize_t
struct file *file, void *buf, size_t count, offset_t *ofs)
fs_read_file(
{ssize_t rs = 0;
if (file->oflag != FS_WRONLY) {
rs = file->f_ops->read(file, buf, count, ofs);
}return rs;
}
This function does two things:
FS_WRONLY
)f_ops
field to call its read
operationThe f_ops
field is a pointer to a struct file_operations
. This struct contains four function pointers, and is how we can control the behavior of read
, write
, readdir
, and close
for a particular file struct. For example, stdin
is a file struct that has specialized behavior. In kernel/console.c
, we find
static struct file_operations stdin_ops = {
.read = stdin_read, };
In the console_init
function that initializes stdin
and stdout
, we find
stdin.oflag = FS_RDONLY;
stdout.oflag = FS_WRONLY;
stdin.f_ops = &stdin_ops; stdout.f_ops = &stdout_ops;
This is making it so that when fs_read_file
calls f_ops->read(file, buf, count, ofs)
with stdin
, stdin_read
gets called. We can do the same thing with pipe operations:
static ssize_t pipe_read(struct file *file, void *buf, size_t count, offset_t *ofs);
static ssize_t pipe_write(struct file *file, const void *buf, size_t count, offset_t *ofs);
static void pipe_close(struct file *p);
static struct file_operations pipe_ops = {
.read = pipe_read,
.write = pipe_write,
.close = pipe_close };
Like for stdin
and stdin_ops
, we would initialize a pipe’s f_ops
field to point to pipe_ops
.
Though a pipe appears as just another file to the user, internally, it has more to keep track of than just what’s in osv
’s current file struct. This pipe-specific data likely includes
char
array) to hold bytes written to the pipeAs discussed above, pipe operations will need to apply to file structs. One possibility would be to add all the information for pipes directly to the file struct. However, this is not a very flexible or modular solution. For one thing, it would involve allocating a pipe buffer for every file struct whether or not it was actually a pipe.
A better and more general solution is to add a generic void *info
field to the file struct that can be used to point to extra information that various kinds of files might need. When creating a pipe, then, you can use fs_alloc_file
to allocate a file struct, and then initialize the info
field. In pipe_read
, pipe_write
, and pipe_close
, we can retrieve this information with
struct pipe *p = (struct pipe*) file->info;
In terms of implementation, pipes can use a blocking bounded queue like the one included in the condition variable notes to store data written to the pipe. OSTEP Chapter 30 (section 30.2) walks through the implementation of this kind of buffer, including why two condition variable are needed. It’s up to you whether to integrate a blocking bounded queue as a separate kernel data structure that’s used in the implementation of pipes, or to adapt the code for the pipe implementation. Note that the code in the notes is for a queue of int
s, while a pipe should have a buffer of bytes (char
s).
The pipe
system call does not specify how large the bounded buffer needs to be, but you may use 512 bytes as the buffer size. You should be able to write any number of bytes to a pipe (though write
will eventually block if no other process is reading from the pipe). Concurrent operations can happen in any order, but each operation completes as an atomic unit.
Since pipes support concurrency, your pipe implementation will need to use locks and condition variables.
The pipe
system call creates a pipe (a holding area for written data) and opens two file descriptors, one for reading and one for writing. A write to a pipe with no open read descriptors will return an error. A read from a pipe with no open write descriptors will return any remaining buffered data, and 0 to indicate EOF (end-of-file) if no buffered data remains.
For memory management, you can use kmalloc
and kfree
or the kmem_cache
-based approach used by the bounded queue code.
The reference solution modified include/kernel/fs.h
and kernel/syscall.c
and added new files include/kernel/pipe.h
and kernel/pipe.c
.
/*
* Corresponds to int pipe(int* fds);
*
* Creates a pipe and two open file descriptors. The file descriptors
* are written to the array at fds, with fds[0] the read end of the
* pipe and fds[1] as the write end of the pipe.
*
* Return:
* ERR_OK on success
* ERR_FAULT if fds address is invalid
* ERR_NOMEM if 2 new file descriptors are not available
*/
sysret_tvoid *arg); sys_pipe(
After you implement the system calls described above, test your lab 3 code by either running individual tests in osv
shell (i.e., make qemu
and then enter the name of the test file in user/lab3
without the .c
), or run python3 test.py 3
to run all the tests.
Some of the tests depend on fork
, wait
, and exit
. If you were not able to get these working in lab 2, please see me.
When debugging, I would recommend running individual tests in osv
. test.py
depends on observing TEST-NAME passed
in the output from osv
, and the presence of debugging print statements can interfere with this (as output from various processes can get mixed together). It’s also important, therefore, that you disable any print statements before submission to Gradescope.
You will submit your work for this project via Gradescope.
Gradescope lets you submit via GitHub, which is probably the easiest method. All you’ll need to do is connect your GitHub account (the Gradescope submission page has a button for this) and select the repository and branch you wish to submit. Alternatively, you can create a zip file of the osv-w22
directory and upload that. The the arch
, include
and kernel
directories from your submission will be used.
When you submit, the autograder will compile your code and run the test cases. The Gradescope autograder will run each test multiple times.
Although you are allowed submit your answers as many times as you like, you should not treat Gradescope as your only debugging tool. Many people may submit their projects near the deadline, and thus it will Gradescope take longer to process the requests. You may not get feedback in a timely manner to help you debug problems.
This lab will be graded out of 90 points, as shown in the table below. Comments explaining your approach can help earn partial credit if there are tests that don’t pass. Poor coding style can lose points, so make sure to submit clean, well-organized code.
Test | Points |
---|---|
pipe-bad-args |
10 |
pipe-basic |
10 |
pipe-errors |
10 |
pipe-race |
20 |
pipe-robust |
20 |
pipe-test |
20 |
when a user process blocks in read
(or another system call), this means that system call does not immediately return (think condition variable)↩︎