CS 208 f21 — System I/O
Table of Contents
- three sets of functions for input/output
- low-level operating system functions (Unix I/O)
- Standard C library functions (Standard I/O)
- Robust I/O functions created by the authors of our textbook
- take care of some good coding practice things: error checking, signal handling, short counts
- Both Standard and RIO functions built on top of Unix I/O
1 Unix I/O
- A Linux file is a sequence of m bytes:
- B0 , B1 , …. , Bk , …. , Bm-1
- Cool fact: All I/O devices are represented as files:
/dev/sda2
(/usr disk partition)/dev/tty2
(terminal)- Even the kernel is represented as a file:
/boot/vmlinuz-5.4.0-89-generic
(kernel image)/proc
(kernel data structures)- Elegant mapping of files to devices allows kernel to export simple interface called Unix I/O:
- these are system calls
- Opening and closing files
open()
andclose()
- Reading and writing a file
read()
andwrite()
- Changing the current file position (seek)
- indicates next offset into file to read or write
lseek()
1.1 File Types
- Each file has a type indicating its role in the system
- Regular file: Contains arbitrary data
- Directory: Index for a related group of files
- Socket: For communicating with a process on another machine
- Other file types beyond our scope
- Named pipes (FIFOs)
- Symbolic links
- Character and block devices
1.1.1 Regular Files
- A regular file contains arbitrary data
- Applications often distinguish between text files and binary files
- Text files are regular files with only ASCII or Unicode characters
- Binary files are everything else
- e.g., object files, JPEG images
- Kernel doesn’t know the difference!
- Text file is sequence of text lines
- Text line is sequence of chars terminated by newline char (
'\n'
)- Windows and Internet protocols:
"\r\n"
- Windows and Internet protocols:
- Text line is sequence of chars terminated by newline char (
1.1.2 Directories
- Directory consists of an array of links
- Each link maps a filename to a file
- Each directory contains at least two entries
.
(dot) is a link to itself..
(dot dot) is a link to the parent directory in the directory hierarchy
- Commands for manipulating directories
mkdir
: create empty directoryls
: view directory contentsrmdir
: delete empty directory
- Locations of files in the hierarchy denoted by pathnames
- Absolute pathname starts with
'/'
and denotes path from root (top of the hierarchy)/home/awb/hello.c
- Relative pathname denotes path from current working directory (cwd)
- if cwd is
/home/mtie
, then relative path would be../awb/hello.c
- if cwd is
- Absolute pathname starts with
1.2 Working with Files
Opening a file informs the kernel that you are getting ready to access that file
#include <fcntl.h> #include <unistd.h> int main() { int fd = open("file2.txt", O_RDONLY); char buf[1024]; int rval; if ((rval = read(0, buf, 1024)) < 0) { write(2, "read error", 11); _exit(1); } if ((rval = write(1, buf, rval)) < 0) { write(2, "write error", 12); _exit(1); } if ((rval = close(fd)) < 0) { write(2, "close error", 12); _exit(1); } }
- Returns a small identifying integer file descriptor
- Lowest numbered file descriptor not currently open for the process
fd == -1
indicates that an error occurred
- Each process created by a Linux shell begins life with three open files associated with a terminal:
- 0: standard input (stdin)
- 1: standard output (stdout)
- 2: standard error (stderr)
Closing a file informs the kernel that you are finished accessing that file. Always check return values from system calls!
Reading a file copies bytes from the current file position to memory, and then updates file position.
- Returns number of bytes read from file
fd
intobuf
- Return type
ssize_t
is signed integer rval < 0
indicates that an error occurred- Short counts (
ravl < sizeof(buf)
) are possible and are not errors!
- Return type
Writing a file copies bytes from memory to the current file position, and then updates current file position.
- Returns number of bytes written from
buf
to filefd
rval < 0
indicates that an error occurred- As with reads, short counts are possible and are not errors!
1.3 Representing Open Files
Two descriptors referencing two distinct open files. Descriptor 1 (stdout) points to terminal, and descriptor 4 points to open disk file:
Two distinct descriptors sharing the same disk file through two distinct open file table entries:
- E.g., Calling open twice with the same filename argument
1.4 I/O Redirection
Question: How does a shell implement I/O redirection?
ls > foo.txt
Answer: By calling the dup2(oldfd, newfd)
function
- Copies (per-process) descriptor table entry
oldfd
to entrynewfd
What would be printed for a file containing "abcde"?1
#include "csapp.h" int main(int argc, char *argv[]) { char c1, c2, c3; int fd1 = Open(argv[1], O_RDONLY, 0); int fd2 = Open(argv[1], O_RDONLY, 0); int fd3 = Open(argv[1], O_RDONLY, 0); Dup2(fd2, fd3); Read(fd1, &c1, 1); Read(fd2, &c2, 1); Read(fd3, &c3, 1); printf("c1 = %c, c2 = %c, c3 = %c\n", c1, c2, c3); return 0; }
2 Standard I/O Functions
The C standard library (libc.so) contains a collection of higher-level standard I/O functions. Examples of standard I/O functions:
- Opening and closing files (
fopen
andfclose
) - Reading and writing bytes (
fread
andfwrite
) - Reading and writing text lines (
fgets
andfputs
) - Formatted reading and writing (
fscanf
andfprintf
)
2.1 Buffered I/O
Standard I/O models open files as streams.
- Abstraction for a file descriptor and a buffer in memory
- C programs begin life with three open streams (defined in
stdio.h
)stdin
(standard input)stdout
(standard output)stderr
(standard error)
- Applications often read/write one character at a time
- getc, putc, ungetc
- gets, fgets
- Read line of text one character at a time, stopping at newline
- Implementing as Unix I/O calls expensive
- read and write require Unix kernel calls
- >10,000 clock cycles
- Solution: Buffered read
- Use Unix read to grab block of bytes
- User input functions take one byte at a time from buffer
- Refill buffer when empty
3 Robust I/O (RIO)
RIO is a set of wrappers that provide efficient and robust I/O in apps, such as network programs that are subject to short counts.
- RIO provides two different kinds of functions
- Unbuffered input and output of binary data
rio_readn
andrio_writen
#include "csapp.h" ssize_t rio_readn(int fd, void *usrbuf, size_t n); ssize_t rio_writen(int fd, void *usrbuf, size_t n); // Return: num. bytes transferred if OK, 0 on EOF (rio_readn only), -1 on error
- Buffered input of text lines and binary data
rio_readlineb
andrio_readnb
#include "csapp.h" void rio_readinitb(rio_t *rp, int fd); ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen); ssize_t rio_readnb(rio_t *rp, void *usrbuf, size_t n); // Return: num. bytes read if OK, 0 on EOF, -1 on error
- Unbuffered input and output of binary data
- Download from http://csapp.cs.cmu.edu/3e/code.html
src/csapp.c
andinclude/csapp.h
- Same interface as Unix read and write
- Especially useful for transferring data on network sockets
/* * rio_readn - Robustly read n bytes (unbuffered) */ ssize_t rio_readn(int fd, void *usrbuf, size_t n) { size_t nleft = n; ssize_t nread; char *bufp = usrbuf; while (nleft > 0) { if ((nread = read(fd, bufp, nleft)) < 0) { if (errno == EINTR) /* Interrupted by sig handler return */ nread = 0; /* and call read() again */ else return -1; /* errno set by read() */ } else if (nread == 0) break; /* EOF */ nleft -= nread; bufp += nread; } return (n - nleft); /* Return >= 0 */ }
Buffered RIO input functions efficiently read text lines and binary data from a file partially cached in an internal memory buffer.
rio_readlineb
reads a text line of up tomaxlen
bytes from filefd
and stores the line inusrbuf
- Especially useful for reading text lines from network sockets
- Stopping conditions
- maxlen bytes read
- EOF encountered
- Newline (‘\n’) encountered
typedef struct { int rio_fd; /* descriptor for this internal buf */ int rio_cnt; /* unread bytes in internal buf */ char *rio_bufptr; /* next unread byte in internal buf */ char rio_buf[RIO_BUFSIZE]; /* internal buffer */ } rio_t;
The rio_t
struct:
The Unix file underneath:
3.1 Quick Check
Why is buffering I/O often a more efficient solution than non-buffered I/O?2
3.2 Example
Copying a file to stdout, line-by-line with RIO:
#include "csapp.h" #define MLINE 1024 int main(int argc, char *argv[]) { rio_t rio; char buf[MLINE]; int infd = STDIN_FILENO; ssize_t nread = 0; if (argc == 2) { infd = Open(argv[1], O_RDONLY, 0); } Rio_readinitb(&rio, infd); while((nread = Rio_readlineb(&rio, buf, MLINE)) != 0) Rio_writen(STDOUT_FILENO, buf, nread); exit(0); }
4 Which I/O Functions Should You Use?
- Unix I/O
- Pros
- Unix I/O is the most general and lowest overhead form of I/O
- All other I/O packages are implemented using Unix I/O functions
- Unix I/O functions are async-signal-safe and can be used safely in signal handlers
- Cons
- Dealing with short counts is tricky and error prone
- Efficient reading of text lines requires some form of buffering, also tricky and error prone
- Both of these issues are addressed by the standard I/O and RIO packages
- Pros
- Standard I/O
- Pros:
- Buffering increases efficiency by decreasing the number of read and write system calls
- Short counts are handled automatically
- Cons:
- Standard I/O functions are not async-signal-safe, and not appropriate for signal handlers
- Standard I/O is not appropriate for input and output on network sockets
- There are poorly documented restrictions on streams that interact badly with restrictions on sockets (CS:APP3e, Sec 10.11)
- Pros:
- General rule: use the highest-level I/O functions you can
- Many C programmers are able to do all of their work using the standard I/O functions
- But, be sure to understand the functions you use!
- When to use standard I/O
- When working with disk or terminal files
- When to use raw Unix I/O
- Inside signal handlers, because Unix I/O is async-signal-safe
- In rare cases when you need absolute highest performance
- When to use RIO
- When you are reading and writing network sockets
- Avoid using standard I/O on sockets