CS 208 s22 — System I/O

Table of Contents

io-functions.png

1 Unix I/O

  • A Linux file is a sequence of m bytes:
    • B0 , B1 , …. , Bk , …. , Bm-1
  • Cool fact: All I/O devices are represented as files:
  • /dev/sda2 (/usr disk partition)
  • /dev/tty2 (terminal)
  • Even the kernel is represented as a file:
  • /boot/vmlinuz-5.4.0-89-generic (kernel image)
  • /proc (kernel data structures)
  • Elegant mapping of files to devices allows kernel to export simple interface called Unix I/O:
    • these are system calls
    • Opening and closing files
      • open() and close()
    • Reading and writing a file
      • read() and write()
    • Changing the current file position (seek)
      • indicates next offset into file to read or write
      • lseek()

        open-file.png

1.1 File Types

  • Each file has a type indicating its role in the system
    • Regular file: Contains arbitrary data
    • Directory: Index for a related group of files
    • Socket: For communicating with a process on another machine
  • Other file types beyond our scope
    • Named pipes (FIFOs)
    • Symbolic links
    • Character and block devices

1.1.1 Regular Files

  • A regular file contains arbitrary data
  • Applications often distinguish between text files and binary files
    • Text files are regular files with only ASCII or Unicode characters
    • Binary files are everything else
      • e.g., object files, JPEG images
    • Kernel doesn’t know the difference!
  • Text file is sequence of text lines
    • Text line is sequence of chars terminated by newline char ('\n')
      • Windows and Internet protocols: "\r\n"

1.1.2 Directories

  • Directory consists of an array of links
    • Each link maps a filename to a file
  • Each directory contains at least two entries
    • . (dot) is a link to itself
    • .. (dot dot) is a link to the parent directory in the directory hierarchy
  • Commands for manipulating directories
    • mkdir: create empty directory
    • ls: view directory contents
    • rmdir: delete empty directory

directories.png

  • Locations of files in the hierarchy denoted by pathnames
    • Absolute pathname starts with '/' and denotes path from root (top of the hierarchy)
      • /home/awb/hello.c
    • Relative pathname denotes path from current working directory (cwd)
      • if cwd is /home/mtie, then relative path would be ../awb/hello.c

1.2 Working with Files

#include <stdlib.h>
#include <unistd.h>

int main(void)
{
    char c;

    while(read(0, &c, 1) != 0)
        write(1, &c, 1);
    exit(0);
}

Opening a file (via open()) informs the kernel that you are getting ready to access that file

  • Returns a small identifying integer file descriptor
    • Lowest numbered file descriptor not currently open for the process
    • fd == -1 indicates that an error occurred
  • Each process created by a Linux shell begins life with three open files associated with a terminal:
    • 0: standard input (stdin)
    • 1: standard output (stdout)
    • 2: standard error (stderr)
  • So int fd = open("my_file.txt", O_RDONLY); would open the file my_file.txt as read-only
    • If this was the first file the process opens, open would return 3.

Closing a file (via close()) informs the kernel that you are finished accessing that file. Always check return values from system calls!

Reading a file (via read()) copies bytes from the current file position to memory, and then updates file position.

  • Returns number of bytes read from file fd into buf
    • Return type ssize_t is signed integer
    • rval < 0 indicates that an error occurred
    • Short counts (ravl < sizeof(buf)) are possible and are not errors!

Writing a file (via write()) copies bytes from memory to the current file position, and then updates current file position.

  • Returns number of bytes written from buf to file fd
    • rval < 0 indicates that an error occurred
    • As with reads, short counts are possible and are not errors!

1.3 Representing Open Files

Two descriptors referencing two distinct open files. Descriptor 1 (stdout) points to terminal, and descriptor 4 points to open disk file:

open-files-1.png

Two distinct descriptors sharing the same disk file through two distinct open file table entries:

  • E.g., Calling open twice with the same filename argument

open-files-2.png

1.4 I/O Redirection

Question: How does a shell implement I/O redirection?

ls > foo.txt

Answer: By calling the dup2(oldfd, newfd) function

  • Copies (per-process) descriptor table entry oldfd to entry newfd

dup2.png

What would be printed for a file foo.txt that contains "abcde"?1

#include "csapp.h"
int main() {
    char c1, c2, c3;
    int fd1 = Open("foo.txt", O_RDONLY, 0);
    int fd2 = Open("foo.txt", O_RDONLY, 0);
    int fd3 = Open("foo.txt", O_RDONLY, 0);
    Dup2(fd2, fd3);
    Read(fd1, &c1, 1);
    Read(fd2, &c2, 1);
    Read(fd3, &c3, 1);
    printf("c1 = %c, c2 = %c, c3 = %c\n", c1, c2, c3);
    return 0;
}

2 Standard I/O Functions

The C standard library (libc.so) contains a collection of higher-level standard I/O functions. Examples of standard I/O functions:

  • Opening and closing files (fopen and fclose)
  • Reading and writing bytes (fread and fwrite)
  • Reading and writing text lines (fgets and fputs)
  • Formatted reading and writing (fscanf and fprintf)

2.1 Buffered I/O

Standard I/O models open files as streams.

  • Abstraction for a file descriptor and a buffer in memory
  • C programs begin life with three open streams (defined in stdio.h)
    • stdin (standard input)
    • stdout (standard output)
    • stderr (standard error)
  • Applications often read/write one character at a time
    • getc, putc, ungetc
    • gets, fgets
      • Read line of text one character at a time, stopping at newline
  • Implementing as Unix I/O calls expensive
    • read and write require Unix kernel calls
    • >10,000 clock cycles
  • Solution: Buffered read
    • Use Unix read to grab block of bytes
    • User input functions take one byte at a time from buffer
    • Refill buffer when empty

io-buffer.png

3 Robust I/O (RIO)

RIO is a set of wrappers that provide efficient and robust I/O in apps, such as network programs that are subject to short counts.

  • RIO provides two different kinds of functions
    • Unbuffered input and output of binary data
      • rio_readn and rio_writen

        #include "csapp.h"
        
        ssize_t rio_readn(int fd, void *usrbuf, size_t n);
        ssize_t rio_writen(int fd, void *usrbuf, size_t n);
        
        //     Return: num. bytes transferred if OK,  0 on EOF (rio_readn only), -1 on error
        
    • Buffered input of text lines and binary data
      • rio_readlineb and rio_readnb

        #include "csapp.h"
        
        void rio_readinitb(rio_t *rp, int fd);
        
        ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen);
        ssize_t rio_readnb(rio_t *rp, void *usrbuf, size_t n);
        
        //       Return: num. bytes read if OK, 0 on EOF, -1 on error
        
  • Download from http://csapp.cs.cmu.edu/3e/code.html
    • src/csapp.c and include/csapp.h
  • Same interface as Unix read and write
  • Especially useful for transferring data on network sockets
/*
 * rio_readn - Robustly read n bytes (unbuffered)
 */
ssize_t rio_readn(int fd, void *usrbuf, size_t n) {
    size_t nleft = n;
    ssize_t nread;
    char *bufp = usrbuf;

    while (nleft > 0) {
        if ((nread = read(fd, bufp, nleft)) < 0) {
            if (errno == EINTR) /* Interrupted by sig handler return */
                nread = 0;       /* and call read() again */
            else
                return -1;       /* errno set by read() */
        }
        else if (nread == 0)
            break;              /* EOF */
        nleft -= nread;
        bufp += nread;
    }
    return (n - nleft);         /* Return >= 0 */
}

Buffered RIO input functions efficiently read text lines and binary data from a file partially cached in an internal memory buffer.

  • rio_readlineb reads a text line of up to maxlen bytes from file fd and stores the line in usrbuf
    • Especially useful for reading text lines from network sockets
  • Stopping conditions
    • maxlen bytes read
    • EOF encountered
    • Newline (‘\n’) encountered
typedef struct {
    int rio_fd;                /* descriptor for this internal buf */
    int rio_cnt;               /* unread bytes in internal buf */
    char *rio_bufptr;          /* next unread byte in internal buf */
    char rio_buf[RIO_BUFSIZE]; /* internal buffer */
} rio_t;

The rio_t struct:

rio_t.png

The Unix file underneath:

rio_t-unix.png

3.1 Quick Check

Why is buffering I/O often a more efficient solution than non-buffered I/O?2

3.2 Example

Copying a file to stdout, line-by-line with RIO:

#include "csapp.h"
#define MLINE 1024

int main(int argc, char *argv[])
{
    rio_t rio;
    char buf[MLINE];
    int infd = STDIN_FILENO;
    ssize_t nread = 0;
    if (argc == 2) {
        infd = Open(argv[1], O_RDONLY, 0);
    }
    Rio_readinitb(&rio, infd);
    while((nread = Rio_readlineb(&rio, buf, MLINE)) != 0)
        Rio_writen(STDOUT_FILENO, buf, nread);
    exit(0);
}

4 Which I/O Functions Should You Use?

  • Unix I/O
    • Pros
      • Unix I/O is the most general and lowest overhead form of I/O
      • All other I/O packages are implemented using Unix I/O functions
      • Unix I/O functions are async-signal-safe and can be used safely in signal handlers
    • Cons
      • Dealing with short counts is tricky and error prone
      • Efficient reading of text lines requires some form of buffering, also tricky and error prone
      • Both of these issues are addressed by the standard I/O and RIO packages
  • Standard I/O
    • Pros:
      • Buffering increases efficiency by decreasing the number of read and write system calls
      • Short counts are handled automatically
    • Cons:
      • Standard I/O functions are not async-signal-safe, and not appropriate for signal handlers
      • Standard I/O is not appropriate for input and output on network sockets
      • There are poorly documented restrictions on streams that interact badly with restrictions on sockets (CS:APP3e, Sec 10.11)
  • General rule: use the highest-level I/O functions you can
    • Many C programmers are able to do all of their work using the standard I/O functions
    • But, be sure to understand the functions you use!
  • When to use standard I/O
    • When working with disk or terminal files
  • When to use raw Unix I/O
    • Inside signal handlers, because Unix I/O is async-signal-safe
    • In rare cases when you need absolute highest performance
  • When to use RIO
    • When you are reading and writing network sockets
    • Avoid using standard I/O on sockets

Footnotes:

1

c1 = a, c2 = a, c3 = b

2

Reduces the number of read and write system calls.