CS 332 s20 — File System Implementation

Table of Contents

1 Introduction

Having covered the general file system abstraction and played around with the file system API, now it's time to take a look at what's going on underneath. We know that persistent storage disks are block devices, meaning we read from and write to them in fixed-size blocks. Meanwhile, the operating system lets users jump to arbitrary points within a file and start reading and writing away. Clearly there must be some machinery to bridge the gap between device interface and system call. This material should emphasize that careful data structure design plays a significant part in building an operating system.

2 Reading: File System Implementation

Read this part of the OSPP chapter on file system implementation. Don't worry about the details of directory internals on p. 6, the takeaway is the choice of data structures is a big part of file system design. Important terms to make sure you understand after you're finished:

  • inode
  • data block
  • direct pointer
  • indirect pointer

As you read, follow along with this comprehension quiz. Attempt the first question after reading through page 9, the second once you get to page 14, and the third after you're finished.

3 Supplementary Notes

I've included these notes to flesh out and complement today's reading.

  • Primary Roles of the OS for the file system
    1. Hide hardware specific interface
    2. Allocate disk blocks
    3. Check permissions
    4. Understand directory file structure
    5. Maintain metadata
    6. Performance
    7. Flexibility
  • The concept of a file system is simple
    • the implementation of the abstraction for secondary storage
      • abstraction = files
    • logical organization of files into directories
      • the directory hierarchy
    • sharing of data among processes, people and machines
      • access control, consistency, …

3.1 File Access

  • Some file systems provide different access methods that specify ways the application will access data
    • sequential access
      • read bytes in order
    • direct access
      • random access given a block/byte number
    • record access
      • file is array of fixed- or variable-sized records
    • indexed access
      • file system contains an index to a particular field of each record in a file
      • apps can find a file based on value in that record (similar to a database)
  • Why do we care about distinguishing sequential from direct access?
    • what might the file system do differently in these cases?1

3.2 What is a Directory?

  • A directory is typically just a file that happens to contain special metadata
    • directory = list of (name of file, file attributes)
    • attributes include such things as:
      • size, protection, location on disk, creation time, access time, …
    • the directory list is usually unordered (effectively random)
      • when you type ls, the ls command sorts the results for you

3.3 Path Translation

  • Let's say you want to open "/one/two/three"
fd = open("/one/two/three", O_RDWR);
  • What goes on inside the file system?
    • open directory "/" (well known, can always find)
    • search the directory for "one", get location of "one"
    • open directory "one", search for "two", get location of "two"
    • open directory "two", search for "three", get location of "three"
    • open file "three"
    • (of course, permissions are checked at each step)
  • File system spends lots of time walking down directory paths
    • this is why open is separate from read/write
      • to preserve the location, etc. of a file between individual reads and writes (session state)
    • OS will cache prefix lookups to enhance performance
      • /a/b, /a/bb, /a/bbb all share the "/a" prefix

3.4 File protection

  • File system must implement some kind of protection system
    • to control who can access a file (user)
    • to control how they can access it (e.g., read, write, or exec)
  • More generally:
    • generalize files to objects (the "what")
    • generalize users to principals (the "who", user or program)
    • generalize read/write to actions (the "how", or operations)
  • A protection system dictates whether a given action performed by a given principal on a given object should be allowed
    • e.g., you can read or write your files, but others cannot
    • e.g., your can read /etc/motd but you cannot write to it

3.4.1 Models for Representing Protection

  • Two different ways of representing protection:
    • access control lists (ACLs)
      • for each object, keep list of principals and principals' allowed actions
    • capabilities
      • for each principal, keep list of objects and principal's allowed actions
    • Both can be represented with the following matrix:

protection-matrix.png

3.4.2 ACLs vs. Capabilities

  • Capabilities are easy to transfer
    • they are like keys: can hand them off
    • they make sharing easy
  • ACLs are easier to manage
    • object-centric, easy to grant and revoke
      • to revoke capability, need to keep track of principals that have it
      • hard to do, given that principals can hand off capabilities
  • ACLs grow large when object is heavily shared
    • can simplify by using "groups"
      • put users in groups, put groups in ACLs
    • additional benefit
      • change group membership, affects ALL objects that have this group in its ACL

3.5 The original Unix file system

  • Dennis Ritchie and Ken Thompson, Bell Labs, 1969

"UNIX rose from the ashes of a multi-organizational effort in the early 1960s to develop a dependable timesharing operating system" – Multics

  • Designed for a "workgroup" sharing a single system
  • Did its job exceedingly well
    • Although it has been stretched in many directions and made ugly in the process
  • A wonderful study in engineering tradeoffs

(Old) Unix disks are divided into five parts:

  • Boot block
    • can boot the system by loading from this block
  • Superblock
    • specifies boundaries of next 3 areas, and contains head of freelists of inodes and file blocks
  • inode area
    • contains descriptors (inodes) for each file on the disk; all inodes are the same size; head of freelist is in the superblock
  • File contents area
    • fixed-size blocks; head of freelist is in the superblock
  • Swap area
    • holds processes that have been swapped out of memory

3.6 inodes

A per-file on-disk data structure that contains the file's metadata

3.6.1 inode format

  • User number
  • Group number
  • Protection bits
  • Times (file last read, file last written, inode last written)
  • File code: specifies if the i-node represents a directory, an ordinary user file, or a "special file" (typically an I/O device)
  • Size: length of file in bytes
  • Block list: locates contents of file (in the file contents area)
  • Link count: number of directories referencing this inode

3.6.2 The flat (inode) file system

  • Each file is known by a number, which is the number of the i-node
    • seriously — 1, 2, 3, etc.!
  • why is it called "flat"?2
  • Files are created empty, and grow when extended through writes

3.6.3 The tree (directory, hierarchical) file system

  • A directory is a flat file of fixed-size entries
  • Each entry consists of an i-node number and a file
inode number file name
152 .
18 ..
216 my_file
4 another_file
93 louie_louie
144 a_directory
  • It's as simple as that!

3.6.4 The "block list" portion of the inode (Unix Version 7)

  • Points to blocks in the file contents area
  • Must be able to represent very small and very large files.
  • Each inode contains 13 block pointers
    • first 10 are "direct pointers" (pointers to 512B blocks of file data)
    • then, single, double, and triple indirect pointers

inode-block-list.png

This means…

  • The block list only occupies 13 x 4B in the inode
  • Can get to 10 x 512B = a 5120B file directly
    • (10 direct pointers, blocks in the file contents area are 512B)
  • Can get to 128 x 512B = an additional 65KB with a single indirect reference
    • (the 11th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to blocks holding file data)
  • Can get to 128 x 128 x 512B = an additional 8MB with a double indirect reference
    • (the 12th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to 512B blocks in the file contents area that contain 128 4B pointers to 512B blocks holding file data)
  • Can get to 128 x 128 x 128 x 512B = an additional 1GB with a triple indirect reference
    • (the 13th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to 512B blocks in the file contents area that contain 128 4B pointers to 512B blocks in the file contents area that contain 128 4B pointers to 512B blocks holding file data)
  • Maximum file size is 1GB + a smidge
  • A later version of Bell Labs Unix utilized 12 direct pointers rather than 10
  • Berkeley Unix went to 1KB block sizes
    • What's the effect on the maximum file size?
      • 256x256x256x1K = 17 GB + a smidge
    • What's the price?3
  • Subsequently went to 4KB blocks
    • 1Kx1Kx1Kx4K = 4TB + a smidge

3.7 Putting it all together

The file system is just a huge data structure

file-system.png

4 Homework

  1. You should have already done the reading comprehension quiz as you did the assigned reading.
  2. Read through lab 1 and post any questions you have in Slack. Lab 1 is due Monday, April 20 at 9pm Central time.
  3. Take the Week 1 quiz on Moodle. You must complete it by 9pm Wednesday, April 15.

Footnotes:

1

Caching and prefetching would certainly differ between sequential and direct access. The OS can make much stronger assumptions about the pattern of access in the sequential setting.

2

There is no hierarchical structure, just a mapping of files to inodes.

3

Disks with many small files will waste space, since each file must occupy at least one block.