April 7, 2021
In this programming project, you will implement a buffer pool for the BusTub DBMS. The buffer pool is responsible for moving physical pages back and forth from main memory to disk. It allows a DBMS to support databases that are larger than the amount of memory that is available to the system. The buffer pool’s operations are transparent to other parts in the system. For example, the system asks the buffer pool for a page using its unique identifier (page_id_t
) and it does not know whether that page is already in memory or whether the system has to go retrieve it from disk.
Your implementation will need to be thread-safe. Multiple threads will be accessing the internal data structures at the same and thus you need to make sure that their critical sections are protected with latches (these are called “locks” in operating systems).
You will need to implement the following two components in your storage manager:
The Moodle forums are a great place to post questions!
We will go over how to get started on this project in lab on April 9. Before lab, you are expected to
Pull the latest changes from the public repo (the first command will say the remote already exists if you ran it previously, this is fine)
Review material on buffer pools, either from the textbook or Andy Pavlo
std::
means it’s part of the C++ standard library, which means you can look up good documentation about it.LRUReplacer
and BufferPoolManager
src/include/buffer/replacer.h
, src/include/buffer/lru_replacer.h
, src/buffer/lru_replacer.cpp
src/include/buffer/buffer_pool_manager.h
, src/buffer/buffer_pool_manager.cpp
test/buffer/lru_replacer_test.cpp
, test/buffer/buffer_pool_manger_test.cpp
.h
) filesframe_id_t
and page_id_t
are simply aliases for 32-bit int
s
LRUReplacer::Victim
returns a bool
indicating whether a frame to evict was foundframe_id
of the victim frameSo in BufferPoolManager
we might have something like
and LRUReplacer::Victim
would look like
BufferPoolManager
are already implemented, you should not need to change them (constructor, destructor, GetPages
and GetSize
)Page
object, which contains a page’s data and metadata
Page
declares BufferPoolManager
a friend
class, meaning BufferPoolManager
can access the private
data/methods of Page
data_
), page id, pin count, and dirty bitReaderWriterLatch
or the log sequence number LSN
methods for this projectstd::unordered_map
) to map page ids to frame idsstd::list<frame_id_t>
) of currently unused frames
LRUReplacer
implementationPin
and Unpin
methods are used to tell the replacer when frames are in use/become available for replacementstd::list
is a good choice)contains
method
Cannonical way to do this is via std::find
and iterators:
#include <algorithm> // for std::find
#include <list> // for std::list
std::list my_list;
// does the list contain the value 3?
bool list_contains_3 = std::find(my_list.begin(), my_list.end(), 3) != my_list.end();
// ^starting place ^ending place ^end() is past the last element
// so if find returned end(), the
// element was not found
See the Project 0 for instructions on how to create your private repository and setup your development environment.
If you are working with a partner, you can add them as a collaborator to your private GitHub repo. VS Code Live Share is a useful extension for remote pair programming in VS Code.
For each of the following components, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you modify the signatures, the test code that we use for grading will break and you will get only partial credit for the project. You also should not add additional classes in the source code for these components. These components should be entirely self-contained.
If a class already contains data members, you should not remove them. For example, the BufferPoolManager
contains DiskManager
and Replacer
objects. These are required to implement the functionality that is needed by the rest of the system. On the other hand, you may need to add data members to these classes in order to correctly implement the required functionality. You can also add additional helper functions to these classes. The choice is yours.
You are allowed to use any built-in C++17 containers in your project unless specified otherwise. It is up to you to decide which ones you want to use. Note that these containers are not thread-safe and that you will need to include latches in your implementation to protect them. You may not bring in additional third-party dependencies (e.g. boost).
This component is responsible for tracking page usage in the buffer pool. You will implement a new sub-class called LRUReplacer
in src/include/buffer/lru_replacer.h
and its corresponding implementation file in src/buffer/lru_replacer.cpp
. LRUReplacer
extends the abstract Replacer
class (src/include/buffer/replacer.h
), which contains the function specifications.
The LRUReplacer
is initialized to have no frame in it. Newly unpinned frames will be considered in the LRUReplacer
(i.e., eligible for replacement).
You will need to implement the LRU policy discussed in lab. You will need to implement the following methods:
Victim(T*)
: Remove the object that was accessed the least recently compared to all the elements being tracked by the Replacer
, store its contents in the output parameter and return True
. If the Replacer
is empty return False
.Pin(T)
: This method should be called after a page is pinned to a frame in the BufferPoolManager
. It should remove the frame containing the pinned page from the LRUReplacer
.Unpin(T)
: This method should be called when the pin_count
of a page becomes 0. This method should add the frame containing the unpinned page to the LRUReplacer
.Size()
: This method returns the number of frames that are currently in the LRUReplacer
.The implementation details are up to you. You are allowed to use built-in C++ containers. You can assume that you will not run out of memory, but you must make sure that the operations are thread-safe.
Next, you need to implement the buffer pool manager in your system (BufferPoolManager
). The BufferPoolManager
is responsible for fetching database pages from the DiskManager
and storing them in memory. The BufferPoolManager
can also write dirty pages out to disk when it is either explicitly instructed to do so or when it needs to evict a page to make space for a new page.
To make sure that your implementation works correctly with the rest of the system, we will provide you with some of the functions already filled in. You will also not need to implement the code that actually reads and writes data to disk (this is called the DiskManager
in our implementation). We will provide that functionality for you.
All in-memory pages in the system are represented by Page
objects. The BufferPoolManager
does not need to understand the contents of these pages. But it is important for you as the system developer to understand that Page
objects are just containers for memory in the buffer pool and thus are not specific to a unique page. That is, each Page
object contains a block of memory that the DiskManager
will use as a location to copy the contents of a physical page that it reads from disk. The BufferPoolManager
will reuse the same Page
object to store data as it moves back and forth to disk. This means that the same Page
object may contain a different physical page throughout the life of the system. The Page
object’s identifer (page_id
) keeps track of what physical page it contains; if a Page
object does not contain a physical page, then its page_id
must be set to INVALID_PAGE_ID
.
Each Page
object also maintains a counter for the number of threads that have “pinned” that page. Your BufferPoolManager
is not allowed to free a Page
that is pinned. Each Page
object also keeps track of whether it is dirty or not. It is your job to record whether a page was modified before it is unpinned. Your BufferPoolManager
must write the contents of a dirty Page
back to disk before that object can be reused.
Your BufferPoolManager
implementation will use the LRUReplacer
class that you created in the previous task of this project. It will use the LRUReplacer
to keep track of when Page
objects are accessed so that it can decide which one to evict when it must free a frame to make room for copying a new physical page from disk.
You will need to implement the following functions defined in the header file (src/include/buffer/buffer_pool_manager.h
) in the source file (src/buffer/buffer_pool_manager.cpp
):
FetchPage(page_id)
NewPage(page_id)
UnpinPage(page_id, is_dirty)
FlushPage(page_id)
DeletePage(page_id)
FlushAllPages()
For FetchPage
, you should return nullptr
if no page is available in the free list and all other pages are currently pinned. FlushPage
should flush a page regardless of its pin status.
Refer to the function documentation for details on how to implement these functions.
You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. There are two separate files that contain tests for each component:
LRUReplacer
: test/buffer/lru_replacer_test.cpp
BufferPoolManager
: test/buffer/buffer_pool_manager_test.cpp
You can compile and run each test individually from the command-line:
You can also run make check-tests
to run ALL of the test cases. Note that some tests are disabled as you have not implemented future projects. You can disable tests in GTest by adding a DISABLED_
prefix to the test name.
Important: These tests are only a subset of the all the tests that the autograder will use to evaluate and grade your project. You should write additional test cases on your own to check the complete functionality of your implementation.
We encourage you to use gdb
, lldb
, or the VS Code debugger to debug your project if you are having problems.
Your code will also be evaluated for memory errors. These include problems such as array overflows and failing to deallocate memory. We will use the command
to check for these errors (run from the build
directory). See this guide for information about interpreting valgrind
output.
valgrind
and Debug mode
Configuring the project in Debug mode (either with cmake -DCMAKE_BUILD_TYPE=Debug ..
or in VS Code) will interfere with valgrind
. So when you go to check your code for memory errors, you will need to reconfigure and rebuild the project without Debug mode. In VS Code this should only require changing build variants and rebuilding. Via command line, you may need to remove the build
directory and rebuild from scratch.
Instead of using printf
statements for debugging, use the LOG_*
macros for logging information like this:
To enable logging in your project, you will need to reconfigure it like this (or select a build variant with debug info in VS Code):
The different logging levels are defined in src/include/common/logger.h
. After enabling logging, the logging level defaults to LOG_LEVEL_INFO
. Any logging method with a level that is equal to or higher than LOG_LEVEL_INFO
(e.g., LOG_INFO
, LOG_WARN
, LOG_ERROR
) will emit logging information. Note that you will need to add #include "common/logger.h"
to any file that you want to use the logging infrastructure.
Your code must follow the Google C++ Style Guide. We use clang
to automatically check the quality of your source code. The style part of project grade will be zero if your submission fails any of these checks.
Execute the following commands to check your syntax. The format
target will automatically correct the formatting of your code. The check-lint
and check-clang-tidy
targets will print errors and instruct you how to fix it to conform to our style guide.
You will submit your work for this project via Gradescope.
Gradescope lets you submit via GitHub, which is probably the easiest method. All you’ll need to do is connect your GitHub account (the Gradescope submission page has a button for this) and select the repository and branch you wish to submit. Alternatively, you can create a zip file of the src
directory and upload that.
When you submit, the autograder will compile your code, run the test cases, check your code for style, and use valgrind to check for memory errors. It will use the following files (all other submitted files will be discarded):
src/include/buffer/lru_replacer.h
src/buffer/lru_replacer.cpp
src/include/buffer/buffer_pool_manager.h
src/buffer/buffer_pool_manager.cpp
These files must be at the given locations in your submission.
Although you are allowed submit your answers as many times as you like, you should not treat Gradescope as your only debugging tool. Many people may submit their projects near the deadline, and thus it will Gradescope take longer to process the requests. You may not get feedback in a timely manner to help you debug problems.
This project will be graded out of 150 points. No tests will be run for a submission that does not compile. While the autograder will assign 0 points for a test that doesn’t pass, partial credit can be earned for evidence of a good-faith effort. Comments explaining your approach can help earn partial credit.