May 4, 2021
The second programming project is to implement an index in your database system. The index is responsible for fast data retrieval without having to search through every row in a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
You will need to implement B+Tree dynamic index structure. It is a balanced tree in which the internal pages direct the search and leaf pages contains actual data entries. Since the tree structure grows dynamically, you will need handle splitting nodes.
You will need to implement the following components of your index:
The Moodle forums are a great place to post questions!
We will go over how to get started on this project in lab on May 7. Before lab, you are expected to
Pull the latest changes from the public repo (the first command will say the remote already exists if you ran it previously, this is fine)
$ git remote add public https://github.com/awb-carleton/bustub.git
$ git pull public masterRead this entire project writeup
Review material on B+ Trees, either from the textbook, notes, or video

In lab, I said the header page would start with an entry of "foo":-1, but it would actually be empty until the first insert to the tree.

I have laid out a progression of tasks in the Project Specification section. You don’t have to do things in this order, but it is what I would recommend. Here’s a summary of the tasks and how they line up with the tests:
b_plus_tree_leaf_page.cpp and b_plus_tree_internal_page.cppBPlusTreeLeafPage and BPlusTreeInternalPage except for those related to deletion/redistribution/coalescingLeafPageTest and InternalPageTest on the autograderb_plus_tree.cpp (and potentially b_plus_tree.h if you decide to use different helper methods than those in the starter code)isEmpty, StartNewTree, Insert, InsertIntoLeaf, and GetValueindex_iterator.h and index_iterator.cppb_plus_tree_insert_test.cpp (these are also included in the autograder)b_plus_tree.cpp (and potentially b_plus_tree.h if you decide to use different helper methods than those in the starter code)root_page_id_ and call UpdateRootPageIdb_plus_tree.cpp (and potentially b_plus_tree.h if you decide to use different helper methods than those in the starter code)Page object provides to latch a node)root_page_id_ or start a new tree at a time (use a mutex)b_plus_tree_concurrent_test.cpp (these are also on the autograder)b_plus_tree_leaf_page.cpp, b_plus_tree_internal_page.cpp, b_plus_tree.cpp (and potentially b_plus_tree.h if you decide to use different helper methods than those in the starter code)b_plus_tree_delete_test.cpp (these are also on the autograder)CoalesceRedistributeTest on the autograder)In general, the tests 1. Insert some things into the tree 2. Use GetValue to verify that all keys are present in the tree 3. Use an index iterator to verify that all keys are in the correct order in the leaves
ToString in b_plus_tree.cpp#define LOGGING_ON
...
INDEX_TEMPLATE_ARGUMENTS
bool BPLUSTREE_TYPE::Insert(const KeyType &key, const ValueType &value, Transaction *transaction) {
#ifdef LOGGING_ON
LOG_INFO("Inserting %lld", key.ToInt64());
#endif$ ./test/b_plus_tree_insert_test
Running main() from gmock_main.cc
Note: Google Test filter = *InsertTest*
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from BPlusTreeTests
[ RUN ] BPlusTreeTests.InsertTest1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:126:StartNewTree] INFO - Starting a new tree with 1, page id 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 2 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 3 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 2 off from 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 2 into page -1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:319:CreateNewRoot] INFO - Creating root page 3 for 1 and 2, inserting 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 2, holding latches on 3 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 4 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 4 off from 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 3 into page 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 5
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 5
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 4, holding latches on 3 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 5 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 5 off from 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 4 into page 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 6 off from 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 3 into page -1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:319:CreateNewRoot] INFO - Creating root page 7 for 3 and 6, inserting 3
[ OK ] BPlusTreeTests.InsertTest1 (5 ms)
[ RUN ] BPlusTreeTests.InsertTest2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 5
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:126:StartNewTree] INFO - Starting a new tree with 5, page id 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 4 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 3 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 2 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 1 into a leaf
[ OK ] BPlusTreeTests.InsertTest2 (2 ms)
[----------] 2 tests from BPlusTreeTests (8 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (8 ms total)
[ PASSED ] 2 tests.Like the Project 1, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the public methods in these classes. If you do this, then it will break the test code that we will use to grade your assignment. If a class already contains certain member variables, you should not remove them. You may add private helper functions/member variables to these classes in order to correctly realize the functionality. Furthermore, you are free to make changes, to ignore, or to use the private methods defined in the starter code. You can think of these as a suggested design rather than a requirement. The one exception to this is the private BPlusTree::ToString method—several tests depend on this method and you should not modify it.
The correctness of B+ Tree index depends on the correctness of your implementation of buffer pool, we will not provide solutions for the previous programming projects. Make sure your buffer pool passes all the Project 1 tests before starting on this project. Those tests are part of the project 2 autograder, so you can use that to check.
As in previous projects, comments in the source contain additional information beyond what is presented below.
A key feature of B+ Trees in database systems is that each node of the tree (internal and leaf) is its own page. You will implement two Page classes to store the data of your B+ Tree:
BPlusTreeInternalPage (src/include/storage/page/b_plus_tree_internal_page.h and src/page/b_plus_tree_internal_page.cpp)BPlusTreeLeafPage (src/include/storage/page/b_plus_tree_leaf_page.h and src/page/b_plus_tree_leaf_page.cpp)Both of these classes extend the BPlusTreePage class, which implements a number of simple methods common between the two types of pages (src/include/storage/page/b_plus_tree_page.h and src/page/b_plus_tree_page.cpp). This class also defines several fields (metadata) used by B+ Tree pages:
| Field Name | Description |
|---|---|
page_type_ |
The type of page (internal or leaf) |
lsn_ |
Log sequence number (not used in this project) |
size_ |
The number of entries in the page (number of pointers for internal, number of keys for leaf) |
max_size_ |
Maximum number of entries |
parent_page_id_ |
The id of the parent node’s page |
page_id_ |
The id of this node’s page |
An internal page does not store any real data, but instead stores an sorted array of size_ - 1 key entries and size_ child pointers (i.e., the page ids of child nodes) in a field called array. This ordered array stores objects of type std::pair<KeyType, ValueType>, aliased as MappingType for readability. Since there are one fewer keys than points, the key of the first array entry (array[0].first) is ignored. This means that the Lookup method should start with the second key.
The value (page id) stored at index i points to a subtree in which all keys K satisfy: Ki < K ≤ Ki + 1. Put into C++, this means that array[i].second is the page id of a node where all the keys are greater than array[i].first and less than or equal to array[i + 1].first.
One wrinkle is that the keys stored in our B+ Tree can’t necessarily be compared with <, >, etc. Thus, methods where keys need to compared take a parameter const KeyComparator &comparator. You can call this on two keys like a function and it will return 1, 0, or -1 to indicate the relationship between the keys. Here’s an example usage:
if (comparator(key1, key2) < 0) {
// key1 < key2
} else if (comparator(key1, key2) == 0) {
// key1 == key2
} else {
// key1 > key2
}At this point you should implement the methods needed to support insertion:
Init: initialize the internal node’s metadataKeyAt: return the key stored at a given indexSetKeyAt: set the key stored at a given indexValueIndex: return the index where a given value is storedValueAt: return the value stored at a given indexLookup: return the value which points to the child page that would contain the given keyPopulateNewRoot: insert initial values into a newly created root nodeInsertNodeAfter: insert a new key-value pair at the position following an existing valueMoveHalfTo: move half the key-value pairs from this node to another nodeCopyNFrom: private helper method for MoveHalfToBPlusTreeTests.InternalPageTest on the autograder tests these methods.
A leaf page stores an sorted array of size_ keys and size_ record ids. Like the internal page, this array is made up of MappingType (std::pair) objects. The leaf page class also adds an additional field of metadata, next_page_id_, that stores the page id of the leaf node’s sibling.
At this point you should implement the methods needed to support insertion:
Init: initialize the internal node’s metadataGetNextPageId/SetNextPageId: get/set the next_page_id_ fieldKeyAt: return the key stored at a given indexGetItem: return the key-value pair (MappingType) stored at a given indexLookup: search for a given key, and set the output parameter (value) to the corresponding valueInsert: insert a new key-value pair at the appropriate sorted positionMoveHalfTo: move half the key-value pairs from this node to another nodeCopyNFrom: private helper method for MoveHalfToBPlusTreeTests.InternalPageTest on the autograder tests these methods.
These B+ Tree pages will exist within the Page objects managed by the buffer pool. In particular, they will be the data part of these pages. So given a page id page_id of a B+ Tree page and a pointer to a buffer pool buffer_pool_manager_, you would access the page as follows:
Page *page = buffer_pool_manager_->FetchPage(page_id);
BPlusTreePage *bppage = reinterpret_cast<BPlusTreePage *>(page->GetData());
if (bppage->IsLeafPage()) {
LeafPage *leaf = reinterpret_cast<LeafPage *>(bppage);
// do things with a leaf page
} else {
InternalPage *internal = reinterpret_cast<InternalPage *>(bppage);
// do things with an internal page
}
buffer_pool_manager_->UnpinPage(page_id, true); // or false if the page was not modifiedLeafPage and InternalPage are convenient aliases defined by the BPlusTree class (lines 39-40 of src/include/storage/index/b_plus_tree.h), so the above is a example of the kind ofcode you might write in your implementation.
The next task is to implement basic insertion into the B+ Tree that does not take into account node sizes or splitting. Your B+ Tree implementation will go in the BPlusTree class defined in src/include/storate/index/b_plus_tree.h and src/storage/index/b_plus_tree.h. The BPlusTree class has the following fields:
index_name_: the string name of the indexroot_page_id_: the page id of the root nodebuffer_pool_manager_: the buffer pool manager the index will use to manage its pagescomparator_: the comparator for the keys stored in the indexleaf_max_size_ and internal_max_size_Initially there are no nodes in the tree, and root_page_id_ is INVALID_PAGE_ID. Implement the following methods to support simple insertion:
IsEmpty: return true if no root node exists or if there are no keys stored in the root nodeInsert: if the tree is empty, call StartNewTree, otherwise call InsertIntoLeafStartNewTree:
Init method), and insert the key-value pair (call its Insert method)root_page_id_ and call UpdateRootPageId (this BPlusTree method is already written in the starter code—it updates the root page id in the header page, a special page that keeps track of the root pages of all B+ Tree indexes in the database)InsertIntoLeaf:
Lookup method to get the page id of the appropriate child nodeInsert method)GetValue: (used by most of the tests)
InsertIntoLeaf, this will always be the root until you implement splitting)Lookup method to retrieve the value/determine that the key doesn’t existresult (if key exists)Some of these methods take a Transaction *transaction parameter. This will be used for keeping track of which pages you have latches on (Concurrent Index) and which pages should be deleted (Deletion), so you should ignore it for now.
Before you can run the insert tests, you need to implement a way for a client to scan the data stored in the B+ Tree (i.e., this is how the tests check the contents of your tree). This is where the IndexIterator comes in.
You will build a general purpose index iterator to retrieve all the leaf pages efficiently. The basic idea is to organize them into a singly-linked list, and then traverse every key/value pair stored within the B+Tree leaf pages in ascending order. The leaf pages already form a singly-linked list by keeping track of the page id of the next leaf. So your task is to implement the IndexIterator class. You will modify src/include/storage/index/index_interator.h and src/storage/index/index_iterator.cpp to do this.
First, you will need to decide what private fields the IndexIterator will have. You can think of it as a cursor that will keep track of which key-value pair it is on, and be able to advance to the next key-value pair. Since these key-value pairs are stored in the B+ Tree leaf nodes, your iterator will likely need to know (1) which node it is on and (2) where within that node it is.
You will implement the following IndexIterator methods:
isEnd: return true if the iterator points past the end of B+ Tree data (i.e., it has advanced past the last key-value pair)operator==: return true if this iterator is equal to the given iterator (this is like Java’s equals method, except is lets us use == with IndexIterator objects)operator!=: return true if this iterator is not equal to the given iteratoroperator*: return the key-value pair (MappingType) the iterator is currently on (this is overloading the dereference operator, *)operator++: advance the iterator to the next key-value pair and return a reference to itself
return *this;), so your task is to implement advancing to the next pair++iteratorYou will also modify the Begin method in the BPlusTree class to construct and return the appropriate iterator. Note that in order to support for-each loop function for your index, your BPlusTree should implement begin and end (end is used by the tests, begin is not).
When an iterator is in the “end” state, it should be pointing past the end of the data it’s iterating over. That is, end should return the same iterator you would get if you called begin and then did ++iterator until it advanced past the last element. This is useful because it gives us a way to check when an iterator is “done” (i.e., has iterated over all available data). To implement end, think about what your iterator will do when it’s on the last element of the rightmost leaf and ++ is called. end should return an iterator in the same state as that one would be.
The LeafPage alias is not defined in the IndexIterator class like it is in BPlusTree, so use B_PLUS_TREE_LEAF_PAGE_TYPE * as the type for pointers to leaf pages. As in:
B_PLUS_TREE_LEAF_PAGE_TYPE *leaf = reinterpret_cast<B_PLUS_TREE_LEAF_PAGE_TYPE *>(page->GetData());Once you finish this task, your code should pass the two provided tests in test/storage/b_plus_tree_insert_test.cpp.
For this task you will return to the BPlusTree class and split nodes that exceed their maximum size. We will make the simplifying assumption that you can always safely insert a key, and then split the node afterwards if its size exceeds its maximum size.1 All of the project tests will conform to this assumption.
Here is one approach to implementing splitting:
InsertIntoLeaf to check the leaf’s size against the max size after insertion. If splitting is necessary, call Split and then InsertIntoParent.Split: allocate a new page, initialize its metadata, and move half of the entries in node to the new node. If node is a leaf page, then treat the new node as a leaf page. Otherwise the new node is an internal page. Remember to update the next page ids if a leaf is split. Return the new node.
Split can be called with either a leaf page or an internal pageIsLeafPage() to determine the type of node, and reinterpret_cast to generate pointers of the necessary typesInsertIntoParent: a recursive method to insert an entry into the parent node following a split
old_node points to the current root), a new root should be createdInsertIntoParentAt the end of an insert operation (or any operation), make sure to unpin any pages you fetched using the buffer pool.
The provided b_plus_tree_print_test.cpp will likely be a useful debugging tool. See Testing for more information. After this task, the autograder SplitTest and ScaleTest should pass.
Bustub is a multi-threaded database system, so our B+ Tree index needs to be thread safe. You will need to update your original single-threaded B+ Tree index so that it can support concurrent operations. You should use the latch crabbing technique described in class. The thread traversing the index will acquire then release latches on B+ Tree pages. A thread can only release latch on a parent page if its child page considered “safe”. Note that the definition of “safe” can vary based on what kind of operation the thread is executing:
The Page object provides a reader-writer latch for this purpose. RLatch()/RUnlatch() for the read mode, WLatch() and WUnlatch() for the write mode.
You can use the transaction to keep track of the Page objects you have latched.
AddIntoPageSet(Page *page): add a page pointer to the set of pagesGetPageSet(): returns a pointer to a std::deque containing the added page pointersAt the end of any operation, iterate through the page set and release the held latches. You have to release the latch on that page BEFORE you unpin the same page from the buffer pool. If you are implementing concurrent B+ Tree index correctly, then every thread will ALWAYS acquire latch from root to bottom. When you release the latch, make sure you follow the same order (a.k.a from root to bottom) to avoid possible deadlock situation.
You are not allowed to use a global scope latch to protect your data structure. In other words, you may not lock the whole index and only unlock the latch when operations are done. We will check to make sure you are doing the latch crabbing in the right way.
After this task the provided tests in test/storage/b_plus_tree_concurrent_test.cpp should pass.
Besides latch crabbing, there is one other component of the B+ Tree that needs to be made thread safe. When inserting (or deleting), the root_page_id_ may be updated. This creates a situation where two threads could be trying to modify this page id simultaneously. You can use a std::mutex to protect this variable (i.e., allow only one thread to modify it at a time).
This concurrent modification issue extends to creating a new tree as well. Two threads could try and insert into an empty tree at the same time, both determine that it is empty, and both try and start a new tree. So this same mutex can be used to ensure only one thread enters the “potentially create a new tree” region of code at a time.
For all but 5 points of the lab grade, the only kind of deletion you need to support is removing the key and associated value from a leaf. No need to redistribute or coalesce. For this you will implement:
Remove in BPlusTreeLeafPage: delete the given key, if it exists, and shift array elements overRemove in BPlusTree: locate the leaf page that would contain the given key and call its Remove methodOnce these two methods are implemented, the two provided tests in test/storage/b_plus_tree_delete_test.cpp should pass.
If you’re looking to go above and beyond (or earn that last ~3% of the project grade), you can implement deletion with redistribution and coalescing. When a node falls below its minimum size, you should coalesce with a sibling unless the combined node would exceed the maximum size. In that case, redistribute one key-value pair from the sibling to the below-minimum-size node. The methods and comments in the starter code give a suggested breakdown for how to approach this. You’ll need to delete the pages for any nodes that get removed from the tree.
You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. A subset of the tests the autograder will run on your submission can be found in the test/storage directory. The ones beginning with b_plus_tree are relevant to this project.
Using test/index/b_plus_tree_print_test, you can print out the internal data structure of your B+ Tree index, it’s an intuitive way to track and find mistakes.
cd build
make b_plus_tree_print_test
./test/b_plus_tree_print_testThe print test will display help information about commands you can enter to insert keys, delete keys, and print out your tree. You can also use b_plus_tree_print_test to generate a dot file after constructing a tree:
$ ./test/b_plus_tree_print_test
> ... USAGE ...
> 5 5 // set leaf node and internal node max size to be 5
> f input.txt // Insert into the tree with some inserts
> g my-tree.dot // output the tree to dot format
> q // Quit the test (Or use another terminal) You should now have a my-tree.dot file with the DOT file format in the same directory as your test binary, you can then visualize it through either a command line visualizer or an online visualizer:
dot -Tpng -O my-tree.dot, a PNG file will be generated.Consider the following example generated with GraphvizOnline:

P=? tells you the page id of the tree page.Your code will also be evaluated for memory errors. These include problems such as array overflows and failing to deallocate memory. We will use the command
$ valgrind --leak-check=full --suppressions=../build_support/valgrind.supp ./test/b_plus_tree_insert_test --gtest_filter=BPlusTreeTests.InsertTest*to check for these errors (run from the build directory). See this guide for information about interpreting valgrind output.
valgrind and Debug mode
Configuring the project in Debug mode (either with cmake -DCMAKE_BUILD_TYPE=Debug .. or in VS Code) will interfere with valgrind. So when you go to check your code for memory errors, you will need to reconfigure and rebuild the project without Debug mode. In VS Code this should only require changing build variants and rebuilding. Via command line, you may need to remove the build directory and rebuild from scratch.
Instead of using printf statements for debugging, use the LOG_* macros for logging information like this:
LOG_INFO("# Pages: %d", num_pages);
LOG_DEBUG("Fetching page %d", page_id);To enable logging in your project, you will need to reconfigure it like this (or select a build variant with debug info in VS Code):
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=DEBUG ..
$ makeThe different logging levels are defined in src/include/common/logger.h. After enabling logging, the logging level defaults to LOG_LEVEL_INFO. Any logging method with a level that is equal to or higher than LOG_LEVEL_INFO (e.g., LOG_INFO, LOG_WARN, LOG_ERROR) will emit logging information. Note that you will need to add #include "common/logger.h" to any file that you want to use the logging infrastructure.
Your code must follow the Google C++ Style Guide. We use clang to automatically check the quality of your source code. The style part of project grade will be zero if your submission fails any of these checks.
Execute the following commands to check your syntax. The format target will automatically correct the formatting of your code. The check-lint and check-clang-tidy targets will print errors and instruct you how to fix it to conform to our style guide.
$ make format
$ make check-lint
$ make check-clang-tidyYou will submit your work for this project via Gradescope.
Gradescope lets you submit via GitHub, which is probably the easiest method. All you’ll need to do is connect your GitHub account (the Gradescope submission page has a button for this) and select the repository and branch you wish to submit. Alternatively, you can create a zip file of the src directory and upload that.
When you submit, the autograder will compile your code, run the test cases, check your code for style, and use valgrind to check for memory errors. It will use the following files (all other submitted files will be discarded):
Your Project 1 files:
src/include/buffer/lru_replacer.hsrc/buffer/lru_replacer.cppsrc/include/buffer/buffer_pool_manager.hsrc/buffer/buffer_pool_manager.cppas well as
src/include/storage/page/b_plus_tree_internal_page.hsrc/storage/page/b_plus_tree_internal_page.cppsrc/include/storage/page/b_plus_tree_leaf_page.hsrc/storage/page/b_plus_tree_leaf_page.cppsrc/include/storage/index/b_plus_tree.hsrc/storage/index/b_plus_tree.cppsrc/include/storage/index/index_iterator.hsrc/storage/index/index_iterator.cppThese files must be at the given locations in your submission.
Although you are allowed submit your answers as many times as you like, you should not treat Gradescope as your only debugging tool. Many people may submit their projects near the deadline, and thus it will Gradescope take longer to process the requests. You may not get feedback in a timely manner to help you debug problems.
This project will be graded out of 150 points, as shown in the table below. Where multiple tests are listed, the points are split evenly between the tests. No tests will be run for a submission that does not compile. While the autograder will assign 0 points for a test that doesn’t pass, partial credit can be earned for evidence of a good-faith effort. Comments explaining your approach can help earn partial credit.
| Test | Points |
|---|---|
LeafPageTest and InternalPageTest |
30 points |
InsertTest1 and InsertTest2 |
30 points |
SplitTest |
20 points |
ScaleTest (inserting lots of keys2) |
8 points |
ScaleLowMemTest (like ScaleTest, but buffer pool has only 10 pages) |
2 points |
| Concurrency tests (the five provided in the starter code) | 15 points |
DeleteTest1 and DeleteTest2 |
10 points |
CoalesceRedistributeTest |
5 points |
| No style problems | 15 points |
No valgrind errors |
15 points |
You can attempt an implementation without making this assumption if you want an extra challenge. If such an implementation passes all the insertion-related tests, it will earn extra credit. Let me know by sending me an email describing your approach.↩︎
The two other differences from the insert tests are that the keys are not inserted in sorted order, and every key is inserted twice (the second time should have no effect)↩︎