May 4, 2021
The second programming project is to implement an index in your database system. The index is responsible for fast data retrieval without having to search through every row in a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
You will need to implement B+Tree dynamic index structure. It is a balanced tree in which the internal pages direct the search and leaf pages contains actual data entries. Since the tree structure grows dynamically, you will need handle splitting nodes.
You will need to implement the following components of your index:
The Moodle forums are a great place to post questions!
We will go over how to get started on this project in lab on May 7. Before lab, you are expected to
Pull the latest changes from the public repo (the first command will say the remote already exists if you ran it previously, this is fine)
$ git remote add public https://github.com/awb-carleton/bustub.git
$ git pull public master
Read this entire project writeup
Review material on B+ Trees, either from the textbook, notes, or video
In lab, I said the header page would start with an entry of "foo":-1
, but it would actually be empty until the first insert to the tree.
I have laid out a progression of tasks in the Project Specification section. You don’t have to do things in this order, but it is what I would recommend. Here’s a summary of the tasks and how they line up with the tests:
b_plus_tree_leaf_page.cpp
and b_plus_tree_internal_page.cpp
BPlusTreeLeafPage
and BPlusTreeInternalPage
except for those related to deletion/redistribution/coalescingLeafPageTest
and InternalPageTest
on the autograderb_plus_tree.cpp
(and potentially b_plus_tree.h
if you decide to use different helper methods than those in the starter code)isEmpty
, StartNewTree
, Insert
, InsertIntoLeaf
, and GetValue
index_iterator.h
and index_iterator.cpp
b_plus_tree_insert_test.cpp
(these are also included in the autograder)b_plus_tree.cpp
(and potentially b_plus_tree.h
if you decide to use different helper methods than those in the starter code)root_page_id_
and call UpdateRootPageId
b_plus_tree.cpp
(and potentially b_plus_tree.h
if you decide to use different helper methods than those in the starter code)Page
object provides to latch a node)root_page_id_
or start a new tree at a time (use a mutex)b_plus_tree_concurrent_test.cpp
(these are also on the autograder)b_plus_tree_leaf_page.cpp
, b_plus_tree_internal_page.cpp
, b_plus_tree.cpp
(and potentially b_plus_tree.h
if you decide to use different helper methods than those in the starter code)b_plus_tree_delete_test.cpp
(these are also on the autograder)CoalesceRedistributeTest
on the autograder)In general, the tests 1. Insert some things into the tree 2. Use GetValue
to verify that all keys are present in the tree 3. Use an index iterator to verify that all keys are in the correct order in the leaves
ToString
in b_plus_tree.cpp
#define LOGGING_ON
...
INDEX_TEMPLATE_ARGUMENTSbool BPLUSTREE_TYPE::Insert(const KeyType &key, const ValueType &value, Transaction *transaction) {
#ifdef LOGGING_ON
"Inserting %lld", key.ToInt64());
LOG_INFO(#endif
$ ./test/b_plus_tree_insert_test
Running main() from gmock_main.cc
Note: Google Test filter = *InsertTest*
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from BPlusTreeTests
[ RUN ] BPlusTreeTests.InsertTest1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:126:StartNewTree] INFO - Starting a new tree with 1, page id 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 2 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 3 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 2 off from 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 2 into page -1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:319:CreateNewRoot] INFO - Creating root page 3 for 1 and 2, inserting 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 2, holding latches on 3 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 4 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 4 off from 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 3 into page 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 5
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 5
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 4, holding latches on 3 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 5 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 5 off from 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 4 into page 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:256:Split] INFO - Splitting page 6 off from 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:289:InsertIntoParent] INFO - Inserting 3 into page -1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:319:CreateNewRoot] INFO - Creating root page 7 for 3 and 6, inserting 3
[ OK ] BPlusTreeTests.InsertTest1 (5 ms)
[ RUN ] BPlusTreeTests.InsertTest2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 5
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:126:StartNewTree] INFO - Starting a new tree with 5, page id 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 4
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 4 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 3
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 3 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 2
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 2 into a leaf
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:100:Insert] INFO - Inserting 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:190:InsertIntoLeaf] INFO - Finding leaf with 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:199:InsertIntoLeaf] INFO - Found leaf 1, holding latches on 1
2021-05-07 14:53:42 [Users/awb/Documents/teaching/cs334/bustub-cs334/src/storage/index/b_plus_tree.cpp:217:InsertIntoLeaf] INFO - Inserting 1 into a leaf
[ OK ] BPlusTreeTests.InsertTest2 (2 ms)
[----------] 2 tests from BPlusTreeTests (8 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (8 ms total)
[ PASSED ] 2 tests.
Like the Project 1, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the public methods in these classes. If you do this, then it will break the test code that we will use to grade your assignment. If a class already contains certain member variables, you should not remove them. You may add private helper functions/member variables to these classes in order to correctly realize the functionality. Furthermore, you are free to make changes, to ignore, or to use the private methods defined in the starter code. You can think of these as a suggested design rather than a requirement. The one exception to this is the private BPlusTree::ToString
method—several tests depend on this method and you should not modify it.
The correctness of B+ Tree index depends on the correctness of your implementation of buffer pool, we will not provide solutions for the previous programming projects. Make sure your buffer pool passes all the Project 1 tests before starting on this project. Those tests are part of the project 2 autograder, so you can use that to check.
As in previous projects, comments in the source contain additional information beyond what is presented below.
A key feature of B+ Trees in database systems is that each node of the tree (internal and leaf) is its own page. You will implement two Page classes to store the data of your B+ Tree:
BPlusTreeInternalPage
(src/include/storage/page/b_plus_tree_internal_page.h
and src/page/b_plus_tree_internal_page.cpp
)BPlusTreeLeafPage
(src/include/storage/page/b_plus_tree_leaf_page.h
and src/page/b_plus_tree_leaf_page.cpp
)Both of these classes extend the BPlusTreePage
class, which implements a number of simple methods common between the two types of pages (src/include/storage/page/b_plus_tree_page.h
and src/page/b_plus_tree_page.cpp
). This class also defines several fields (metadata) used by B+ Tree pages:
Field Name | Description |
---|---|
page_type_ |
The type of page (internal or leaf) |
lsn_ |
Log sequence number (not used in this project) |
size_ |
The number of entries in the page (number of pointers for internal, number of keys for leaf) |
max_size_ |
Maximum number of entries |
parent_page_id_ |
The id of the parent node’s page |
page_id_ |
The id of this node’s page |
An internal page does not store any real data, but instead stores an sorted array of size_ - 1
key entries and size_
child pointers (i.e., the page ids of child nodes) in a field called array
. This ordered array stores objects of type std::pair<KeyType, ValueType>
, aliased as MappingType
for readability. Since there are one fewer keys than points, the key of the first array entry (array[0].first
) is ignored. This means that the Lookup
method should start with the second key.
The value (page id) stored at index i points to a subtree in which all keys K satisfy: Ki < K ≤ Ki + 1. Put into C++, this means that array[i].second
is the page id of a node where all the keys are greater than array[i].first
and less than or equal to array[i + 1].first
.
One wrinkle is that the keys stored in our B+ Tree can’t necessarily be compared with <
, >
, etc. Thus, methods where keys need to compared take a parameter const KeyComparator &comparator
. You can call this on two keys like a function and it will return 1, 0, or -1 to indicate the relationship between the keys. Here’s an example usage:
if (comparator(key1, key2) < 0) {
// key1 < key2
else if (comparator(key1, key2) == 0) {
} // key1 == key2
else {
} // key1 > key2
}
At this point you should implement the methods needed to support insertion:
Init
: initialize the internal node’s metadataKeyAt
: return the key stored at a given indexSetKeyAt
: set the key stored at a given indexValueIndex
: return the index where a given value is storedValueAt
: return the value stored at a given indexLookup
: return the value which points to the child page that would contain the given keyPopulateNewRoot
: insert initial values into a newly created root nodeInsertNodeAfter
: insert a new key-value pair at the position following an existing valueMoveHalfTo
: move half the key-value pairs from this node to another nodeCopyNFrom
: private helper method for MoveHalfTo
BPlusTreeTests.InternalPageTest
on the autograder tests these methods.
A leaf page stores an sorted array of size_
keys and size_
record ids. Like the internal page, this array is made up of MappingType
(std::pair
) objects. The leaf page class also adds an additional field of metadata, next_page_id_
, that stores the page id of the leaf node’s sibling.
At this point you should implement the methods needed to support insertion:
Init
: initialize the internal node’s metadataGetNextPageId
/SetNextPageId
: get/set the next_page_id_
fieldKeyAt
: return the key stored at a given indexGetItem
: return the key-value pair (MappingType
) stored at a given indexLookup
: search for a given key, and set the output parameter (value
) to the corresponding valueInsert
: insert a new key-value pair at the appropriate sorted positionMoveHalfTo
: move half the key-value pairs from this node to another nodeCopyNFrom
: private helper method for MoveHalfTo
BPlusTreeTests.InternalPageTest
on the autograder tests these methods.
These B+ Tree pages will exist within the Page
objects managed by the buffer pool. In particular, they will be the data part of these pages. So given a page id page_id
of a B+ Tree page and a pointer to a buffer pool buffer_pool_manager_
, you would access the page as follows:
buffer_pool_manager_->FetchPage(page_id);
Page *page = reinterpret_cast<BPlusTreePage *>(page->GetData());
BPlusTreePage *bppage = if (bppage->IsLeafPage()) {
reinterpret_cast<LeafPage *>(bppage);
LeafPage *leaf = // do things with a leaf page
else {
} reinterpret_cast<InternalPage *>(bppage);
InternalPage *internal = // do things with an internal page
}buffer_pool_manager_->UnpinPage(page_id, true); // or false if the page was not modified
LeafPage
and InternalPage
are convenient aliases defined by the BPlusTree
class (lines 39-40 of src/include/storage/index/b_plus_tree.h
), so the above is a example of the kind ofcode you might write in your implementation.
The next task is to implement basic insertion into the B+ Tree that does not take into account node sizes or splitting. Your B+ Tree implementation will go in the BPlusTree
class defined in src/include/storate/index/b_plus_tree.h
and src/storage/index/b_plus_tree.h
. The BPlusTree
class has the following fields:
index_name_
: the string name of the indexroot_page_id_
: the page id of the root nodebuffer_pool_manager_
: the buffer pool manager the index will use to manage its pagescomparator_
: the comparator for the keys stored in the indexleaf_max_size_
and internal_max_size_
Initially there are no nodes in the tree, and root_page_id_
is INVALID_PAGE_ID
. Implement the following methods to support simple insertion:
IsEmpty
: return true
if no root node exists or if there are no keys stored in the root nodeInsert
: if the tree is empty, call StartNewTree
, otherwise call InsertIntoLeaf
StartNewTree
:
Init
method), and insert the key-value pair (call its Insert
method)root_page_id_
and call UpdateRootPageId
(this BPlusTree
method is already written in the starter code—it updates the root page id in the header page, a special page that keeps track of the root pages of all B+ Tree indexes in the database)InsertIntoLeaf
:
Lookup
method to get the page id of the appropriate child nodeInsert
method)GetValue
: (used by most of the tests)
InsertIntoLeaf
, this will always be the root until you implement splitting)Lookup
method to retrieve the value/determine that the key doesn’t existresult
(if key exists)Some of these methods take a Transaction *transaction
parameter. This will be used for keeping track of which pages you have latches on (Concurrent Index) and which pages should be deleted (Deletion), so you should ignore it for now.
Before you can run the insert tests, you need to implement a way for a client to scan the data stored in the B+ Tree (i.e., this is how the tests check the contents of your tree). This is where the IndexIterator
comes in.
You will build a general purpose index iterator to retrieve all the leaf pages efficiently. The basic idea is to organize them into a singly-linked list, and then traverse every key/value pair stored within the B+Tree leaf pages in ascending order. The leaf pages already form a singly-linked list by keeping track of the page id of the next leaf. So your task is to implement the IndexIterator
class. You will modify src/include/storage/index/index_interator.h
and src/storage/index/index_iterator.cpp
to do this.
First, you will need to decide what private fields the IndexIterator
will have. You can think of it as a cursor that will keep track of which key-value pair it is on, and be able to advance to the next key-value pair. Since these key-value pairs are stored in the B+ Tree leaf nodes, your iterator will likely need to know (1) which node it is on and (2) where within that node it is.
You will implement the following IndexIterator
methods:
isEnd
: return true
if the iterator points past the end of B+ Tree data (i.e., it has advanced past the last key-value pair)operator==
: return true
if this iterator is equal to the given iterator (this is like Java’s equals
method, except is lets us use ==
with IndexIterator
objects)operator!=
: return true
if this iterator is not equal to the given iteratoroperator*
: return the key-value pair (MappingType
) the iterator is currently on (this is overloading the dereference operator, *
)operator++
: advance the iterator to the next key-value pair and return a reference to itself
return *this;
), so your task is to implement advancing to the next pair++iterator
You will also modify the Begin
method in the BPlusTree
class to construct and return the appropriate iterator. Note that in order to support for-each loop function for your index, your BPlusTree
should implement begin
and end
(end
is used by the tests, begin
is not).
When an iterator is in the “end” state, it should be pointing past the end of the data it’s iterating over. That is, end
should return the same iterator you would get if you called begin
and then did ++iterator
until it advanced past the last element. This is useful because it gives us a way to check when an iterator is “done” (i.e., has iterated over all available data). To implement end
, think about what your iterator will do when it’s on the last element of the rightmost leaf and ++
is called. end
should return an iterator in the same state as that one would be.
The LeafPage
alias is not defined in the IndexIterator
class like it is in BPlusTree
, so use B_PLUS_TREE_LEAF_PAGE_TYPE *
as the type for pointers to leaf pages. As in:
reinterpret_cast<B_PLUS_TREE_LEAF_PAGE_TYPE *>(page->GetData()); B_PLUS_TREE_LEAF_PAGE_TYPE *leaf =
Once you finish this task, your code should pass the two provided tests in test/storage/b_plus_tree_insert_test.cpp
.
For this task you will return to the BPlusTree
class and split nodes that exceed their maximum size. We will make the simplifying assumption that you can always safely insert a key, and then split the node afterwards if its size exceeds its maximum size.1 All of the project tests will conform to this assumption.
Here is one approach to implementing splitting:
InsertIntoLeaf
to check the leaf’s size against the max size after insertion. If splitting is necessary, call Split
and then InsertIntoParent
.Split
: allocate a new page, initialize its metadata, and move half of the entries in node
to the new node. If node
is a leaf page, then treat the new node as a leaf page. Otherwise the new node is an internal page. Remember to update the next page ids if a leaf is split. Return the new node.
Split
can be called with either a leaf page or an internal pageIsLeafPage()
to determine the type of node
, and reinterpret_cast
to generate pointers of the necessary typesInsertIntoParent
: a recursive method to insert an entry into the parent node following a split
old_node
points to the current root), a new root should be createdInsertIntoParent
At the end of an insert operation (or any operation), make sure to unpin any pages you fetched using the buffer pool.
The provided b_plus_tree_print_test.cpp
will likely be a useful debugging tool. See Testing for more information. After this task, the autograder SplitTest
and ScaleTest
should pass.
Bustub is a multi-threaded database system, so our B+ Tree index needs to be thread safe. You will need to update your original single-threaded B+ Tree index so that it can support concurrent operations. You should use the latch crabbing technique described in class. The thread traversing the index will acquire then release latches on B+ Tree pages. A thread can only release latch on a parent page if its child page considered “safe”. Note that the definition of “safe” can vary based on what kind of operation the thread is executing:
The Page
object provides a reader-writer latch for this purpose. RLatch()
/RUnlatch()
for the read mode, WLatch()
and WUnlatch()
for the write mode.
You can use the transaction
to keep track of the Page
objects you have latched.
AddIntoPageSet(Page *page)
: add a page pointer to the set of pagesGetPageSet()
: returns a pointer to a std::deque
containing the added page pointersAt the end of any operation, iterate through the page set and release the held latches. You have to release the latch on that page BEFORE you unpin the same page from the buffer pool. If you are implementing concurrent B+ Tree index correctly, then every thread will ALWAYS acquire latch from root to bottom. When you release the latch, make sure you follow the same order (a.k.a from root to bottom) to avoid possible deadlock situation.
You are not allowed to use a global scope latch to protect your data structure. In other words, you may not lock the whole index and only unlock the latch when operations are done. We will check to make sure you are doing the latch crabbing in the right way.
After this task the provided tests in test/storage/b_plus_tree_concurrent_test.cpp
should pass.
Besides latch crabbing, there is one other component of the B+ Tree that needs to be made thread safe. When inserting (or deleting), the root_page_id_
may be updated. This creates a situation where two threads could be trying to modify this page id simultaneously. You can use a std::mutex
to protect this variable (i.e., allow only one thread to modify it at a time).
This concurrent modification issue extends to creating a new tree as well. Two threads could try and insert into an empty tree at the same time, both determine that it is empty, and both try and start a new tree. So this same mutex can be used to ensure only one thread enters the “potentially create a new tree” region of code at a time.
For all but 5 points of the lab grade, the only kind of deletion you need to support is removing the key and associated value from a leaf. No need to redistribute or coalesce. For this you will implement:
Remove
in BPlusTreeLeafPage
: delete the given key, if it exists, and shift array elements overRemove
in BPlusTree
: locate the leaf page that would contain the given key and call its Remove
methodOnce these two methods are implemented, the two provided tests in test/storage/b_plus_tree_delete_test.cpp
should pass.
If you’re looking to go above and beyond (or earn that last ~3% of the project grade), you can implement deletion with redistribution and coalescing. When a node falls below its minimum size, you should coalesce with a sibling unless the combined node would exceed the maximum size. In that case, redistribute one key-value pair from the sibling to the below-minimum-size node. The methods and comments in the starter code give a suggested breakdown for how to approach this. You’ll need to delete the pages for any nodes that get removed from the tree.
You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. A subset of the tests the autograder will run on your submission can be found in the test/storage
directory. The ones beginning with b_plus_tree
are relevant to this project.
Using test/index/b_plus_tree_print_test
, you can print out the internal data structure of your B+ Tree index, it’s an intuitive way to track and find mistakes.
cd build
make b_plus_tree_print_test
./test/b_plus_tree_print_test
The print test will display help information about commands you can enter to insert keys, delete keys, and print out your tree. You can also use b_plus_tree_print_test
to generate a dot file after constructing a tree:
$ ./test/b_plus_tree_print_test
> ... USAGE ...
> 5 5 // set leaf node and internal node max size to be 5
> f input.txt // Insert into the tree with some inserts
> g my-tree.dot // output the tree to dot format
> q // Quit the test (Or use another terminal)
You should now have a my-tree.dot
file with the DOT file format in the same directory as your test binary, you can then visualize it through either a command line visualizer or an online visualizer:
dot -Tpng -O my-tree.dot
, a PNG file will be generated.Consider the following example generated with GraphvizOnline:
P=?
tells you the page id of the tree page.Your code will also be evaluated for memory errors. These include problems such as array overflows and failing to deallocate memory. We will use the command
$ valgrind --leak-check=full --suppressions=../build_support/valgrind.supp ./test/b_plus_tree_insert_test --gtest_filter=BPlusTreeTests.InsertTest*
to check for these errors (run from the build
directory). See this guide for information about interpreting valgrind
output.
valgrind
and Debug mode
Configuring the project in Debug mode (either with cmake -DCMAKE_BUILD_TYPE=Debug ..
or in VS Code) will interfere with valgrind
. So when you go to check your code for memory errors, you will need to reconfigure and rebuild the project without Debug mode. In VS Code this should only require changing build variants and rebuilding. Via command line, you may need to remove the build
directory and rebuild from scratch.
Instead of using printf
statements for debugging, use the LOG_*
macros for logging information like this:
"# Pages: %d", num_pages);
LOG_INFO("Fetching page %d", page_id); LOG_DEBUG(
To enable logging in your project, you will need to reconfigure it like this (or select a build variant with debug info in VS Code):
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=DEBUG ..
$ make
The different logging levels are defined in src/include/common/logger.h
. After enabling logging, the logging level defaults to LOG_LEVEL_INFO
. Any logging method with a level that is equal to or higher than LOG_LEVEL_INFO
(e.g., LOG_INFO
, LOG_WARN
, LOG_ERROR
) will emit logging information. Note that you will need to add #include "common/logger.h"
to any file that you want to use the logging infrastructure.
Your code must follow the Google C++ Style Guide. We use clang
to automatically check the quality of your source code. The style part of project grade will be zero if your submission fails any of these checks.
Execute the following commands to check your syntax. The format
target will automatically correct the formatting of your code. The check-lint
and check-clang-tidy
targets will print errors and instruct you how to fix it to conform to our style guide.
$ make format
$ make check-lint
$ make check-clang-tidy
You will submit your work for this project via Gradescope.
Gradescope lets you submit via GitHub, which is probably the easiest method. All you’ll need to do is connect your GitHub account (the Gradescope submission page has a button for this) and select the repository and branch you wish to submit. Alternatively, you can create a zip file of the src
directory and upload that.
When you submit, the autograder will compile your code, run the test cases, check your code for style, and use valgrind to check for memory errors. It will use the following files (all other submitted files will be discarded):
Your Project 1 files:
src/include/buffer/lru_replacer.h
src/buffer/lru_replacer.cpp
src/include/buffer/buffer_pool_manager.h
src/buffer/buffer_pool_manager.cpp
as well as
src/include/storage/page/b_plus_tree_internal_page.h
src/storage/page/b_plus_tree_internal_page.cpp
src/include/storage/page/b_plus_tree_leaf_page.h
src/storage/page/b_plus_tree_leaf_page.cpp
src/include/storage/index/b_plus_tree.h
src/storage/index/b_plus_tree.cpp
src/include/storage/index/index_iterator.h
src/storage/index/index_iterator.cpp
These files must be at the given locations in your submission.
Although you are allowed submit your answers as many times as you like, you should not treat Gradescope as your only debugging tool. Many people may submit their projects near the deadline, and thus it will Gradescope take longer to process the requests. You may not get feedback in a timely manner to help you debug problems.
This project will be graded out of 150 points, as shown in the table below. Where multiple tests are listed, the points are split evenly between the tests. No tests will be run for a submission that does not compile. While the autograder will assign 0 points for a test that doesn’t pass, partial credit can be earned for evidence of a good-faith effort. Comments explaining your approach can help earn partial credit.
Test | Points |
---|---|
LeafPageTest and InternalPageTest |
30 points |
InsertTest1 and InsertTest2 |
30 points |
SplitTest |
20 points |
ScaleTest (inserting lots of keys2) |
8 points |
ScaleLowMemTest (like ScaleTest , but buffer pool has only 10 pages) |
2 points |
Concurrency tests (the five provided in the starter code) | 15 points |
DeleteTest1 and DeleteTest2 |
10 points |
CoalesceRedistributeTest |
5 points |
No style problems | 15 points |
No valgrind errors |
15 points |
You can attempt an implementation without making this assumption if you want an extra challenge. If such an implementation passes all the insertion-related tests, it will earn extra credit. Let me know by sending me an email describing your approach.↩︎
The two other differences from the insert tests are that the keys are not inserted in sorted order, and every key is inserted twice (the second time should have no effect)↩︎