February 18, 2021
KdTree.java
to Lab 6. If you attempt any Challenges upload a text file named challenges.txt
describing what you did.In this lab you will implement a binary search tree for two-dimensional data. In particular, you will implement a k-d tree that uses 2D points as the keys (i.e., k = 2). This 2d-tree will support efficient range search (find all of the points contained in a query rectangle) and nearest-neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects and computer animation to speeding up neural networks and data mining.
The goals for this lab are
contains
, get
, and an initial version of put
by Tuesday, 2/23. Test by by writing your own main
method in KdTree.java
.points
method by Thursday, 2/25. Test using KdTreeVisualizer.java
.range
method by Saturday, 2/27. Test using RangeSearchVisualizer.java
.nearest
method by Monday, 3/1. Test using NearestNeighborVisualizer.java
.You will use the Point2D
and RectHV
classes from algs4 to represent two-dimensional points and rectangles for this lab.
The Point2D class (part of algs4) represents points in the plane. Here is the subset of its API that you may use:
public class Point2D implements Comparable<Point2D> {
// construct the point (x, y)
public Point2D(double x, double y)
// x-coordinate
public double x()
// y-coordinate
public double y()
// square of Euclidean distance between two points
public double distanceSquaredTo(Point2D that)
// does this point equal that object?
public boolean equals(Object that)
// string representation with format (x, y)
public String toString()
}
The class RectHV (part of algs4) represents axis-aligned rectanges. Here is the subset of its API that you may use:
public class RectHV {
// construct the rectangle [xmin, xmax] x [ymin, ymax]
public RectHV(double xmin, double ymin, double xmax, double ymax)
// minimum x-coordinate of rectangle
public double xmin()
// minimum y-coordinate of rectangle
public double ymin()
// maximum x-coordinate of rectangle
public double xmax()
// maximum y-coordinate of rectangle
public double ymax()
// does this rectangle contain the point p (either inside or on boundary)?
public boolean contains(Point2D p)
// does this rectangle intersect that rectangle (at one or more points)?
public boolean intersects(RectHV that)
// square of Euclidean distance from point p to closest point in rectangle
public double distanceSquaredTo(Point2D p)
// does this rectangle equal that object?
public boolean equals(Object that)
// string representation with format [xmin, xmax] x [ymin, ymax]
public String toString()
}
Here is a diagram of some of the geometric operations RectHV
provides:
Modify the file KdTree.java
that uses a 2d-tree to implement the API below (these methods are also included in the starter file). A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates.
public class KdTree<Value> {
// construct an empty 2d-tree of points
public KdTree()
// is the tree empty?
public boolean isEmpty()
// number of points
public int size()
// associate the value val with point p
public void put(Point2D p, Value val)
// value associated with point p
public Value get(Point2D p)
// does the tree contain point p?
public boolean contains(Point2D p)
// all points in the tree
public Iterable<Point2D> points()
// all points that are inside the rectangle (or on the boundary)
public Iterable<Point2D> range(RectHV rect)
// a nearest neighbor of point p; null if the tree is empty
public Point2D nearest(Point2D p)
}
points()
method must return the points in level order: first the root, then all children of the root (from left/bottom to right/top), then all grandchildren of the root (from left to right), and so forth. The level-order traversal of the above 2d-tree is (0.7, 0.2), (0.5, 0.4), (0.9, 0.6), (0.2, 0.3), (0.4, 0.7). This method is useful to assist you (when debugging) and us (when grading).The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest-neighbor search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the entire plane; the left and right children of the root correspond to the two rectangles split by the x-coordinate of the point at the root; and so forth.
Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a subtree only if it might contain a point contained in the query rectangle.
Nearest-neighbor search. To find a closest point to a given query point, start at the root and recursively search in both subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a node only if it might contain a point that is closer than the best one found so far.
Node data type. There are several reasonable ways to represent a node in a 2d-tree. One approach is to include the point, a link to the left/bottom subtree, a link to the right/top subtree, and an axis-aligned rectangle corresponding to the node. (Note the code below does not include a constructor.)
private class Node {
private Point2D p; // the point
private Value val; // the tree maps the point to this value
private RectHV rect; // the axis-aligned rectangle corresponding to this node
private Node lb; // the left/bottom subtree
private Node rt; // the right/top subtree
}
Here are some suggestions for how you might make progress on KdTree.java
:
Write isEmpty()
and size()
. These should be very easy.
Write a simplified version of put()
which does everything except set up the RectHV
for each node. Write the get()
and contains()
methods, and use these to test that put()
was implemented properly. Note that put()
and get()
are best implemented by using private helper methods analogous to those found in BST.java. I recommend using the orientation (vertical or horizontal) as an argument to these helper methods (a boolean
like isVertical
works nicely).
A common error is to not rely on your base case (or cases). For example, compare the following two get()
methods for searching in a BST:
private Value get(Node root, Key key) {
if (root == null) {
return null;
}
int cmp = key.compareTo(root.key);
if (cmp < 0) {
return get(root.left, key);
} else if (cmp > 0) {
return get(root.right, key);
} else {
return root.value;
}
}
private Value overlyComplicatedGet(Node root, Key key) {
if (root == null) {
return null;
}
int cmp = key.compareTo(root.key);
if (cmp < 0) {
if (root.left == null) {
return null;
} else {
return overlyComplicatedGet(root.left, key);
}
} else if (cmp > 0) {
if (root.right == null) {
return null;
} else {
return overlyComplicatedGet(root.right, key);
}
} else {
return root.value;
}
}
In the latter method, extraneous null
checks are made that would otherwise be caught by the base case. Trust in the base case. Your method may have additional base cases, and code like this becomes harder and harder to read and debug.
Implement the points()
method. Use this to check the structure of your k-d tree.
Add code to put()
which sets up the RectHV
for each Node.
Write the range()
method. Test your implementation using RangeSearchVisualizer.java
, which is described in the Testing section.
Write the nearest()
method. This is the hardest method. I recommend doing it in two stages, testing after each.
Test your implementation using NearestNeighborVisualizer.java
, which is described in the Testing section.
Is a point on the boundary of a rectangle considered inside it? Do two rectangle intersect if they have just one point in common? Yes and yes, consistent with the implementation of RectHV.java.
Can I use the distanceTo()
method in Point2D
and RectHV
? No, you may use only the subset of the methods listed. You should be able to accomplish the same result (more efficiently) with distanceSquaredTo()
.
What should I do if a point is inserted twice in the data structure? The data structure represents a map, so you should replace the old value with the new value.
What should points()
return if there are no points in the data structure? What should range()
return if there are no points in the range? The API says to return an Iterable<Point2D>
, so you should return an iterable with zero points.
What should nearest()
return if there are two (or more) nearest points? The API does not specify, so you may return any nearest point.
Testing put()
and points()
in KdTree
. The file KdTreeVisualizer.java
reads a sequence of points from a file (given as a command-line argument) and draws the corresponding k-d tree. It does so by reconstructing the k-d tree from the level-order traversal returned by points()
. Note that it assumes all points are inside the unit square (coordinates between 0 and 1). Included in the starter project are a number of sample input files and corresponding png files showing the expected visualization. The starter project is configure to prompt you for a command line arugment when running any of the testing files. You should see a box pop up at the top of the window prompting you to enter program arguments. Enter the name of one of the .txt
files included in the starter project (e.g., input10.txt). For a number of them, the expected output is included as a .png
of the same name.
Testing range()
and nearest()
in KdTree. Often a good way to test code is to perform the same sequence of operations on a new implementation (your KdTree
) and an existing, trusted implementedion (the provided PointMap
), and identify any discrepancies. PointMap
is a simple but inefficient map for 2d points that uses Java’s TreeMap
to implement all the same methods you are implementing for KdTree
. The key idea is to implement a reference solution in which you have confidence. The simplicty of PointMap
allows it to serve this purpose in your testing.
RangeSearchVisualizer.java
reads a sequence of points from a file (given as a command-line argument) and draws them to standard drawing. It also highlights the points inside the rectangle that the user selects by dragging the mouse. Specifically, it colors red the points returned by the method range()
in PointMap
and blue the points returned by the method range()
in KdTree
.NearestNeighborVisualizer.java
reads a sequence of points from a file (given as a command-line argument) and draws them to standard drawing. It also highlights the point closest to the mouse. Specifically, it colors red the point returned by the method nearest()
in PointMap
and blue the point returned by the method nearest()
in KdTree
.Warning: both of these clients will be slow for large inputs because (1) the methods in the PointMap
implementation are slow and (2) drawing the points is slow.
To test whether you are pruning correctly in range
and nearest
, you can write code in your KdTree
main
method to compare the speed of KdTree
and PointMap
. Correct pruning will cause KdTree
to be significantly faster. In my own tests, KdTree
’s range
is twice as fast without pruning, and more than 10 times as fast with pruning. nearest
is more dramatic, outperforming PointMap
by a factor of more than 1000 with the pruning rule described in Specification. The first optimization described in the Challenges section should make nearest
a further 20 times faster.
How do should you measure the performance of a method? Here is one reasonable approach.
input1M.txt
and insert those 1 million points into the k-d tree (and/or PointMap
), using integers as the values.nearest
(or whatever method you’re testing) with random points in the unit square that occur in t seconds (i.e., while(timer.elapsedTime() < t)
). You can use StopwatchCPU.To get a reliable estimate, choose a t that is neither negligible (e.g., less than 1 second) or astronomical (e.g., more than 1 hour). When measuring the CPU time, Do not include the time to read in the 1 million points or construct the k-d tree.
How do you generate a uniformly random point in the unit square? Make two calls to StdRandom.uniform(0.0, 1.0)
—one for the x-coordinate and one for the y-coordinate.
You are expected to submit files that contains no checkstyle
errors or warnings. You will lose 0.5 points per error (up to a maximum of 3) and per warning (up to a maximum of 2), as indicated in the Grading breakdown.
Consider attempting any or all of these additional challenges if you have time:
nearest
depends on quickly finding a nearby point. To do this, organize the recursive method so that when there are two possible subtrees to go down, you choose first the subtree that is on the same side of the splitting line as the query point; the closest point found while exploring the first subtree may enable pruning of the second subtree.range
, instead of checking whether the query rectangle intersects the rectangle corresponding to a node, it suffices to check only whether the query rectangle intersects the splitting line segment: if it does, then recursively search both subtrees; otherwise, recursively search the one subtree where points intersecting the query rectangle could be.RectHV
in each 2d-tree node (though it is probably wise in your first version). Save memory by only storing the splitting line segment (i.e., a min and a max coordinate, where the axis of those coordinates is implicitely determined by the level of the node in the tree).KdTree.java
: public Iterable<Point2D> nearest(Point2D p, int k)
. This method returns the k points that are closest to the query point (in any order); return all n points in the tree if n ≤ k. It must do this in an efficient manner, (i.e., using the technique from k-d tree nearest neighbor search, not from brute force). Once you’ve completed this method (and the first of the optimizations above), you’ll be able to run BoidSimulator.java
(which depends upon both Boid.java
and Hawk.java
). Behold their flocking majesty.public static KdTree buildTree(List<Point2D> points)
that takes in a list of points and constructs and returns a new balanced KdTree
. An approach for constructing a balanced tree is described here. Include in your submission evidence that your buildTree
constructs a correct and balanced k-d tree (e.g., write a main
method that uses it, prints out the height, and shows a correct level-order traversal).This lab will be graded out of 40 points as shown below. See the Testing section for advice on how to evaluate whether your implementation is correct. Partial credit will be awarded based on evidence of a good-faith effort to implement the related features. Comments explaining your approach can help earn partial credit.
Requirement | Points |
---|---|
points method returns correct level-order traversal |
10 points |
range method returns correct points |
5 points |
nearest method returns correct point |
5 points |
Correct pruning in range method |
5 points |
Correct pruning in nearest method |
5 points |
Submitted files do not have any checkstyle errors |
3 points |
Sbumitted files do not have any checkstyle warnings |
2 points |
Check-in post | 2 points |
KdTree.java compiles without warnings |
0.5 points |
Correct submission format (KdTree.java ) |
0.5 points |
Challenges | up to 4 points |
Adapted from Kevin Wayne and Josh Hug: https://www.cs.princeton.edu/courses/archive/fall20/cos226/assignments/kdtree/specification.php↩︎