February 18, 2022
KdTree.java
to Lab
6. If you attempt any Challenges upload a
text file named challenges.txt
describing what you
did.In this lab you will implement a binary search tree for two-dimensional data. In particular, you will implement a k-d tree that uses 2D points as the keys (i.e., k = 2). This 2d-tree will support efficient range search (find all of the points contained in a query rectangle) and nearest-neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects and computer animation to speeding up neural networks and data mining.
The goals for this lab are
contains
, get
, and an initial
version of put
by Wednesday, 2/23. Test by by writing your
own main
method in KdTree.java
.points
method by Friday, 2/25. Test using
KdTreeVisualizer.java
.range
method by Monday, 2/28. Test using
RangeSearchVisualizer.java
.nearest
method by Wednesday, 3/2. Test
using NearestNeighborVisualizer.java
.You will use the Point2D
and RectHV
classes
from algs4
to represent two-dimensional points and
rectangles for this lab.
The Point2D class (part of algs4) represents points in the plane. Here is the subset of its API that you may use:
public class Point2D implements Comparable<Point2D> {
// construct the point (x, y)
public Point2D(double x, double y)
// x-coordinate
public double x()
// y-coordinate
public double y()
// square of Euclidean distance between two points
public double distanceSquaredTo(Point2D that)
// does this point equal that object?
public boolean equals(Object that)
// string representation with format (x, y)
public String toString()
}
The class RectHV (part of algs4) represents axis-aligned rectangles. Here is the subset of its API that you may use:
public class RectHV {
// construct the rectangle [xmin, xmax] x [ymin, ymax]
public RectHV(double xmin, double ymin, double xmax, double ymax)
// minimum x-coordinate of rectangle
public double xmin()
// minimum y-coordinate of rectangle
public double ymin()
// maximum x-coordinate of rectangle
public double xmax()
// maximum y-coordinate of rectangle
public double ymax()
// does this rectangle contain the point p (either inside or on boundary)?
public boolean contains(Point2D p)
// does this rectangle intersect that rectangle (at one or more points)?
public boolean intersects(RectHV that)
// square of Euclidean distance from point p to closest point in rectangle
public double distanceSquaredTo(Point2D p)
// does this rectangle equal that object?
public boolean equals(Object that)
// string representation with format [xmin, xmax] x [ymin, ymax]
public String toString()
}
Here is a diagram of some of the geometric operations
RectHV
provides:
Modify the file KdTree.java
that uses a 2d-tree to
implement the API below (these methods are also included in the starter
file). A 2d-tree is a generalization of a BST to two-dimensional keys.
The idea is to build a BST with points in the nodes, using the x- and
y-coordinates of the points as keys in strictly alternating sequence,
starting with the x-coordinates.
public class KdTree<Value> {
// construct an empty 2d-tree of points
public KdTree()
// is the tree empty?
public boolean isEmpty()
// number of points
public int size()
// associate the value val with point p
public void put(Point2D p, Value val)
// value associated with point p
public Value get(Point2D p)
// does the tree contain point p?
public boolean contains(Point2D p)
// all points in the tree
public Iterable<Point2D> points()
// all points that are inside the rectangle (or on the boundary)
public Iterable<Point2D> range(RectHV rect)
// a nearest neighbor of point p; null if the tree is empty
public Point2D nearest(Point2D p)
}
points()
method
must return the points in level order: first the root, then all children
of the root (from left/bottom to right/top), then all grandchildren of
the root (from left to right), and so forth. The level-order traversal
of the above 2d-tree is (0.7, 0.2), (0.5, 0.4), (0.9, 0.6), (0.2, 0.3),
(0.4, 0.7). This method is useful to assist you (when debugging) and us
(when grading).The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest-neighbor search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the entire plane; the left and right children of the root correspond to the two rectangles split by the x-coordinate of the point at the root; and so forth.
Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a subtree only if it might contain a point contained in the query rectangle.
Nearest-neighbor search. To find a closest point to a given query point, start at the root and recursively search in both subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a node only if it might contain a point that is closer than the best one found so far.
Node data type. There are several reasonable ways to represent a node in a 2d-tree. One approach is to include the point, a link to the left/bottom subtree, a link to the right/top subtree, and an axis-aligned rectangle corresponding to the node. (Note the code below does not include a constructor.)
private class Node {
private Point2D p; // the point
private Value val; // the tree maps the point to this value
private RectHV rect; // the axis-aligned rectangle corresponding to this node
private Node lb; // the left/bottom subtree
private Node rt; // the right/top subtree
}
Here are some suggestions for how you might make progress on
KdTree.java
:
Write isEmpty()
and size()
. These
should be very easy.
Write a simplified version of put()
which does
everything except set up the RectHV
for each node. Write
the get()
and contains()
methods, and use
these to test that put()
was implemented properly. Note
that put()
and get()
are best implemented by
using private helper methods analogous to those found in BST.java.
I recommend using the orientation (vertical or horizontal) as an
argument to these helper methods (a boolean
like
isVertical
works nicely).
A common error is to not rely on your base case (or cases). For
example, compare the following two get()
methods for
searching in a BST:
private Value get(Node root, Key key) {
if (root == null) {
return null;
}
int cmp = key.compareTo(root.key);
if (cmp < 0) {
return get(root.left, key);
} else if (cmp > 0) {
return get(root.right, key);
} else {
return root.value;
}
}
private Value overlyComplicatedGet(Node root, Key key) {
if (root == null) {
return null;
}
int cmp = key.compareTo(root.key);
if (cmp < 0) {
if (root.left == null) {
return null;
} else {
return overlyComplicatedGet(root.left, key);
}
} else if (cmp > 0) {
if (root.right == null) {
return null;
} else {
return overlyComplicatedGet(root.right, key);
}
} else {
return root.value;
}
}
In the latter method, extraneous null
checks are made
that would otherwise be caught by the base case. Trust in the base case.
Your method may have additional base cases, and code like this becomes
harder and harder to read and debug.
Implement the points()
method. Use this to check the
structure of your k-d tree.
Add code to put()
which sets up the
RectHV
for each Node.
Write the range()
method. Test your implementation
using RangeSearchVisualizer.java
, which is described in the
Testing section.
Write the nearest()
method. This is the hardest
method. I recommend doing it in two stages, testing after each.
Test your implementation using
NearestNeighborVisualizer.java
, which is described in the
Testing section.
Is a point on the boundary of a rectangle considered inside it? Do two rectangle intersect if they have just one point in common? Yes and yes, consistent with the implementation of RectHV.java.
Can I use the distanceTo()
method in
Point2D
and RectHV
? No, you may use
only the subset of the methods listed. You should be able to accomplish
the same result (more efficiently) with
distanceSquaredTo()
.
What should I do if a point is inserted twice in the data structure? The data structure represents a map, so you should replace the old value with the new value.
What should points()
return if there are no
points in the data structure? What should range()
return if
there are no points in the range? The API says to return an
Iterable<Point2D>
, so you should return an iterable
with zero points.
What should nearest()
return if there are two
(or more) nearest points? The API does not specify, so you may
return any nearest point.
Testing put()
and points()
in
KdTree
. The file
KdTreeVisualizer.java
reads a sequence of points from a
file (given as a command-line argument) and draws the corresponding k-d
tree. It does so by reconstructing the k-d tree from the level-order
traversal returned by points()
. Note that it assumes all
points are inside the unit square (coordinates between 0 and 1).
Included in the starter project are a number of sample input files and
corresponding png files showing the expected visualization. The starter
project is configure to prompt you for a command line argument when
running any of the testing files. You should see a box pop up at the top
of the window prompting you to enter program arguments. Enter the name
of one of the .txt
files included in the starter project
(e.g., input10.txt). For a number of them, the expected output is
included as a .png
of the same name.
Testing range()
and nearest()
in
KdTree. Often a good way to test code is to perform the same
sequence of operations on a new implementation (your
KdTree
) and an existing, trusted implementation (the
provided PointMap
), and identify any discrepancies.
PointMap
is a simple but inefficient map for 2d points that
uses Java’s TreeMap
to implement all the same methods you
are implementing for KdTree
. The key idea is to implement a
reference solution in which you have confidence. The simplicity of
PointMap
allows it to serve this purpose in your
testing.
RangeSearchVisualizer.java
reads a sequence of
points from a file (given as a command-line argument) and draws them to
standard drawing. It also highlights the points inside the rectangle
that the user selects by dragging the mouse. Specifically, it colors red
the points returned by the method range()
in
PointMap
and blue the points returned by the method
range()
in KdTree
.NearestNeighborVisualizer.java
reads a
sequence of points from a file (given as a command-line argument) and
draws them to standard drawing. It also highlights the point closest to
the mouse. Specifically, it colors red the point returned by the method
nearest()
in PointMap
and blue the point
returned by the method nearest()
in
KdTree
.Warning: both of these clients will be slow for large inputs because
(1) the methods in the PointMap
implementation are slow and
(2) drawing the points is slow.
To test whether you are pruning correctly in range
and
nearest
, you can write code in your KdTree
main
method to compare the speed of KdTree
and
PointMap
. Correct pruning will cause KdTree
to
be significantly faster. In my own tests, KdTree
’s
range
is twice as fast without pruning, and more than 10
times as fast with pruning. nearest
is more dramatic,
outperforming PointMap
by a factor of more than 1000 with
the pruning rule described in Specification. The first optimization
described in the Challenges section should
make nearest
a further 20 times faster.
How do should you measure the performance of a method? Here is one reasonable approach.
input1M.txt
and insert those 1 million
points into the k-d tree (and/or PointMap
), using integers
as the values.nearest
(or
whatever method you’re testing) with random points in the unit square
that occur in t seconds (i.e.,
while(timer.elapsedTime() < t)
). You can use StopwatchCPU.To get a reliable estimate, choose a t that is neither negligible (e.g., less than 1 second) or astronomical (e.g., more than 1 hour). When measuring the CPU time, Do not include the time to read in the 1 million points or construct the k-d tree.
How do you generate a uniformly random point in the unit square? Make
two calls to StdRandom.uniform(0.0, 1.0)
—one for the
x-coordinate and one for the y-coordinate.
You are expected to submit files that contains no
checkstyle
errors or warnings. You will lose 0.5
points per error (up to a maximum of 3) and per warning (up to a maximum
of 2), as indicated in the Grading breakdown.
Consider attempting any or all of these additional challenges if you have time:
nearest
depends on quickly finding a nearby point. To do
this, organize the recursive method so that when there are two possible
subtrees to go down, you choose first the subtree that is on the same
side of the splitting line as the query point; the closest point found
while exploring the first subtree may enable pruning of the second
subtree.range
, instead of checking whether the
query rectangle intersects the rectangle corresponding to a node, it
suffices to check only whether the query rectangle intersects the
splitting line segment: if it does, then recursively search both
subtrees; otherwise, recursively search the one subtree where points
intersecting the query rectangle could be.RectHV
in each 2d-tree node (though it is probably wise in
your first version). Save memory by only storing the splitting line
segment (i.e., a min and a max coordinate, where the axis of those
coordinates is implicitly determined by the level of the node in the
tree).KdTree.java
:
public Iterable<Point2D> nearest(Point2D p, int k)
.
This method returns the k points that are closest to the query
point (in any order); return all n points in the tree if n ≤ k. It must do this in
an efficient manner, (i.e., using the technique from k-d tree nearest
neighbor search, not from brute force). Once you’ve completed this
method (and the first of the optimizations above), you’ll be able to run
BoidSimulator.java
(which depends upon both
Boid.java
and Hawk.java
). Behold their
flocking majesty.public static KdTree buildTree(List<Point2D> points)
that takes in a list of points and constructs and returns a new
balanced KdTree
. An approach for constructing a
balanced tree is described here.
Include in your submission evidence that your buildTree
constructs a correct and balanced k-d tree (e.g., write a
main
method that uses it, prints out the height, and shows
a correct level-order traversal).This lab will be graded out of 40 points as shown below. See the Testing section for advice on how to evaluate whether your implementation is correct. Partial credit will be awarded based on evidence of a good-faith effort to implement the related features. Comments explaining your approach can help earn partial credit.
Requirement | Points |
---|---|
points method returns correct level-order
traversal |
10 points |
range method returns correct points |
5 points |
nearest method returns correct point |
5 points |
Correct pruning in range method |
5 points |
Correct pruning in nearest method |
5 points |
Submitted files do not have any checkstyle errors |
3 points |
Submitted files do not have any checkstyle
warnings |
2 points |
Check-in post | 2 points |
KdTree.java compiles without warnings |
0.5 points |
Correct submission format (KdTree.java ) |
0.5 points |
Challenges | up to 4 points |
Adapted from Kevin Wayne and Josh Hug: https://www.cs.princeton.edu/courses/archive/fall20/cos226/assignments/kdtree/specification.php↩︎