Lab 6: k-d Tree

Aaron Bauer

February 18, 2022

Lab 6: k-d Tree1

Overview

In this lab you will implement a binary search tree for two-dimensional data. In particular, you will implement a k-d tree that uses 2D points as the keys (i.e., k = 2). This 2d-tree will support efficient range search (find all of the points contained in a query rectangle) and nearest-neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects and computer animation to speeding up neural networks and data mining.

The goals for this lab are

Preparatory Exercises

  1. Work through the practice problems from Trees for Multidimensional Data. There is a nice video walkthrough you can watch.
  2. Read this entire writeup
  3. Download the starter project, unzip it into its own folder, and open that folder in VS Code.

Suggested Timeline

Geometric Objects

You will use the Point2D and RectHV classes from algs4 to represent two-dimensional points and rectangles for this lab.

The Point2D class (part of algs4) represents points in the plane. Here is the subset of its API that you may use:

public class Point2D implements Comparable<Point2D> {

    // construct the point (x, y)
    public Point2D(double x, double y)

    // x-coordinate 
    public double x()

    // y-coordinate 
    public double y()

    // square of Euclidean distance between two points 
    public double distanceSquaredTo(Point2D that)

    // does this point equal that object? 
    public boolean equals(Object that)

    // string representation with format (x, y) 
    public String toString()

}

The class RectHV (part of algs4) represents axis-aligned rectangles. Here is the subset of its API that you may use:

public class RectHV {
    // construct the rectangle [xmin, xmax] x [ymin, ymax] 
    public RectHV(double xmin, double ymin, double xmax, double ymax)

    // minimum x-coordinate of rectangle 
    public double xmin()

    // minimum y-coordinate of rectangle 
    public double ymin()

    // maximum x-coordinate of rectangle 
    public double xmax()

    // maximum y-coordinate of rectangle 
    public double ymax()

    // does this rectangle contain the point p (either inside or on boundary)? 
    public boolean contains(Point2D p)

    // does this rectangle intersect that rectangle (at one or more points)? 
    public boolean intersects(RectHV that)

    // square of Euclidean distance from point p to closest point in rectangle 
    public double distanceSquaredTo(Point2D p)

    // does this rectangle equal that object? 
    public boolean equals(Object that)

    // string representation with format [xmin, xmax] x [ymin, ymax] 
    public String toString()

}

Here is a diagram of some of the geometric operations RectHV provides:

Specification

Modify the file KdTree.java that uses a 2d-tree to implement the API below (these methods are also included in the starter file). A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates.

public class KdTree<Value> {

    // construct an empty 2d-tree of points 
    public KdTree()

    // is the tree empty? 
    public boolean isEmpty()

    // number of points
    public int size()

    // associate the value val with point p
    public void put(Point2D p, Value val)

    // value associated with point p 
    public Value get(Point2D p)

    // does the tree contain point p? 
    public boolean contains(Point2D p)

    // all points in the tree 
    public Iterable<Point2D> points()

    // all points that are inside the rectangle (or on the boundary) 
    public Iterable<Point2D> range(RectHV rect)

    // a nearest neighbor of point p; null if the tree is empty 
    public Point2D nearest(Point2D p)
}

The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest-neighbor search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the entire plane; the left and right children of the root correspond to the two rectangles split by the x-coordinate of the point at the root; and so forth.

Implementation Advice

Node data type. There are several reasonable ways to represent a node in a 2d-tree. One approach is to include the point, a link to the left/bottom subtree, a link to the right/top subtree, and an axis-aligned rectangle corresponding to the node. (Note the code below does not include a constructor.)

private class Node {
   private Point2D p;     // the point
   private Value val;     // the tree maps the point to this value
   private RectHV rect;   // the axis-aligned rectangle corresponding to this node
   private Node lb;       // the left/bottom subtree
   private Node rt;       // the right/top subtree
}

Here are some suggestions for how you might make progress on KdTree.java:

  1. Write isEmpty() and size(). These should be very easy.

  2. Write a simplified version of put() which does everything except set up the RectHV for each node. Write the get() and contains() methods, and use these to test that put() was implemented properly. Note that put() and get() are best implemented by using private helper methods analogous to those found in BST.java. I recommend using the orientation (vertical or horizontal) as an argument to these helper methods (a boolean like isVertical works nicely).

    A common error is to not rely on your base case (or cases). For example, compare the following two get() methods for searching in a BST:

    private Value get(Node root, Key key) {
        if (root == null) {
            return null;
        }
        int cmp = key.compareTo(root.key);
        if (cmp < 0) {
            return get(root.left, key);
        } else if (cmp > 0) {
            return get(root.right, key);
        } else {
            return root.value;
        }
    }
    private Value overlyComplicatedGet(Node root, Key key) {
        if (root == null) {
            return null;
        }
        int cmp = key.compareTo(root.key);
        if (cmp < 0) {
            if (root.left == null) {
                return null;
            } else {
                return overlyComplicatedGet(root.left, key);
            }
        } else if (cmp > 0) {
            if (root.right == null) {
                return null;
            } else {
                return overlyComplicatedGet(root.right, key);
            }
        } else {
            return root.value;
        }
    }

    In the latter method, extraneous null checks are made that would otherwise be caught by the base case. Trust in the base case. Your method may have additional base cases, and code like this becomes harder and harder to read and debug.

  3. Implement the points() method. Use this to check the structure of your k-d tree.

  4. Add code to put() which sets up the RectHV for each Node.

  5. Write the range() method. Test your implementation using RangeSearchVisualizer.java, which is described in the Testing section.

  6. Write the nearest() method. This is the hardest method. I recommend doing it in two stages, testing after each.

    • First, find the nearest neighbor without pruning. That is, always explore both subtrees. Moreover, always explore the left subtree before the right subtree.
    • Next, implementing the pruining rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees).

    Test your implementation using NearestNeighborVisualizer.java, which is described in the Testing section.

Q&A

Is a point on the boundary of a rectangle considered inside it? Do two rectangle intersect if they have just one point in common? Yes and yes, consistent with the implementation of RectHV.java.

Can I use the distanceTo() method in Point2D and RectHV? No, you may use only the subset of the methods listed. You should be able to accomplish the same result (more efficiently) with distanceSquaredTo().

What should I do if a point is inserted twice in the data structure? The data structure represents a map, so you should replace the old value with the new value.

What should points() return if there are no points in the data structure? What should range() return if there are no points in the range? The API says to return an Iterable<Point2D>, so you should return an iterable with zero points.

What should nearest() return if there are two (or more) nearest points? The API does not specify, so you may return any nearest point.

Testing

Testing put() and points() in KdTree. The file KdTreeVisualizer.java reads a sequence of points from a file (given as a command-line argument) and draws the corresponding k-d tree. It does so by reconstructing the k-d tree from the level-order traversal returned by points(). Note that it assumes all points are inside the unit square (coordinates between 0 and 1). Included in the starter project are a number of sample input files and corresponding png files showing the expected visualization. The starter project is configure to prompt you for a command line argument when running any of the testing files. You should see a box pop up at the top of the window prompting you to enter program arguments. Enter the name of one of the .txt files included in the starter project (e.g., input10.txt). For a number of them, the expected output is included as a .png of the same name.

Testing range() and nearest() in KdTree. Often a good way to test code is to perform the same sequence of operations on a new implementation (your KdTree) and an existing, trusted implementation (the provided PointMap), and identify any discrepancies. PointMap is a simple but inefficient map for 2d points that uses Java’s TreeMap to implement all the same methods you are implementing for KdTree. The key idea is to implement a reference solution in which you have confidence. The simplicity of PointMap allows it to serve this purpose in your testing.

Warning: both of these clients will be slow for large inputs because (1) the methods in the PointMap implementation are slow and (2) drawing the points is slow.

To test whether you are pruning correctly in range and nearest, you can write code in your KdTree main method to compare the speed of KdTree and PointMap. Correct pruning will cause KdTree to be significantly faster. In my own tests, KdTree’s range is twice as fast without pruning, and more than 10 times as fast with pruning. nearest is more dramatic, outperforming PointMap by a factor of more than 1000 with the pruning rule described in Specification. The first optimization described in the Challenges section should make nearest a further 20 times faster.

How do should you measure the performance of a method? Here is one reasonable approach.

To get a reliable estimate, choose a t that is neither negligible (e.g., less than 1 second) or astronomical (e.g., more than 1 hour). When measuring the CPU time, Do not include the time to read in the 1 million points or construct the k-d tree.

How do you generate a uniformly random point in the unit square? Make two calls to StdRandom.uniform(0.0, 1.0)—one for the x-coordinate and one for the y-coordinate.

Style

You are expected to submit files that contains no checkstyle errors or warnings. You will lose 0.5 points per error (up to a maximum of 3) and per warning (up to a maximum of 2), as indicated in the Grading breakdown.

Challenges

Consider attempting any or all of these additional challenges if you have time:

Grading

This lab will be graded out of 40 points as shown below. See the Testing section for advice on how to evaluate whether your implementation is correct. Partial credit will be awarded based on evidence of a good-faith effort to implement the related features. Comments explaining your approach can help earn partial credit.

Requirement Points
points method returns correct level-order traversal 10 points
range method returns correct points 5 points
nearest method returns correct point 5 points
Correct pruning in range method 5 points
Correct pruning in nearest method 5 points
Submitted files do not have any checkstyle errors 3 points
Submitted files do not have any checkstyle warnings 2 points
Check-in post 2 points
KdTree.java compiles without warnings 0.5 points
Correct submission format (KdTree.java) 0.5 points
Challenges up to 4 points

  1. Adapted from Kevin Wayne and Josh Hug: https://www.cs.princeton.edu/courses/archive/fall20/cos226/assignments/kdtree/specification.php↩︎