Advanced Search Trees

Table of Contents

1 Reading

After watching the video, read Bailey 14.5 for an overview of one advanced variation of a binary search tree: the splay tree. Optionally read 14.6 for splay-tree implementation details and 14.7 for a description of a type of self-balancing tree: the red-black tree.

2 Video

Here is a video lecture for today's lesson.

The video contains sections on

  1. Learning goals (0:36)
  2. Analyzing BST efficiency (0:55)
  3. AVL trees (5:50)
  4. Balanced tree analysis (17:55)

The Panopto viewer has table of contents entries for these sections: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=1c6475ea-9a42-4096-86a0-acd10028270c

3 Learning Goals

After this lesson you will be able to

  1. Analyze the efficiency of tree operations in terms of tree size and structure
  2. Describe, at a high level, why and how self-balancing trees provide better performance

4 Analyzing BST efficiency

Recall that the height of a node is the distance from that node to its farthest leaf descendant, and that the height of a tree is the height of the root node.

  • All BST operations, whether searching for a node, adding a node, or removing a node, involve starting at the root and proceeding down the left or right subtree.
  • In the worst case, this will continue until a leaf node is reached.
  • The largest number of nodes that might be encountered along this path is equal to the height of the tree, \(h\) (since height is defined as the length of the longest path to a leaf).
    • So all our BST operations are \(O(h)\)
  • What will \(h\) be for a BST with \(n\) nodes? Let's look at a few examples:
    • \(n = 7\), \(h = 2\)

      bst-height1.png

    • \(n = 15\), \(h = 3\)

      bst-height2.png

    • In these full trees (i.e., maximum nodes without increasing the height), doubling \(n\) increased height by 1
    • Thus, for these kind of balanced trees (left and right subtrees have similar number of nodes), \(h\) is \(\log n\) for a tree of \(n\) nodes
    • \(n = 7, h = 6\)

      bst-height3.png

    • But the same values can also form a valid tree where \(h\) is \(n - 1\)

Since BST operations are \(O(h)\), it's unfortunate that \(h\) will be \(n - 1\) in the worst case. But what if there were some way to efficiently ensure that our tree was balanced (i.e., \(h = \log n\))?

5 AVL Trees

An AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree.

  • Structural properties
    • Binary search tree property (nodes to the left are smaller, nodes to the right are larger)
    • NEW: balance property
      • The difference in heights between the left subtree and the right subtree must be at most 1
  • Result:
    • Worst-case height is \(O(\log n)\)

A valid AVl tree (blue numbers are the height at each node):

avl1.png

Not a valid AVL tree (red height numbers indicate nodes with children out of balance)

avl2.png

How do we enforce the balance property?

  • As we insert and delete elements, we need to:
    • Track balance
    • Detect imbalance
    • Restore balance

We will need to store height at each node:

avl-node.png

Searching the tree works exactly the same as it does for a BST For adding

  • we first add as normal for a BST
  • then fix any resulting imbalance

avl-insert.png

More specifically,

  • Insert the new node as in a BST (a new leaf)
  • For each node on the path from the root to the new leaf, the insertion may (or may not) have changed the node's height
  • So after recursive insertion in a subtree, detect height imbalance and perform a rotation to restore balance at that node

All the action is in defining the correct rotations to restore balance!

  • Single rotation: The basic operation we'll use to rebalance
    • Move child of unbalanced node into parent position
    • Parent becomes the "other" child (always okay in a BST!)
    • Other subtrees move in only way BST allows.

avl-rotation.png

A rotation is a constant time (\(O(1)\)) operation! Like removing from a BST, there are several different cases to handle when restoring balance to an AVL Tree. We won't go through them in detail, but we'll look at some animations to get a sense of what's going on.

There are also other kinds of balanced binary search trees. Two common ones, splay trees and red-black trees, are discussed in Bailey. AVL Trees provide the fastest searches because they are strictly balanced. Red-black trees and splay trees have more relaxed invariants, and so are faster at adding and removing nodes with fewer rotations. Java's TreeMap class uses a red-black tree.

5.1 Animations

https://visualgo.net/en/bst (select AVL Tree in the bar at the top of the page)

5.2 Analysis

Now that we have guaranteed that a tree with \(n\) nodes will have height \(\log n\), our search tree performance is looking pretty good! Not as efficient as a hash table for some operations, but able to efficiently provide operations related to the sorted order of the keys that a hash table is unsuited for. Here's a chart of the big-O running time for various structures on different Map operations:

Operation Unsorted array Sorted array Hash table Balanced BST (e.g., AVL Tree)
contains(Key key) \(O(n)\) \(O(\log n)\) \(O(1)\) \(O(\log n)\)
get(Key key) \(O(n)\) \(O(\log n)\) \(O(1)\) \(O(\log n)\)
put(Key key, Value val) \(O(1)\) \(O(n)\) \(O(1)\) \(O(\log n)\)
minKey() \(O(n)\) \(O(1)\) \(O(n)\) \(O(\log n)\)
maxKey() \(O(n)\) \(O(1)\) \(O(n)\) \(O(\log n)\)
iterate over the keys in order \(O(n\log n)\) \(O(n)\) \(O(n\log n)\) \(O(n)\)

6 Practice Problems1

  1. Give five orderings of the keys A X C S E R H that, when inserted into an initially empty BST, produce the best-case (i.e., balanced) tree.
  2. Give nonrecursive implementations of get and put for a BST.
  3. Write a method isBST(Node node) that takes a Node as argument and returns true if the argument node is the root of a binary search tree, false otherwise.
  4. How many nodes are in a full binary tree of height \(x\)?
  5. Which of the following trees are valid AVL trees?

    (A)

    avl-practice1.png

    (B)

    avl-practice2.png

    (C)

    avl-practice3.png

    (D)

    avl-practice4.png

    (E)

    avl-practice5.png

7 Learning Block

  • Summary
    • All BST operations involve traversing the tree from the root to a leaf in the worst case
    • The number of nodes visited in this traversal is the height of the tree
    • So BST operations are \(O(h)\) where \(h\) is the height
    • This is bad because \(h\) could be \(n - 1\) in the worst case, where \(n\) is the number of nodes in the tree
    • Enter the AVL tree, a self-balancing BST
      • Enforces a balance property: heights of the children can differ by at most 1
      • Performs BST operations as normal
      • But after each one, fixes any imbalance using rotations
      • This guarantees \(\log n\) height
    • A balanced BST can perform both Map operations and operations related to the sorted order of the keys in logarithmic time
  • Questions?
    • Quiz deadline moved, see Moodle announcement
  • Form questions
  • In pods: using BST.java as a starting point, write a method public Iterable<Key> keys() that returns an iterable structure (like a list) containing the keys stored in the tree in sorted order
    • Hint: you will probably want a recursive helper method
    • Hint: you are building up a list of things in a recursive method, so you could use a similar approach to this aspect as you did for the recursive helper in lab 4
    • Test by replacing the main method with this one. You should see numbers printed out in order if your keys method is correct. (You will need to add import java.util.Random to the top of the file.)

      public static void main(String[] args) {
          BST<Integer, String> bst = new BST<>();
          Random rng = new Random();
          for(int i = 0; i < 20; i++) {
              bst.put(rng.nextInt(100), "hello I am a cool value");
          }
          for (int k : bst.keys()) {
              System.out.print(k + " ");
          }
      }
      
  • Heads up: lab 6 may be posted Saturday instead of Friday, depending on how soon I can get it ready

Footnotes:

1

Solutions:

  1. Any sequence that inserts H first; C before A and E; S before R and X.
  2. // Insert key-value pair into symbol table (nonrecursive version).
    public void put(Key key, Value val) {
        Node z = new Node(key, val);
        if (root == null) {
            root = z;
            return;
        }
    
        Node parent = null;
        Node x = root;
        while (x != null) {
            parent = x;
            int cmp = key.compareTo(x.key);
            if      (cmp < 0) x = x.left;
            else if (cmp > 0) x = x.right;
            else {
                x.value = val;
                return; 
            }
        }
        int cmp = key.compareTo(parent.key);
        if (cmp < 0) parent.left  = z;
        else         parent.right = z;
    }
    
    // Search BST for given key, nonrecursive version.
    public Value get(Key key) {
        Node x = root;
        while (x != null) {
            int cmp = key.compareTo(x.key);
            if      (cmp < 0) x = x.left;
            else if (cmp > 0) x = x.right;
            else return x.value;
        }
        return null;
    }   
    
  3. public boolean isBST(Node node) {
        if (node == null) return true; // an empty tree is a valid BST
        // each child must be empty or have an appropriate key and itself be a valid BST
        boolean leftCheck = node.left == null || (node.left.key < node.key && isBST(node.left));
        boolean rightCheck = node.right == null || (node.right.key > node.key && isBST(node.right));
        return leftCheck && rightCheck;
    }
    
  4. A full tree of height \(x\) will have \(2^{x+1} - 1\) nodes
  5. (B) and (E) are AVL trees, as they follow the balance property (all five follow the BST ordering property)