Iterators

Table of Contents

Adpated from Lenhart and Jannen: https://williams-cs.github.io/cs136-f20-www/slides/Iterators.pdf

1 Video

Here is a video lecture for the material outlined below. If you prefer (or want supplemental) textbook readings, Bailey Chapter 8 covers this topic.

The video contains sections on

  1. Motivation (0:01)
  2. Analyzing a quadratic time operation (4:38)
  3. Iterator interface (8:50)
  4. Fibonacci Iterator (10:21)
  5. Iterating over structures (16:02)
  6. Inner classes (19:08)
  7. SinglyLinkedListIterator (22:06)
  8. SkipIterator (26:13)
  9. ReverseIterator (28:25)
  10. Iterators and for-each loops (29:46)

The Panopto viewer has table of contents entries for these sections: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=7bb540f7-0278-4c5b-b811-acb60122bfd2

2 Motivation

  • The problem: Efficient and uniform dispensing of values from data structures
  • The solution: The Iterator interface
    • Iterators as dispensers
    • Iterators as generators
    • Iterators as filters
  • Iterators that iterate over other Iterators?!
    • Yep, it's a thing
  • Iterators and for loops: The Iterable interface
    • Allows use of iterators with for-each

Here's a method (count) that counts the number of times a particular Object appears in a List. List is an interface provided by Java that specifies methods for a list-like structure. Both ArrayList and LinkedList implement this interface.

public static int count(List<Object> data, Object elem) { 
    int count = 0;
    for (int i = 0; i < data.size(); i++) { 
        Object obj = data.get(i);
        if (obj.equals(elem))  // <-- Curly brackets can be omitted from an if, for, or while
            count++;           // <-- when the body consists of a single line
    }
    return count;
}

(All Objects in Java have a default equals method (or define their own).) Does this count on all structures (that we have studied so far)?

2.1 Problems with count

  • get(i) not defined on stacks and queues (so count will not work at all)
  • get(i) is slow on some structures
    • Takes linear time on a singly- or doubly-linked list, since we have to loop through the nodes starting at the head (or tail)
    • What does this mean for the performance of count on linked lists?
      • Let's count up the steps:
      • get(0) visits 1 node
      • get(1) visits 2 nodes
      • get(2) visits 3 nodes
      • get(data.size() - 1) visits \(n\) nodes
      • \(1 + 2 + 3 + \cdots + n = (n + 1)\frac{n}{2} = \frac{n^2 + n}{2}\)
      • Since it takes a number of steps proportional to \(n^2\), we say it take quadratic or polynomial time
      • This is significantly worse than linear time as \(n\) gets really big
      • Intuitively, we have a loop over each element (a linear time operation) that contains within it a linear time operation (get)
        • linear \(\times\) linear = quadratic
  • How do we traverse data in structures in a general, efficient way?
    • Goal: data structure-specific for efficiency
    • Goal: use same interface to make general

3 Iterator Interface

Every data structure we've studies has provided some form of these common operations

  • size()
  • isEmpty()
  • add()
  • remove()

They also all provide a method for efficient data traversal: public Iterator<E> iterator(). Iterator<E> is an interface for classes that provide support for efficiently visiting all elements of a data structure. The common methods it defines can serve to dispense values for

  • Traversal of elements: Iteration
  • Production of values: Generation
  • Selection of values: Filtering

By providing a simple, minimal interface, the Iterator interface abstracts away details of how to access elements. This must be part of the customized implementation for each data structure.

public interface Iterator<E> {
    boolean hasNext(); // are there more elements in iteration?
    E next();          // return next element
}

4 Fibonacci Iterator

Here's a simple example of an iterator that generates values belonging to the sequence of numbers known as the Fibonacci Sequence. This sequence begins with two 1s, and then each subsequent number is the sum of the previous two numbers in the sequence. Thus, we have 1, 1, 2, 3, 5, …

Here's an iterator that effcienctly generates this sequence

import java.util.Iterator;
public class FibonacciNumbers implements Iterator<Integer> { 
    private int next;
    private int current;
    private int length;

    public FibonacciNumbers(int n) {
        length = n;
        next = 1;
        current = 1;
    } 

    public boolean hasNext() { 
        return length >= 0;
    }

    public Integer next() {
        length--;
        int temp = current; 
        current = next;
        next = temp + current; 
        return temp;
    }
}

Why Is This Cool? (it is)

  • We could calculate the \(i\)th Fibonacci number each time, but that would be slow
  • Observation: to find the \(n\)th Fibonacci number, we need to calculate the previous \(n-1\) Fib numbers…
  • But by storing some state, we can easily generate the next Fibonacci number in constant time
  • Knowledge about the structure of the problem helps us traverse the Fibonacci space efficiently one element at a time
  • We can do the same for data structures

5 Iterating Over Structures

Goal: Have a data structure produce an iterator that return the values of the structure in some order. How can we accomplish this?

  • Define an iterator class for the structure, for example
    class ArrayListIterator<E> implements Iterator<E> { ... }
    class SinglyLinkedListIterator<E> implements Iterator<E> { ... }
    
  • Provide a method in the data structure that returns an iterator
    public Iterator<E> iterator() { ... }
    

The details of hasNext() and next() often depend on the specific data structure. For example, our SinglyLinkedListIterator will hold a reference to the next node whose value it will return. In contrast our ArrayListIterator will hold the index of next element to return.

Given these two pieces, we rewrite count to use an iterator instead of repeated calls to get:

public int count (List<Object> data, Object elem) { 
    int count = 0;
    Iterator<Object> iter = data.iterator(); 
    while (iter.hasNext()) {
        if(elem.equals(iter.next())) 
            count++; 
    }
    return count;
}

Why do we need an iterator method in the data structure that returns an iterator? Why not just pass the data structure to the constructor for the iterator like this?

public SinglyLinkedListIterator<E>(SinglyLinkedList<E> list) {
      // code to construct the iterator
}

In this case we would only have access to list's public methods. We would be prevented from using the private instance variables of the data structure to efficiently implement our iterator. From within the data structure, however, we can access the instance variables of the structure and use them in the implementation of the iterator.

6 Inner Classes

One way we can give iterators the privileged access to a data structure's internal data we need is to implement them as an inner class of the data structure. In Java, we can nest one class definition within another. We have used this already in our linked list implementations—our ListNode has been an inner class of the SinglyLinkedList class:

public class SinglyLinkedList<E> {
    private class ListNode {
        E data;
        ListNode next;
    }
    ...
}

We could make ListNode an entirely separate class implemented in its own .java file, but there are some good reason to make it an inner class:

  • It simplifies the SinglyLinkedList implementation somewhat if it can directly manipulate the fields of the nodes. If ListNode was a separate class with private fields, we'd need to define getters and setters for SinglyLinkedList to use.
  • Logically, a ListNode isn't an object that should exist on its own. A ListNode should only exist as part of a SinglyLinkedList. By making it a private inner class, we ensure it's something that can only be used internally by SinglyLinkedList.
  • In general, anytime it would be useful to have a new type of object as part of a data structure's implementation, we'll create an inner class for that purpose.

This same idea applies to iterators! If we define our SinglyLinkedListIterator as an inner class within the SinglyLinkedList class, we can use the head field and directly access the next and data fields of the ListNode inner class:

import java.util.Iterator;
public class SinglyLinkedList<E> {
    private class ListNode {
        E data;
        ListNode next;
    }

    private ListNode head;
    private int count;

    ...

    private class SinglyLinkedListIterator<E> implements Iterator<E> {
        private ListNode current;

        public SinglyLinkedListIterator() {
            current = head;
        }

        public boolean hasNext() {
            return current != null;
        }

        public E next() {
            E value = current.data;
            current = current.next;
            return value;
        }
    }

    public Iterator<E> iterator() {
        return new SingleLinkedListIterator();
    }
}

Because this iterator keeps track of the current node it's on, it can return the value at the next node in constant time. This is a big improvement over repeated calls to the linear time get method (where we have to re-find the current node each time).

7 More Examples

We can also make iterators that filter the output of other iterators.

  • SkipIterator: Skips over a given value
  • ReverseIterator: Dispenses elements in the reverse order given by another iterator

7.1 SkipIterator

  • Problem: How can we filter out unwanted elements from an iterator iter?
  • Solution: Create another iterator that takes iter as a parameter to its constructor and uses that the methods of iter (with some extra steps)
    • The SkipIterator will ensure that the next element that iter would dispense is not the one we want to skip over!
    • Since SkipIterator only needs access to the public methods of another iterator (and not any class' internal data), we can implement it as its own separate class
/**
 * An iterator that returns the values from another iterator, skipping over a specific value
 * Note: this implementation doesn't handle the edge case where a 
 * skipped value is the last value in the original iterator
 */
import java.util.Iterator;
public class SkipIterator<E> implements Iterator<E> {
    private Iterator<E> elems;
    private E skipValue;

    public SkipIterator(Iterator<E> iter, E skipMe) {
        elems = iter;
        skipValue = skipMe;
    }

    public boolean hasNext() {
        return elems.hasNext()
    }

    public E next() {
        E returnVal = elems.next();
        while (returnVal.equals(skipValue) && elems.hasNext()) {
            returnVal = elems.next();
        }
        return returnVal;
    }
}

7.2 ReverseIterator

  • Problem: How can dispense the elements from an iterator iter in the opposite order from which iter would dispense them?
  • Solution: Create another iterator that
    • Creates a LinkedList called secretList
    • Fills it with the elements dispensed by iter
    • But stores them in reverse order
    • Then asks secretList for an iterator to itself to use for dispensing values
/**
 * An iterator that reverses the order of elements returned from another iterator.
 */
import java.util.LinkedList;
import java.util.Iterator;
public class ReverseIterator<E> implements Iterator<E> { 
    private Iterator<E> elems;
    public ReverseIterator(Iterator<E> iter) { 
        LinkedList<E> list = new LinkedList<>(); 
        while (iter.hasNext()) {
            // adding each element to front serves to reverse them---like a stack!
            list.addFirst(iter.next()); 
        }
        elems = list.iterator(); 
    }

    // All other methods dispatch to the underlying iterator. 
    public boolean hasNext() { 
        return elems.hasNext(); 
    } 
    public E next() { 
        return elems.next(); 
    }
}

8 Iterators and for-each

Recall that with arrays, we saw we can use a simplified form of the for loop (called a for-each loop):

String[] arr = new String[] {"All", "the", "world's", "a", "stage"};
for( String s : arr) {
    System.out.println(s);
}

We could also rewrite out count method from before to use a for-each loop:

public int count (List<Object> data, Object elem) { 
    int count = 0;
    for (Object obj : data) {
        if(elem.equals(obj)) 
            count++; 
    }
    return count;
}

Why does this work?? Because the List interface provides an iterator() method and we can use the for-each construct…

for( E elem : boxOfStuff ) { ... }

…as long as boxOfStuff implements the Iterable interface:

public interface Iterable<E> {
      public Iterator<E> iterator();
}

All of the Java structures we've studied so far (ArrayList, LinkedList, Stack, Queue, and Deque) implement Iterable, and so can all be used with a for-each loop.

9 Practice Problems1

  1. Suppose iter is an Iterator over some data structure. Write a loop using iter to print all the values of the data structure.
  2. Suppose that a is an ArrayList of Integer values (ArrayList<Integer>). Write a loop that will use an Iterator to print those Integer values that are even. Try to see if you can write two ways: using a while loop and using a for-each loop.
  3. This code attemps to use an iterator to find and return the longest string in a linked list, but is has a bug. What is the problem and how would you fix it?
    // returns the longest string in the list (does not work!)
    public static String longest(LinkedList<String> list) {
        Iterator<String> itr = list.iterator();
        String longest = itr.next(); // initialize to first element
        while (itr.hasNext()) {
            if (itr.next().length() > longest.length()) {
                longest = itr.next();
            }
        }
        return longest;
    }
    
  4. What is one benefit of making the list iterator into an inner class?
  5. Write a CharacterIterator class that iterates over the characters in a String. It should implement the Iterator<Character> interface, and next should return type Character. The constructor should take the String that the instance will iterate over. It may be useful to refer to the Java String documentation.

10 Learning Block

  • Lab 2
    • Lots of great questions on the check-in forum, make sure to check it out!
    • Adding dummy nodes does not change the efficiency of the list, but serves to greatly simplify some of the code
      • By eliminating edge cases for many of the add and remove methods (see preparatory exercises for examples of these edge cases)
      • Simpler code is easier to write correctly, easier to debug, easier to maintain/modify
    • In what sense are the dummy nodes transparent?
      • internally the list accounts for the dummy nodes, interacts with them, etc.
        • the head and tail fields always refer to the dummy nodes, and they are always the first and last nodes in the list
      • the effects of and values returned by public methods (i.e., the externally observable behavior of the list) are as if the dummy nodes don't exist
    • Method overloading
      • Joe asked a great question on the forum: why are there three removes?
        • private E remove(ListNode node)
        • public E remove(int index)
        • public E remove(E value)
        • We want all three because each does a different thing
        • But how does Java tell them apart?
          • The parameter types are part of a method's identity—two methods with the same name, but different parameter types are treated as entirely separate and don't conflict
          • When you call remove, you pass it a value. The type of this value tells Java which remove to use (the one that takes a value of that type)
      • This is called method overloading.
  • Remember to fill out the partner survey for lab 3!
  • Questions?
  • Form question
  • In pods: implement WordIterator
    • constructor takes a string
    • next returns the next word in the string
    • words are separated by spaces
    • s.split(" ") returns a String[] of parts of s separated by spaces
  • In pods: share lab progress

Footnotes:

1

Solutions:

  1. while (iter.hasNext()) {
        System.out.println(iter.next());
    }
    
  2. Iterator<Integer> iter = a.iterator();
    while (iter.hasNext()) {
        int val = iter.next();
        if (val % 2 == 0) { // check is the value is even
            System.out.println(val);
        }
    }
    

    or

    for (int val : a) {
        if (val % 2 == 0) { // check is the value is even
            System.out.println(val);
        }
    }
    
  3. The problem with the code is that its loop calls the next method on the iterator in two places: once when it tests the length of the string, and again when it tries to store the string as the longest. Each time you call next, the iterator advances by one position, so if it’s called twice in the loop, you’ll skip an element when you find a match. For example, if the list contains the elements ("oh", "hello", "how", "are", "you"), the program might see the "hello" and intend to store it, but the second call to next would actually cause it to store the element that follows "hello", namely, "how".

    The solution is to save the result of the itr.next() call into a variable. The following code would replace the while loop in the previous code:

    // this version of the code is correct
    while (itr.hasNext()) {
        String current = itr.next();
        if (current.length() > longest.length()) {
            longest = current;
        }
    }
    
  4. When the iterator is an inner class, it can directly access the fields of the enclosing list object.
  5. public class CharacterIterator implements Iterator<Character> {
        private String str; // String we are iterating over
        private int index;  // index of the next char to return
    
        public CharacterIterator(String s) {
            str = s;
            index = 0;
        }
    
        public boolean hasNext() {
            return index < str.length();
        }
    
        public Character next() {
            Character c = str.charAt(index);
            index++;
            return c;
        }
    }