Linked Lists II

1. Reading
2. List Interface
3. Singly-Linked List
- 3.1. Operations at the head
- 3.2. Operations at the tail
4. Upgrade: Tail Reference
5. Upgrade: Doubly-Linked List
6. ArrayList vs Singly-Linked List vs Doubly-Linked List
7. Practice Problems

1 Reading

After reviewing the material below, play around with the singly-linked list and doubly-linked list visualizations at https://visualgo.net/en/list. (Stacks, Queues, and Deques coming next week!)

2 `List` Interface

We've done a lot of directly manipulating linked list nodes to get our heads around a structure consisting of a chain of references. But to be a useful general-purpose data structure, our linked list is going to better encapsulation—ideally a single object with a bunch of useful methods like ArrayList. Let's start out by looking at the method signatures for some of the operations we'll want our linked list to support.

// post: returns number of elements in list
public int size();

// post: returns true iff list has no elements
public boolean isEmpty();

// post: value is added to beginning of list
public void addFirst(E value);

// post: value is added to end of list
public void addLast(E value);

// pre: list is not empty
// post: returns first value in list
public E getFirst();

// pre: list is not empty
// post: returns last value in list
public E getLast();

// pre: list is not empty
// post: removes first value from list
public E removeFirst();

// pre: list is not empty
// post: removes last value from list
public E removeLast();

// pre: 0 <= i < size()
// post: returns object found at that location
public E get(int i);

// pre: 0 <= i <= size()
// post: adds ith entry of list to value o
public void add(int i, E o);

// pre: 0 <= i < size()
// post: removes and returns object found at that location
public E remove(int i);

You may notice many of these methods are similar (or identical) to ArrayList methods. This is no accident! Extensible arrays and linked lists are two approaches to providing a list-like data structure. They have different properties, and are thus appropriate for different situations.

One important difference in the methods is the presence of methods explicitly for accessing/modifying the beginning and end (head and tail) of the list (e.g., getFirst, removeFirst, addLast). Such operations are so commonly used on linked lists that it makes sense to provide these versions (as opposed to doing everything with plain add and remove).

3 Singly-Linked List

We'll begin our LinkedList implementation by deciding what fields it should have. As we saw last time, with a reference to the first node in a chain, we can use the next fields to work our way to any subsequent node. So our data structure will keep track of the front (often call the head) of the list:

public class LinkedList<E> {
    private ListNode<E> head;
}

Note that I'm using Java generics to allow our linked list to be declared to store different types of data.

The ListNode class is familiar to us from the last video, but we need to define it somewhere our linked list can access it. Since a ListNode is an internal part of a LinkedList object, and not something intended to be used on its own, I'm going to make it an inner class. Java actually lets us define classes inside of other classes for exactly this purpose of defining object types we need for a particular implementation.

public class LinkedList<E> {
    private ListNode<E> head;

    private class ListNode<E> {
        private E data;
        private ListNode<E> next;

        ListNode(E data, ListNode<E> next) {
            this.data = data;
            this.next = next;
        }
    }
}

I declare the inner ListNode class to be private because I want it to only be available inside the definition of LinkedList.

Now on the constructor, which should initialize an empty list. An empty list is one with no nodes in it, and Java uses the special value null to indicate when an object variable doesn't refer to any actual object. Thus, our constructor would look like

public LinkedList() {
    head = null;
}

This naturally suggests an implementation for the isEmpty method:

public boolean isEmpty() {
    return head == null;
}

Before getting to methods that deal with adding and removing nodes, let's look at our size method. It's task is to count the number of nodes in the list and return that number. Last time we looked at how we could use a loop to traverse our chain of nodes, and we can use this same technique to count them.

public int size() {
    ListNode<E> current = head;
    int count = 0;
    while (current != null) {
        count++;
        current = current.next;
    }
    return count;
}

While this will work, from a performance perspective it's unsatifying. Every time our list is asked for its size, it will loop through all the nodes—an operation that is linear in the size of the list (if there are \(n\) nodes, our loop will take a number of operations proportional to \(n\)). A better design would be to keep track of the number of nodes as we add and remove them, so we don't have to re-count every time.

public class LinkedList<E> {
    private ListNode<E> head;
    private int count;

    private class ListNode<E> {
        private E data;
        private ListNode<E> next;

        ListNode(E data, ListNode<E> next) {
            this.data = data;
            this.next = next;
        }
    }

    public LinkedList() {
        head = null;
        count = 0;
    }

    public boolean isEmpty() {
        return head == null;
    }

    public int size() {
        return count;
    }
}

This is an example of trading off space for time. By adding a count field, our LinkedList object takes up slightly more memory. In return, our size method goes from linear time to constant time, which is a big improvement. These kind of tradeoffs are very common in data structure design (and software design in general).

3.1 Operations at the head

Alright, now we're ready to tackle addFirst and removeFirst.

public void addFirst(E data) {
    // construct a new node to be the head of the list
    // this constructor will set the next field of the new node
    // to refer to the current head
    ListNode<E> newHead = new ListNode(data, head);
    // update the list
    head = newHead;
    count++;
}

public E removeFirst() {
    if (isEmpty()) {
        // if the user attempts to remove from an empty list, throw an exception
        // with an appropriate error message
        throw new NoSuchElementException("Cannot remove from an empty list");
    }
    // save the head's data in a temporary variable, so we can return it later
    E data = head.data;
    // update the list
    head = head.next;
    count--;
    // return the data from the removed node
    return data;
}

To help visualize what these methods are doing, let's return to our trusty boxes and arrows. If we have an empty list and add two nodes via addFirst, what will it look like?

LinkedList<String> list = new LinkedList<>();
list.addFirst("hi");
list.addFirst("go");

         +-------+------+
         | count | head |
    list |   0   | null |
         +-------+------+


list.addFirst("hi")
                               newNode
         +-------+------+      +------+------+
         | count | head |      | data | next |
    list |   0   | null |      | "hi" | null |
         +-------+------+      +------+------+

         +-------+------+      +------+------+
         | count | head |      | data | next |
    list |   1   |   +--+--->  | "hi" | null |
         +-------+------+      +------+------+


list.addFirst("go");
                           +-------------------------------+
                           |                               |
                           |                               |
                           |   newNode                     V       
         +-------+------+  |   +------+------+      +------+------+      
         | count | head |  |   | data | next |      | data | next |      
    list |   1   |   +--+--+   | "go" |   +--+--->  | "hi" | null |      
         +-------+------+      +------+------+      +------+------+          

                               newNode                            
         +-------+------+      +------+------+      +------+------+      
         | count | head |      | data | next |      | data | next |      
    list |   2   |   +--+--->  | "go" |   +--+--->  | "hi" | null |      
         +-------+------+      +------+------+      +------+------+

How about then calling removeFirst on this list?

String val = list.removeFirst();

     +-------+------+      +------+------+      +------+------+      
     | count | head |      | data | next |      | data | next |      
list |   2   |   +--+--->  | "go" |   +--+--->  | "hi" | null |      
     +-------+------+      +------+------+      +------+------+          

     +-------+------+      +------+------+      +------+------+      
     | count | head |      | data | next |      | data | next |      
list |   2   |   +--+--->  | "go" |   +--+--->  | "hi" | null |      
     +-------+------+      +------+------+      +------+------+          

data "go"

                       +-------------------------------+
                       |                               |
                       |                               V       
     +-------+------+  |   +------+------+      +------+------+      
     | count | head |  |   | data | next |      | data | next |      
list |   1   |   +--+--+   | "go" |   +--+--->  | "hi" | null |      
     +-------+------+      +------+------+      +------+------+          

data "go"

The String stored in data will be returned. Java will automatically clean-up the removed node now that it is no longer in use.

It's possible to tell just by looking at the code for addFirst and removeFirst that these are constant time operations. The same number of steps are involved regardless of how many nodes are currently in the list.

3.2 Operations at the tail

What about adding and removing from the other end of the list?

public void addLast(E data) {
    // if the list is empty, addLast is equivalent to addFirst
    // since the loop after this if-statement assumes the list is not empty, 
    // we'll use addFirst which can handle an empty list
    if (isEmpty()) { 
        addFirst(data);
        return;
    }
    // construct the new node
    ListNode<E> newTail = new ListNode(data, null);
    // find the current tail node
    ListNode<E> current = head;
    while (current.next != null) {
        current = current.next;
    }
    // update the list
    current.next = newTail;
    count++;
}

public E removeLast() {
    // if the list is empty or has only one element, removeLast is equivalent to removeFirst
    if (size() <= 1) {
        return removeFirst();
    }

    // find the node right before the current tail node
    ListNode<E> current = head;
    while (current.next.next != null) {
        current = current.next;
    }
    // update the list
    current.next = null;
    count--;
}

Unfortunately, manipulating the end of the list (called the tail) isn't as easy or efficient as manipulating the head. Since we only have a reference to the head, each of these methods has to use a loop to traverse the list until it gets to the current tail node (we can distinguish the tail node because the last node in the list always has a next value of null). Traversing the list is a linear time operation because we have to go around the loop once for each node currently in the list. That is, if there are \(n\) nodes in our list, we have to move current to the next node \(n-1\) times in order to reach the tail node.

Ideally, we want our tail-related methods to be constant time like our head-related methods. We'll use the same strategy we used to improve the size method from linear time to constant time—trading off space for time.

4 Upgrade: Tail Reference

The slow part of the addLast and removeLast implementations above is the loop to find the tail node. What if we could just always keep track of which node is the tail node like we do with the head node? Then we could do away with the loop and achieve a constant time addLast and removeLast.

First, let's augment out implementation with a field tail that keeps track of the tail node:

public class LinkedList<E> {
    private ListNode<E> head;
    private ListNode<E> tail;
    private int count;

    private class ListNode<E> {
        private E data;
        private ListNode<E> next;

        ListNode(E data, ListNode<E> next) {
            this.data = data;
            this.next = next;
        }
    }

    public LinkedList() {
        head = null;
        tail = null;
        count = 0;
    }

    // ...
}

Seems simple enough, but how do we use it? Here's our addLast now that we have a tail field:

public void addLast(E data) {
    ListNode<E> newTail = new ListNode(data, null);
    // update the list
    tail.next = newTail;
    tail = newTail;
    count++;
}

This is definitely an upgrade from the old addLast! It's much simpler and now a constant time operation.

What about removeLast? Here we run into a problem. Removing the tail node requires setting the next field of the previous node to null. But we have no way to get a reference to that previous node other than traversing the list from the head. That is, the tail node doesn't keep track of the previous node in the list. In order to remove from the end of the list efficiently, we will need to modify our list design.

5 Upgrade: Doubly-Linked List

Here we continue the tradeoff of space and time. To make removing from the tail efficient in term of time, we will have each node maintain a reference to the previous node in the list.

private class ListNode<E> {
    private ListNode<E> prev;
    private E data;
    private ListNode<E> next;

    ListNode(E data, ListNode<E> next, ListNode<E> prev) {
        this.prev = prev;
        this.data = data;
        this.next = next;
    }
}

This will make each node take up more space, but allows us to implement an efficient removeLast method:

public E removeLast() {
    if (size() <= 1) {
        return removeFirst();
    }
    tail = tail.previous;
    tail.next = null;
    count--;
}

We call this form of linked list a doubly-linked list due to the fact that each node has two links: one to the next node and one to the preivous node. When our nodes only have a link to the next node, we say we have a singly-linked list.

6 `ArrayList` vs Singly-Linked List vs Doubly-Linked List

Now that we have these three different structures for representing a list, how would we go about choosing which one to use? It will depend on which list operations you need to be efficient—there's no one choice that's always best. Need to frequently remove from the front of the list? A either type of linked list can do this in constant time, so you would use one of those. Also removing from the end of the list? That narrows it down to a doubly-linked list. On the other hand, if you need to frequently access elements in the middle of the list, an ArrayList will do that far more efficiently (i.e., constant time).

Here's a chart comparing these three structures across a variety of operations:

Operation	`ArrayList`	Singly-Linked List	Doubly-Linked List
`size`	constant time	constant time	constant time
`addLast`	constant time or linear time (if resize)	linear time	constant time
`removeLast`	constant time	linear time	constant time
`getLast`	constant time	linear time	constant time
`addFirst`	linear time	constant time	constant time
`removeFirst`	linear time	constant time	constant time
`getFirst`	constant time	constant time	constant time
`get(index)`	constant time	linear time	linear time
`set(index)`	constant time	linear time	linear time
`remove(index)`	linear time	linear time	linear time
`contains`	linear time	linear time	linear time
`remove(element)`	linear time	linear time	linear time

7 Practice Problems¹

Intuitively, we might think that addFirst and removeFirst only modify the head field of the list, while addLast and removeLast only modify the tail field. However, there are situations where each of these operations must modify both head and tail. (Such special cases are often called edge cases or boundary cases). When would each of these operations need to modify both?
The chart above shows that both the ArrayList and linked lists take linear time to remove an element from a specific position in the list (remove(index)). Why is this? Is it the same reason for both data structures or different reasons?
When you add or remove the element found at index i of a singly-linked list, you must create a temporary current node reference and advance it through the list. At which index's node should the loop stop, relative to i?
In an ArrayList, it is possible to overrun the capacity of the array, at which point the list must be resized to fit. Is resizing necessary on a linked list? What limits the number of elements that a linked list can have?
Write an implementation of the public E remove(int index) method for a doubly-linked list.
Write an implementation of the public boolean contains(E element) method for a doubly-linked list.

Footnotes:

Solutions:

When addFirst or addLast is called on an empty list, both head and tail need to be updated to refer to the new node (otherwise one would still be null). When removeFirst or removeLast are called on a list with one node, both head and tail need to be set to null (otherwise one of them would still refer to the removed node).
For ArrayList, remove takes linear time due to shifting elements over to fill in the slot in the array used by the removed element. For a linked list it takes linear time to find the node at a specific index.
The loop should stop and index i - 1, the index before the one to add or remove. This is so that you can adjust the preceding node's next reference.
Resizing is not necessary for a linked list, since more nodes can be dynamically allocated. The only limit on the number of elements is the amount of memory available to the Java virtual machine.

public E remove(int index) {
    if (index >= size()) {  // generate an error if there is no element at index
        throw new NoSuchElementException();  
    }
    // find the node at index
    ListNode<E> current = head;
    currentIndex = 0;
    while (currentIndex != index) {
        current = current.next;
        currentIndex++;
    }
    // remove the node, accounting for edge cases where node is at the beginning or end of the list
    if (current.prev) { // update previous node's next to skip over this node (if prev isn't null)
        current.prev.next = current.next;
    }
    if (current.next) { // update next node's prev to skip over this node (if next isn't null)
        current.next.prev = current.prev
    }
    elementCount--;
    return current.data;
}

public boolean contains(E element) {
    // traverse the list, comparing element with the data at each node
    ListNode<E> current = head;
    // head will be null for an empty list, so we don't need a special case
    while (current != null) { 
        if (current.data.equals(element)) {
            return true;
        }
        current = current.next;
    }
    // if we exit the loop, we've compared to each node and none were equal
    // so element is not in the list
    return false;
}