Extensible Array

Table of Contents

1 Reading

No reading assigned for this topic. If you prefer (or want supplemental) textbook readings, this topic is based on sections 3.1–3.5 from Bailey. Sections 3.6–3.8 of Bailey explore applications of extensible arrays. Note: the Bailey book refers to the extensible array object as a Vector while I use ArrayList, but they are otherwise the same thing.

2 Return to Mean class

Let's say we wanted to (a) read the input values from a file and (b) store the values we read in so that we could perform additional statistics

2.1 Reading from a file

import edu.princeton.cs.algs4.In;

public class Mean {

    public static void main(String[] args) {
        In input = new In("data.txt");
        // compute the mean of the values in the file
    }
}
  • Did I make data.txt? Yes, yes I did.
  • Do I remember how many values are in it? Let's say I do not
  • The file has one number per line, and there could be a lot of lines

2.1.1 Separate variable for each?

import edu.princeton.cs.algs4.In;

public class Mean {

    public static void main(String[] args) {
        In input = new In("data.txt");
        // compute the mean of the values in the file

        int v1 = input.readInt();
        int v2 = input.readInt();
        int v3 = input.readInt();
        int v4 = input.readInt();

        System.out.println("the mean is " + (v1 + v2 + v3 + v4) / 4.0);
    }
}
  • We need to know exactly how many values there are…
  • And it is not remotely scalable (i.e., it will not work as the input size grows)

2.1.2 Array with hard-coded size?

import edu.princeton.cs.algs4.In;

public class Mean {

    public static void main(String[] args) {
        In input = new In("data.txt");
        // compute the mean of the values in the file
        int data[];
        int n = 10;
        data = new int[n];
        int valuesRead = 0;  // need to keep track of how many values we actually
                             // read in order to compute the mean
        for(int i = 0; i < n && input.hasNextLine(); i++) {
            data[i] = input.readInt();
            valuesRead++;
        }

        double sum = 0;
        for(int i = 0; i < valuesRead; i++) {
            sum += data[i];
        }

        // why don't we need to cast to double for the division to do what we want?
        System.out.println("the mean is " + sum / valuesRead);
    }
}
  • What if there are more than 10 data points?

2.1.3 Huge array?

import edu.princeton.cs.algs4.In;

public class Mean {

    public static void main(String[] args) {
        In input = new In("data.txt");
        // compute the mean of the values in the file
        int data[];
        int n = 100000000;
        data = new int[n];
        int valuesRead = 0;  // need to keep track of how many values we actually
                             // read in order to compute the mean
        for(int i = 0; i < n && input.hasNextLine(); i++) {
            data[i] = input.readInt();
            valuesRead++;
        }

        double sum = 0;
        for(int i = 0; i < valuesRead; i++) {
            sum += data[i];
        }

        // why don't we need to cast to double for the division to do what we want?
        System.out.println("the mean is " + sum / valuesRead);
    }
}
  • Potentially wasting huge amounts of memory

2.1.4 Get it from the user?

What if we get the number of data points from the user on the command line?

import edu.princeton.cs.algs4.In;

public class Mean {

    public static void main(String[] args) {
        In input = new In("data.txt");
        // compute the mean of the values in the file
        int data[];
        int n = Integer.parseInt(args[0]);
        data = new int[n];
        int valuesRead = 0;  // need to keep track of how many values we actually
                             // read in order to compute the mean
        for(int i = 0; i < n && input.hasNextLine(); i++) {
            data[i] = input.readInt();
            valuesRead++;
        }

        double sum = 0;
        for(int i = 0; i < valuesRead; i++) {
            sum += data[i];
        }

        // why don't we need to cast to double for the division to do what we want?
        System.out.println("the mean is " + sum / valuesRead);
    }
}
  • All we've done is force the user to pre-commit to a number of data points instead of us!

3 Solution: ArrayIntList

An extensible array that can grow as needed

  • Provides the abstraction of an array that can hold as many elements as needed
  • Will be implemented with a fixed-size array underneath
  • API, note how pre and post conditions are specified in comments (preconditions document assumptions/requirements at the time the method is called, postconditions document what will be true when the method returns)

    public class ArrayIntList  {
    
        // post: constructs an array  with capacity for 10 elements
        public ArrayIntList()
    
        // pre: 0 <= index && index < size()
        // post: returns the element stored in location index
        public int get(int index)
    
        // pre: 0 <= index && index < size()
        // post: existing value at index is changed to element; old value is returned
        public int set(int index, int element)
    
        // post: adds new element to end of possibly extended array
        public void add(int element)
    
        // pre: 0 <= index <= size()
        // post: inserts new value in the array with desired index,
        //       moving elements from index to size()-1 to the right
        public void add(int index, int element)
    
        // pre: 0 <= where && where < size()
        // post: indicated element is removed, size decreases by 1
        public int remove(int where)
    
        // post: returns true if there are no elements in the array
        public boolean isEmpty()
    
        // post: returns the size of the vector
        public int size()
    }
    

3.1 Use in Mean

import edu.princeton.cs.algs4.In;

public class Mean {

    public static void main(String[] args) {
        In input = new In("data.txt");

        // compute the mean of the values in the file
        ArrayIntList data = new ArrayIntList();
        while(input.hasNextLine()) {
            data.add(input.readInt());
        }

        double sum = 0;
        for(int i = 0; i < data.size(); i++) {
            sum += data.get(i);
        }

        System.out.println("the mean is " + sum / data.size());
    }
}
  • So how is ArrayIntList data magically exactly the size we need it to be?
  • The secret is it's not! It just provides the illusion/abstraction that it is

3.2 Constructor

private int[] elementData;     // the data
private int elementCount;      // number of elements in the array

// post: constructs a ArrayIntList with capacity for 10 elements
public ArrayIntList() {
    elementData = new int[10];
    elementCount = 0;
}

3.3 get, set

// pre: 0 <= index && index < size()
// post: returns the element stored in location index
public int get(int index) {
    return elementData[index];
}

// pre: 0 <= index && index < size()
// post: existing value at index is changed to element; old value is returned
public int set(int index, int element) {
    int previous = elementData[index];
    elementData[index] = element;
    return previous;
}
  • Like an array, we can read and write elements at any index (a property called random access, meaning we can access elements in any order)
  • Also like an array, these methods take a constant number of operations
    • In analyzing them, first ask if the number of operations they take depends on the data we are working with (i.e., do they take more operations if there are more elements in the array?)
    • Since they always take the same number of operations regardless, we say they are constant time

3.4 add, remove

// post: adds new element to end of possibly extended array
public void add(int element) {
    // ensureCapacity wasn't in our API---must be a private helper method
    ensureCapacity(elementCount + 1);
    elementData[elementCount] = element;
    elementCount++;
}

// pre: 0 <= index <= size()
// post: inserts new value in vector with desired index,
//        moving elements from index to size()-1 to right
public void add(int index, int element) {
    ensureCapacity(elementCount + 1);

    // must copy from right to left to avoid destroying data
    for (int i = elementCount; i > index; i--) {
        elementData[i] = elementData[i - 1];
    }

    elementData[index] = element;
    elementCount++;
}
  • The work of making sure elementData has enough room for another element is offloaded to the private helper method ensureCapacity
    • This is good style—we can call it from both versions of add without needing to copy-paste the code in two places.

insert-start-of-array.png

Figure 1: The incorrect (a) and correct (b) way of moving values in an array to make room for an inserted value.

// pre: 0 <= where && where < size()
// post: indicated element is removed, size decreases by 1
public int remove(int where) {
    int result = get(where);
    elementCount--;

    // shift data to the left to fill in removed element
    while (where < elementCount) {
        elementData[where] = elementData[where + 1];
        where++;
    }

    return result;
}    
  • How could we characterize the performance (i.e., number of operations) when elements of the array are shifted to the left or right?
    • Depending on the index, it could take just 1 operation, or a lot more…
    • In this kind of analysis, we're typically interested in the worst case (i.e., the most operations it could take)
    • The worst case would be when we have to shift the entire array
      • So if the array has \(n\) elements, it will take \(n\) shifts in the worst case
    • Now, how many operations does each shift take?
      • 4: a comparison (i > index), arithmetic (i - 1), assignment (elementData[i] = elementData[i - 1]), and a decrement (i--)
      • That is, each shift takes a constant number of operations
    • Thus, we say the overall array shift is take linear time, since the expression \(\mathrm{time} = 4n\) is a linear equation in terms of the size of our data, \(n\)

3.5 isEmpty, size

// post: returns true iff there are no elements in the vector
public boolean isEmpty() {
    return size() == 0;
}

// post: returns the size of the vector
public int size() {
    return elementCount;
}

3.6 ensureCapacity

// post: the capacity of this array is at least minCapacity
public void ensureCapacity(int minCapacity) {
    // only need to do something if we don't already have the capacity
    if (elementData.length < minCapacity) {
        int newLength = elementData.length;  // initial guess
        // double the size of our expanded array until it's big enough for minCapacity
        while (newLength < minCapacity) {
            newLength *= 2;
        }

        // guaranteed: newLength > elementData.length.
        int newElementData[] = new int[newLength];

        // copy old data to new array
        for (int i = 0; i < elementCount; i++) {
            newElementData[i] = elementData[i];
        }
        elementData = newElementData;  // reassign elementData to refer to the new, larger array
    }
}
  • Why double the size of the array?
    • It has the nice property of making these capacity increases average out to constant time over \(n\) elements added to the array
    • Suppose, for neatness only, that \(n\) is a power of 2, and that the array started with a capacity of 1.
      • What do we know? When the Vector was extended from capacity 1 to capacity 2, one element was copied.
      • When the array was extended from capacity 2 to capacity 4, two elements were copied.
      • When the array was extended from capacity 4 to capacity 8, four elements were copied.
      • This continues until the last extension, when the Vector had its capacity extended from \(\frac{n}{2}\) to \(n\). Then \(\frac{n}{2}\) elements had to be preserved.
      • The total number of times elements were copied is \(1 + 2 + 4 + \cdots + \frac{n}{2} = n - 1\)
      • \(n - 1\) copies to support an array of size \(n\) works out to an average of approximately 1 copy per element
      • Thus, there is a constant overhead in supporting each element of an array extended in this way.

3.7 Do we need ArrayDoubleList, ArrayStringList, etc?

  • ArrayObjectList? Would need casts on every access to an array element

    import edu.princeton.cs.algs4.In;
    
    public class Mean {
    
        public static void main(String[] args) {
            In input = new In("data.txt");
    
            // compute the mean of the values in the file
            ArrayObjectList data = new ArrayObjectList();
            while(input.hasNextLine()) {
                data.add(input.readInt());  // int automatically turned into an object
                                            // in Java, this automatic behavior is called "autoboxing"
            }
    
            double sum = 0;
            for(int i = 0; i < data.size(); i++) {
                // we would have to cast to int or Integer every time we retrieve an element
                sum += (int) data.get(i);  
            }
    
            System.out.println("the mean is " + sum / data.size());
        }
    }
    
  • Instead, create a generic ArrayList<E> that will do all the necessary casting for the user

    public class ArrayList<E> {
        private E[] elementData;
        private int elementCount;
    
        // post: constructs an array  with capacity for 10 elements
        public ArrayList() {
            elementData = (E[]) new Object[10];
            elementCount = 0;
        }
    
        // pre: 0 <= index && index < size()
        // post: returns the element stored in location index
        public E get(int index) {
            return elementData[index];
        }
    
        // pre: 0 <= index && index < size()
        // post: existing value at index is changed to element; old value is returned
        public E set(int index, E element) {
            E previous = elementData[index];
            elementData[index] = element;
            return previous;
        }
    
        // post: adds new element to end of possibly extended array
        public void add(E element) {
            ensureCapacity(elementCount + 1);
            elementData[elementCount] = element;
            elementCount++;
        }
    
        // pre: 0 <= index <= size()
        // post: inserts new value in vector with desired index,
        //        moving elements from index to size()-1 to right
        public void add(int index, E element) {
            ensureCapacity(elementCount + 1);
    
            // must copy from right to left to avoid destroying data
            for (int i = elementCount; i > index; i--) {
                elementData[i] = elementData[i - 1];
            }
    
            elementData[index] = element;
            elementCount++;
        }
    
        // pre: 0 <= where && where < size()
        // post: indicated element is removed, size decreases by 1
        public E remove(int where) {
            E result = get(where);
            elementCount--;
    
            // shift data to the left to fill in removed element
            while (where < elementCount) {
                elementData[where] = elementData[where + 1];
                where++;
            }
    
            return result;
        }
    
        // post: returns true iff there are no elements in the vector
        public boolean isEmpty() {
            return size() == 0;
        }
    
        // post: returns the size of the vector
        public int size() {
            return elementCount;
        }
    
        // post: the capacity of this array is at least minCapacity
        public void ensureCapacity(int minCapacity) {
            // only need to do something if we don't already have the capacity
            if (elementData.length < minCapacity) {
                int newLength = elementData.length; // initial guess
                // double the size of our expanded array until it's big enough for minCapacity
                while (newLength < minCapacity) {
                    newLength *= 2;
                }
    
                // guaranteed: newLength > elementData.length.
                E[] newElementData = (E[]) new Object[newLength];
    
                // copy old data to new array
                for (int i = 0; i < elementCount; i++) {
                    newElementData[i] = elementData[i];
                }
                elementData = newElementData; // reassign elementData to refer to the new, larger array
            }
        }
    }
    
    import edu.princeton.cs.algs4.In;
    
    public class Mean {
    
        public static void main(String[] args) {
            In input = new In("data.txt");
    
            // compute the mean of the values in the file
            // when we specify a type for a generic structure like this, it must be an object
            // so we have to use Integer instead of int
            // fortunately, Java will automatically convert to and from int and Integer as needed
            ArrayList<Integer> data = new ArrayList<Integer>();
            while(input.hasNextLine()) {
                data.add(input.readInt());
            }
    
            double sum = 0;
            for(int i = 0; i < data.size(); i++) {
                sum += data.get(i);
            }
    
            System.out.println("the mean is " + sum / data.size());
        }
    }
    

4 Practice Problems1

  1. What is the distinction between the capacity and size of an ArrayList?
  2. When inserting a value into an ArrayList why is it necessary to shift elements to the right starting at the high end of the ArrayList? (See this figure)
  3. Write the code to declare an ArrayList containing these five strings: "It", "was", "a", "stormy", "night". What is the size of the list? What is its type?
  4. Write code to insert two additional elements, "dark" and "and", at the proper places in the list to produce the following ArrayList as the result: ["It", "was", "a", "dark", "and", "stormy", "night"]
  5. Write code to change the second element's value to "IS", producing the following ArrayList as the result: ["It", "IS", "a", "dark", "and", "stormy", "night"]
  6. Write code to print out all Strings from the list that don't contain the letter "a". Your code should print

    It
    IS
    stormy
    night
    
  7. The implementation of java.util.ArrayList makes the ensureCapacity method public. Why is this useful?
  8. Write an ArrayIntList method, indexOf, that returns the index of an int in the ArrayIntList. What should the method return if no matching int can be found? What does java.util.ArrayList do in this case? How long does this operation take to perform, in the worst case?

Footnotes:

1

Solutions:

  1. The size is the number of storage locations logically available in the ArrayList. Usually, the size corresponds to the number of items stored within the ArrayList. The capacity is the number of memory references currently allocated to the ArrayList. The capacity indicates how large the ArrayList's size can get before the underlying array must be reallocated and copied over. Ideally, the capacity provides enough room so that the size can increase without a significant risk of reallocation.
  2. This avoids destroying data. Starting at the low end of the ArrayList causes each moved value to overwrite an unmoved value.
  3. The list's type is ArrayList<String> and its size is 5.

    ArrayList<String> list = new ArrayList<String>();
    list.add("It");
    list.add("was");
    list.add("a");
    list.add("stormy");
    list.add("night");
    
  4. list.add(3, "dark");
    list.add(4, "and");
    
  5. list.set(1, "IS");
    
  6. for (int i = 0; i < list.size(); i++) {
        if (!list.get(i).contains("a")) {
            System.out.prinlnt(list.get(i));
        }
    }
    
  7. Making this method public allows the user to increase the capacity of the ArrayList ahead of actually inserting elements. This decreases (or eliminates) the amount of copying that occurs.
  8. Like Java's ArrayList, this method returns -1 if the element cannot be found. In the worst case, this operations takes linear time, since it has to check each element once to verify that a matching value isn't in the array.

    public int indexOf(int elem) {
        for(int i = 0; i < elementCount; i++) {
            if (elem == elementData[i]) {
                return i;
            }
        }
        return -1;
    }