Extensible Array
Table of Contents
1 Reading
No reading assigned for this topic. If you prefer (or want supplemental) textbook readings, this topic is based on sections 3.1–3.5 from Bailey. Sections 3.6–3.8 of Bailey explore applications of extensible arrays. Note: the Bailey book refers to the extensible array object as a Vector
while I use ArrayList
, but they are otherwise the same thing.
2 Return to Mean
class
Let's say we wanted to (a) read the input values from a file and (b) store the values we read in so that we could perform additional statistics
2.1 Reading from a file
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file } }
- Did I make
data.txt
? Yes, yes I did. - Do I remember how many values are in it? Let's say I do not
- The file has one number per line, and there could be a lot of lines
2.1.1 Separate variable for each?
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file int v1 = input.readInt(); int v2 = input.readInt(); int v3 = input.readInt(); int v4 = input.readInt(); System.out.println("the mean is " + (v1 + v2 + v3 + v4) / 4.0); } }
- We need to know exactly how many values there are…
- And it is not remotely scalable (i.e., it will not work as the input size grows)
2.1.2 Array with hard-coded size?
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file int data[]; int n = 10; data = new int[n]; int valuesRead = 0; // need to keep track of how many values we actually // read in order to compute the mean for(int i = 0; i < n && input.hasNextLine(); i++) { data[i] = input.readInt(); valuesRead++; } double sum = 0; for(int i = 0; i < valuesRead; i++) { sum += data[i]; } // why don't we need to cast to double for the division to do what we want? System.out.println("the mean is " + sum / valuesRead); } }
- What if there are more than 10 data points?
2.1.3 Huge array?
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file int data[]; int n = 100000000; data = new int[n]; int valuesRead = 0; // need to keep track of how many values we actually // read in order to compute the mean for(int i = 0; i < n && input.hasNextLine(); i++) { data[i] = input.readInt(); valuesRead++; } double sum = 0; for(int i = 0; i < valuesRead; i++) { sum += data[i]; } // why don't we need to cast to double for the division to do what we want? System.out.println("the mean is " + sum / valuesRead); } }
- Potentially wasting huge amounts of memory
2.1.4 Get it from the user?
What if we get the number of data points from the user on the command line?
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file int data[]; int n = Integer.parseInt(args[0]); data = new int[n]; int valuesRead = 0; // need to keep track of how many values we actually // read in order to compute the mean for(int i = 0; i < n && input.hasNextLine(); i++) { data[i] = input.readInt(); valuesRead++; } double sum = 0; for(int i = 0; i < valuesRead; i++) { sum += data[i]; } // why don't we need to cast to double for the division to do what we want? System.out.println("the mean is " + sum / valuesRead); } }
- All we've done is force the user to pre-commit to a number of data points instead of us!
3 Solution: ArrayIntList
An extensible array that can grow as needed
- Provides the abstraction of an array that can hold as many elements as needed
- Will be implemented with a fixed-size array underneath
API, note how
pre
andpost
conditions are specified in comments (preconditions document assumptions/requirements at the time the method is called, postconditions document what will be true when the method returns)public class ArrayIntList { // post: constructs an array with capacity for 10 elements public ArrayIntList() // pre: 0 <= index && index < size() // post: returns the element stored in location index public int get(int index) // pre: 0 <= index && index < size() // post: existing value at index is changed to element; old value is returned public int set(int index, int element) // post: adds new element to end of possibly extended array public void add(int element) // pre: 0 <= index <= size() // post: inserts new value in the array with desired index, // moving elements from index to size()-1 to the right public void add(int index, int element) // pre: 0 <= where && where < size() // post: indicated element is removed, size decreases by 1 public int remove(int where) // post: returns true if there are no elements in the array public boolean isEmpty() // post: returns the size of the vector public int size() }
3.1 Use in Mean
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file ArrayIntList data = new ArrayIntList(); while(input.hasNextLine()) { data.add(input.readInt()); } double sum = 0; for(int i = 0; i < data.size(); i++) { sum += data.get(i); } System.out.println("the mean is " + sum / data.size()); } }
- So how is
ArrayIntList data
magically exactly the size we need it to be? - The secret is it's not! It just provides the illusion/abstraction that it is
3.2 Constructor
private int[] elementData; // the data private int elementCount; // number of elements in the array // post: constructs a ArrayIntList with capacity for 10 elements public ArrayIntList() { elementData = new int[10]; elementCount = 0; }
3.3 get
, set
// pre: 0 <= index && index < size() // post: returns the element stored in location index public int get(int index) { return elementData[index]; } // pre: 0 <= index && index < size() // post: existing value at index is changed to element; old value is returned public int set(int index, int element) { int previous = elementData[index]; elementData[index] = element; return previous; }
- Like an array, we can read and write elements at any index (a property called random access, meaning we can access elements in any order)
- Also like an array, these methods take a constant number of operations
- In analyzing them, first ask if the number of operations they take depends on the data we are working with (i.e., do they take more operations if there are more elements in the array?)
- Since they always take the same number of operations regardless, we say they are constant time
3.4 add
, remove
// post: adds new element to end of possibly extended array public void add(int element) { // ensureCapacity wasn't in our API---must be a private helper method ensureCapacity(elementCount + 1); elementData[elementCount] = element; elementCount++; } // pre: 0 <= index <= size() // post: inserts new value in vector with desired index, // moving elements from index to size()-1 to right public void add(int index, int element) { ensureCapacity(elementCount + 1); // must copy from right to left to avoid destroying data for (int i = elementCount; i > index; i--) { elementData[i] = elementData[i - 1]; } elementData[index] = element; elementCount++; }
- The work of making sure
elementData
has enough room for another element is offloaded to the private helper methodensureCapacity
- This is good style—we can call it from both versions of
add
without needing to copy-paste the code in two places.
- This is good style—we can call it from both versions of
Figure 1: The incorrect (a) and correct (b) way of moving values in an array to make room for an inserted value.
// pre: 0 <= where && where < size() // post: indicated element is removed, size decreases by 1 public int remove(int where) { int result = get(where); elementCount--; // shift data to the left to fill in removed element while (where < elementCount) { elementData[where] = elementData[where + 1]; where++; } return result; }
- How could we characterize the performance (i.e., number of operations) when elements of the array are shifted to the left or right?
- Depending on the index, it could take just 1 operation, or a lot more…
- In this kind of analysis, we're typically interested in the worst case (i.e., the most operations it could take)
- The worst case would be when we have to shift the entire array
- So if the array has \(n\) elements, it will take \(n\) shifts in the worst case
- Now, how many operations does each shift take?
- 4: a comparison (
i > index
), arithmetic (i - 1
), assignment (elementData[i] = elementData[i - 1]
), and a decrement (i--
) - That is, each shift takes a constant number of operations
- 4: a comparison (
- Thus, we say the overall array shift is take linear time, since the expression \(\mathrm{time} = 4n\) is a linear equation in terms of the size of our data, \(n\)
3.5 isEmpty
, size
// post: returns true iff there are no elements in the vector public boolean isEmpty() { return size() == 0; } // post: returns the size of the vector public int size() { return elementCount; }
3.6 ensureCapacity
// post: the capacity of this array is at least minCapacity public void ensureCapacity(int minCapacity) { // only need to do something if we don't already have the capacity if (elementData.length < minCapacity) { int newLength = elementData.length; // initial guess // double the size of our expanded array until it's big enough for minCapacity while (newLength < minCapacity) { newLength *= 2; } // guaranteed: newLength > elementData.length. int newElementData[] = new int[newLength]; // copy old data to new array for (int i = 0; i < elementCount; i++) { newElementData[i] = elementData[i]; } elementData = newElementData; // reassign elementData to refer to the new, larger array } }
- Why double the size of the array?
- It has the nice property of making these capacity increases average out to constant time over \(n\) elements added to the array
- Suppose, for neatness only, that \(n\) is a power of 2, and that the array started with a capacity of 1.
- What do we know? When the Vector was extended from capacity 1 to capacity 2, one element was copied.
- When the array was extended from capacity 2 to capacity 4, two elements were copied.
- When the array was extended from capacity 4 to capacity 8, four elements were copied.
- This continues until the last extension, when the Vector had its capacity extended from \(\frac{n}{2}\) to \(n\). Then \(\frac{n}{2}\) elements had to be preserved.
- The total number of times elements were copied is \(1 + 2 + 4 + \cdots + \frac{n}{2} = n - 1\)
- \(n - 1\) copies to support an array of size \(n\) works out to an average of approximately 1 copy per element
- Thus, there is a constant overhead in supporting each element of an array extended in this way.
3.7 Do we need ArrayDoubleList
, ArrayStringList
, etc?
ArrayObjectList
? Would need casts on every access to an array elementimport edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file ArrayObjectList data = new ArrayObjectList(); while(input.hasNextLine()) { data.add(input.readInt()); // int automatically turned into an object // in Java, this automatic behavior is called "autoboxing" } double sum = 0; for(int i = 0; i < data.size(); i++) { // we would have to cast to int or Integer every time we retrieve an element sum += (int) data.get(i); } System.out.println("the mean is " + sum / data.size()); } }
Instead, create a generic
ArrayList<E>
that will do all the necessary casting for the userpublic class ArrayList<E> { private E[] elementData; private int elementCount; // post: constructs an array with capacity for 10 elements public ArrayList() { elementData = (E[]) new Object[10]; elementCount = 0; } // pre: 0 <= index && index < size() // post: returns the element stored in location index public E get(int index) { return elementData[index]; } // pre: 0 <= index && index < size() // post: existing value at index is changed to element; old value is returned public E set(int index, E element) { E previous = elementData[index]; elementData[index] = element; return previous; } // post: adds new element to end of possibly extended array public void add(E element) { ensureCapacity(elementCount + 1); elementData[elementCount] = element; elementCount++; } // pre: 0 <= index <= size() // post: inserts new value in vector with desired index, // moving elements from index to size()-1 to right public void add(int index, E element) { ensureCapacity(elementCount + 1); // must copy from right to left to avoid destroying data for (int i = elementCount; i > index; i--) { elementData[i] = elementData[i - 1]; } elementData[index] = element; elementCount++; } // pre: 0 <= where && where < size() // post: indicated element is removed, size decreases by 1 public E remove(int where) { E result = get(where); elementCount--; // shift data to the left to fill in removed element while (where < elementCount) { elementData[where] = elementData[where + 1]; where++; } return result; } // post: returns true iff there are no elements in the vector public boolean isEmpty() { return size() == 0; } // post: returns the size of the vector public int size() { return elementCount; } // post: the capacity of this array is at least minCapacity public void ensureCapacity(int minCapacity) { // only need to do something if we don't already have the capacity if (elementData.length < minCapacity) { int newLength = elementData.length; // initial guess // double the size of our expanded array until it's big enough for minCapacity while (newLength < minCapacity) { newLength *= 2; } // guaranteed: newLength > elementData.length. E[] newElementData = (E[]) new Object[newLength]; // copy old data to new array for (int i = 0; i < elementCount; i++) { newElementData[i] = elementData[i]; } elementData = newElementData; // reassign elementData to refer to the new, larger array } } }
import edu.princeton.cs.algs4.In; public class Mean { public static void main(String[] args) { In input = new In("data.txt"); // compute the mean of the values in the file // when we specify a type for a generic structure like this, it must be an object // so we have to use Integer instead of int // fortunately, Java will automatically convert to and from int and Integer as needed ArrayList<Integer> data = new ArrayList<Integer>(); while(input.hasNextLine()) { data.add(input.readInt()); } double sum = 0; for(int i = 0; i < data.size(); i++) { sum += data.get(i); } System.out.println("the mean is " + sum / data.size()); } }
4 Practice Problems1
- What is the distinction between the capacity and size of an
ArrayList
? - When inserting a value into an
ArrayList
why is it necessary to shift elements to the right starting at the high end of theArrayList
? (See this figure) - Write the code to declare an
ArrayList
containing these five strings: "It", "was", "a", "stormy", "night". What is the size of the list? What is its type? - Write code to insert two additional elements, "dark" and "and", at the proper places in the list to produce the following
ArrayList
as the result:["It", "was", "a", "dark", "and", "stormy", "night"]
- Write code to change the second element's value to "IS", producing the following
ArrayList
as the result:["It", "IS", "a", "dark", "and", "stormy", "night"]
Write code to print out all Strings from the list that don't contain the letter "a". Your code should print
It IS stormy night
- The implementation of
java.util.ArrayList
makes theensureCapacity
methodpublic
. Why is this useful? - Write an
ArrayIntList
method,indexOf
, that returns the index of anint
in theArrayIntList
. What should the method return if no matching int can be found? What doesjava.util.ArrayList
do in this case? How long does this operation take to perform, in the worst case?
Footnotes:
Solutions:
- The size is the number of storage locations logically available in the
ArrayList
. Usually, the size corresponds to the number of items stored within theArrayList
. The capacity is the number of memory references currently allocated to theArrayList
. The capacity indicates how large theArrayList
's size can get before the underlying array must be reallocated and copied over. Ideally, the capacity provides enough room so that the size can increase without a significant risk of reallocation. - This avoids destroying data. Starting at the low end of the
ArrayList
causes each moved value to overwrite an unmoved value. The list's type is
ArrayList<String>
and its size is 5.ArrayList<String> list = new ArrayList<String>(); list.add("It"); list.add("was"); list.add("a"); list.add("stormy"); list.add("night");
list.add(3, "dark"); list.add(4, "and");
list.set(1, "IS");
for (int i = 0; i < list.size(); i++) { if (!list.get(i).contains("a")) { System.out.prinlnt(list.get(i)); } }
- Making this method
public
allows the user to increase the capacity of theArrayList
ahead of actually inserting elements. This decreases (or eliminates) the amount of copying that occurs. Like Java's
ArrayList
, this method returns -1 if the element cannot be found. In the worst case, this operations takes linear time, since it has to check each element once to verify that a matching value isn't in the array.public int indexOf(int elem) { for(int i = 0; i < elementCount; i++) { if (elem == elementData[i]) { return i; } } return -1; }