Maps

Table of Contents

1 Reading

Read Bailey Sections 15.2 and 15.3.

2 Motivation

Looking up associated data is something we ask computers to do all the time. You click on a book on the library catalog web page and it tells you which shelf to find it on. You scan your One Card at Sayles and a computer accesses your Schillers account. Any kind of directory or contact list associates names with contact information.

One such use for computers that I'm proud to annouce today is my campaign to be the next governor of Minnesota! I plan to barnstorm the state on a platform of efficient data structures for all. As part of this campaign I will need a list of the 1000s (or 10,000s or millions mwhahaha) of volunteers to spread my big-O message. Perhaps this list will associate names with objects of a Volunteer class that contain a bunch of data about that volunteer (phone number, email, town they live in, etc.). This week we will learn about how to implement an efficient data structure for this situation. Today we'll cover the Map abstract data type and make a first attempt at implementing it using the techniques we've studied so far.

3 Learning Goals

After this lesson you will be able to

  • Use a map-like data structure to store key-value pairs
  • Explain why the data structures we've studied so far can't provide an efficient map implementation

4 Map ADT

When choosing a data structure, an important first step is thinking about what operations you will need. For my campaign volunteer list, I will need

  • a way to add a volunteer's name and email to the list
  • a way to check if someone is already on the list
  • a way to look up a volunteer's email given their name
  • and a way to remove a volunteer from the list

These turn out to be very common operations when dealing with associated data like this. In computer science, we often call the associated data (volunteer name and volunteer contact info in this example) key-value pairs. The key (e.g., name) is like the index of an array—it's how we lookup a value we've stored. Also like array indexes, keys must be unique. It would be a problem if an array has two different locations that were both index 0 because then you wouldn't know which value to retrieve when accessing that index. Likewise, each key must be unique because otherwise we wouldn't known which associated value to retrieve.

So let's formally define the operations I listed in terms of an abstract data type: the Map. Our Map will hold keys of type K associated with values of type V. K and V are generic placeholders that will get filled in with the types stored in a particular Map.

// add an association between key and value to the map
// if this key is already present in the map, replace the existing value
// and return the old one (return null if the key wasn't already present)
public V put(K key, V value)

// returns true if key was previously put into the map
public boolean contains(K key)

// returns the value associated with key
public V get(K key)

// removes key from the map and returns the value that was associated with it
public V remove(K key)

Like the Stack and Queue ADTs, the Map ADT specifies the external behavior of these methods, but says nothing about how they are implemented.

5 Using a Map

Before we get to implementing the Map ADT, let's look at how it might be used. I've written the following main method to read in a file of volunteer information and store it in a map.

public static void main(String[] args) {
    ListMap<String, Volunteer> volunteers = new ListMap<>(); // fills in K with String and V with Volunteer
    In csv = new In("volunteers.csv");
    while (csv.hasNextLine()) {
        String line = csv.readLine();
        String[] fields = line.split(",");
        volunteers.put(fields[0], new Volunteer(fields[0], fields[1], fields[2], fields[3]));
    }
    System.out.println(volunteers);
}

The file I'm reading in, volunteers.csv, is a comma-separated values (CSV) file:

Abe Lincoln,1111111111,abe@carleton.edu,Northfield
Ilhan Omar,2222222222,ilhan@carleton.edu,Minneapolis
Cleopatra,3333333333,cleo@carleton.edu,St. Cloud
Julius Caesar,4444444444,jc2@carleton.edu,Duluth
Stevie P,5555555555,sposkanzer@carleton.edu,Northfield

Each line of the file is information about one volunteer. The different pieces of information (name, phone number, email, and town of residence) are separated by commas—hence comma-serparated values. I also created a Volunteer class to hold this information about each volunteer:

public class Volunteer {
    private String name;
    private String phone;
    private String email;
    private String town;

    public Volunteer(String name, String phone, String email, String town) {
        this.setName(name);
        this.setPhone(phone);
        this.setEmail(email);
        this.setTown(town);
    }

    public String getTown() {
        return town;
    }

    public void setTown(String town) {
        this.town = town;
    }

    public String getEmail() {
        return email;
    }

    public void setEmail(String email) {
        this.email = email;
    }

    public String getPhone() {
        return phone;
    }

    public void setPhone(String phone) {
        this.phone = phone;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String toString() {
        return name + ": " + phone + " (phone) " + email + " (email), lives in " + town; 
    }
}

I declared my ListMap (we'll get to its implementation shortly) as

ListMap<String, Volunteer> volunteers = new ListMap<>();

to make the keys have type String and values have type Volunteer. Inside the loop over the lines of the input file, I use

volunteers.put(fields[0], new Volunteer(fields[0], fields[1], fields[2], fields[3]));

to add a new key-value pair to the map with the first column of the file (fields[0]) associated with a new Volunteer object constructed with the data from that line of the file.

6 Association

Before we can implement the Map ADT in our ListMap class, we need some way of representing these key-value pairs of associated data. Unlike Python, Java does not provide a convenient built-in tuple type for this purpose. So we'll need to create our own! Like the ListMap, it will also need to be parameterized with K and V, so any types can be filled in for the keys and values.

public class Association<K,V> {
    private K key;
    private V value;

    public Association(K key, V value) {
        this.key = key;
        this.value = value;
    }

    public K getKey() {
        return key;
    }

    public V getValue() {
        return value;
    }
}

7 ListMap

Now for the main event! Based on data structures we've studied so far, a linked list seems like a good choice for storing a collection of Associations where we might remove from other places besides the end of the list. So we'll begin our implementation declaring a linked list of associations as a private field and having our constructor initialize it:

import java.util.LinkedList;

public class ListMap<K, V> {
    private LinkedList<Association<K, V>> items;

    public ListMap() {
        items = new LinkedList<>();
    }
}

It might look a little convoluted, but LinkedList<Association<K, V>> accurately describes what we want our list of key-value pairs to be. Namely, a list of Association objects, which themselves represent keys of type K associated with values of type V. When we then declare ListMap<String, Volunteer> in our main method, this will fill in String as the key type and Volunteer as the value type for our associations.

The first Map ADT operation we'll implement is put that adds and entry to our map. We might think all we need to do is add a new association to our list:

public V put(K key, V value) {
    items.add(new Association<K, V>(key, value));
}

There are two problems with this. First, put needs to return a value (the old value associated with key or null), so this code won't compile without a return statement. Second, keys are supposed to be unique, and this implemention does nothing to prevent multiple associations with same key being in the list at the same time. The solution is to search the list for key and remove a matching association if we find one. Since items isn't sorted, we'll use a for-each loop to perform a linear search:

public V put(K key, V value) {
    // search for an existing entry for key
    for (Association<K, V> a : items) {
        if (a.getKey().equals(key)) {
            // found an existing entry, remove it and return the old value
            items.remove(a);
            items.add(new Association<K, V>(key, value));
            return a.getValue();
        }
    }
    // no existing key, add new one and return null
    items.add(new Association<K, V>(key, value));
    return null
}

With put squared away, we can move on to contains. To find whether a key currently exists in the map, we'll again perform a linear search for a match

public boolean contains(K key) {
    // search for an existing entry for key
    for (Association<K, V> a : items) {
        if (a.getKey().equals(key)) {
...
ALARM BELLS RINGING

As I typed that out, alarm bells started going off in my head. I was typing the exact same code I'd just typed in another method. Duplicating code like this is often poor design. When you find yourself duplicating a chunk of code between methods, your first thought should be to see whether you can create a helper method to handle that part of the method. In this case, we can avoid having the same loop in both contains and put (and, as we'll see, get and remove as well) by creating a findKey helper method that searches the list for a particular key:

private Association<K, V> findKey(K key) {
    for (Association<K, V> a : items) {
        if (a.getKey().equals(key)) {
            return a;
        }
    }
    return null;
}

This method either returns a matching Association or null if no matches were found. We can now rewrite put to use findKey:

public V put(K key, V value) {
    // search for an existing entry for key
    Association<K, V> a = findKey(key);
    if (a != null) {
        // found an existing entry, remove it and return the old value
        items.remove(a);
        items.add(new Association<K, V>(key, value));
        return a.getValue();
    }
    // no existing key, add new one and return null
    items.add(new Association<K, V>(key, value));
    return null;
}

contains turns out to be very short once we have findKey. If findKey returns a non-null value, this means key exists in the map. Since this exactly is what contains is supposed to determine, we can write:

public boolean contains(K key) {
    return findKey(key) != null; // findKey returns null if key isn't there
}

Our two remaining Map ADT operations, get and remove, are almost identical. They both find an existing key and return the associated value. The only difference is that remove also removes the key-value pair from the map. For my implementations, I choose to have them throw a NoSuchElementException if the key isn't in the map.

public V get(K key) {
    Association<K, V> a = findKey(key);
    if (a == null) {
        throw new NoSuchElementException();
    }
    return a.getValue();
}

public V remove(K key) {
    Association<K, V> a = findKey(key);
    if (a == null) {
        throw new NoSuchElementException();
    }
    items.remove(a);
    return a.getValue();
}

7.1 Analysis

As usual, a new data structure implementation needs to be analyzed for big-O running time. A key observation is that all four of our Map operations call findKey, so analyzing this helper method is going to be important. Recall that we model a method call as however much work the code within that method does. Conveniently, findKey performs an operation we've previous analyzed: linear search. This means that each of our operations is at least \(O(n)\) due to the findKey call.

What other non-constant operations are involved in our methods? The only one is items.remove (since we know adding to the end of a linked list with items.add is \(O(1)\)). We know from Lab 2 that the linked list method that takes an object in the list to remove involves a linear search through the list. So, put and remove contain this additional \(O(n)\) method call. This second linear time call doesn't change the overall efficiency of \(O(n)\) (i.e., \(O(n) + O(n)\) is still \(O(n)\) since we ignore a constant factor of 2).

When we analyzed ArrayList and LinkedList, yes some operations were \(O(n)\), but others were constant time. ListMap is significantly worse with all operations at \(O(n)\). We'll need a new technique for organizing data if we want a Map implementation with better performance. This new technique is call hashing and will be the subject of the next two lessons.

8 Practice Problems1

For these practice problems, {bar=2, foo=5} indicates a map with string keys "bar" and "foo" associated with integer values 2 and 5.

  1. Consider the method below:

    public static void mystery1(ListMap<String, Integer> map1, ListMap<Integer, String> map2) {
        ListMap<String, String> result = new ListMap<String, String>();
        for (String k1 : map1.keyList()) {
            if (map2.contains(map1.get(k1))) {
                result.put(k1, map2.get(map1.get(k1)));
            }
        }
        return result;
    }
    

    It uses a ListMap method keyList that returns a list of the keys in the map. It has the following implementation:

    public List<K> keyList() {
        List<K> list = new ArrayList<>();
        for (Association<K, V> a : items) {
            list.add(a.getKey());
        }
        return list;
    }
    

    What does myster1 return when called with the following arguments:

    1. map1={bar=1, baz=2, foo=3, mumble=4}, map2={1=earth, 2=wind, 3=air, 4=fire}
    2. map1={five=105, four=104, one=101, six=106, three=103, two=102}, map2={99=uno, 101=dos, 103=tres, 105=cuatro}
    3. map1={a=42, b=9, c=7, d=15, e=11, f=24, g=7}, map2={1=four, 3=score, 5=and, 7=seven, 9=years, 11=ago}
  2. public static void mystery2(ListMap<String, String> map) {
        ArrayList<String> list = new ArrayList<String>();
        for (String key : map.keyList()) {
            if (map.get(key).length() > key.length()) {
                list.add(map.get(key));
            } else {
                list.add(0, key);
                list.remove(map.get(key));
            }
        }
        System.out.println(list);
    }
    

    What is printed when mystery2 is run with the following arguments? Assume that keyList() returns the keys in the order they are shown below.

    1. {horse=cow, cow=horse, dog=cat, ok=yo}
    2. {bye=hello, bird=dog, hi=hello, hyena=apple, fruit=meat}
    3. {a=b, c=d, e=a, ff=a, gg=c, hhh=ff}

Footnotes:

1

Solutions:

    1. {bar=earth, baz=wind, foo=air, mumble=fire}
    2. {five=cuatro, one=dos, three=tres}
    3. {b=years, c=seven, e=ago, g=seven}
    1. [ok, dog, horse, horse]
    2. [fruit, hyena, bird, hello, hello]
    3. [hhh, gg, e]