Maps
Table of Contents
1 Reading
Read Bailey Sections 15.2 and 15.3.
2 Motivation
Looking up associated data is something we ask computers to do all the time. You click on a book on the library catalog web page and it tells you which shelf to find it on. You scan your One Card at Sayles and a computer accesses your Schillers account. Any kind of directory or contact list associates names with contact information.
One such use for computers that I'm proud to annouce today is my campaign to be the next governor of Minnesota!
I plan to barnstorm the state on a platform of efficient data structures for all.
As part of this campaign I will need a list of the 1000s (or 10,000s or millions mwhahaha) of volunteers to spread my big-O message.
Perhaps this list will associate names with objects of a Volunteer
class that contain a bunch of data about that volunteer (phone number, email, town they live in, etc.).
This week we will learn about how to implement an efficient data structure for this situation.
Today we'll cover the Map abstract data type and make a first attempt at implementing it using the techniques we've studied so far.
3 Learning Goals
After this lesson you will be able to
- Use a map-like data structure to store key-value pairs
- Explain why the data structures we've studied so far can't provide an efficient map implementation
4 Map ADT
When choosing a data structure, an important first step is thinking about what operations you will need. For my campaign volunteer list, I will need
- a way to add a volunteer's name and email to the list
- a way to check if someone is already on the list
- a way to look up a volunteer's email given their name
- and a way to remove a volunteer from the list
These turn out to be very common operations when dealing with associated data like this. In computer science, we often call the associated data (volunteer name and volunteer contact info in this example) key-value pairs. The key (e.g., name) is like the index of an array—it's how we lookup a value we've stored. Also like array indexes, keys must be unique. It would be a problem if an array has two different locations that were both index 0 because then you wouldn't know which value to retrieve when accessing that index. Likewise, each key must be unique because otherwise we wouldn't known which associated value to retrieve.
So let's formally define the operations I listed in terms of an abstract data type: the Map.
Our Map will hold keys of type K
associated with values of type V
.
K
and V
are generic placeholders that will get filled in with the types stored in a particular Map.
// add an association between key and value to the map // if this key is already present in the map, replace the existing value // and return the old one (return null if the key wasn't already present) public V put(K key, V value) // returns true if key was previously put into the map public boolean contains(K key) // returns the value associated with key public V get(K key) // removes key from the map and returns the value that was associated with it public V remove(K key)
Like the Stack and Queue ADTs, the Map ADT specifies the external behavior of these methods, but says nothing about how they are implemented.
5 Using a Map
Before we get to implementing the Map ADT, let's look at how it might be used.
I've written the following main
method to read in a file of volunteer information and store it in a map.
public static void main(String[] args) { ListMap<String, Volunteer> volunteers = new ListMap<>(); // fills in K with String and V with Volunteer In csv = new In("volunteers.csv"); while (csv.hasNextLine()) { String line = csv.readLine(); String[] fields = line.split(","); volunteers.put(fields[0], new Volunteer(fields[0], fields[1], fields[2], fields[3])); } System.out.println(volunteers); }
The file I'm reading in, volunteers.csv
, is a comma-separated values (CSV) file:
Abe Lincoln,1111111111,abe@carleton.edu,Northfield Ilhan Omar,2222222222,ilhan@carleton.edu,Minneapolis Cleopatra,3333333333,cleo@carleton.edu,St. Cloud Julius Caesar,4444444444,jc2@carleton.edu,Duluth Stevie P,5555555555,sposkanzer@carleton.edu,Northfield
Each line of the file is information about one volunteer.
The different pieces of information (name, phone number, email, and town of residence) are separated by commas—hence comma-serparated values.
I also created a Volunteer
class to hold this information about each volunteer:
public class Volunteer { private String name; private String phone; private String email; private String town; public Volunteer(String name, String phone, String email, String town) { this.setName(name); this.setPhone(phone); this.setEmail(email); this.setTown(town); } public String getTown() { return town; } public void setTown(String town) { this.town = town; } public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } public String getPhone() { return phone; } public void setPhone(String phone) { this.phone = phone; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String toString() { return name + ": " + phone + " (phone) " + email + " (email), lives in " + town; } }
I declared my ListMap
(we'll get to its implementation shortly) as
ListMap<String, Volunteer> volunteers = new ListMap<>();
to make the keys have type String
and values have type Volunteer
.
Inside the loop over the lines of the input file, I use
volunteers.put(fields[0], new Volunteer(fields[0], fields[1], fields[2], fields[3]));
to add a new key-value pair to the map with the first column of the file (fields[0]
) associated with a new Volunteer
object constructed with the data from that line of the file.
6 Association
Before we can implement the Map ADT in our ListMap
class, we need some way of representing these key-value pairs of associated data.
Unlike Python, Java does not provide a convenient built-in tuple type for this purpose.
So we'll need to create our own!
Like the ListMap
, it will also need to be parameterized with K
and V
, so any types can be filled in for the keys and values.
public class Association<K,V> { private K key; private V value; public Association(K key, V value) { this.key = key; this.value = value; } public K getKey() { return key; } public V getValue() { return value; } }
7 ListMap
Now for the main event!
Based on data structures we've studied so far, a linked list seems like a good choice for storing a collection of Associations
where we might remove from other places besides the end of the list.
So we'll begin our implementation declaring a linked list of associations as a private field and having our constructor initialize it:
import java.util.LinkedList; public class ListMap<K, V> { private LinkedList<Association<K, V>> items; public ListMap() { items = new LinkedList<>(); } }
It might look a little convoluted, but LinkedList<Association<K, V>>
accurately describes what we want our list of key-value pairs to be.
Namely, a list of Association
objects, which themselves represent keys of type K
associated with values of type V
.
When we then declare ListMap<String, Volunteer>
in our main
method, this will fill in String
as the key type and Volunteer
as the value type for our associations.
The first Map ADT operation we'll implement is put
that adds and entry to our map.
We might think all we need to do is add a new association to our list:
public V put(K key, V value) { items.add(new Association<K, V>(key, value)); }
There are two problems with this.
First, put
needs to return a value (the old value associated with key
or null
), so this code won't compile without a return
statement.
Second, keys are supposed to be unique, and this implemention does nothing to prevent multiple associations with same key being in the list at the same time.
The solution is to search the list for key
and remove a matching association if we find one.
Since items
isn't sorted, we'll use a for-each loop to perform a linear search:
public V put(K key, V value) { // search for an existing entry for key for (Association<K, V> a : items) { if (a.getKey().equals(key)) { // found an existing entry, remove it and return the old value items.remove(a); items.add(new Association<K, V>(key, value)); return a.getValue(); } } // no existing key, add new one and return null items.add(new Association<K, V>(key, value)); return null }
With put
squared away, we can move on to contains
.
To find whether a key currently exists in the map, we'll again perform a linear search for a match
public boolean contains(K key) { // search for an existing entry for key for (Association<K, V> a : items) { if (a.getKey().equals(key)) { ... ALARM BELLS RINGING
As I typed that out, alarm bells started going off in my head.
I was typing the exact same code I'd just typed in another method.
Duplicating code like this is often poor design.
When you find yourself duplicating a chunk of code between methods, your first thought should be to see whether you can create a helper method to handle that part of the method.
In this case, we can avoid having the same loop in both contains
and put
(and, as we'll see, get
and remove
as well) by creating a findKey
helper method that searches the list for a particular key:
private Association<K, V> findKey(K key) { for (Association<K, V> a : items) { if (a.getKey().equals(key)) { return a; } } return null; }
This method either returns a matching Association
or null
if no matches were found.
We can now rewrite put
to use findKey
:
public V put(K key, V value) { // search for an existing entry for key Association<K, V> a = findKey(key); if (a != null) { // found an existing entry, remove it and return the old value items.remove(a); items.add(new Association<K, V>(key, value)); return a.getValue(); } // no existing key, add new one and return null items.add(new Association<K, V>(key, value)); return null; }
contains
turns out to be very short once we have findKey
.
If findKey
returns a non-null value, this means key
exists in the map.
Since this exactly is what contains
is supposed to determine, we can write:
public boolean contains(K key) { return findKey(key) != null; // findKey returns null if key isn't there }
Our two remaining Map ADT operations, get
and remove
, are almost identical.
They both find an existing key and return the associated value.
The only difference is that remove
also removes the key-value pair from the map.
For my implementations, I choose to have them throw a NoSuchElementException
if the key isn't in the map.
public V get(K key) { Association<K, V> a = findKey(key); if (a == null) { throw new NoSuchElementException(); } return a.getValue(); } public V remove(K key) { Association<K, V> a = findKey(key); if (a == null) { throw new NoSuchElementException(); } items.remove(a); return a.getValue(); }
7.1 Analysis
As usual, a new data structure implementation needs to be analyzed for big-O running time.
A key observation is that all four of our Map operations call findKey
, so analyzing this helper method is going to be important.
Recall that we model a method call as however much work the code within that method does.
Conveniently, findKey
performs an operation we've previous analyzed: linear search.
This means that each of our operations is at least \(O(n)\) due to the findKey
call.
What other non-constant operations are involved in our methods?
The only one is items.remove
(since we know adding to the end of a linked list with items.add
is \(O(1)\)).
We know from Lab 2 that the linked list method that takes an object in the list to remove involves a linear search through the list.
So, put
and remove
contain this additional \(O(n)\) method call.
This second linear time call doesn't change the overall efficiency of \(O(n)\) (i.e., \(O(n) + O(n)\) is still \(O(n)\) since we ignore a constant factor of 2).
When we analyzed ArrayList
and LinkedList
, yes some operations were \(O(n)\), but others were constant time.
ListMap
is significantly worse with all operations at \(O(n)\).
We'll need a new technique for organizing data if we want a Map implementation with better performance.
This new technique is call hashing and will be the subject of the next two lessons.
8 Practice Problems1
For these practice problems, {bar=2, foo=5}
indicates a map with string keys "bar" and "foo" associated with integer values 2 and 5.
Consider the method below:
public static void mystery1(ListMap<String, Integer> map1, ListMap<Integer, String> map2) { ListMap<String, String> result = new ListMap<String, String>(); for (String k1 : map1.keyList()) { if (map2.contains(map1.get(k1))) { result.put(k1, map2.get(map1.get(k1))); } } return result; }
It uses a
ListMap
methodkeyList
that returns a list of the keys in the map. It has the following implementation:public List<K> keyList() { List<K> list = new ArrayList<>(); for (Association<K, V> a : items) { list.add(a.getKey()); } return list; }
What does
myster1
return when called with the following arguments:map1={bar=1, baz=2, foo=3, mumble=4}, map2={1=earth, 2=wind, 3=air, 4=fire}
map1={five=105, four=104, one=101, six=106, three=103, two=102}, map2={99=uno, 101=dos, 103=tres, 105=cuatro}
map1={a=42, b=9, c=7, d=15, e=11, f=24, g=7}, map2={1=four, 3=score, 5=and, 7=seven, 9=years, 11=ago}
public static void mystery2(ListMap<String, String> map) { ArrayList<String> list = new ArrayList<String>(); for (String key : map.keyList()) { if (map.get(key).length() > key.length()) { list.add(map.get(key)); } else { list.add(0, key); list.remove(map.get(key)); } } System.out.println(list); }
What is printed when
mystery2
is run with the following arguments? Assume thatkeyList()
returns the keys in the order they are shown below.{horse=cow, cow=horse, dog=cat, ok=yo}
{bye=hello, bird=dog, hi=hello, hyena=apple, fruit=meat}
{a=b, c=d, e=a, ff=a, gg=c, hhh=ff}
Footnotes:
Solutions:
{bar=earth, baz=wind, foo=air, mumble=fire}
{five=cuatro, one=dos, three=tres}
{b=years, c=seven, e=ago, g=seven}
[ok, dog, horse, horse]
[fruit, hyena, bird, hello, hello]
[hhh, gg, e]