Final Exam Study Guide
Table of Contents
1 Overview
Our final exam will be open-note, open-video, open-computer, but closed-internet. Any resources linked from the course website are allowed. You are to work on the exam independently and not consult any except me. The exam will be complete online. It will be made available March 11 and must be submitted no later than 9pm March 16.
The final exam will focus on the primary learning goal for the course:
Choose an appropriate data structure for a given computational task. This will require being able to
- Identify the operations required by a task
- Analyze the structure and performance of arrays, extensible arrays, linked lists, stacks, queues, trees, graphs, hash tables, and heaps
- Perform theoretical and empirical comparisons between data structures
- Locate and use the Java documentation and implementation for a chosen data structure
You will be presented with various scenarios and be tasked with determining the required operations and/or proposing appropriate data strutures to support them. You will be expected to justify your choices and state any assumptions you make. In many cases there may be more than one reasonable answer, and you will be evaluated on your justification. You may be asked how your solution would change under different criteria, such as a particular operation being more common/important, or space being more important than running time. In addition, the exam will also have true/false questions on material we've covered.
2 Practice Problems
Below are practice exercises, many of which come from practice problems or Plicker questins. Sample solutions are at the end of this document.
- A weather station records simultaneous temperature measurements that need to be averaged together. You need to write a program to read these measurements and display the results. You can assume the number of measurements is fixed and known ahead of time.
- You're implementing a program to let two people play the game Tic-Tac-Toe. Your program should allow people to play multiple games in a row and keep track of how many each player has won. Describe the operations your program will need to perform and the data it will need to keep track of.
- You want to display a list of users who have joined a chat room. What should you use to store the names?
- You're writing a class to represent an 8-sided die that is rolled to choose a random color (i.e., each face of the die shows a different color). What should you use to store the colors? What if the die can have a variable number of sides?
- You need to write a program to manage a online customer service site. In particular, your program is responsible for controlling the line of customers waiting to be served. Describe the operations you will implement and the data structure(s) you will use. How would those change if sometimes we want the last person in line to split off and form a new line? If you wanted to display to each customer their place in line, how would you implement that?
- Let's say you were implementing an interpreter for a very spooky programming language called GhostCrypt. This language lets programmers define horrifying variables to have certain otherworldly values. What data structure would you use to represent the defined variables? Why?
- You are writing a program to facilitate online voting for president of the cool club, the Carleton Abstract Types (cool CATs). Assume you have a way to authenticate users and prevent people from voting more than once (i.e., your answer does not need to deal with this aspect of the program). Describe the design of your program in terms of operations and data structures.
- We want to make a database of students names and GPAs. Discuss at least three operations we might want to perform on such a database and the data structure that would most efficiently support each operation.
- Say we wanted to write a program to form team-ups among a set of superheroes. We have information on which superheroes shouldn't work together—bitter rivalries, frenemies, incompatible super powers, that sort of thing. We don't want to form a team of superheroes where two or more team members should not work together. How might we represent this superhero data and use it to efficiently form effective teams?
- You want to write a program to compute the fastest way to get from your dorm room (or wherever you are living) to every building on campus for when it's painfully cold outside. What data would you need and how would you use it to solve this problem?
3 Solutions
Since these are design questions, there are many reasonable answers. The solutions below are intended to demonstrate the level of detail and type of justifications expected on the exam. They do not represent the one true answer to any of these problems.
Operations:
- Read a temperature measurement. Since there are different measurements to read, this could be a method that takes an argument specifying which measurement to read. The method would return the temperature. This design would make it easy to modify the system to add additional measurements compared to a method that assumed a specific set of them.
- Compute the average. This method would compute and return the average of the current measurements.
- Display the result. This method would use the previous two to compute the average and then print or otherwise display the result.
Since the number of measurements is known ahead of time, I would use a fixed-size array to store the current temperatures. Since all we need to do is update specific entries and sum up the entries for the average, an array is a time- and space-efficient choice. Each time the program reads a measurement, it would update an entry in the array. The method to compute the average would use the values currently in the array.
Operations:
- Display the board. Either by printing it out or drawing it on the screen, the program would need to display the grid and current positions of Xs and Os. This should also display the number of games each player has won.
- Ask a player for their move. Prompt the player whose turn it is to enter a move, either by clicking on a location or typing in a location in the terminal. Use the check move operation below to check if the move is legal. If it is, use the update operation below to make it. If it's not, indicate that to the user and ask them to input a legal move.
- Check if a move is legal. Determine if a given move is allowed on the current board. This would be a method that takes a move as an argument and returns true or false.
- Update the board with a move. Modify the current board to make a particular move. This would be a method that takes a move as an argument. It would assume the move is legal.
- Check if the game has a winner. Determine if the current board has a winner (a three-in-a-row). This could either be a method that returns the number of the winning player (or -1 if no player has won) or a separate method to check for X winning and O winning, both of which would return true or false.
- Check if the game is a draw. Determine if all the spaces are filled resulting in a draw, or if the game can continue. This would be a method that returns true or false.
- End/reset the game. After each move, use the above two methods to determine if the game is over. If there is a winner, add to that player's score. If there is a winner or a draw, ask if the players want to play another game. If they do, reset the game to an empty board and ask for the first move. If there is no winner and no draw, go to the next player's turn.
The program would need to keep track of:
- The game board. It would use a 2D array (3x3) to store the current board. The size of a Tic-Tac-Toe board is fixed and the spaces don't move around, so an array is appropriate.
- The current score. A seperate variable for each player to keep track of how many games they have won.
- The current turn. A varible indicating which player's turn it is.
- Which player is which symbol. To make it so that the same player isn't X every time, the program will keep track of which player is playing X or O for the current game. That way they can switch for the next game.
- A hash table would be a good choice for the list of names. The needed operations would be to add a name, remove a name, and display the current names. A hash table would provide constant time add and remove operations assuming a good hash function. If the chat room requires unique names, or if users have some unique id that could be used for hashing this would be helpful. Assuming we don't need to display the names in any particular order, we can iterate over the hash table to get the current names.
- Given that the number of sides is fixed, and that we want constant-time random access to the 8 different colors, an array would be a good choice. That way a random index between 0 and 7 could be generated to select a random color. With a variable number of sides, we would want a data structure with flexible size that still provides constant-time random access. An extensible array, such as ArrayList, would provide this.
Operations:
- Add a customer to the end of the line
- Remove a customer from the beginning of the line and connect them with service
Data structures:
- A singly-linked list with a tail reference would be a good choice for a line of customers because it supports adding to the end and removing from the beginning in constant time.
If we want the last person to be able to split off and form a new line, a doubly-linked list would be better because it supports removing the last element in constant time.
To keep track of customers' place in line, we could use the current size of the list as the place in line when a customer is added. Each element in the list could be an object that keeps track of the customer information and their place, or we could maintain a separate hash table to map each customer to their place in line. Either way, after every time we remove a customer, we would need to iterate over customers currently in line and update their place. If this linear operation is too expensive, these updates could be done less often, perhaps after every 10 removals.
- A hash table would be the ideal data structure for representing variable names and associated values. The three operations we need to support when it comes to variables are (1) adding a new variable, (2) retrieving the value of a variable, and (3) updating the value of an existing variable. Each of these can be accomplished by adding or accessing an entry in a hash table. Since variable names must be unique, we should be able to avoid hash table collisions, guaranteeing constant time performance for all three operations.
Operations:
- Display a ballot: a user on the website needs to see a list of the candidates
- Record a vote: a user selects a candidate and their vote is counted
- Display results/determine a winner: after the election is over, the winner and the vote totals for each candidate are displayed
Data structures:
- A fixed number of candidates will be input ahead of when the election begins, so we could use an array to store their names for display. If we wanted to display other information alongside, like class year, a personal statement, or other biographical information, we could create a Candidate class with fields for all the relevant information. Candidate objects would be stored in the array in this case.
- A hash table would be a good way to store the current vote totals. Candidate names/objects would be the keys and the integer number of votes the values. A hash table provides the association between any number of candidates and their vote totals, and also allows us to increment a vote total in constant time.
- Displaying the results is equivalent to displaying the hash table of vote totals, possible sorted according to vote total. To determine a winner we would iterate over the table to find the candidate with the most votes.
- Operations:
- Look up a student's GPA
- A hash table would provide constant time performance for the operation. The keys would be student IDs or emails and the values would be GPAs.
- Rank students according to GPA
- A balanced binary search tree would provide this operation in \(O(n)\) time. The keys at the nodes would be GPA and the values would be student name or other information. The tree would need to allow duplicate values (GPAs are not unique), essentially maintaining the set of students in sorted order by GPA. The performance is \(O(n)\) because an linear-time in-order traversal outputs the nodes of binary search tree in sorted order. A sorted array would provide a constant time ranking operation (it already has students in sorted order by GPA, and so we could just return the array without any extra work), but any updates to the array would require linear time, compared to logarithmic time for updating the tree.
- Get all students with a GPA above some cutoff
- Like the previous operation, this one would be best supported by a balanced binary search tree using GPAs as the keys or a sorted array. Using a tree, we can traverse the tree only exploring subtrees if they could contain GPAs above the cutoff. This would be similar to the "pruned" tree search operations on a k-d tree and would likely be efficient in practice. Using an array, we could perform binary search to find the first GPA above the cutoff and return the elements in the array from that point on.
- Look up a student's GPA
- We could represent the super hero information using an undirected, unweighted graph. Each hero would be a vertex, and each edge would indicate the two connected heros should not team up. Assuming there aren't a lot of heros who hate everyone (i.e., have an edge to everyone), the graph will be sparse and an adjacency list would be a good choice of data structure. We could find independent sets (vertices sharing no edges) within the graph to form the team-ups. A greedy algorithm such as the one used to make schedules in lab 7 would be one possible approach. We might need to modify it if we didn't want huge teams (since that algorithm finds maximal independent sets).
- If we knew the travel time between every pair of nearby buildings on campus, we could represent this information as a weighted undirected graph. Buildings would be the vertices, and weighted edges would indicate buildings with a direct path between them and the travel time for that path. Since each building will be directly connected to the few buildings nearby, the graph will be sparse and an adjacency list would be a good choice of data structure. Applying Dijkstra's algorithm to the graph will compute the fastest paths we are looking for.