Hashing I

Table of Contents

1 Reading

Read Sections 15.4.3 and 15.4.4 of Bailey. Ignore figures 15.11 and 15.12, those relate to the the performance analysis section that will be reading for Friday's lesson.

2 Video

Here is a video lecture for the slides below.

The video contains sections on

  1. Motivating Hash Tables (0:05)
  2. Hash Tables: Turn Key Into Array Index (2:51)
  3. Hash Tables: More Keys Than Spots (4:18)
  4. Hash functions (5:59)
  5. Hashing integers (7:07)
  6. Who hashes what? (11:04)
  7. Okay, what about keys that aren't ints? (14:28)
  8. Specializing hash functions (18:19)
  9. Verifying String hashCode (19:17)
  10. Hashing and comparing (22:04)
  11. Equal Objects Must Hash the Same (23:13)
  12. Example: CalendarDate (25:54)
  13. Conclusions and notes on hashing (36:05)

The Panopto viewer has table of contents entries for these sections: https://carleton.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=f9b503fc-1d68-48b8-9449-acc201760ef6

3 Slides

You can download the slides in pdf or PowerPoint.

4 Practice Problems1

  1. What is hashing, and why is hashing a good way of implementing a map?
  2. Is it possible for a hash table to have two entries with equal keys?
  3. Is it possible for a hash table to have two entries with equal values?
  4. For Java strings consisting of only one character, hashCode simply returns the value of that char. Suppose we have a hash table with an array of length 20. What will the array look like after the following keys have been put: "s", "a", "n", "d", "w", "o", "r", "m"? For reference here is a table of char values in Java:
    Dec  = Decimal Value
    Char = Character
    
    '5' has the int value 53
    if we write '5'-'0' it evaluates to 53-48, or the int 5
    if we write char c = 'B'+32; then c stores 'b'
    
    
    Dec  Char                           Dec  Char     Dec  Char     Dec  Char
    ---------                           ---------     ---------     ----------
      0  NUL (null)                      32  SPACE     64  @         96  `
      1  SOH (start of heading)          33  !         65  A         97  a
      2  STX (start of text)             34  "         66  B         98  b
      3  ETX (end of text)               35  #         67  C         99  c
      4  EOT (end of transmission)       36  $         68  D        100  d
      5  ENQ (enquiry)                   37  %         69  E        101  e
      6  ACK (acknowledge)               38  &         70  F        102  f
      7  BEL (bell)                      39  '         71  G        103  g
      8  BS  (backspace)                 40  (         72  H        104  h
      9  TAB (horizontal tab)            41  )         73  I        105  i
     10  LF  (NL line feed, new line)    42  *         74  J        106  j
     11  VT  (vertical tab)              43  +         75  K        107  k
     12  FF  (NP form feed, new page)    44  ,         76  L        108  l
     13  CR  (carriage return)           45  -         77  M        109  m
     14  SO  (shift out)                 46  .         78  N        110  n
     15  SI  (shift in)                  47  /         79  O        111  o
     16  DLE (data link escape)          48  0         80  P        112  p
     17  DC1 (device control 1)          49  1         81  Q        113  q
     18  DC2 (device control 2)          50  2         82  R        114  r
     19  DC3 (device control 3)          51  3         83  S        115  s
     20  DC4 (device control 4)          52  4         84  T        116  t
     21  NAK (negative acknowledge)      53  5         85  U        117  u
     22  SYN (synchronous idle)          54  6         86  V        118  v
     23  ETB (end of trans. block)       55  7         87  W        119  w
     24  CAN (cancel)                    56  8         88  X        120  x
     25  EM  (end of medium)             57  9         89  Y        121  y
     26  SUB (substitute)                58  :         90  Z        122  z
     27  ESC (escape)                    59  ;         91  [        123  {
     28  FS  (file separator)            60  <         92  \        124  |
     29  GS  (group separator)           61  =         93  ]        125  }
     30  RS  (record separator)          62  >         94  ^        126  ~
     31  US  (unit separator)            63  ?         95  _        127  DEL
    
  5. For each of the following possible hashCode implementations: Is the following a legal hashCode method for a Point class, according to the requirement that equivalent points should have the same hash code? Does it distribute the hash codes well between objects? Why or why not?
    1. public int hashCode() {
          return x * y;
      }
      
    2. public int hashCode() {
          return 42;
      }
      
    3. public int hashCode() {
          Random rng = new Random();
          return rng.nextInt();
      }
      
  6. Write a hashCode method for a Date class, whose fields are a year, month, and day, as integers. Follow the general requirement that equivalent objects should have the same hash code.
  7. Write a hashCode method for a Student class, whose fields are a name (a string), age (an integer), student ID number (an integer), and favorite data structure (a string). Follow the general requirement that equivalent objects should have the same hash code.

5 Learning Block

  • Discuss in learning pods (share confusions, practice problems 1, 2, and 3)
  • Questions?
  • Form questions
  • Text generation demo
  • Practice problems 5, 6, and 7 in pods

Footnotes:

1

Solutions:

  1. Hashing is a process of mapping element values to integer indexes and storing the elements at those indexes in an array. Hashing is a good way of implementing a map because it provides theoretically O(1) runtime for adding, removing, and searching a set.
  2. No. If the keys were equal, the second key inserted would have found the first (i.e., been hashed to the same index), and thus not have been inserted.
  3. Yes, there is no restriction that values must be unique. Many keys can be associated with the same values.
  4. internal array:
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    |"d"|   |   |   |   |   |   |   |   |"m"|"n"|"o"|   |   |"r"|"s"|   |"a"|   |"w"|
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
      0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  <--- index
    
    1. This is a legal hashCode implementation, according to the requirement. It distributes hash codes somewhat well, but it has certain groups of points that will collide with each other unnecessarily (for example, every point with an x-coordinate or y-coordinate of 0 will have the same hash code).
    2. This is a legal hashCode implementation, according to the requirement. But it distributes hash codes extremely poorly; literally it could not be worse in that respect, because every point has the same hash code. It's important to note that it is still a technically correct implementation, though it works very poorly in terms of hash table performance.
    3. This is not a legal hashCode implementation, according to the contract. It does not consistently return the same hash code for the same point object state.
  5. hashCode method for a Date class (the constant multipliers for each component are somewhat arbitrary):
    public int hashCode() {
        return 31 * year + 31 * 31 * month + day;
    }
    
  6. hashCode method for a Student class (the constant multipliers for each component are somewhat arbitrary):
    public int hashCode() {
        int result = 17;
        result = result * 31 + name.hashCode();
        result = result * 31 + age;
        result = result * 31 + studentID;
        result = result * 31 + favorite.hashCode();
        return result;
    }