Hash Tables

View Code

A Hash Table is a table that uses a hash method to get the elements. A hash function will take an element and given a nearly unique value for it. Therefore it can be found in the array at or near a secific location rather than having linear searches for it.

A basic hash function is to take a numeric value and divide it modulary (return the remainder not the quotient). Now if you had a table that can hold ten items, then modular division by ten will set elements into their positions quickly. However if by chance two elements share a hash result then a colision results. The main study in hash tables is how to deal with these colisions.

Hash Table
A table that uses a hash function to position its elements
Colision
When two elements added to a hash table have the same position
Rehashing
recalculating the keys for resizing a hash table
Load Factor
number of elements in a table divided by the size of the table's array
Clustering
the tendancy for new elements to be hashed in the same location
Relatively Prime
the property that two number share no common factors except the number one
Twin Primes
two numbers that are both relatively prime and differ by two
Linear Hashing
A hashing algorithm that uses linear searches for colisions
Open Addressing
hashing that uses direct location in the table's array
Double Hashing
A hashing algorithm that uses two modular division to deal with colisions
Chain Hashing
A hash table that utilizes a linked list to deal with colisions

Linear Hashing

This is a simple algorithm for hashing and is best used with an array based table. The hash code is modular division of the size of the table. When adding an element just get the hash code. If the spot if empty put it in, otherwise just place is in the next available spot. If the end is reached then loop to the begining like with the circular queue. Removing an element is a linear search, but since it starts from where the element should be it is much quicker.

The problem with linear hashing is adding when removals a have been done. This can create situations in which the element may already be in the table. In that case the table must be seached from the hash position to the first empty space. If it already is in there then that it. If a space is reached then it is not in the table and it can then be added. Here is the general adding algorithm, followed by some example code.

Linear Hashing Adding Algorithm

  1. Get the hash code.
  2. Check the index of the hash value.
    1. If it is empty, search the table until a empty space is reached.
      1. If the element is found return.
      2. Else add the element at the hash value
    2. Else search the table for an empty space and place the value there.
      1. If the element was found before empty space return.
      2. Else add the element at the empty.
Linear Hashing Example
Value Call Comments
  add Dick==5 Check 5, its empty. Check until Dick or empty. The very next is empty so add Dick to 5.
 
 
 
 
Dick
 
  add Jane==1 Check 1, its empty. Check until jane or empty. The very next is empty so add Jane to 1.
Jane
 
 
 
Dick
 
  add Spot==5 Check 5, colision. Now check until the next open spot. If Spot was found retun, otherwise and place Spot in the empty space.
Jane
 
 
 
Dick
Spot
Nan add Nan==6 Check 6, colision. Now check until the next open spot and place Nan there if she wasn't foudn first. Since searching reaches the end of the table, loop back to zero.
Jane
 
 
 
Dick
Spot
Nan add Tim==4 Check 4, empty. Check until another empty space or Dick. Add appropriatly.
Jane
 
 
Tim
Dick
Spot
Nan remove Nan==6 Check 6, not Nan. Search after 6 till Nan then mark empty.
Jane
 
 
Tim
Dick
Spot
Nan remove Spot==5 Check 5, not Spot. Search after 5 till Spot then mark empty.
Jane
 
 
Tim
Dick
Spot
Nan add Rex==6 Check 5, used. Search for empty space. If Rex was found before return, otherwise add to table.
Jane
 
 
Tim
Dick
SpotRex
Nan add Rex==5 Check 5, used. Search for empty space. Rex was found before an empty space so do nothing.
Jane
 
 
Tim
Dick
SpotRex

Rehashing

With various elements returing the same hash codes they can be mixed up in the table, this lessens the usefulness of the table and creates a new issue, resizing. When the table resizing all the keys have to recalculated and added to the table. This is relativly expensive given the tables quick performance, O(n). Since colisions occur more often as the table gets full a rehash is done when it is filled at a certain point. The value of how much is used by a table is called the load factor. Rehashing usually occurs with the load factor is 75%.

Double Hashing

Linear hashing is useful but it can often creates clustering. Clustering is when the hash fuction tends to send the elements in the same area of the table. To prevent this double hashing gets two hash codes. The first is the index of where the value show be. The second is a skip value. Instead of just searching till the next available spot or the element itself, search every hash2-th spot after it.

This skipping however can create problesm. What if there are not available spaces in the table on the skips? This is avoided by making sure that the two numbers for modular divsion are relatively prime. Two relativly prime numbers only have one for a common factor. This does not limit the choices to prime numbers alone. Take fourteen and fifteen. Fourteen's factors are one, two, seven, and fourteen. Fifteen's are one, three, five, and fifteen. The only common factor is one, so they are relativly prime.

An optoinal performance booster caculated by computer sciencetis Donald Knuth is to have the relavtively prime numbers be twin primes. Twin primes are relatively prime numbers whose difference are two. For example fivteen and seventeen. The factor's of fifteen are one, three, five, and fifteen. Seventeen's factors are one and seventeen. The greatest common factor is one, so they are relativly prime. Seventeen minus fifteen is two, so they are also twin primes.

Here's the tracing of the same hash calls with double hashing. Notice that the size of the table, seven, allows for the use of the twin prime five and seven.


int hash1(const string& s)         //notice constance pass by reference
{                                  //& returns the memory location
   return int(&s) % theTableSize;  //mod by table size gets position
}

int hash2(const string& s)         //only used in double hashing
{                                  //one is added so there is no skip of zero
   return 1 + (int(&2) % (theTableSize-2)) 
}
Double Hashing Example
Value Call Comments
  add Dick==5/3 Check 5, its empty so add.
 
 
 
 
Dick
 
  add Jane==1/2 Check 1, its empty so add.
Jane
 
 
 
Dick
 
  add Spot==5/1 Check 5. It's used so skip by ones until an empty spot and add it there.
Jane
 
 
 
Dick
Spot
  add Nan==6/2 Check 6, used, so skip by two's. (6+2)-7=1, taken by Jane so skip again. 2+2=4, empty so use it.
Jane
 
Nan
 
Dick
Spot
  add Tim==4/1 4 is taken by Nan, so skip by one's until empty.
Jane
 
Nan
Tim
Dick
Spot
  remove Nan==6/2 Look at 6. Nan is not there so search until blank or Nan by two.
Jane
 
Nan
Tim
Dick
Spot
  remove Spot==5/1 Spot's not at six so keep searching until blank or Spot.
Jane
 
Nan
Tim
Dick
Spot
Rex add Rex==6/2 Six is taken so search by two's until an empty space or Rex. Taken by 1, 3, 5. Finaly a spot open at 0.
Jane
 
Nan
Tim
Dick
Spot
Rex add Rex==6/2 Six is taken so search by two's until an empty space or Rex. Rex is found at 0.
Jane
 
Nan
Tim
Dick
Spot

Chain Hashing

This type of hashing takes advantage of the objects it uses rather than an algorithm. The hash table now holds linked lists instead of single values. In order to add or remove an element just get the hash value like a linear table. Then add or remove a node with its value appropriatly at the list at that index.
Chain Hashing Example
Value Call Comments
NULL add Dick==5 Add Dick at 5.
NULL
NULL
NULL
NULL
->Dick
NULL
NULL add Jane==1 Add Jane at 1.
Jane
NULL
NULL
NULL
->Dick
NULL
NULL add Spot==1 Add Spot at 5 after Dick.
Jane
NULL
NULL
NULL
->Dick-->Spot
NULL
NULL add Nan==6 Add Nan at 6.
Jane
NULL
NULL
NULL
->Dick-->Spot
-->Nan
NULL add Tim==4 Add Tim at 4.
Jane
NULL
NULL
-->Tim
->Dick-->Spot
-->Nan
NULL remove Nan==6 Remove Nan at 6.
Jane
NULL
NULL
-->Tim
->Dick-->Spot
NULL
NULL remove Spot==5 Remove Spot at 5. Have to search for him though.
Jane
NULL
NULL
-->Tim
->Dick
NULL
NULL add Rex==2 Add Rex at 2.
Jane
-->Rex
NULL
-->Tim
->Dick-->Spot
NULL
NULL add Rex==2 Rex is already at 2.
Jane
-->Rex
NULL
-->Tim
->Dick-->Spot
NULL

Performance

Each implementation of the hash table creates different performances. Donald Knuth is the computer scientist who figured all this stuff out. The average performance of a hash table is based on its implementation and is load factor, α These value are how fast a serach is for an element in a tabel that is not full and has had no removals.

α = number of elements in a table / size of the table's array
Average Hash Table Search Performance Rates
Open Addressing with Linear Probing .5(1+(1-α)-1)
Open Addressiong with Double Hashing α-1(-ln(1-α))
Chained Hashing 1+.5α

Prev -- Back to Portal -- Next