A Hash Table is a table that uses a hash method to get the elements. A hash function will take an element and given a nearly unique value for it. Therefore it can be found in the array at or near a secific location rather than having linear searches for it.
A basic hash function is to take a numeric value and divide it modulary (return the remainder not the quotient). Now if you had a table that can hold ten items, then modular division by ten will set elements into their positions quickly. However if by chance two elements share a hash result then a colision results. The main study in hash tables is how to deal with these colisions.
This is a simple algorithm for hashing and is best used with an array based table. The hash code is modular division of the size of the table. When adding an element just get the hash code. If the spot if empty put it in, otherwise just place is in the next available spot. If the end is reached then loop to the begining like with the circular queue. Removing an element is a linear search, but since it starts from where the element should be it is much quicker.
The problem with linear hashing is adding when removals a have been done. This can create situations in which the element may already be in the table. In that case the table must be seached from the hash position to the first empty space. If it already is in there then that it. If a space is reached then it is not in the table and it can then be added. Here is the general adding algorithm, followed by some example code.
Value | Call | Comments | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
add Dick==5 | Check 5, its empty. Check until Dick or empty. The very next is empty so add Dick to 5. | |||||||||||||||||
Dick | ||||||||||||||||||
add Jane==1 | Check 1, its empty. Check until jane or empty. The very next is empty so add Jane to 1. | |||||||||||||||||
Jane | ||||||||||||||||||
Dick | ||||||||||||||||||
add Spot==5 | Check 5, colision. Now check until the next open spot. If Spot was found retun, otherwise and place Spot in the empty space. | |||||||||||||||||
Jane | ||||||||||||||||||
Dick | ||||||||||||||||||
Spot | ||||||||||||||||||
Nan | add Nan==6 | Check 6, colision. Now check until the next open spot and place Nan there if she wasn't foudn first. Since searching reaches the end of the table, loop back to zero. | ||||||||||||||||
Jane | ||||||||||||||||||
Dick | ||||||||||||||||||
Spot | ||||||||||||||||||
Nan | add Tim==4 | Check 4, empty. Check until another empty space or Dick. Add appropriatly. | ||||||||||||||||
Jane | ||||||||||||||||||
Tim | ||||||||||||||||||
Dick | ||||||||||||||||||
Spot | ||||||||||||||||||
remove Nan==6 | Check 6, not Nan. Search after 6 till Nan then mark empty. | |||||||||||||||||
Jane | ||||||||||||||||||
Tim | ||||||||||||||||||
Dick | ||||||||||||||||||
Spot | ||||||||||||||||||
remove Spot==5 | Check 5, not Spot. Search after 5 till Spot then mark empty. | |||||||||||||||||
Jane | ||||||||||||||||||
Tim | ||||||||||||||||||
Dick | ||||||||||||||||||
add Rex==6 | Check 5, used. Search for empty space. If Rex was found before return, otherwise add to table.
| Jane
|
|
| Tim
| Dick
| add Rex==5 | Check 5, used. Search for empty space. Rex was found before an empty space so do nothing.
| Jane
|
|
| Tim
| Dick
| |
With various elements returing the same hash codes they can be mixed up in the table, this lessens the usefulness of the table and creates a new issue, resizing. When the table resizing all the keys have to recalculated and added to the table. This is relativly expensive given the tables quick performance, O(n). Since colisions occur more often as the table gets full a rehash is done when it is filled at a certain point. The value of how much is used by a table is called the load factor. Rehashing usually occurs with the load factor is 75%.
Linear hashing is useful but it can often creates clustering. Clustering is when the hash fuction tends to send the elements in the same area of the table. To prevent this double hashing gets two hash codes. The first is the index of where the value show be. The second is a skip value. Instead of just searching till the next available spot or the element itself, search every hash2-th spot after it.
This skipping however can create problesm. What if there are not available spaces in the table on the skips? This is avoided by making sure that the two numbers for modular divsion are relatively prime. Two relativly prime numbers only have one for a common factor. This does not limit the choices to prime numbers alone. Take fourteen and fifteen. Fourteen's factors are one, two, seven, and fourteen. Fifteen's are one, three, five, and fifteen. The only common factor is one, so they are relativly prime.
An optoinal performance booster caculated by computer sciencetis Donald Knuth is to have the relavtively prime numbers be twin primes. Twin primes are relatively prime numbers whose difference are two. For example fivteen and seventeen. The factor's of fifteen are one, three, five, and fifteen. Seventeen's factors are one and seventeen. The greatest common factor is one, so they are relativly prime. Seventeen minus fifteen is two, so they are also twin primes.
Here's the tracing of the same hash calls with double hashing. Notice that the size of the table, seven, allows for the use of the twin prime five and seven.
int hash1(const string& s) //notice constance pass by reference
{ //& returns the memory location
return int(&s) % theTableSize; //mod by table size gets position
}
int hash2(const string& s) //only used in double hashing
{ //one is added so there is no skip of zero
return 1 + (int(&2) % (theTableSize-2))
}
Value | Call | Comments |
---|---|---|
add Dick==5/3 | Check 5, its empty so add. | |
Dick | ||
add Jane==1/2 | Check 1, its empty so add. | |
Jane | ||
Dick | ||
add Spot==5/1 | Check 5. It's used so skip by ones until an empty spot and add it there. | |
Jane | ||
Dick | ||
Spot | ||
add Nan==6/2 | Check 6, used, so skip by two's. (6+2)-7=1, taken by Jane so skip again. 2+2=4, empty so use it. | |
Jane | ||
Nan | ||
Dick | ||
Spot | ||
add Tim==4/1 | 4 is taken by Nan, so skip by one's until empty. | |
Jane | ||
Nan | ||
Tim | ||
Dick | ||
Spot | ||
remove Nan==6/2 | Look at 6. Nan is not there so search until blank or Nan by two. | |
Jane | ||
Tim | ||
Dick | ||
Spot | ||
remove Spot==5/1 | Spot's not at six so keep searching until blank or Spot. | |
Jane | ||
Tim | ||
Dick | ||
Rex | add Rex==6/2 | Six is taken so search by two's until an empty space or Rex. Taken by 1, 3, 5. Finaly a spot open at 0. |
Jane | ||
Tim | ||
Dick | ||
Rex | add Rex==6/2 | Six is taken so search by two's until an empty space or Rex. Rex is found at 0. |
Jane | ||
Tim | ||
Dick | ||
Value | Call | Comments |
---|---|---|
NULL | add Dick==5 | Add Dick at 5. |
NULL | ||
NULL | ||
NULL | ||
NULL | ||
->Dick | ||
NULL | ||
NULL | add Jane==1 | Add Jane at 1. |
Jane | ||
NULL | ||
NULL | ||
NULL | ||
->Dick | ||
NULL | ||
NULL | add Spot==1 | Add Spot at 5 after Dick. |
Jane | ||
NULL | ||
NULL | ||
NULL | ||
->Dick-->Spot | ||
NULL | ||
NULL | add Nan==6 | Add Nan at 6. |
Jane | ||
NULL | ||
NULL | ||
NULL | ||
->Dick-->Spot | ||
-->Nan | ||
NULL | add Tim==4 | Add Tim at 4. |
Jane | ||
NULL | ||
NULL | ||
-->Tim | ||
->Dick-->Spot | ||
-->Nan | ||
NULL | remove Nan==6 | Remove Nan at 6. |
Jane | ||
NULL | ||
NULL | ||
-->Tim | ||
->Dick-->Spot | ||
NULL | ||
NULL | remove Spot==5 | Remove Spot at 5. Have to search for him though. |
Jane | ||
NULL | ||
NULL | ||
-->Tim | ||
->Dick | ||
NULL | ||
NULL | add Rex==2 | Add Rex at 2. |
Jane | ||
-->Rex | ||
NULL | ||
-->Tim | ||
->Dick-->Spot | ||
NULL | ||
NULL | add Rex==2 | Rex is already at 2. |
Jane | ||
-->Rex | ||
NULL | ||
-->Tim | ||
->Dick-->Spot | ||
NULL |
Each implementation of the hash table creates different performances. Donald Knuth is the computer scientist who figured all this stuff out. The average performance of a hash table is based on its implementation and is load factor, α These value are how fast a serach is for an element in a tabel that is not full and has had no removals.
α = number of elements in a table / size of the table's arrayOpen Addressing with Linear Probing | .5(1+(1-α)-1) |
Open Addressiong with Double Hashing | α-1(-ln(1-α)) |
Chained Hashing | 1+.5α |
Prev -- Back to Portal -- Next