Data Structures in Go: Hash Table
he time has come to implement one of the most commonly used data structures in software development - the Hash Table!
A hash table is a data structure that implements an associative array, allowing us to store pairs of keys and values that can be accessed at an expected O(1) time (O(n) - worst case for a bad implementation).
Personally I love this data structure. It works a bit like magic! How could you access a random key’s value at constant time?
Here is how it works:
Create an array of a length equal to the expected number of key/value pairs in your hash table. The bigger the array, the lower the chance of collisions (explained further below). Create a hashing function which will take the value of the key you want add and transform it into a number. The better this function is, the lower the chance of collisions. Take the number generated by the hashing function and calculate the modulo with the array length. (e.g. if the hash is 1234 and the array length is 100, then calculate 1234 % 100). This is going to be the index in the array where you are going to store the value. That is pretty much how it works in a nutshell. One of the most important things in a hash map is probably the hashing function. It can make the difference between a great and a very poor hash map.
In the implementation that follows, for a hash table that allows only string keys, I chose to use the following hash function: char[0] * 31^n-1 + char[1] * 31^n-2 + ... + char[n-1]
Also, I have mentioned above collisions a couple of times. A collision happens when the array indices for two different keys happens to be the same. This could be because of a bad hashing function or a too small of an underlying array. Both of these can be fixed though. You can use a better hashing function and you can also dynamically grow the array when needed.
But collisions will happen, so then what do we do about it? Instead of storing our value as is in the array, we are going to store it in a linked list (or array) together with the un-hashed key. So, whenever we try to retrieve or modify a key value, we walk through that linked list and make sure that we are manipulating the correct key (if there are multiple at that index).