Hash Tables in Python: A Deep Dive

Hash Tables in Python: A Deep Dive

Hash Tables, sometimes referred to as Hash Maps or Dictionaries in other languages, are an essential data structure that allows us to store and retrieve data efficiently. They achieve this through a process of mapping keys to specific locations in an array, known as hashing.

In today's post, we'll embark on a detailed exploration of Hash Tables, using a custom-built Python version. The end goal? To grasp the intricacies of hashing, handle collisions, and employ this mighty data structure in real-world scenarios.

Structure of a Hash Table

At its core, a Hash Table consists of an array and a hashing function. This function takes a key as input and returns an index where the associated value should be stored or retrieved.

Here's a breakdown of our custom HashTable class:

class HashTable:
    def __init__(self, size = 7):
        # Initializes an array with a default size of 7
        self.data_map = [None] * size

    def __hash(self, key):
        # Computes an index for the given key
        my_hash = 0
        for letter in key:
            # This formula ensures that the hash values are spread out over our array
            my_hash =(my_hash + ord(letter) * 23) % len(self.data_map)
        return my_hash

    def print_table(self):
        # Prints the contents of the hash table for debugging purposes
        for i, val in enumerate(self.data_map):
            print(i, ": ", val)

    def set_item(self, key, value):
        # Sets a new key-value pair into the hash table
        index = self.__hash(key)
        # If the index is empty (None), initialize it as an empty list
        if self.data_map[index] == None:
            self.data_map[index] = []
        # Appends the key-value pair to the list at the computed index
        self.data_map[index].append([key, value])

    def get_item(self, key):
        # Retrieves a value based on its key
        index = self.__hash(key)
        if self.data_map[index] is not None:
            for i in range(len(self.data_map[index])):
                # Searches for the key in the index's list
                if self.data_map[index][i][0] == key:
                    return self.data_map[index][i][1]
        # Returns None if the key wasn't found
        return None

    def keys(self):
        # Returns all the keys present in the hash table
        all_keys = []
        for i in range(len(self.data_map)):
            if self.data_map[i] is not None:
                for j in range(len(self.data_map[i])):
                    all_keys.append(self.data_map[i][j][0])
        return all_keys

Exploring the Intricacies

  1. Hashing: Our hashing function __hash() transforms a key into an array index. By using the ord() function (which returns an integer representing the Unicode character), we ensure a good spread of values across the hash table. The multiplication by 23, a prime number, further randomizes this spread, reducing the chances of collisions. Finally, the modulo operation ensures that the resulting index remains within the bounds of the array.

  2. Collisions: Notice that our set_item() method can handle situations where two keys hash to the same index (a collision). This is achieved by storing key-value pairs in a list at the indexed location. If a collision occurs, the new key-value pair is simply appended to this list.

  3. Retrieval: The get_item() method calculates the key's hash, then checks if that index contains the required key-value pair. If it does, the associated value is returned. If not, the method returns None.

  4. Key Enumeration: Our keys() method returns a list of all keys currently in the hash table. This is useful for various applications, from debugging to operations that require knowledge of all current keys.

Why Hash Tables?

Hash tables offer O(1) average time complexity for both insertions and lookups, making them incredibly efficient for tasks involving frequent data retrieval or updates. However, they're not without their limitations and challenges, such as handling collisions and ensuring a good distribution of values to maintain performance.

In our Python example, we've seen a rudimentary hashing function and collision resolution using chaining (storing a list of key-value pairs at each index). This is just one approach, and as you delve deeper into the world of data structures, you'll encounter more sophisticated methods.

As always, understanding the inner workings of such structures provides the foundation for leveraging them most effectively in real-world scenarios, be it in Python or any other language. As you move forward, experimenting with different hashing functions and collision resolution strategies will deepen your appreciation and mastery of this versatile data structure.