O(1) Hash Tables in C

C - Hash tables

Jul 06, 2023

Intro: Why ?

In many applications, data needs to be stored and retrieved quickly, and traditional data structures like arrays or linked lists can be inefficient for these tasks, particularly when dealing with large amounts of data.

Hash tables solve this problem by providing constant-time average operations for insertion, deletion, and retrieval of data. This is achieved by using a hash function to map each key to a unique index in the array, allowing for quick access to the data.

Hash tables are widely used in a variety of applications, including databases, caches, compilers, and search engines, where fast data retrieval and storage are essential. They are particularly useful for tasks such as dictionary implementations, where a large number of key-value pairs need to be stored and accessed efficiently.

============================================================================

Practice Objectives will be:

A hash function to map keys to values.
A hash table data structure that supports insert, search, and delete operations.
A data structure to account for a collision of keys.
========================================================================

Hash Function :
To ensure the proper functioning of a hash table, the hash function must meet three key requirements:
Firstly, it should produce the same output when given the same input. Secondly, it should be fast to compute to enable quick lookups. Finally, the hash function should generate outputs that are reasonably random.

A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored — Resource : https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Hash_table_3_1_1_0_1_0_0_SP.svg/800px-Hash_table_3_1_1_0_1_0_0_SP.svg.png

In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. — Resource: https://en.wikipedia.org/wiki/Hash_table#

What is a collision ?

A collision occurs when two or more keys have the same hash value and therefore map to the same index in the hash table.

How to avoid collisions ?

a linked list corresponding to the particular bucket of the Hash Table, to accommodate all the values corresponding to different keys that map to the same bucket.

PRACTICE:

CREATE TABLE AND ITEMS USING STRUCTS:

typedef struct Hash_item{

char *key;

char *value

} H_item;

==================================

typedef struct HashTable{

// Contains an array of pointers to items.

item **items;

int size; /* no of slots in hash table*/

int count; /* no of key-value pairs in hash table*/

} HashTable;

Allocate memory for a new HashTable struct using malloc().
Initialize the size and count fields of the HashTable struct.
Allocate memory for an array of size pointers to Ht_item structs using calloc().
Set all pointers in the items array to NULL.
Return a pointer to the new HashTable struct.
create_hash_table() function allocates memory for a new hash table with a specified size, initializes its fields, and returns a pointer to the new hash table.
To print the hash table content:
1. Iterate over the items array of the hash table using a for loop.
2. For each element in the items array, assign the pointer to the Ht_item struct at the current index to the item variable.
3. Check if item is NULL, which indicates that there are no key-value pairs stored in the current slot of the hash table.
4. If item is not NULL, print the key and value of the Ht_item struct using printf().

Iterate over the items array of the hash table using a for loop. — Iterate over the `items` array of the hash table using a `for` loop.

hint:
malloc() is used to allocate an uninitialized block of memory of a specified size. The values in the memory block are undefined and may contain garbage values.
calloc() is used to allocate a block of memory of a specified size and initialize all bytes in the block to zero. This can be useful when working with structures or arrays, as it ensures that all elements are initialized to zero before use.
Free up memory that you’ve allocated on the heap with malloc() and calloc().

Benefits of Using Pointers in Hash Table Implementation:

This approach proves especially advantageous when dealing with large structures. By saving memory in this way, we can optimize our program's performance and accommodate more data without exhausting system resources.

The second benefit of using pointers in hash table implementation lies in their ability to easily identify empty locations within the table. By assigning a null value to a pointer, we can indicate that a specific slot in the table is unoccupied.

What is djb2 algorithm ?

The djb2 algorithm, also known as the DJB2 hash function, is a widely used non-cryptographic hash function created by Daniel J. Bernstein. It is a simple and efficient algorithm designed to generate hash values for various data types, such as strings.

The djb2 algorithm operates by iteratively hashing each character of the input data and combining the hash value with a constant value using bitwise operations.

djb2 make use of:

Bit-shifting left by n positions is equivalent to multiplying the original
value by 2^n.
In the case of hash << 5, the left bit-shift by 5 positions is equivalent to multiplying the hash value by 2^5, which is equal to 32.
- hash << 5 multiplies the hash value by 32. This operation is often used in hash table implementations as a simple and fast way to generate a new hash value that is well-distributed across the array of buckets. By multiplying the hash value by a large power of 2, the resulting hash value is likely to have more unique bits, which can reduce the likelihood of collisions and improve the overall performance of the hash table.
  The resulting hash value is an unsigned long integer, which serves as a unique identifier for the input data. The djb2 algorithm is known for its simplicity, good distribution properties, and fast computation speed, making it suitable for various applications such as hash tables, data indexing, and checksumming.

What is index in a hash table ?

An index in a hash table is the position in the underlying array where a key-value pair is stored, and it is determined by applying a hashing function to the key.

The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then the algorithm computes the index at which the value would be stored. — Resource : https://www.javatpoint.com/hash-table

To return the index that has been hashed in a hash table, you typically need to implement a hash function and apply it to the key you want to hash. The resulting hash value is then mapped to an index within the hash table's underlying array.

Here's an example usage:

key = "example" 
table_size = 10 
index = hash_function(key, table_size)

The output will be the calculated index value for the given key based on the hash function and table size.

hint:
The purpose of using % with the table size is to "wrap around" the hash value and constrain it within the valid index range. By taking the modulo of the hash value with the table size, you effectively ensure that the resulting index falls within the bounds of the hash table's array.

By taking the modulo of the hash value (27) with the table size (10), we get an index of 7. This index falls within the valid range of the hash table's array (0 to 9), so it can be used to determine the appropriate bucket to store or retrieve the key-value pair.

see :

Handling Collisions in Hash Tables

Shaza Ali

July 6, 2023

Read full story

Resources :

Hash table in C

Implementation in C " Jacob Sorber" Video 25 min.

Examples

Shaza’s Substack

Handling Collisions in Hash Tables

Discussion about this post