Intro: Why ?
In many applications, data needs to be stored and retrieved quickly, and traditional data structures like arrays or linked lists can be inefficient for these tasks, particularly when dealing with large amounts of data.
Hash tables solve this problem by providing constant-time average operations for insertion, deletion, and retrieval of data. This is achieved by using a hash function to map each key to a unique index in the array, allowing for quick access to the data.
Hash tables are widely used in a variety of applications, including databases, caches, compilers, and search engines, where fast data retrieval and storage are essential. They are particularly useful for tasks such as dictionary implementations, where a large number of key-value pairs need to be stored and accessed efficiently.
============================================================================
Practice Objectives will be:
A hash function to map keys to values.
A hash table data structure that supports
insert
,search
, anddelete
operations.A data structure to account for a collision of keys.
========================================================================
Hash Function :
To ensure the proper functioning of a hash table, the hash function must meet three key requirements:
Firstly, it should produce the same output when given the same input. Secondly, it should be fast to compute to enable quick lookups. Finally, the hash function should generate outputs that are reasonably random.
What is a collision ?
A collision occurs when two or more keys have the same hash value and therefore map to the same index in the hash table.
How to avoid collisions ?
a linked list corresponding to the particular bucket of the Hash Table, to accommodate all the values corresponding to different keys that map to the same bucket.
PRACTICE:
CREATE TABLE AND ITEMS USING STRUCTS:
typedef struct Hash_item{
char *key;
char *value
} H_item;
==================================
typedef struct HashTable{
// Contains an array of pointers to items.
item **items;
int size; /* no of slots in hash table*/
int count; /* no of key-value pairs in hash table*/
} HashTable;
Allocate memory for a new
HashTable
struct usingmalloc()
.Initialize the
size
andcount
fields of theHashTable
struct.Allocate memory for an array of
size
pointers toHt_item
structs usingcalloc()
.Set all pointers in the
items
array toNULL
.Return a pointer to the new
HashTable
struct.To print the hash table content:
Iterate over the
items
array of the hash table using afor
loop.For each element in the
items
array, assign the pointer to theHt_item
struct at the current index to theitem
variable.Check if
item
isNULL
, which indicates that there are no key-value pairs stored in the current slot of the hash table.If
item
is notNULL
, print the key and value of theHt_item
struct usingprintf()
.
hint:
malloc()
is used to allocate an uninitialized block of memory of a specified size. The values in the memory block are undefined and may contain garbage values.
calloc()
is used to allocate a block of memory of a specified size and initialize all bytes in the block to zero. This can be useful when working with structures or arrays, as it ensures that all elements are initialized to zero before use.Free up memory that you’ve allocated on the heap with
malloc()
andcalloc()
.
Benefits of Using Pointers in Hash Table Implementation:
This approach proves especially advantageous when dealing with large structures. By saving memory in this way, we can optimize our program's performance and accommodate more data without exhausting system resources.
The second benefit of using pointers in hash table implementation lies in their ability to easily identify empty locations within the table. By assigning a null value to a pointer, we can indicate that a specific slot in the table is unoccupied.
What is djb2 algorithm ?
The djb2 algorithm, also known as the DJB2 hash function, is a widely used non-cryptographic hash function created by Daniel J. Bernstein. It is a simple and efficient algorithm designed to generate hash values for various data types, such as strings.
The djb2 algorithm operates by iteratively hashing each character of the input data and combining the hash value with a constant value using bitwise operations.
djb2 make use of:
Bit-shifting left by
n
positions is equivalent to multiplying the originalvalue by
2^n
.In the case of
hash << 5
, the left bit-shift by 5 positions is equivalent to multiplying thehash
value by2^5
, which is equal to 32.hash << 5
multiplies the hash value by 32. This operation is often used in hash table implementations as a simple and fast way to generate a new hash value that is well-distributed across the array of buckets. By multiplying the hash value by a large power of 2, the resulting hash value is likely to have more unique bits, which can reduce the likelihood of collisions and improve the overall performance of the hash table.
What is index in a hash table ?
An index in a hash table is the position in the underlying array where a key-value pair is stored, and it is determined by applying a hashing function to the key.
To return the index that has been hashed in a hash table, you typically need to implement a hash function and apply it to the key you want to hash. The resulting hash value is then mapped to an index within the hash table's underlying array.
Here's an example usage:
key = "example"
table_size = 10
index = hash_function(key, table_size)
The output will be the calculated index value for the given key based on the hash function and table size.
hint:
The purpose of using
%
with the table size is to "wrap around" the hash value and constrain it within the valid index range. By taking the modulo of the hash value with the table size, you effectively ensure that the resulting index falls within the bounds of the hash table's array.
see :