1. 程式人生 > >Hash Table (雜湊表)

Hash Table (雜湊表)

hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number).  A hash table implements an associative array. The hash function is used to transform the key into the index (the hash

) of an array element (the slot or bucket) where the corresponding value is to be sought.

The implementation of this calculation is the hash functionf:

index = f(key, arrayLength)

The hash function calculates an index within the array from the data keyarrayLength is the size of the array. 

A basic requirement is that the function should provide a 

uniform distribution of hash values.

Load factor

The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size sof its array of buckets.

Separate chaining

In the strategy known as separate chaining

direct chaining, or simply chaining, each slot of the bucket array is a pointer to a linked list that contains the key-value pairs that hashed to the same location. Lookup requires scanning the list for an entry with the given key. Insertion requires adding a new entry record to either end of the list belonging to the hashed slot. Deletion requires searching the list and removing the element.

Dynamic resizing

To keep the load factor under a certain limit, e.g. under 3/4, many table implementations expand the table when items are inserted. 

Resizing is accompanied by a full or incremental table rehash whereby existing items are mapped to new bucket locations.

To limit the proportion of memory wasted due to empty buckets, some implementations also shrink the size of the table—followed by a rehash—when items are deleted. From the point of space-time tradeoffs, this operation is similar to the deallocation in dynamic arrays.


Resizing by copying all entries

A common approach is to automatically trigger a complete resizing when the load factor exceeds some threshold rmax. Then a new larger table is allocated, all the entries of the old table are removed and inserted into this new table, and the old table is returned to the free storage pool. Symmetrically, when the load factor falls below a second threshold rmin, all entries are moved to a new smaller table.