雜湊表之HashTable

阿新 • • 發佈：2022-03-20

1.什麼是Hash表？

Hash表又被稱為散列表，是根據關鍵碼值（key-value）也就是鍵值對來直接訪問的一種資料結構。也就是說，它通過把關鍵碼值對映到表中的一個位置來訪問記錄，用以加快查詢的速度。

2.HashTable

2.1在那個包下？

來自於java.util

2.2類的繼承和實現關係

Hashtable實現了一個雜湊表（Map<K,V>），可以將key對映到value。任何一個非空的物件object都可以作為key或者value。

public class Hashtable<K,V>
    extends Dictionary<K,V>
    implements Map<K,V>, Cloneable, java.io.Serializable {}

2.1如何保證成功的從hashtable中儲存和查詢物件

	那麼key必須實現hashcode方法和equals方法。

2.3影響效能的2個重要引數是什麼

初始容量（initial capacity）
負載因子（load factor）

2.3.1容量

The capacity is the number of buckets in the hash table
在雜湊表中容量就是桶（buckets）的數量。    
初始容量（initial capacity）就是建立hashtable表時的容量

2.3.2負載因子

Generally, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the time cost to look up an entry (which is reflected in most Hashtable operations, including get and put).

通常,0.75負載因子提供了良好的時間和空間的平衡。提高負載因子的值會降低空間消耗，但是增加了時間成本去查詢entry。

2.4產生Hash衝突的情況圖示

圖中key2和key3就產生了hash衝突，他們的地址在同一個桶上，造成一個桶上儲存了2個條目。這樣就知道了，如果是多個衝突的話，一個桶就有多個條目。這種情況的查詢，必須按順序進行搜尋。

2.5初始容量

2.6什麼是快速失敗（fail-fast）

/*
	The iterators returned by the iterator method of the collections returned by all of this class's "collection view methods" are fail-fast: if the Hashtable is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. The Enumerations returned by Hashtable's keys and elements methods are not fail-fast.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
*/
	由此類的所有“集合檢視方法”返回的集合的迭代器方法返回的迭代器是快速失敗的：如果在建立迭代器後的任何時間對 Hashtable 進行結構修改，除了通過迭代器自己的刪除之外的任何方式方法，迭代器將丟擲ConcurrentModificationException 。因此，面對併發修改，迭代器快速而乾淨地失敗，而不是在未來不確定的時間冒任意的、非確定性的行為。 Hashtable 的鍵和元素方法返回的列舉不是快速失敗的。
	請注意，不能保證迭代器的快速失敗行為，因為一般來說，在存在不同步的併發修改的情況下，不可能做出任何硬保證。快速失敗的迭代器會盡最大努力丟擲ConcurrentModificationException 。因此，編寫一個依賴於這個異常的正確性的程式是錯誤的：迭代器的快速失敗行為應該只用於檢測錯誤。

說人話：上面的這一段是描述了hashtable的迭代器iterator在遍歷一個集合的物件的時候，如果遍歷的過程中對集合物件的內容進行了修改(包括增加、刪除、修改)，就會丟擲一個異常ConcurrentModificationException。翻譯：併發修改異常。

2.7Hashtable和HashMap和ConcurrentHashMap怎麼選擇

As of the Java 2 platform v1.2, this class was retrofitted to implement the Map interface, making it a member of the Java Collections Framework. Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use java.util.concurrent.ConcurrentHashMap in place of Hashtable.

從 Java 2 平臺 v1.2 開始，該類被改進為實現Map介面，使其成為Java Collections Framework的成員。與新的集合實現不同， Hashtable是同步的。如果不需要執行緒安全的實現，建議使用HashMap代替Hashtable 。如果需要執行緒安全的高併發實現，則建議使用java.util.concurrent.ConcurrentHashMap代替Hashtable 。

2.8hashtable類的成員變數

 private transient Entry<?,?>[] table;	//定義的一個表資料結構，該結構在該類的原始碼中有單獨定義，由2.4圖也可以看出其實就是儲存k-v鍵值的結構
 private transient int count;			//表中的條目總數
 private int threshold;					//表的閾值，是雜湊表的擴容的臨界條件。（該欄位的值為 (int)(capacity * loadFactor)
 private float loadFactor;				//表的負載因子
 private transient int modCount = 0;	//修改的次數

2.9hashtable的4種構造方法

2.9.1指定初始容量和指定負載因子

public Hashtable(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)	//初始容量為0的話就丟擲異常
            throw new IllegalArgumentException("Illegal Capacity: "+initialCapacity);
        if (loadFactor <= 0 || Float.isNaN(loadFactor))		//負載因子小於等於0或者傳入的負載因子不是一個數字就丟擲異常
            throw new IllegalArgumentException("Illegal Load: "+loadFactor);

        if (initialCapacity==0)			//如果傳入的為0，則將傳入的做處理，使其等於1
            initialCapacity = 1;
        this.loadFactor = loadFactor;	//負載因子複合條件了，直接賦值。
        table = new Entry<?,?>[initialCapacity];	//建立entry表資料結構，大小是初始容量的大小
        threshold = (int)Math.min(initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1);	//閾值：自己看，這個很容易懂
    }

指定初始容量和負載因子對其傳入的值進行了一定的約束，不滿足條件則直接丟擲異常。

2.9.2指定初始容量和使用預設負載因子

 public Hashtable(int initialCapacity) {
        this(initialCapacity, 0.75f);
    }

2.9.3空參構造（全部使用預設值）

hashtable的初始容量的預設值為11，負載因子為0.75f

public Hashtable() {
        this(11, 0.75f);
    }

2.9.4構造方法的引數是map的構造方法

 public Hashtable(Map<? extends K, ? extends V> t) {
        this(Math.max(2*t.size(), 11), 0.75f);	//保證初始容量，使用的是預設載入因子
        putAll(t);
    }

2.10說說put方法是怎麼實現的【重點】

//加入了synchronized鎖來保證在多執行緒環境下的資料安全
public synchronized V put(K key, V value) {
        // 確保value不為空
        if (value == null) {
            throw new NullPointerException();
        }

        // 確保key沒有存在hashtable中
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();	//獲取key的雜湊值，與hashmap有所不同。
        int index = (hash & 0x7FFFFFFF) % tab.length;	//直接取模得到目標雜湊桶。
        @SuppressWarnings("unchecked")
        Entry<K,V> entry = (Entry<K,V>)tab[index];	//單向連結串列
        for(; entry != null ; entry = entry.next) {	//for迴圈查詢複合條件的key，賦值為新的value,也就是覆蓋掉原來的值
            if ((entry.hash == hash) && entry.key.equals(key)) {
                V old = entry.value;
                entry.value = value;
                return old;
            }
        }
		
    	//執行到這裡就表示沒有key相等的位置，那麼就直接插入entry中
        addEntry(hash, key, value, index);
        return null;
    }

//------------------------------------addEntry----------------------------------------
private void addEntry(int hash, K key, V value, int index) {
        modCount++;	//修改的次數進行+1

        Entry<?,?> tab[] = table;
        if (count >= threshold) {	//如果HashTable的條目大小大於閾值，那麼就會觸發一次rehash()
            // 超過閾值，則進行擴容
            rehash();

            tab = table;
            hash = key.hashCode();
            index = (hash & 0x7FFFFFFF) % tab.length;
        }

        // Creates the new entry.
        @SuppressWarnings("unchecked")
        Entry<K,V> e = (Entry<K,V>) tab[index];	//將tab中索引位置處的Entry賦值給e
        tab[index] = new Entry<>(hash, key, value, e);	//建立一個新的Entry
        count++;	//數量進行+1
    }

總結下流程：

1.先對鍵值進行判斷是否為null

2.通過(hash & 0x7FFFFFFF) % tab.length;來確定雜湊桶的位置

3.遍歷桶中的元素，如果key相等，那麼則進行覆蓋

4.如果不相等，那麼就呼叫addEntry，直接進行插入

5.在addEntry方法中，先判斷插入鍵值對後雜湊表是否需要擴容，若需要則先擴容，然後重新計算雜湊值；

6.最終進行插入

2.11說說get方法的實現流程【重點】

 public synchronized V get(Object key) {
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();	//計算下標的位置
        int index = (hash & 0x7FFFFFFF) % tab.length;	
     	//for迴圈來查詢符合條件的value
        for (Entry<?,?> e = tab[index] ; e != null ; e = e.next) {
            if ((e.hash == hash) && e.key.equals(key)) {
                return (V)e.value;
            }
        }
        return null;
    }

總的來說就是先通過key的key.hashcode()來定位一個目標桶，在通過遍歷連結串列獲取響應的元素

2.12說說remove方法【重點】

//	synchronized鎖保證刪除成功
public synchronized V remove(Object key) {
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % tab.length;	//index是陣列的索引值
        @SuppressWarnings("unchecked")
        Entry<K,V> e = (Entry<K,V>)tab[index];
     	//遍歷單向連結串列找，刪除對應節點
        for(Entry<K,V> prev = null ; e != null ; prev = e, e = e.next) {
            if ((e.hash == hash) && e.key.equals(key)) {
                modCount++;	//修改值++
                if (prev != null) {
                    prev.next = e.next;
                } else {
                    tab[index] = e.next;
                }
                count--;
                V oldValue = e.value;
                e.value = null;
                return oldValue;
            }
        }
        return null;
    }

2.13說說rehash擴容方法【重點】

/*8Increases the capacity of and internally reorganizes this hashtable, in order to accommodate and access its entries more efficiently. This method is called automatically when the number of keys in the hashtable exceeds this hashtable's capacity and load factor.*/

//增加此雜湊表的容量並在內部重新組織此雜湊表，以便更有效地容納和訪問其條目。
//當雜湊表中的鍵數超過此雜湊表的容量和負載因子時，將自動呼叫此方法。

protected void rehash() {
        int oldCapacity = table.length;
        Entry<?,?>[] oldMap = table;

        // overflow-conscious code
        int newCapacity = (oldCapacity << 1) + 1;	//這裡是關鍵的擴容點
        if (newCapacity - MAX_ARRAY_SIZE > 0) {
            if (oldCapacity == MAX_ARRAY_SIZE)
                // Keep running with MAX_ARRAY_SIZE buckets
                return;
            newCapacity = MAX_ARRAY_SIZE;
        }
        Entry<?,?>[] newMap = new Entry<?,?>[newCapacity];

        modCount++;
        threshold = (int)Math.min(newCapacity * loadFactor, MAX_ARRAY_SIZE + 1);
        table = newMap;

        for (int i = oldCapacity ; i-- > 0 ;) {
            for (Entry<K,V> old = (Entry<K,V>)oldMap[i] ; old != null ; ) {
                Entry<K,V> e = old;
                old = old.next;

                int index = (e.hash & 0x7FFFFFFF) % newCapacity;
                e.next = (Entry<K,V>)newMap[index];
                newMap[index] = e;
            }
        }
    }

重點是 int newCapacity = (oldCapacity << 1) + 1; 左移一位，變為2倍，在加1，所以擴容機制是2倍+1**；