通過java HashMap的存取方式來學習Hash儲存機制

阿新 • • 發佈：2019-01-26

最近重新開始看一遍java基礎，從原始碼讀起，堅持把自己在閱讀中的總結分享上來。下面是HashMap的一些總結。

HashMap的構造方法：

無參構造方法:會使用預設的初始容量和載入因子初始化map,預設初始化大小是16，載入因子0.75f

當雜湊表中的條目數超出了載入因子與當前容量的乘積時，則要對該雜湊表進行 rehash 操作（即重建內部資料結構），從而雜湊表將具有大約兩倍的桶數。

 /**
  * The default initial capacity - MUST be a power of two.
  */
 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
 /**
  * The load factor used when none specified in constructor.
  */
 static final float DEFAULT_LOAD_FACTOR = 0.75f;
 public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
 }

自定義初始化大小

  /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

自定義初始化大小和載入因子

/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }

總結：當建立 HashMap 時，有一個預設的負載因子（load factor），其預設值為 0.75，這是時間和空間成本上一種折衷：增大負載因子可以減少 Hash 表（就是那個 Entry 陣列）所佔用的記憶體空間，但會增加查詢資料的時間開銷，而查詢是最頻繁的的操作（HashMap 的 get() 與 put() 方法都要用到查詢）；減小負載因子會提高資料查詢的效能，但會增加 Hash 表所佔用的記憶體空間。我們可以在建立 HashMap 時根據實際需要適當地調整 load factor 的值；如果程式比較關心空間開銷、記憶體比較緊張，可以適當地增加負載因子；如果程式比較關心時間開銷，記憶體比較寬裕則可以適當的減少負載因子。通常情況下，無需改變負載因子的值。

HashMap最常用的put方法，程式碼如下：

 /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        //如果陣列為空,初始化
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        //如果key為空，則呼叫putForNullKey進行處理
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);//計算key的hashcode值
        int i = indexFor(hash, table.length);//計算key在hash表中的索引，此處的table是一個Entry<k,v>陣列
        //遍歷陣列，比較Entry是否一致（hash值相等，即在hash表中的同一位置），並且key值相等，則直接用新的value替換舊的value並返回value，key值不用替換。如果不滿足條件，則將key和value新增到i索引處
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
       //將key和value新增到i索引處
        addEntry(hash, key, value, i);
        return null;
    }

上面的put方法中用到了一個重要的內部類HashMap$Entry,每個 Entry 其實就是一個 key-value 對。從上面程式中可以看出：當系統決定儲存 HashMap 中的 key-value 對時，完全沒有考慮 Entry 中的 value，僅僅只是根據 key 來計算並決定每個 Entry 的儲存位置。當決定了 key 的儲存位置之後，value 隨之儲存在那裡即可,Entry原始碼如下：

 static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;//key值
        V value;//value值
        Entry<K,V> next;//Entry鏈指向
        int hash;//key的hash值

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

put方法中呼叫了一個計算Hash碼的方法hash()來返回key的雜湊碼，這個方法是一個純粹的數學計算，其方法如下：

  final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

對於任意給定的物件，只要它的 hashCode() 返回值相同，那麼程式呼叫 hash(int h) 方法所計算得到的 Hash 碼值總是相同的。接下來程式會呼叫 indexFor(int h, int length) 方法來計算該物件應該儲存在 table 陣列的哪個索引處。indexFor(int h, int length) 方法的程式碼如下：

//h為key的hash值，length為陣列的長度
static int indexFor(int h, int length) 
{ 
    return h & (length-1); 
}

這個方法非常巧妙，它總是通過 h &(table.length -1) 來得到該物件的儲存位置，而HashMap底層陣列的長度總是2的n次方，這一點可參看前面關於HashMap構造器的介紹。

當length總是2的倍數時，h&(length-1)將是一個非常巧妙的設計：假設 h=5,length=16, 那麼h&(length - 1) 將得到5；如果h=6,length=16, 那麼h&(length - 1)將得到6 ;如果h=15,length=16, 那麼h&(length - 1)將得到15；但是當h=16時 ,length=16時，那麼h&(length - 1)將得到0了；當 h=17 時 , length=16 時，那麼h&(length - 1) 將得到1了……這樣保證計算得到的索引值總是位於 table 陣列的索引之內。

從put 方法的原始碼可以看出，當程式試圖將一個 key-value 對放入 HashMap 中時，程式首先根據該 key 的 hashCode() 返回值決定該 Entry 的儲存位置：如果兩個 Entry 的 key 的 hashCode() 返回值相同，那它們的儲存位置相同。儲存位置相同會分為兩種情況：

（1）.如果這兩個 Entry 的 key 通過 equals 比較返回 true，新新增 Entry 的 value 將覆蓋集合中原有 Entry 的 value，但 key 不會覆蓋。

（2）.如果這兩個 Entry 的 key 通過 equals 比較返回 false，新新增的 Entry 將與集合中原有 Entry 形成 Entry 鏈，而且新新增的 Entry 位於 Entry 鏈的頭部——具體說明繼續看 addEntry() 方法的說明。

儲存位置不同，則將key和value直接新增到i索引處。

addEntyr方法，原始碼如下：

   void addEntry(int hash, K key, V value, int bucketIndex) {
       //如果容量大於閾值，並且索引bucketIndex處的元素不為空       
       if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);//擴容為原來陣列長度的兩倍
            hash = (null != key) ? hash(key) : 0;//重新計算key的hash值
            bucketIndex = indexFor(hash, table.length);//重新計算元素在新table中的索引
        }
        //建立新的entry物件並放到table的bucketIndex索引處，並讓新的entry指向原來的entry
        createEntry(hash, key, value, bucketIndex);
    }

    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

上面createEntry方法包含了一個非常優雅的設計：總是將新新增的 Entry 物件放入 table 陣列的 bucketIndex 索引處——如果 bucketIndex 索引處已經有了一個 Entry 物件，那新新增的 Entry 物件指向原有的 Entry 物件（產生一個 Entry 鏈），如果 bucketIndex 索引處沒有 Entry 物件，上面程式 e 變數是 null，也就是新放入的 Entry 物件指向 null，也就是Entry內部類中的next屬性為null，也就是沒有產生 Entry 鏈。,可以對比Entry類看。

解釋幾個名詞：

桶：對於 HashMap 及其子類而言，它們採用 Hash 演算法來決定集合中元素的儲存位置。當開始初始化 HashMap 時，會建立一個長度為 capacity 的 Entry 陣列，這個數組裡可以儲存元素的位置被稱為“桶（bucket）”，每個 bucket 都有其指定索引，系統可以根據其索引快速訪問該 bucket 裡儲存的元素。

Entry鏈：無論何時，HashMap 的每個“桶”只儲存一個元素（也就是一個 Entry），由於 Entry 物件可以包含一個引用變數（就是 Entry 構造器的的最後一個引數next）用於指向下一個 Entry，因此可能出現的情況是：HashMap 的 bucket 中只有一個 Entry，但這個 Entry 指向另一個 Entry ——這就形成了一個 Entry 鏈。下圖為我簡單的畫了一個HashMap的儲存結構：

HashMap儲存結構圖

HashMap最常用的get方法，原始碼如下：

 public V get(Object key) {
      //如果key為null，則呼叫getForNullKey獲得value
      if (key == null)
          return getForNullKey();
      //否則呼叫getEntry方法
      Entry<K,V> entry = getEntry(key);

      return null == entry ? null : entry.getValue();
 }
   
 final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
        //計算key的hash值
        int hash = (key == null) ? 0 : hash(key);
        //直接通過key的hash值獲取該Entry在陣列中的下標，從而獲取該Entry物件並遍歷entry鏈，直到找到相等的key，然後取出該key對應的value。
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
 }

從上面程式碼中可以看出，如果 HashMap 的每個 bucket 裡只有一個 Entry 時，HashMap 可以根據索引、快速地取出該 bucket 裡的 Entry；在發生“Hash 衝突”的情況下，單個 bucket 裡儲存的不是一個 Entry，而是一個 Entry 鏈，只能按順序遍歷每個 Entry，直到找到想搜尋的 Entry 為止——如果恰好要搜尋的 Entry 位於該 Entry 鏈的最末端（該 Entry 是最早放入該 bucket 中），那必須迴圈到最後才能找到該元素。所以，當 HashMap 的每個 bucket 裡儲存的 Entry 只是單個 Entry ，也就是沒有通過指標產生 Entry 鏈時，此時的 HashMap 具有最好的效能：當程式通過 key 取出對應 value 時，只要先計算出該 key 的 hashCode() 返回值，在根據該 hashCode 返回值找出該 key 在 table 陣列中的索引，然後取出該索引處的 Entry，最後返回該 key 對應的 value 即可。

通過java HashMap的存取方式來學習Hash儲存機制

通過java HashMap的存取方式來學習Hash儲存機制

spring boot框架學習學前掌握之重要註解(2)-通過java的配置方式進行配置spring

通過編譯函數庫來學習GCC【轉】

excel2013如何通過自定義排序方式來進行排序

有道雲筆記不需要通過開通會員的方式來去除廣告顯示

通過實際部署應用程式來學習Web 3.0：動手實踐（IPFS +以太坊）

通過介面回撥方式來設定RecyclerView的條目點選監聽事件

Mybatis Generator的model生成中文註釋,支援oracle和mysql(通過修改原始碼的方式來實現)

java web客戶端整合cas，web.xml通過編碼的方式來載入配置

Java HashMap 遍歷方式探討

關於使用Java中的for循環和if語句打印空心菱形的詳細分析(也算是通過重新梳理一遍來加深自己的理解吧!)

通過Java對象來遍歷成員方法，成員變量，構造函數

使用Java命令行方式導入第三方jar包來運行Java程序的命令

通過java方式安裝jenkins

Java HashMap詳細介紹和使用示例（正在整理學習中）

通過COM組件方式實現java調用C#寫的DLL文件轉

Java基礎（二）Java 基礎語法，小白趕緊來學習吧！

如何通過讀取配置引數而不是預編譯巨集定義方式來決定是否執行printf函式

java HashMap hash方法分析

通過 Java 去監測某個目錄下的檔案變動（commons.io方式）

通過java HashMap的存取方式來學習Hash儲存機制

相關推薦