1. 程式人生 > 程式設計 >Java HashMap原始碼及併發環境常見問題解決

Java HashMap原始碼及併發環境常見問題解決

HashMap原始碼簡單分析:

1 一切需要從HashMap屬性欄位說起:

/** The default initial capacity - MUST be a power of two. 初始容量 */
  static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

  /**
   * The maximum capacity,used if a higher value is implicitly specified
   * by either of the constructors with arguments.
   * MUST be a power of two <= 1<<30. 最大容量
   */
  static final int MAXIMUM_CAPACITY = 1 << 30;

  /**
   * The load factor used when none specified in constructor.    * 預設的負載因子,當map的size>=負載因子*capacity時候並且插入元素時候的table[i]!=null進行擴容   * 擴容判斷邏輯:java.util.HashMap#addEntry函式中   *
   */
  static final float DEFAULT_LOAD_FACTOR = 0.75f;

  /**
   * An empty table instance to share when the table is not inflated.
   */
  static final Entry<?,?>[] EMPTY_TABLE = {};

  /**
   * The table,resized as necessary. Length MUST Always be a power of two. 雜湊表
   */
  transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

  /**
   * The number of key-value mappings contained in this map. map的大小
   */
  transient int size;

  /**
   * The next size value at which to resize (capacity * load factor).
   * @serial
   */
  // If table == EMPTY_TABLE then this is the initial capacity at which the
  // table will be created when inflated. 擴容的閾值 = capacity * 負載因子
  int threshold;

  /**
   * The load factor for the hash table. 負載因子,預設是0.75,可以在建立HashMap時候通過建構函式指定
   *
   * @serial
   */
  final float loadFactor;

  /**
   * The number of times this HashMap has been structurally modified
   * Structural modifications are those that change the number of mappings in
   * the HashMap or otherwise modify its internal structure (e.g.,* rehash). This field is used to make iterators on Collection-views of
   * the HashMap fail-fast. (See ConcurrentModificationException).   * 修改次數:例如進行rehash或者返回hashMap檢視時候如果發生修改可以fast-fail
   */
  transient int modCount;

  /**
   * The default threshold of map capacity above which alternative hashing is
   * used for String keys. Alternative hashing reduces the incidence of
   * collisions due to weak hash code calculation for String keys.
   * <p/>
   * This value may be overridden by defining the system property
   * {@code jdk.map.althashing.threshold}. A property value of {@code 1}
   * forces alternative hashing to be used at all times whereas
   * {@code -1} value ensures that alternative hashing is never used.   * rehash時候判斷的一個閾值
   */
  static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer.MAX_VALUE;

2: 接下來檢視一下HashMap的put方法:

/**
   * Associates the specified value with the specified key in this map.
   * If the map previously contained a mapping for the key,the old
   * value is replaced.
   *
   * @param key key with which the specified value is to be associated
   * @param value value to be associated with the specified key
   * @return the previous value associated with <tt>key</tt>,or
   *     <tt>null</tt> if there was no mapping for <tt>key</tt>.
   *     (A <tt>null</tt> return can also indicate that the map
   *     previously associated <tt>null</tt> with <tt>key</tt>.)
   */
  public V put(K key,V value) {
    if (table == EMPTY_TABLE) {//初始化雜湊表
      inflateTable(threshold);
    }
    if (key == null) //如果key 為null 儲存到table[0]位置
      return putForNullKey(value);
    int hash = hash(key); //計算hash值
    int i = indexFor(hash,table.length);//計算entry在table中的位置
    //for迴圈邏輯用於修改key對應的value的
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
      Object k;
      if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
        V oldValue = e.value;
        e.value = value;
        e.recordAccess(this);
        return oldValue;//如果是更新返回舊值
      }
    }
    //修改次數++
    modCount++;
    //新增元素到雜湊表中
    addEntry(hash,key,value,i);
    // 如果是新增元素則返回null
    return null;
  }

3 put中呼叫的inflateTable方法:

/**
   * Inflates the table.
   */
  private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    //計算大於等於toSize的最小的2的整數次冪的值
    int capacity = roundUpToPowerOf2(toSize);
    //計算擴容閾值
    threshold = (int) Math.min(capacity * loadFactor,MAXIMUM_CAPACITY + 1);
    //初始化雜湊表
    table = new Entry[capacity];
    //更新一下rehash的判斷條件,便於以後判斷是否rehash
    initHashSeedAsNeeded(capacity);
  }

4 put方法中呼叫的indexFor方法:

/**
   * Returns index for hash code h. 返回雜湊值對應的雜湊表索引
   */
  static int indexFor(int h,int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
   //使用&操作,而不使用取餘原因:均勻分佈在雜湊表中 。length-1目的是:由於table的長度都是2的整數次冪進行擴容,length-1的二進位制全是1,計算效率高
    return h & (length-1);
  }

5 put方法中呼叫的addEntry方法:

/**
   * Adds a new entry with the specified key,value and hash code to
   * the specified bucket. It is the responsibility of this
   * method to resize the table if appropriate.
   *
   * Subclass overrides this to alter the behavior of put method.
   */
  void addEntry(int hash,K key,V value,int bucketIndex) {
   //判斷是否擴容,只有size大於等於閾值而且當前插入table[i]!=null(就是able[i]已經被佔用則擴容) 
   if ((size >= threshold) && (null != table[bucketIndex])) {
      resize(2 * table.length);
      hash = (null != key) ? hash(key) : 0;
      //如果需要擴容的話則需要更新再次重新計算雜湊表位置
      bucketIndex = indexFor(hash,table.length);
    }
    //將值插入到雜湊表中
    createEntry(hash,bucketIndex);
  }

6 addEntry方法中呼叫的createEntry方法:

/**
   * Like addEntry except that this version is used when creating entries
   * as part of Map construction or "pseudo-construction" (cloning,* deserialization). This version needn't worry about resizing the table.
   *
   * Subclass overrides this to alter the behavior of HashMap(Map),* clone,and readObject.
   */
  void createEntry(int hash,int bucketIndex) {
    // 獲取到雜湊表指定位置
    Entry<K,V> e = table[bucketIndex];
    // 連結串列的頭插入方式進行插入,插入邏輯在Entry的構造器中。然後將新節點儲存到 table[bucketIndex]中
    table[bucketIndex] = new Entry<>(hash,e);
    size++;//更新size即可
  }

Entry構造器:

/**
   * 
   * @param h hash值
   * @param k key
   * @param v value
   * @param n 原始連結串列
   */
  Entry(int h,K k,V v,Entry<K,V> n) {
    value = v;
    //將原始連結串列接該節點後面
    next = n;
    key = k;
    hash = h;
  }

7 接下來看一下java.util.HashMap#addEntry擴容機制:

當進行擴容時候需要重新計算雜湊值和在雜湊表中的位置。

void addEntry(int hash,int bucketIndex) {
    //滿足擴容條件進行擴容
    if ((size >= threshold) && (null != table[bucketIndex])) {
      //擴容,2倍進行擴容
      resize(2 * table.length);
      //重新計算哈數值
      hash = (null != key) ? hash(key) : 0;
      //重新計算雜湊表中的位置
      bucketIndex = indexFor(hash,table.length);
    }

    createEntry(hash,bucketIndex);
  }

接下來看一下java.util.HashMap#resize方法:

/**
   * Rehashes the contents of this map into a new array with a
   * larger capacity. This method is called automatically when the
   * number of keys in this map reaches its threshold.
   *
   * If current capacity is MAXIMUM_CAPACITY,this method does not
   * resize the map,but sets threshold to Integer.MAX_VALUE.
   * This has the effect of preventing future calls.
   *
   * @param newCapacity the new capacity,MUST be a power of two;
   *    must be greater than current capacity unless current
   *    capacity is MAXIMUM_CAPACITY (in which case value
   *    is irrelevant).
   */
  void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {//判斷當前old容量是否最最大容量,是的話更新閾值
      threshold = Integer.MAX_VALUE;
      return;
    }
    //建立新的表
    Entry[] newTable = new Entry[newCapacity];
    //元素轉移,根據initHashSeedAsNeeded結果判斷是否進行rehash
    transfer(newTable,initHashSeedAsNeeded(newCapacity));
    // 新表賦給table
    table = newTable;
    //更新閾值
    threshold = (int)Math.min(newCapacity * loadFactor,MAXIMUM_CAPACITY + 1);
  }

關於HashMap在併發情況下的常見問題,其實在多執行緒環境下使用HashMap本來就是有風險錯誤的,但是一般面試卻喜歡這麼問,下面列舉一下自己印象中的常見問題:

1:在進行擴容時候,其他執行緒是否可以進行進行插入操作(多執行緒環境下可能會導致HashMap進入死迴圈,此處暫不考慮)?

答:首先HashMap就不是一個執行緒安全的容器,所以在多執行緒環境下使用就是錯誤的。其次在擴容時候可以進行插入的,但是不安全。例如:

當主執行緒在呼叫transfer方法進行復制元素:

/**
   * Transfers all entries from current table to newTable.
   */
  void transfer(Entry[] newTable,boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
      while(null != e) {
        Entry<K,V> next = e.next;
        if (rehash) {
          e.hash = null == e.key ? 0 : hash(e.key);
        }
        int i = indexFor(e.hash,newCapacity);
        e.next = newTable[i];
        newTable[i] = e;
        e = next;
      }
    }
  }

此時另一個執行緒在新增新元素是可以的,新元素新增到table中。如果子執行緒需要擴容的話可以進行擴容,然後將新容器賦給table。而此時主執行緒轉移元素的工作就是將table中元素轉移到newTable中。注意main執行緒的transfer方法:

如果main執行緒剛進入transfer方法時候newTable大小是32的話,由於子執行緒的新增操作導致table此時元素如果有128的話。則128個元素就會儲存到大小為32的newTable中(此處不會擴容)。這就會導致HashMap效能下降!!!

可以使用多執行緒環境進行debug檢視即可確定(推薦Idea的debug,的確強大,尤其是Evaluate Expression功能)。

2:進行擴容時候元素是否需要重新Hash?

這個需要具體情況判斷,呼叫initHashSeedAsNeeded方法判斷(判斷邏輯這裡先不介紹)。

/**
   * Rehashes the contents of this map into a new array with a
   * larger capacity. This method is called automatically when the
   * number of keys in this map reaches its threshold.
   *
   * If current capacity is MAXIMUM_CAPACITY,MUST be a power of two;
   *    must be greater than current capacity unless current
   *    capacity is MAXIMUM_CAPACITY (in which case value
   *    is irrelevant).
   */
  void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
      threshold = Integer.MAX_VALUE;
      return;
    }

    Entry[] newTable = new Entry[newCapacity];
    //initHashSeedAsNeeded 判斷是否需要重新Hash
    transfer(newTable,initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor,MAXIMUM_CAPACITY + 1);
  }

然後進行轉移元素:

/**
   * Transfers all entries from current table to newTable.
   */
  void transfer(Entry[] newTable,boolean rehash) {
    int newCapacity = newTable.length;
    //多執行緒環境下,如果其他執行緒導致table快速擴大。newTable在此處無法擴容會導致效能下降。但是如果後面有再次呼叫put方法的話可以再次觸發resize。
    for (Entry<K,V> next = e.next;
        if (rehash) { //判斷是否需要重新Hash
          e.hash = null == e.key ? 0 : hash(e.key);
        }
        int i = indexFor(e.hash,newCapacity);
        e.next = newTable[i];
        newTable[i] = e;
        e = next;
      }
    }
  }

3:如何判斷是否需要重新Hash?

/**
   * Initialize the hashing mask value. We defer initialization until we
   * really need it.
   */
  final boolean initHashSeedAsNeeded(int capacity) {

    // hashSeed降低hash碰撞的hash種子,初始值為0
    boolean currentAltHashing = hashSeed != 0;
    //ALTERNATIVE_HASHING_THRESHOLD: 當map的capacity容量大於這個值的時候並滿足其他條件時候進行重新hash
    boolean useAltHashing = sun.misc.VM.isBooted() && (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
    //TODO 異或操作,二者滿足一個條件即可rehash
    boolean switching = currentAltHashing ^ useAltHashing;
    if (switching) {
      // 更新hashseed的值
      hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0;
    }
    return switching;
  }

4:HashMap在多執行緒環境下進行put操作如何導致的死迴圈?

死迴圈產生時機:

當兩個執行緒同時需要進行擴容,而且對雜湊表同一個桶(table[i])進行擴容時候,一個執行緒剛好確定e和next元素之後,執行緒被掛起。此時另一個執行緒得到cpu並順利對該桶完成轉移(需要要求被轉移之後的執行緒1中的e和next指的元素在新雜湊表的同一個桶中,此時e和next被逆序了)。接著執行緒從掛起恢復回來時候就會陷入死迴圈中。參考:https://coolshell.cn/articles/9606.html

產生原因:主要由於併發操作,對用一個桶的兩個節點構成了環,導致對環進行無法轉移完畢元素陷入死迴圈。

Java HashMap原始碼及併發環境常見問題解決

以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支援我們。