HashMap底層陣列長度與位運算

阿新 • • 發佈：2021-01-07

HashMap資料結構

看過jdk中HashMap原始碼的同學都知道他的底層資料結構是陣列+連結串列
並且jdk1.8做了優化，當連結串列長度大於8時會採用紅黑樹
形如下面兩張結構圖

jdk1.8之前hashmap結構圖

這不是這篇文章的重點，我的目的是搞清楚閱讀原始碼一直以來的一個困惑，具體就是底層原始碼中的位運算。

陣列容量初始化

初始化實際發生在第一次put元素，在resize()中完成

1、不指定initialCapacity

//指定預設負載容量，即容量超過3/4時擴容
public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
 
 static final float DEFAULT_LOAD_FACTOR = 0.75f;
 //預設初始容量
 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
 
 
final Node<K,V>[] resize() {
    ...
    
    else {               
           // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        
        
return newTab;
    
}

2、指定initialCapacity


public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

//初始化核心方法
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}


final Node<K,V>[] resize() {
    ...
    int oldThr = threshold;
    ...
    
    else if (oldThr > 0) 
    // initial capacity was placed in threshold
    newCap = oldThr;
    ...  
    return newTab;
}

分析tableSizeFor方法

cap=0 n=-1 通過位運算仍然為-1 方法返回1
cap=1 n=0 通過位運算仍然為-1方法返回也是1
cap>1 n=cap - 1>0 那麼二進位制的 n 至少有一個 bit 為 1

cap = 17
n = cap-1 = 16  原碼=反碼=補碼 =  0001 0000

n |= n >>> 1  
n >>> 1 = 0000 1000
n= 0001 0000|0000 1000 = 0001 1000     24

n |= n >>> 2 
n >>> 2 = 0000 0110
n= 0001 1000|0000 0110 = 0001 1110     30

n |= n >>> 4 
n >>> 4 = 0000 0001
n= 0001 1110|0000 0001 = 0001 1111     31

n |= n >>> 8
n >>> 8 = 0000 0000
n = 00001 1111|0000 0000 = 0001 1111     31

n |= n >>> 16
n >>> 16 = 0000 0000
n = 00001 1111|0000 0000 = 0001 1111     31

返回 n + 1 = 31+1 = 2^4+2^3+2^2+2^1+2^0 +1 = 2^5=32

位運算的目的是為了將第一個位值1後面的所有位都置換為1最終

n = (省略0) 1...111

最終返回 n + 1 符合數學定理

2^n  = 2^(n-1)+2^(n-2)+...+2^0 + 1

是>=cap 最小2的n次冪

另外resize方法真正擴容時,採取容量翻倍策略

final Node<K,V>[] resize() {
    ...
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //newCap = oldCap << 1  = oldCap x 2
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    ...

｝

所以HashMap底層陣列初始化和擴容，陣列長度都是2次冪

底層位運算(n - 1) & hash

HashMap底層多個方法用到了這個位運算

final Node<K,V> getNode(int hash, Object key)

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict)
                   
final void treeifyBin(Node<K,V>[] tab, int hash)

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable)
                               
public V computeIfAbsent(K key,
                             Function<? super K, ? extends V> mappingFunction)
                             
public V compute(K key,
                     BiFunction<? super K, ? super V, ? extends V> remappingFunction)
                     
public V merge(K key, V value,
                   BiFunction<? super V, ? super V, ? extends V> remappingFunction)
                   
final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab,
                                  boolean movable)
                                  
//類似
int n;
n = tab.length;
int index = (n - 1) & hash;
//或者
n = tab.length;
tab[i = (n - 1) & hash]

目的都是為了確定元素在陣列中的位置，分析(n - 1) & hash

前文可知陣列 
n = tab.length = 2^m
n-1 = 2^m -1 = 2^(m-1)+2^(m-2)+...+2^0
二進位制h換算
n-1= (省略0)11111111

n-1&hash
根據按位與運算的特性 即取二進位制hash值的低m位

例如n = 16 = 2^4
n-1 = 16-1 = 2^3+2^2+2^1+2^0 = 0000 1111
(n-1) & hash = 0000 1111 & hash 即取二進位制hash值的低4位

無論hash值是多少 
其結果範圍只能是  [0000 0000 , 0000 1111] = [0,15]
正好對應陣列的index

所以HashMap陣列長度是2次冪，將可以很方便的與Key的hash值運算出元素在陣列中的位置，非常巧妙。