JDK原始碼分析-HashMap(2)

阿新 • • 發佈：2019-12-31

前文「JDK原始碼分析-HashMap(1)」分析了 HashMap 的內部結構和主要方法的實現原理。但是，面試中通常還會問到很多其他的問題，本文簡要分析下常見的一些問題。

這裡再貼一下 HashMap 內部的結構圖（JDK 1.8）：

FAQ:

Q1: HashMap 是否執行緒安全？為什麼？

首先 HashMap 是執行緒不安全的。這一點很多人應該都瞭解，HashMap 原始碼中也有說明。但是為什麼說不安全？體現在哪裡呢？下面通過兩個例子簡要進行分析（可能不夠全面，僅做參考）。

case 1：

執行緒 T1 執行 put / remove 等結構性修改（structural modification

）的操作；執行緒 T2 執行遍歷操作，這種情況下會丟擲 ConcurrentModificationException.

示例程式碼（以 put 為例）：

private static void test() { Map<Integer,Integer> map = new HashMap<>(); Thread t1 = new Thread(() -> { for (int i = 0; i < 5000; i++) { map.put(i,i); } }); Thread t2 = new Thread(() -> {

for (Map.Entry<Integer,Integer> entry : map.entrySet()) { System.out.println(entry); t1.start(); t2.start();}// 執行結果：// 丟擲 java.util.ConcurrentModificationException

原因在這裡：

if (modCount != expectedModCount) throw new ConcurrentModificationException();

HashMap 的迭代器和集合檢視中，都會對該值進行比較。目的是判斷是否有其他執行緒正在對該 HashMap 進行結構性修改，若有則拋會出異常。

PS: 細心閱讀 HashMap 原始碼的話可以發現，結構性修改的方法中都會有如下一行程式碼：

++modCount;

該值就是用來記錄結構性修改的次數。

case 2:

執行緒 T1 和 T2 同時執行 put / remove 等結構性修改（structural modification）的操作。以 put 方法為例分析，會發生元素覆蓋。

示例程式碼：

private static void test() throws InterruptedException { for (int i = 5000; i < 10000; i++) { TimeUnit.SECONDS.sleep(20); System.out.println(map); System.out.println("size: " + map.size());// 輸出結果：// {8192=8192,8193=8193,8194=8194,8195=8195,...// size: 9666// PS: 這是某一次的結果，多次測試的具體結果可能不同，但基本都沒有 size=10000 的情況。

這裡問題出在 put 方法上，通過前文分析 HashMap 中 put 方法的內部實現原理可以找出原因，這裡不再贅述。

這裡再說一句，HashMap 的設計就是為了單執行緒下的高效率，瞭解執行緒不安全是為了加深對它的理解，知道在哪些情況不適合使用，若有執行緒安全的需求，可以考慮使用 ConcurrentHashMap。

Q2: 連結串列和紅黑樹的轉換閾值為什麼是 8 和 6 ？

首先分析下為什麼會有連結串列和紅黑樹。理想情況下，HashMap 中每個 bin 所在位置只有一個節點，這樣查詢效率最高，為 O(1)。而拉出一個連結串列、或者把連結串列再轉為紅黑樹，是在雜湊衝突比較嚴重時的一種應對措施，目的是為了讓 HashMap 在極端情況下仍然能夠保持較高的效率。

至於為什麼是 8，HashMap 的部分 Implementation notes 如下:

/* Because TreeNodes are about twice the size of regular nodes,we * use them> * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes,tree bins are * rarely used. Ideally,under random hashCodes,the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) with a * parameter of about 0.5> * threshold of 0.75,although with a large variance because of * resizing granularity. Ignoring variance,the expected * occurrences of list size k are (exp(-0.5) * pow(0.5,k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * more: less than 1 in ten million */

由於 TreeNode 的大小是普通節點（Node）的兩倍，因此只有當 bin 中包含足夠多（即樹化的閾值 TREEIFY_THRESHOLD）的節點時才會轉為 TreeNode；而當 bin 中節點減少時（刪除節點或擴容），又會把紅黑樹再轉為連結串列。

hashCode 均勻分佈時，TreeNode 用到的機會很小。理想情況下，在隨機分佈的 hashCode 下，bin 中節點的分佈遵循泊松分佈，並列出了幾個資料，可以看到一個 bin 中連結串列長度達到 8 的概率（0.00000006）不足千萬分之一，因此將轉換的閾值設為 8.

兩個轉換閾值及其說明如下：

/** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */static final int TREEIFY_THRESHOLD = 8; * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD,and at * most 6 to mesh with shrinkage detection under removal.static final int UNTREEIFY_THRESHOLD = 6;

至於將紅黑樹轉為連結串列的閾值為 6，網上有說法是為了避免頻繁轉換。比如，若紅黑樹轉為連結串列的閾值也是 8，如果一個 HashMap 不停地進行插入和刪除元素，連結串列的個數一直在 8 左右，這種情況會頻繁地進行樹和連結串列的相互轉換，效率很低。

這樣解釋似乎也有些道理，各位可以去探索。

Q3: 為什麼負載因子是 0.75？

JDK 1.7 中的相關描述：

/* As a general rule,the default load factor (.75) offers a good tradeoff * between time and space costs. Higher values decrease the space overhead * but increase the lookup cost (reflected in most of the operations of the * <tt>HashMap</tt> class,including <tt>get</tt> and <tt>put</tt>). public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; return h;}

PS: 上述問題是本人從網上搜索後整理和思考的結果，僅做參考，並不一定完全準確（要持有懷疑態度）。有關 HashMap 的問題可能還有很多，這裡不再一一列舉。

參考連結：

https://www.jianshu.com/p/7af5bb1b57e2

JDK原始碼分析-HashMap(2)

JDK原始碼分析-HashMap(2)

JDK原始碼分析-AbstractQueuedSynchronizer(2)

【JDK原始碼】HashMap原始碼分析

JDK原始碼分析-LinkedList

JDK原始碼分析-Vector

JDK原始碼分析-ArrayList

JDK原始碼分析-AbstractQueuedSynchronizer(1)

JDK原始碼分析-CountDownLatch

JDK原始碼分析-AbstractQueuedSynchronizer(3)

JDK原始碼分析-Hashtable

JDK原始碼分析-LinkedBlockingQueue

Redis原始碼分析--Sentinel(2)例項處理的Monitor half

Redis原始碼分析--伺服器(2)執行命令的過程

kube-scheduler原始碼分析（2）-核心處理邏輯分析

Mybatis原始碼分析（2）

分析HashMap 的 JDK 原始碼

HashMap原始碼分析（jdk 8）

設計模式【3.2】-- JDK動態代理原始碼分析有多香？

dubbo原始碼分析2（jdk原生spi機制）

JDK原始碼學習筆記——HashMap

JDK原始碼分析-HashMap(2)

相關推薦