hashmap的hash方法源doc解讀
/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds.*/ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
上次在面試中被問及一個問題:如果直接拿key的記憶體地址的long值與table的長度做取餘操作(%),有什麼不好?
我做了一番研究。
first = tab[(n - 1) & hash]
首先,在計算一個key在table中的位置時,用的是table的長度減1,與hash值取位與的結果。而不是取餘(%)操作。
如果一個table的長度為8,那麼n=8 (1000),n-1=7 (111),如果hash是什麼值,取and的結果一定是000 ~ 111 之間,即0-7,正好對應table的index的範圍。
註釋中寫道,Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide.
翻譯過來就是:table的長度總是2的n次冪,如果一組hash值只是在(111....1111)之上的高位互相不同,那麼它們與(n-1) 位與 的結果總會碰撞。
一句話概括就是,key只有與(n-1)低位為1的長度相同位參與了hash碰撞的計算,高位沒有體現出來。
JDK作者的解決方案是:(h = key.hashCode()) ^ (h >>> 16), JDK的doc中一開始說: spread higher bits of hash to lower
將高位的影響傳播到低位,這樣與(n-1)位與的計算,高低位就同時參與了。
我們都知道,一個int值是32位的,hash >>> 16 的含義就是右移16位,左邊以0補齊。移位的結果是,低16位被拋棄,原高16位變成新低16位,新高16位用0補充。
0與0異或是0,0與1異或是1,即一個bit與0異或結果不變。 所以,hash xor (hash >>> 16) 的最終結果是:高16位不變,低16位與高16位異或。
如果 (n-1) 的二進位制表示有16位,那麼 n = 2的16次方 = 65536,hashmap的容量只要不大於65536,都是高低混合之16位在參與碰撞檢測。