Scala中HashSet的實現原理詳解

阿新 • • 發佈：2019-02-05

HashSet是平時常用到的資料結構之一，其保證元素是不重複的。
本文將用一個簡單的例子來解釋下scala語言中HashSet內部的工作原理，看下add和remove到底是怎樣工作的。

用法示例

   val s = mutable.HashSet[String]()
    s.add("a")
    s.add("a")
    s.add("b")
    println(s)

HashSet實現原理

1、add方法

我們看一下add的程式碼實現：

    protected def addElem(elem: A) : Boolean = {
      addEntry(elemToEntry(elem))
    }

    /**
         * Elems have type A, but we store AnyRef in 
 the table. Plus we need to deal with
         * null elems, which need to be stored as NullSentinel
         */
        protected final def elemToEntry(elem : A) : AnyRef =
          if (null == elem) NullSentinel else elem.asInstanceOf[AnyRef]


     protected def addEntry(newEntry : AnyRef) : Boolean = {
         var h = index(newEntry.hashCode)
         var curEntry = table(h)
         while 
 (null != curEntry) {
           if (curEntry == newEntry) return false
           h = (h + 1) % table.length
           curEntry = table(h)
           //Statistics.collisions += 1
         }
         table(h) = newEntry
         tableSize = tableSize + 1
         nnSizeMapAdd(h)
         if (tableSize >= threshold) growTable()
         true

       }

首先需要根據元素NewEntry計算出table陣列的下標h，
然後找到table中的第h元素e，
當e為空時直接新增元素到table[h]；
當e不為空並且e和要新增的元素相同時, 說明已存在，返回false；
當e不為空且和新增的元素不同時,每次使h加一(達到上限則從0開始)直到table[h]為空，將NewEntry新增至此，

擴容的方法和HashMap基本相同，
當HashSet中的元素個數超過陣列大小threshold時，
就會進行陣列擴容，threshold的預設值為table大小的0.75，這是一個折中的取值。
也就是說，預設情況下，陣列大小為16，那麼當HashSet中元素個數超過16*0.75=12的時候，
就把陣列的大小擴充套件為 2*16=32，即擴大一倍，然後重新計算每個元素在陣列中的位置，而這是一個非常消耗效能的操作，

 private def growTable() {
     val oldtable = table
     table = new Array[AnyRef](table.length * 2)
     tableSize = 0
     nnSizeMapReset(table.length)
     seedvalue = tableSizeSeed
     threshold = newThreshold(_loadFactor, table.length)
     var i = 0
     while (i < oldtable.length) {
       val entry = oldtable(i)
       if (null != entry) addEntry(entry)
       i += 1
     }
     if (tableDebug) checkConsistent()
   }

2、remove方法

理解了add操作則remove方法就會簡單得多。

  /**
     * Removes an elem from the hash table returning true if the element was found (and thus removed)
     * or false if it didn't exist.
     */
    protected def removeElem(elem: A) : Boolean = {
      if (tableDebug) checkConsistent()
      def precedes(i: Int, j: Int) = {
        val d = table.length >> 1
        if (i <= j) j - i < d
        else i - j > d
      }
      val removalEntry = elemToEntry(elem)
      var h = index(removalEntry.hashCode)
      var curEntry = table(h)
      while (null != curEntry) {
        if (curEntry == removalEntry) {
          var h0 = h
          var h1 = (h0 + 1) % table.length
          while (null != table(h1)) {
            val h2 = index(table(h1).hashCode)
            //Console.println("shift at "+h1+":"+table(h1)+" with h2 = "+h2+"? "+(h2 != h1)+precedes(h2, h0)+table.length)
            if (h2 != h1 && precedes(h2, h0)) {
              //Console.println("shift "+h1+" to "+h0+"!")
              table(h0) = table(h1)
              h0 = h1
            }
            h1 = (h1 + 1) % table.length
          }
          table(h0) = null
          tableSize -= 1
          nnSizeMapRemove(h0)
          if (tableDebug) checkConsistent()
          return true
        }
        h = (h + 1) % table.length
        curEntry = table(h)
      }
      false
    }

對於要刪除的元素removalEntry，首先計算其雜湊值得到table的下標h，
然後如果table[h]為空，返回false，否則
比較table[h]和removalEntry是否相同，相同則刪除，不相同則h逐一遞增，直到table[h]為空返回false。

總結

在java中HashSet與TreeSet都是基於Set介面的實現類。
其中TreeSet是Set的子介面SortedSet的實現類。Set介面及其子介面、實現類的結構如下所示：

                  |——SortedSet介面——TreeSet實現類

  Set介面——|——HashSet實現類                

                  |——LinkedHashSet實現類

HashSet有以下特點
 不能保證元素的排列順序，順序有可能發生變化
 不是同步的
 集合元素可以是null,但只能放入一個null

TreeSet型別是J2SE中唯一可實現自動排序的型別

TreeSet是SortedSet介面的唯一實現類，TreeSet可以確保集合元素處於排序狀態。
TreeSet支援兩種排序方式，自然排序和定製排序，其中自然排序為預設的排序方式。
向 TreeSet中加入的應該是同一個類的物件。

LinkedHashSet集合同樣是根據元素的hashCode值來決定元素的儲存位置，但是它同時使用連結串列維護元素的次序。
這樣使得元素看起來像是以插入順序儲存的，也就是說，當遍歷該集合時候，
LinkedHashSet將會以元素的新增順序訪問集合的元素。
LinkedHashSet在迭代訪問Set中的全部元素時，效能比HashSet好，但是插入時效能稍微遜色於HashSet。

Scala中HashSet的實現原理詳解

用法示例

HashSet實現原理

1、add方法

2、remove方法

總結

Scala中HashSet的實現原理詳解

Java HashSet的實現原理詳解

Python Web開發中，WSGI協議的作用和實現原理詳解

String類在記憶體中實現原理詳解

反射和多型實現原理詳解

Java LinkedList的實現原理詳解

HashMap底層實現原理詳解（轉載）

golang net/http包部分實現原理詳解

word2vec 中的數學原理詳解

word2vec 中的數學原理詳解（五）基於 Negative Sampling 的模型

幾種壓縮演算法實現原理詳解

影象處理中的數學原理詳解17——卷積定理及其證明

word2vec 中的數學原理詳解（二）預備知識

棧中函式呼叫原理詳解

影象處理中的數學原理詳解11——線性空間

word2vec 中的數學原理詳解（六）若干原始碼細節

Spring Aop之Cglib實現原理詳解

幾種主流貼圖壓縮演算法的實現原理詳解

word2vec 中的數學原理詳解（三）背景知識

word2vec 中的數學原理詳解（四）基於 Hierarchical Softmax 的模型

Scala中HashSet的實現原理詳解

用法示例

HashSet實現原理

1、add方法

2、remove方法

總結

相關推薦