Symbol Table（符號表）

阿新 • • 發佈：2017-05-07

-- 預測 smallest 是否性能分析 .cn 變量不能級別

一、定義

符號表是一種存儲鍵值對的數據結構並且支持兩種操作：將新的鍵值對插入符號表中（insert）；根據給定的鍵值查找對應的值（search）。

二、API

1、無序符號表

技術分享

幾個設計決策：

A、泛型

在設計方法時沒有指定處理對象，而是使用了泛型。

並且將鍵（Key）和值（Value）區分開來。

B、重復鍵的處理

規則：

每個值（Value）都只對應一個鍵（Key）（即不允許存在重復的鍵）。

當用例代碼向表中存入的鍵值對和表中的已有的鍵（以及關聯的值）沖突時，新的值替代舊的值。

C、Null 鍵

鍵不能為空，否則會拋出空指針異常。

D、Null 值

同樣規定值不能為Null，因為API中get函數，如果鍵不存在會返回null，如果在表中的值可以為null的話，就會產生混淆。

這個規定產生兩個結果：

可以用get()方法是否返回null來判斷給定的key是否存在表中。

可以用put(key, null)來實現刪除。

E、便捷方法

可以由其他函數來實現，如：

    //shorthand methods
    public boolean contains(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function contains");
        
        return get(key) != null 
;
    }
    
    //shorthand methods
    public boolean isEmpty() {
        return size() == 0;
    }

F、鍵的等價性

對象的equals方法。

最好用不可變的數據類型作為鍵，否則表的一致性是不可保證的。

2、有序符號表

技術分享

對於實現了Comparable的鍵，符號表可利用這一特性來保持鍵的有序性。

這就大大拓展了API，根據鍵的相對位置定義更多使用的操作。這種表就叫有序符號表。

只要符號表出現了泛型變量Key extends Comparable<Key>，那麽這個符號表就實現了有序性。

A、最大鍵和最小鍵

在有序符號表中有獲取最大（最小）鍵操作和刪除最大（最小）鍵操作。

之前的Priority Queue也有類似的操作，主要區別在優先隊列中可以存在重復的鍵而符號表不行。

B、向上取整和向下取整

向下取整：floor, find the largest key that is less than or equal to the given key.

向上取整：ceiling, find the smallest key that is greater than or equal to the given key.

C、排名和選擇

rank(key), 小於key的鍵的個數。

select(k), 返回第k大的鍵，即符號表中有k個鍵小於返回的鍵（k的取值範圍為0到st.size）。

i == rank(select(i))，key = select(rank(key))

上述兩個等式可以更好的理解這兩個操作。

D、範圍查找

對於給定的範圍有多少個鍵在這個範圍，或者哪些鍵在這些範圍？

public int size(Key lo, Key hi)

public Iterable<Key> keys(Key lo, Key hi)

這兩個函數實現了上述操作。

E、異常情況

當方法需要返回一個鍵，但符號表沒有符合的鍵時，拋出一個異常。

或者返回null。

F、便捷方法

可以由其他函數來實現，如：

　　//shorthand methods
    public void deleteMin() {
        if(isEmpty())
            throw new NoSuchElementException("underflow error");
        delete(min());
    }
    
    //shorthand methods
    public void deleteMax() {
        if(isEmpty())
            throw new NoSuchElementException("underflow error");
        delete(max());
    }
    
    //shorthand methods
    public int size(Key lo, Key hi) {
        
        if (lo == null)
            throw new IllegalArgumentException("first argument to size() is null"); 
        if (hi == null)
            throw new IllegalArgumentException("second argument to size() is null"); 
        
        if(hi.compareTo(lo) < 0)
            return 0;
        else if(contains(hi))
            return rank(hi) - rank(lo) + 1;
        else 
            return rank(hi) - rank(lo);
    }
    
    //shorthand methods
    public Iterable<Key> keys() {
        if(isEmpty())
            return new LinkedList<>();
        return keys(min(), max());
    }

G、鍵的等價性

任何一個Comparable類型的兩個值a和b都要保證（a.compareTo(b) == 0）和a.equals(b)的返回值相等。

為了避免二義性，在有序符號表中只使用compareTo()方法來比較兩個鍵，即a.compareTo(b) == 0來表示a和b是否相等。

H、成本模型

無論是用a.compareTo(b) == 0還是用a.equals(b)，都用比較來表示一個符號表條目和一個被查找的鍵的比較操作。

如果比較操作不在內循環，則統計數組的訪問次數。

三、測試用例

兩個用例：一個用來跟蹤在小規模輸入下的行為測試用例，一個用來尋找更高效實現的性能測試用例。

1、Test Client

    public static void main(String[] args) {
        
        ST<String, Integer> st = new ST<String, Integer>();
        
        for(int i = 0; !StdIn.isEmpty(); i++) {
            
            String key = StdIn.readString();
            
            if(key.equals("-")) {
                key = StdIn.readString();
                st.delete(key);
                StdOut.println("delete " + key);
            } else {
                st.put(key, i);
                StdOut.println("put" + " " + key + " " + i);
            }
            
            StdOut.print(st.size() + " key-value pairs:");
            for(String s : st.keys()) 
                StdOut.print(" " + s + " " + st.get(s));
            StdOut.println();
            
        }
        
    }

跟課本的有所不同，加上了delete操作，更好的觀測行為。

2、Performance Client

    public static void main(String[] args) {
        
        int minlen = Integer.parseInt(args[0]);
        SequentialSearchST<String, Integer> st = new SequentialSearchST<>();
        
        //build symbol table and count frequencies
        while(!StdIn.isEmpty()) {
            String word = StdIn.readString();
            if(word.length() < minlen)
                continue;//ignore short keys
            Integer freq = st.get(word);
            if(freq == null)
                st.put(word, 1);
            else
                st.put(word, freq + 1);
        }
        
        //find a key with the highest frequency count
        String max = "";
        st.put(max, 0);
        for(String word : st.keys())
            if(st.get(word) > st.get(max))
                max = word;
        StdOut.println(max + " " + st.get(max));
        
    }

這個是對課本的用例進行改進後的，消除對contains的調用。

本章一直用這個用例來進行性能測試。用例特點：

查找（search）和插入（insert）操作交叉進行。

大量不同的鍵。

查找（search）比插入（insert）操作多的多。

雖然不可預測，查找和插入的模式並非隨機。

四、初級實現

1、無序鏈表中的順序查找

順序查找用的是數據結構是鏈表，每個節點存儲一個鍵值對。

get（）的實現為遍歷鏈表，用equals函數比較尋找的鍵和節點中的鍵。如果有匹配的鍵，則返回相應的值。

否則，返回null。

    //search for key, return associated value
    public Value get(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function get");
        
        for(Node current = first; current != null; current = current.next) {
            if(key.equals(current.key))
                return current.val;//search hit
        }
        
        return null;//search miss
    }

put（）的實現也是遍歷鏈表，如果有匹配的鍵，則更新相應的值，否則，在鏈表頭插入新的節點。

    public void put(Key key, Value val) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function put");
        
        if(val == null) {
            delete(key);
            return;
        }
        
        for(Node current = first; current != null; current = current.next) {
            if(key.equals(current.key)) {
                current.val = val;//search hit: update val
                return;
            }
        }
        
        first = new Node(key, val, first);//search miss: add new node.
        n++;
        
    }

delete（）實現：

    public void delete(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function delete");
        
        Node fakehead = new Node(null, null, first);
        
        for(Node prev = fakehead; prev.next != null; prev = prev.next) {
            if(key.equals(prev.next.key)) {
                prev.next = prev.next.next;
                n--;
                break;
            }
        }
        
        first = fakehead.next;
        
    }

整個代碼：

package com.qiusongde;

import java.util.LinkedList;
import java.util.Queue;

import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;

public class SequentialSearchST<Key, Value> {
    
    private Node first;
    private int n;
    
    public SequentialSearchST() {
        first = null;
        n = 0;
    }
    
    public void put(Key key, Value val) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function put");
        
        if(val == null) {
            delete(key);
            return;
        }
        
        for(Node current = first; current != null; current = current.next) {
            if(key.equals(current.key)) {
                current.val = val;//search hit: update val
                return;
            }
        }
        
        first = new Node(key, val, first);//search miss: add new node.
        n++;
        
    }
    
    //search for key, return associated value
    public Value get(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function get");
        
        for(Node current = first; current != null; current = current.next) {
            if(key.equals(current.key))
                return current.val;//search hit
        }
        
        return null;//search miss
    }
    
    public void delete(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function delete");
        
        Node fakehead = new Node(null, null, first);
        
        for(Node prev = fakehead; prev.next != null; prev = prev.next) {
            if(key.equals(prev.next.key)) {
                prev.next = prev.next.next;
                n--;
                break;
            }
        }
        
        first = fakehead.next;
        
    }
    
    public Iterable<Key> keys() {
        
        Queue<Key> queue = new LinkedList<>();
        
        for(Node cur = first; cur != null; cur = cur.next) {
            queue.add(cur.key);
        }
        
        return queue;
        
    }
    
    //shorthand methods
    public boolean contains(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function contains");
        
        return get(key) != null;
    }
    
    //shorthand methods
    public boolean isEmpty() {
        return size() == 0;
    }
    
    public int size() {
        return n;
    }
    
    //inner class
    private class Node {
        
        Key key;
        Value val;
        Node next;
        
        public Node(Key key, Value val, Node next) {
            this.key = key;
            this.val = val;
            this.next = next;
        }
        
    }
    
    public static void main(String[] args) {
        
        SequentialSearchST<String, Integer> st = new SequentialSearchST<String, Integer>();
        
        for(int i = 0; !StdIn.isEmpty(); i++) {
            
            String key = StdIn.readString();
            
            if(key.equals("-")) {
                key = StdIn.readString();
                st.delete(key);
                StdOut.println("delete " + key);
            } else {
                st.put(key, i);
                StdOut.println("put" + " " + key + " " + i);
            }
            
            StdOut.print(st.size() + " key-value pairs:");
            for(String s : st.keys()) 
                StdOut.print(" " + s + " " + st.get(s));
            StdOut.println();
            
        }
        
    }
    
}

測試數據：

A
B
C
D
-
B
C
E
F
-
A
B
-
B
-
F
-
D

Test Client的輸出結果：

put A 0
1 key-value pairs: A 0
put B 1
2 key-value pairs: B 1 A 0
put C 2
3 key-value pairs: C 2 B 1 A 0
put D 3
4 key-value pairs: D 3 C 2 B 1 A 0
delete B
3 key-value pairs: D 3 C 2 A 0
put C 5
3 key-value pairs: D 3 C 5 A 0
put E 6
4 key-value pairs: E 6 D 3 C 5 A 0
put F 7
5 key-value pairs: F 7 E 6 D 3 C 5 A 0
delete A
4 key-value pairs: F 7 E 6 D 3 C 5
put B 9
5 key-value pairs: B 9 F 7 E 6 D 3 C 5
delete B
4 key-value pairs: F 7 E 6 D 3 C 5
delete F
3 key-value pairs: E 6 D 3 C 5
delete D
2 key-value pairs: E 6 C 5

2、無序鏈表符號表性能分析

結論1：在含有N個鍵值對的基於無序鏈表的符號表中，未命中的查找（search miss）和插入（insert）操作都需要N次比較。命中的查找（search hit）在最壞的情況下需要N次比較。

推論：向一個空表中插入N個不同的鍵需要~N²/2次比較。

隨機查找命中，即在符號表中查找每個鍵的可能性是相同的，所以平均比較次數是（1+2+……+N）/N = （N+1）/2 ~ N/2

3、有序數組中的二分查找

采用的數據結構是一對平行的數組，一個用於存儲鍵，一個用於存儲值。

算法需要保持數組中的Comparable類型的鍵有序，然後使用數組的索引來高效的實現其他操作。

這份實現的核心是rank方法：

　　 //return the number of keys in the table that are smaller than key
    //the heart method
    public int rank(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function rank");
        
        int lo = 0, hi = n - 1;
        while(lo <= hi) {
            int mid = lo + (hi - lo)/2;
            int cmp = key.compareTo(keys[mid]);
            
            if(cmp < 0) {
                hi = mid - 1;
            }
            else if(cmp > 0) {
                lo = mid + 1;
            }
            else {
                return mid;
            }
        }
        
        return lo;
    }

對於get方法，如果鍵在數組中，調用rank方法即可知道鍵在數組中的位置。否則不存在，返回null。

    public Value get(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function get");
        
        if(isEmpty())
            return null;
        
        int k = rank(key);
        if(k < n && keys[k].compareTo(key) == 0)
            return vals[k];
        else
            return null;
        
    }

對於put方法，如果鍵在數組中，調用rank方法即可知道更新該鍵值的位置，或者插入的位置。

插入的時候，需要將後邊的鍵都往後移動一個位置（對於大數組，移動的開銷將會非常大）。

    public void put(Key key, Value val) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function put");
        
        if(val == null) {
            delete(key);
            return;
        }
        
        //it works when ST is empty(n = 0)
        int k = rank(key);
        if(k < n && keys[k].compareTo(key) == 0) {
            vals[k] = val;//key is already in symbol table
            return;
        }
        
        //move
        for(int j = n; j > k; j--) {
            keys[j] = keys[j-1];
            vals[j] = vals[j-1];
        }
        
        keys[k] = key;
        vals[k] = val;
        n++;
    }

delete操作：其中對於大數組，移動的開銷也會非常大。

    public void delete(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function delete");
        
        if(isEmpty())
            return;
        
        int k = rank(key);
        
        if(k < n && keys[k].compareTo(key) == 0) {//key is in symbol table
            
            //move
            for(int j = k; j < n - 1; j++) {
                keys[j] = keys[j+1];
                vals[j] = vals[j+1];
            }
            
            n--;
            
            keys[n] = null;// to avoid loitering
            vals[n] = null;
            
        }
        
    }

其余操作：

    public Key min() {
        if(isEmpty())
            return null;
        return keys[0];
    }
    
    public Key max() {
        if(isEmpty())
            return null;
        return keys[n -1];
    }
    
    public Key select(int k) {
        if(k < 0 || k >= n)
            return null;
        return keys[k];
    }
    
    public Key ceiling(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function ceiling");
        
        if(isEmpty())
            return null;
        
        int k = rank(key);
        if(k == n)
            return null;
        else
            return keys[k];
        
    }
    
    public Key floor(Key key) {
        
        if(key == null)
            throw new IllegalArgumentException("key is null in function floor");
        
        if(isEmpty())
            return null;
        
        int k = rank(key);
        if(k < n && keys[k].compareTo(key) == 0)
            return keys[k];
        else if(k == 0)
            return null;
        else
            return keys[k-1];
        
    }
    
    public Iterable<Key> keys(Key lo, Key hi) {
        
        if(lo == null || hi == null)
            throw new IllegalArgumentException("one of the arguements is null in function keys");
        
        Queue <Key> queue = new LinkedList<>();
        
        if(lo.compareTo(hi) > 0 || isEmpty())
            return queue;//special case
        
        for(int i = rank(lo); i < rank(hi); i++) {
            queue.add(keys[i]);
        }
        if(contains(hi))
            queue.add(keys[rank(hi)]);
        
        return queue;
        
    }
    
    //shorthand methods
    public boolean isEmpty() {
        return size() == 0;
    }
    
    //shorthand methods
    public boolean contains(Key key) {
        return get(key) != null;
    }
    
    public int size() {
        return n;
    }
    
    //shorthand methods
    public void deleteMin() {
        if(isEmpty())
            throw new NoSuchElementException("underflow error");
        delete(min());
    }
    
    //shorthand methods
    public void deleteMax() {
        if(isEmpty())
            throw new NoSuchElementException("underflow error");
        delete(max());
    }
    
    //shorthand methods
    public int size(Key lo, Key hi) {
        
        if (lo == null)
            throw new IllegalArgumentException("first argument to size() is null"); 
        if (hi == null)
            throw new IllegalArgumentException("second argument to size() is null"); 
        
        if(hi.compareTo(lo) < 0)
            return 0;
        else if(contains(hi))
            return rank(hi) - rank(lo) + 1;
        else 
            return rank(hi) - rank(lo);
    }
    
    //shorthand methods
    public Iterable<Key> keys() {
        if(isEmpty())
            return new LinkedList<>();
        return keys(min(), max());
    }

Test Client測試結果（數據同上）：

put A 0
1 key-value pairs: A 0
put B 1
2 key-value pairs: A 0 B 1
put C 2
3 key-value pairs: A 0 B 1 C 2
put D 3
4 key-value pairs: A 0 B 1 C 2 D 3
delete B
3 key-value pairs: A 0 C 2 D 3
put C 5
3 key-value pairs: A 0 C 5 D 3
put E 6
4 key-value pairs: A 0 C 5 D 3 E 6
put F 7
5 key-value pairs: A 0 C 5 D 3 E 6 F 7
delete A
4 key-value pairs: C 5 D 3 E 6 F 7
put B 9
5 key-value pairs: B 9 C 5 D 3 E 6 F 7
delete B
4 key-value pairs: C 5 D 3 E 6 F 7
delete F
3 key-value pairs: C 5 D 3 E 6
delete D
2 key-value pairs: C 5 E 6

4、二分查找性能分析

結論1：在N個鍵的有序數組中進行二分查找最多需要logN + 1次比較（無論是否成功）。

但是put和delete這兩個方法太慢了。

結論2：向大小為N的數組中插入一個新元素，在最壞的情況下需要訪問~2N次數組。因此向一個空的符號表中插入N個元素，在最壞的情況下需要訪問~N²次數組。

五、總結

二分查找法適用於靜態表（不允許插入），在初始化的時候就進行排序。

但是二分查找法不適用於查找和插入操作是混合進行的，而且符號表非常大的情況。

目前很多應用都需要同時支持高效的查找和插入兩種操作。

如何保證查找和插入操作都是對數級別的算法和數據結構？

首先，要支持高效的插入操作，需要一種鏈式結構；但是單鏈接的鏈表是不能支持二分查找法的。

為了將二分查找法的效率和鏈表的靈活性結合起來，需要更加復雜的數據結構，二叉查找樹。

Symbol Table（符號表）

-- 預測 smallest 是否性能分析 .cn 變量不能級別一、定義符號表是一種存儲鍵值對的數據結構並且支持兩種操作：將新的鍵值對插入符號表中（insert）；根據給定的鍵值查找對應的值（search）。二、API 1、無序符號表幾個設計決策： A、

Symbol Table（符號表）

一、定義

二、API

1、無序符號表

A、泛型

B、重復鍵的處理

C、Null 鍵

D、Null 值

E、便捷方法

F、鍵的等價性

2、有序符號表

A、最大鍵和最小鍵

B、向上取整和向下取整

C、排名和選擇

D、範圍查找

E、異常情況

F、便捷方法

G、鍵的等價性

H、成本模型

三、測試用例

1、Test Client

2、Performance Client

四、初級實現

1、無序鏈表中的順序查找

2、無序鏈表符號表性能分析

3、有序數組中的二分查找

4、二分查找性能分析

五、總結

相關推薦