1. 程式人生 > >從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求

從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求

一、 發現問題
有這樣一個任務:對一堆學生按照成績進行排序。為了能夠快速的獲得有序結合,我選擇了TreeSet這個有序資料結構來幫我完成這個任務。有兩點讓我認為TreeSet能夠幫我快速獲得有序的學生集合:
(1)TreeSet基於紅黑樹實現,而紅黑樹是一個平衡二叉樹,也就說,它的排序時間複雜度是nlogn
(2)在插入的初期logn較小。
也就是說,我認為使用TreeSet優於在得到全部學生後對全部學生使用一個nlogn的演算法排序。無論如何,我按照的想法實現了。
學生類:

package com.liyuncong.learn.test.sortedset;

public
class Student implements Comparable<Student> { private String studentNumber; private String name; private int score; public String getStudentNumber() { return studentNumber; } public void setStudentNumber(String studentNumber) { this.studentNumber = studentNumber; } public
String getName() { return name; } public void setName(String name) { this.name = name; } public int getScore() { return score; } public void setScore(int score) { this.score = score; } @Override public int hashCode() { final int
prime = 31; int result = 1; result = prime * result + ((name == null) ? 0 : name.hashCode()); result = prime * result + score; result = prime * result + ((studentNumber == null) ? 0 : studentNumber.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Student other = (Student) obj; if (name == null) { if (other.name != null) return false; } else if (!name.equals(other.name)) return false; if (score != other.score) return false; if (studentNumber == null) { if (other.studentNumber != null) return false; } else if (!studentNumber.equals(other.studentNumber)) return false; return true; } /** * Student的自然序 */ @Override public int compareTo(Student o) { return this.studentNumber.compareTo(o.getStudentNumber()); } @Override public String toString() { return "Student [studentNumber=" + studentNumber + ", name=" + name + ", score=" + score + "]"; } }

對學生排序:

package com.liyuncong.learn.test.sortedset;

import java.util.Comparator;
import java.util.TreeSet;

public class SortStudentTest {
    public static void main(String[] args) {
        Student student1 = new Student();
        student1.setStudentNumber("1");
        student1.setName("張三");
        student1.setScore(90);
        Student student2 = new Student();
        student2.setStudentNumber("2");
        student2.setName("李四");
        student2.setScore(80);
        Student student3 = new Student();
        student3.setStudentNumber("3");
        student3.setName("王二麻子");
        student3.setScore(90);

        TreeSet<Student> treeSet = new TreeSet<>(new Comparator<Student>() {

            @Override
            public int compare(Student o1, Student o2) {
                return o1.getScore() - o2.getScore();
            }
        });
        treeSet.add(student3);
        treeSet.add(student2);
        treeSet.add(student1);
        for (Student student : treeSet) {
            System.out.println(student);
        }
    }
}

排序輸出:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
信心滿滿的實現了自己的想法,結果卻有點出乎意料。放進集合三個物件,出來卻只有兩個。
二、找到原因
從程式輸出看到,“張三”沒有被成功新增進去。按照Java Set的規範,只有當集合中已經有某個元素時(通過equal方法判斷),再次新增這個元素才不會被新增;可是,新增“張三”時,集合中並沒有和他相等的元素。為了一探究竟,打算進入原始碼中看看。首先看TreeSet的add方法:

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element {@code e} to this set if
     * the set contains no element {@code e2} such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.
     *
     * @param e element to be added to this set
     * @return {@code true} if this set did not already contain the specified
     *         element
     * @throws ClassCastException if the specified object cannot be compared
     *         with the elements currently in this set
     * @throws NullPointerException if the specified element is null
     *         and this set uses natural ordering, or its comparator
     *         does not permit null elements
     */
    public boolean add(E e) {
        return m.put(e, PRESENT)==null;
    }

從add的方法的註釋中,看到,TreeSet是遵守Set的規範的——通過equal方法判斷重複元素。但這裡沒有具體的實現,繼續看原始碼。add方法是呼叫m的put方法往集合中新增元素。m是什麼?

    /**
     * The backing map.
     */
    private transient NavigableMap<E,Object> m;
        public TreeSet() {
        this(new TreeMap<E,Object>());
    }

原來m是一個TreeMap,TreeSet和HashSet一樣,基於對應的Map實現。現在來看看TreeMap的put方法:

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     *
     * @return the previous value associated with {@code key}, or
     *         {@code null} if there was no mapping for {@code key}.
     *         (A {@code null} return can also indicate that the map
     *         previously associated {@code null} with {@code key}.)
     * @throws ClassCastException if the specified key cannot be compared
     *         with the keys currently in the map
     * @throws NullPointerException if the specified key is null
     *         and this map uses natural ordering, or its comparator
     *         does not permit null keys
     */
    public V put(K key, V value) {
        Entry<K,V> t = root;
        if (t == null) {
            compare(key, key); // type (and possibly null) check

            root = new Entry<>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        // split comparator and comparable paths
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                parent = t;
                cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        else {
            if (key == null)
                throw new NullPointerException();
            @SuppressWarnings("unchecked")
                Comparable<? super K> k = (Comparable<? super K>) key;
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }

原來,是通過Comparator的compare方法(或者Comparable介面的compareTo)判斷元素的相等性。這違背了Set介面的規範,我覺得我發現了Java類庫的一個bug。不過,我得先解決問題。
三、解決問題
知道了問題所在,我只需要對Comparator做個簡單的修改,就能實現最初的目標:

        TreeSet<Student> treeSet2 = new TreeSet<>(new Comparator<Student>() {

            @Override
            public int compare(Student o1, Student o2) {
                int result = o1.getScore() - o2.getScore();
                return result == 0 ? 1 : result;
            }
        });

也就是說,通過Comparator比較的兩個元素永遠不可能相等。再跑一下上面的排序,結果正常了:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
Student [studentNumber=1, name=張三, score=90]
四、進一步思考
問題是解決了,但是還沒完。我可是發現了Java類庫的一個bug。不過,在告訴大家這個bug之前,我得做足準備,進一步確認,免得鬧笑話。於是看了這幾個介面或者類的文件:Collection、Set、SortedSet、NavigableSet、TreeSet、TreeMap、Comparable和Object,因為TreeMap的紅黑樹是基於《演算法導論》中的介紹實現的(TreeMap的一段註釋:Algorithms are adaptations of those in Cormen, Leiserson, and Rivest’s Introduction to Algorithms),所以也簡單複習了一下其中對二叉搜尋樹和紅黑樹的介紹,當然也看了下網上一些部落格對我遇到的問題的介紹。好了,我感覺有有資格來說這件事兒了。
站在TreeSet的add方法的角度來看,這確實是一個bug;但是站在整個SortedSet的角度來看,這只是一個設計缺陷。因為在SortedMap的文件中已經說明了這個問題:

 * <p>Note that the ordering maintained by a sorted set (whether or not an
 * explicit comparator is provided) must be <i>consistent with equals</i> if
 * the sorted set is to correctly implement the <tt>Set</tt> interface.  (See
 * the <tt>Comparable</tt> interface or <tt>Comparator</tt> interface for a
 * precise definition of <i>consistent with equals</i>.)  This is so because
 * the <tt>Set</tt> interface is defined in terms of the <tt>equals</tt>
 * operation, but a sorted set performs all element comparisons using its
 * <tt>compareTo</tt> (or <tt>compare</tt>) method, so two elements that are
 * deemed equal by this method are, from the standpoint of the sorted set,
 * equal.  The behavior of a sorted set <i>is</i> well-defined even if its
 * ordering is inconsistent with equals; it just fails to obey the general
 * contract of the <tt>Set</tt> interface.

“precise definition of consistent with equals”是指:

 * The natural ordering for a class <tt>C</tt> is said to be <i>consistent
 * with equals</i> if and only if <tt>e1.compareTo(e2) == 0</tt> has
 * the same boolean value as <tt>e1.equals(e2)</tt> 

TreeMap的註釋聲明瞭,如果“不一致”,會違背Set的規範,具體點說,就是會違背通過equals方法判斷重複物件的規範。文件已經說明了,所以,上面遇到的問題不能認為是一個bug。但是可以像《effective Java》中提到的一些點一樣,我認為這是一個設計缺陷,得出這個結論是基於下面三點:
(1)不管是Comparator還是Comparable,目的都是用於對物件排序,從它們的文件中可以看到:
Comparable

 This interface imposes a total ordering on the objects of each class that
 * implements it.

Comparator

 * A comparison function, which imposes a <i>total ordering</i> on some
 * collection of objects

而SortedSet不僅讓它們用於排序,還用它們代替equals方法來判斷物件相等,這違背了單一職責原則,使得設計顯得醜陋。
(2)Comparabl的文件中並沒有強制滿足與equals的一致性:

 * It is strongly recommended (though not required) that natural orderings be
 * consistent with equals.  This is so because sorted sets (and sorted maps)
 * without explicit comparators behave "strangely" when they are used with
 * elements (or keys) whose natural ordering is inconsistent with equals.  In
 * particular, such a sorted set (or sorted map) violates the general contract
 * for set (or map), which is defined in terms of the <tt>equals</tt>
 * method.<p>

(3) 站在一個程式之外的視角來看,要求兩個物件相等是兩個物件某一方面比較相等的充要條件,這本身就是不合理的。
最後,我來猜測一下,為什麼類庫的設計人員要這麼做?在二叉搜尋樹中,要搜尋一個key,比較的也只是這個key,根據key的有序儲存方式,可以快速找到這個key對應的物件;這個key,對應的也就是Comparator中用於比較的元素。如果在實現中,用於比較相等的不再是key,那麼二叉搜尋樹的儲存優勢就不在了。(其實,個人感覺,這個問題是可以解決的,比如,假設Java中相等的物件是Comparator比較結果為0的充分條件,就依舊可以使用key進行搜尋,通過equals比較相等,並不會增加演算法複雜度)