從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求

阿新 • • 發佈：2019-02-04

一、發現問題
有這樣一個任務：對一堆學生按照成績進行排序。為了能夠快速的獲得有序結合，我選擇了TreeSet這個有序資料結構來幫我完成這個任務。有兩點讓我認為TreeSet能夠幫我快速獲得有序的學生集合：
（1）TreeSet基於紅黑樹實現，而紅黑樹是一個平衡二叉樹，也就說，它的排序時間複雜度是nlogn；
（2）在插入的初期logn較小。
也就是說，我認為使用TreeSet優於在得到全部學生後對全部學生使用一個nlogn的演算法排序。無論如何，我按照的想法實現了。
學生類：

package com.liyuncong.learn.test.sortedset;

public 
 class Student implements Comparable<Student> {
    private String studentNumber;
    private String name;
    private int score;

    public String getStudentNumber() {
        return studentNumber;
    }

    public void setStudentNumber(String studentNumber) {
        this.studentNumber = studentNumber;
    }

    public 
 String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getScore() {
        return score;
    }

    public void setScore(int score) {
        this.score = score;
    }

    @Override
    public int hashCode() {
        final int 
 prime = 31;
        int result = 1;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        result = prime * result + score;
        result = prime * result + ((studentNumber == null) ? 0 : studentNumber.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Student other = (Student) obj;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        if (score != other.score)
            return false;
        if (studentNumber == null) {
            if (other.studentNumber != null)
                return false;
        } else if (!studentNumber.equals(other.studentNumber))
            return false;
        return true;
    }

    /**
     * Student的自然序
     */
    @Override
    public int compareTo(Student o) {
        return this.studentNumber.compareTo(o.getStudentNumber());
    }

    @Override
    public String toString() {
        return "Student [studentNumber=" + studentNumber + ", name=" + name + ", score=" + score + "]";
    }

}

對學生排序：

package com.liyuncong.learn.test.sortedset;

import java.util.Comparator;
import java.util.TreeSet;

public class SortStudentTest {
    public static void main(String[] args) {
        Student student1 = new Student();
        student1.setStudentNumber("1");
        student1.setName("張三");
        student1.setScore(90);
        Student student2 = new Student();
        student2.setStudentNumber("2");
        student2.setName("李四");
        student2.setScore(80);
        Student student3 = new Student();
        student3.setStudentNumber("3");
        student3.setName("王二麻子");
        student3.setScore(90);

        TreeSet<Student> treeSet = new TreeSet<>(new Comparator<Student>() {

            @Override
            public int compare(Student o1, Student o2) {
                return o1.getScore() - o2.getScore();
            }
        });
        treeSet.add(student3);
        treeSet.add(student2);
        treeSet.add(student1);
        for (Student student : treeSet) {
            System.out.println(student);
        }
    }
}

排序輸出：
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
信心滿滿的實現了自己的想法，結果卻有點出乎意料。放進集合三個物件，出來卻只有兩個。
二、找到原因
從程式輸出看到，“張三”沒有被成功新增進去。按照Java Set的規範，只有當集合中已經有某個元素時（通過equal方法判斷），再次新增這個元素才不會被新增；可是，新增“張三”時，集合中並沒有和他相等的元素。為了一探究竟，打算進入原始碼中看看。首先看TreeSet的add方法：

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element {@code e} to this set if
     * the set contains no element {@code e2} such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.
     *
     * @param e element to be added to this set
     * @return {@code true} if this set did not already contain the specified
     *         element
     * @throws ClassCastException if the specified object cannot be compared
     *         with the elements currently in this set
     * @throws NullPointerException if the specified element is null
     *         and this set uses natural ordering, or its comparator
     *         does not permit null elements
     */
    public boolean add(E e) {
        return m.put(e, PRESENT)==null;
    }

從add的方法的註釋中，看到，TreeSet是遵守Set的規範的——通過equal方法判斷重複元素。但這裡沒有具體的實現，繼續看原始碼。add方法是呼叫m的put方法往集合中新增元素。m是什麼？

    /**
     * The backing map.
     */
    private transient NavigableMap<E,Object> m;
        public TreeSet() {
        this(new TreeMap<E,Object>());
    }

原來m是一個TreeMap，TreeSet和HashSet一樣，基於對應的Map實現。現在來看看TreeMap的put方法：

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     *
     * @return the previous value associated with {@code key}, or
     *         {@code null} if there was no mapping for {@code key}.
     *         (A {@code null} return can also indicate that the map
     *         previously associated {@code null} with {@code key}.)
     * @throws ClassCastException if the specified key cannot be compared
     *         with the keys currently in the map
     * @throws NullPointerException if the specified key is null
     *         and this map uses natural ordering, or its comparator
     *         does not permit null keys
     */
    public V put(K key, V value) {
        Entry<K,V> t = root;
        if (t == null) {
            compare(key, key); // type (and possibly null) check

            root = new Entry<>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        // split comparator and comparable paths
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                parent = t;
                cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        else {
            if (key == null)
                throw new NullPointerException();
            @SuppressWarnings("unchecked")
                Comparable<? super K> k = (Comparable<? super K>) key;
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }

原來，是通過Comparator的compare方法（或者Comparable介面的compareTo）判斷元素的相等性。這違背了Set介面的規範，我覺得我發現了Java類庫的一個bug。不過，我得先解決問題。
三、解決問題
知道了問題所在，我只需要對Comparator做個簡單的修改，就能實現最初的目標：

        TreeSet<Student> treeSet2 = new TreeSet<>(new Comparator<Student>() {

            @Override
            public int compare(Student o1, Student o2) {
                int result = o1.getScore() - o2.getScore();
                return result == 0 ? 1 : result;
            }
        });

也就是說，通過Comparator比較的兩個元素永遠不可能相等。再跑一下上面的排序，結果正常了：
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
Student [studentNumber=1, name=張三, score=90]
四、進一步思考
問題是解決了，但是還沒完。我可是發現了Java類庫的一個bug。不過，在告訴大家這個bug之前，我得做足準備，進一步確認，免得鬧笑話。於是看了這幾個介面或者類的文件：Collection、Set、SortedSet、NavigableSet、TreeSet、TreeMap、Comparable和Object，因為TreeMap的紅黑樹是基於《演算法導論》中的介紹實現的(TreeMap的一段註釋：Algorithms are adaptations of those in Cormen, Leiserson, and Rivest’s Introduction to Algorithms)，所以也簡單複習了一下其中對二叉搜尋樹和紅黑樹的介紹，當然也看了下網上一些部落格對我遇到的問題的介紹。好了，我感覺有有資格來說這件事兒了。
站在TreeSet的add方法的角度來看，這確實是一個bug；但是站在整個SortedSet的角度來看，這只是一個設計缺陷。因為在SortedMap的文件中已經說明了這個問題：

 * <p>Note that the ordering maintained by a sorted set (whether or not an
 * explicit comparator is provided) must be <i>consistent with equals</i> if
 * the sorted set is to correctly implement the <tt>Set</tt> interface.  (See
 * the <tt>Comparable</tt> interface or <tt>Comparator</tt> interface for a
 * precise definition of <i>consistent with equals</i>.)  This is so because
 * the <tt>Set</tt> interface is defined in terms of the <tt>equals</tt>
 * operation, but a sorted set performs all element comparisons using its
 * <tt>compareTo</tt> (or <tt>compare</tt>) method, so two elements that are
 * deemed equal by this method are, from the standpoint of the sorted set,
 * equal.  The behavior of a sorted set <i>is</i> well-defined even if its
 * ordering is inconsistent with equals; it just fails to obey the general
 * contract of the <tt>Set</tt> interface.

“precise definition of consistent with equals”是指：

 * The natural ordering for a class <tt>C</tt> is said to be <i>consistent
 * with equals</i> if and only if <tt>e1.compareTo(e2) == 0</tt> has
 * the same boolean value as <tt>e1.equals(e2)</tt>

TreeMap的註釋聲明瞭，如果“不一致”，會違背Set的規範，具體點說，就是會違背通過equals方法判斷重複物件的規範。文件已經說明了，所以，上面遇到的問題不能認為是一個bug。但是可以像《effective Java》中提到的一些點一樣，我認為這是一個設計缺陷，得出這個結論是基於下面三點：
(1)不管是Comparator還是Comparable，目的都是用於對物件排序，從它們的文件中可以看到：
Comparable

 This interface imposes a total ordering on the objects of each class that
 * implements it.

Comparator

 * A comparison function, which imposes a <i>total ordering</i> on some
 * collection of objects

而SortedSet不僅讓它們用於排序，還用它們代替equals方法來判斷物件相等，這違背了單一職責原則，使得設計顯得醜陋。
(2)Comparabl的文件中並沒有強制滿足與equals的一致性：

 * It is strongly recommended (though not required) that natural orderings be
 * consistent with equals.  This is so because sorted sets (and sorted maps)
 * without explicit comparators behave "strangely" when they are used with
 * elements (or keys) whose natural ordering is inconsistent with equals.  In
 * particular, such a sorted set (or sorted map) violates the general contract
 * for set (or map), which is defined in terms of the <tt>equals</tt>
 * method.<p>

(3) 站在一個程式之外的視角來看，要求兩個物件相等是兩個物件某一方面比較相等的充要條件，這本身就是不合理的。
最後，我來猜測一下，為什麼類庫的設計人員要這麼做？在二叉搜尋樹中，要搜尋一個key，比較的也只是這個key，根據key的有序儲存方式，可以快速找到這個key對應的物件；這個key，對應的也就是Comparator中用於比較的元素。如果在實現中，用於比較相等的不再是key，那麼二叉搜尋樹的儲存優勢就不在了。（其實，個人感覺，這個問題是可以解決的，比如，假設Java中相等的物件是Comparator比較結果為0的充分條件，就依舊可以使用key進行搜尋，通過equals比較相等，並不會增加演算法複雜度）

從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求

從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求

redis 系列14 有序集合對象

遍歷HashSet與TreeSet與java的集合框架

JAVA 有序集合無序集合

Java第二章----對象和類

Java之集合(四)Vector和Stack

Python內置的有序集合：list和tuple是，一個可變，一個不可變

從一個例子看Java的資料初始化和類載入

Java的集合：HashMap和TreeMap(學習篇9）

Java Map集合按照key和value排序之法

三、Java記憶體模型---重排序和順序一致性

hbase對jdk和Hadoop的版本要求

java String 類equals方法和==的比較

Json轉java對象和List集合

java--用一個對象實現集合

利用java反射將結果集封裝成為對象和對象集合

有關java類、對象初始化的話題，從一道面試題切入

從零開始學java (面向對象)

redis原始碼分析與思考（十七）——有序集合型別的命令實現(t_zset.c)

從一次線上故障思考 Java 問題定位思路

從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求

相關推薦