從誤用TreeSet到思考Java有序集合對相等和順序比較一致性的要求
一、 發現問題
有這樣一個任務:對一堆學生按照成績進行排序。為了能夠快速的獲得有序結合,我選擇了TreeSet這個有序資料結構來幫我完成這個任務。有兩點讓我認為TreeSet能夠幫我快速獲得有序的學生集合:
(1)TreeSet基於紅黑樹實現,而紅黑樹是一個平衡二叉樹,也就說,它的排序時間複雜度是
(2)在插入的初期
也就是說,我認為使用TreeSet優於在得到全部學生後對全部學生使用一個
學生類:
package com.liyuncong.learn.test.sortedset;
public class Student implements Comparable<Student> {
private String studentNumber;
private String name;
private int score;
public String getStudentNumber() {
return studentNumber;
}
public void setStudentNumber(String studentNumber) {
this.studentNumber = studentNumber;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getScore() {
return score;
}
public void setScore(int score) {
this.score = score;
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
result = prime * result + score;
result = prime * result + ((studentNumber == null) ? 0 : studentNumber.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Student other = (Student) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
if (score != other.score)
return false;
if (studentNumber == null) {
if (other.studentNumber != null)
return false;
} else if (!studentNumber.equals(other.studentNumber))
return false;
return true;
}
/**
* Student的自然序
*/
@Override
public int compareTo(Student o) {
return this.studentNumber.compareTo(o.getStudentNumber());
}
@Override
public String toString() {
return "Student [studentNumber=" + studentNumber + ", name=" + name + ", score=" + score + "]";
}
}
對學生排序:
package com.liyuncong.learn.test.sortedset;
import java.util.Comparator;
import java.util.TreeSet;
public class SortStudentTest {
public static void main(String[] args) {
Student student1 = new Student();
student1.setStudentNumber("1");
student1.setName("張三");
student1.setScore(90);
Student student2 = new Student();
student2.setStudentNumber("2");
student2.setName("李四");
student2.setScore(80);
Student student3 = new Student();
student3.setStudentNumber("3");
student3.setName("王二麻子");
student3.setScore(90);
TreeSet<Student> treeSet = new TreeSet<>(new Comparator<Student>() {
@Override
public int compare(Student o1, Student o2) {
return o1.getScore() - o2.getScore();
}
});
treeSet.add(student3);
treeSet.add(student2);
treeSet.add(student1);
for (Student student : treeSet) {
System.out.println(student);
}
}
}
排序輸出:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
信心滿滿的實現了自己的想法,結果卻有點出乎意料。放進集合三個物件,出來卻只有兩個。
二、找到原因
從程式輸出看到,“張三”沒有被成功新增進去。按照Java Set的規範,只有當集合中已經有某個元素時(通過equal方法判斷),再次新增這個元素才不會被新增;可是,新增“張三”時,集合中並沒有和他相等的元素。為了一探究竟,打算進入原始碼中看看。首先看TreeSet的add方法:
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element {@code e} to this set if
* the set contains no element {@code e2} such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns {@code false}.
*
* @param e element to be added to this set
* @return {@code true} if this set did not already contain the specified
* element
* @throws ClassCastException if the specified object cannot be compared
* with the elements currently in this set
* @throws NullPointerException if the specified element is null
* and this set uses natural ordering, or its comparator
* does not permit null elements
*/
public boolean add(E e) {
return m.put(e, PRESENT)==null;
}
從add的方法的註釋中,看到,TreeSet是遵守Set的規範的——通過equal方法判斷重複元素。但這裡沒有具體的實現,繼續看原始碼。add方法是呼叫m的put方法往集合中新增元素。m是什麼?
/**
* The backing map.
*/
private transient NavigableMap<E,Object> m;
public TreeSet() {
this(new TreeMap<E,Object>());
}
原來m是一個TreeMap,TreeSet和HashSet一樣,基於對應的Map實現。現在來看看TreeMap的put方法:
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
*
* @return the previous value associated with {@code key}, or
* {@code null} if there was no mapping for {@code key}.
* (A {@code null} return can also indicate that the map
* previously associated {@code null} with {@code key}.)
* @throws ClassCastException if the specified key cannot be compared
* with the keys currently in the map
* @throws NullPointerException if the specified key is null
* and this map uses natural ordering, or its comparator
* does not permit null keys
*/
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
compare(key, key); // type (and possibly null) check
root = new Entry<>(key, value, null);
size = 1;
modCount++;
return null;
}
int cmp;
Entry<K,V> parent;
// split comparator and comparable paths
Comparator<? super K> cpr = comparator;
if (cpr != null) {
do {
parent = t;
cmp = cpr.compare(key, t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
} while (t != null);
}
else {
if (key == null)
throw new NullPointerException();
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) key;
do {
parent = t;
cmp = k.compareTo(t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
} while (t != null);
}
Entry<K,V> e = new Entry<>(key, value, parent);
if (cmp < 0)
parent.left = e;
else
parent.right = e;
fixAfterInsertion(e);
size++;
modCount++;
return null;
}
原來,是通過Comparator的compare方法(或者Comparable介面的compareTo)判斷元素的相等性。這違背了Set介面的規範,我覺得我發現了Java類庫的一個bug。不過,我得先解決問題。
三、解決問題
知道了問題所在,我只需要對Comparator做個簡單的修改,就能實現最初的目標:
TreeSet<Student> treeSet2 = new TreeSet<>(new Comparator<Student>() {
@Override
public int compare(Student o1, Student o2) {
int result = o1.getScore() - o2.getScore();
return result == 0 ? 1 : result;
}
});
也就是說,通過Comparator比較的兩個元素永遠不可能相等。再跑一下上面的排序,結果正常了:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
Student [studentNumber=1, name=張三, score=90]
四、進一步思考
問題是解決了,但是還沒完。我可是發現了Java類庫的一個bug。不過,在告訴大家這個bug之前,我得做足準備,進一步確認,免得鬧笑話。於是看了這幾個介面或者類的文件:Collection、Set、SortedSet、NavigableSet、TreeSet、TreeMap、Comparable和Object,因為TreeMap的紅黑樹是基於《演算法導論》中的介紹實現的(TreeMap的一段註釋:Algorithms are adaptations of those in Cormen, Leiserson, and Rivest’s Introduction to Algorithms),所以也簡單複習了一下其中對二叉搜尋樹和紅黑樹的介紹,當然也看了下網上一些部落格對我遇到的問題的介紹。好了,我感覺有有資格來說這件事兒了。
站在TreeSet的add方法的角度來看,這確實是一個bug;但是站在整個SortedSet的角度來看,這只是一個設計缺陷。因為在SortedMap的文件中已經說明了這個問題:
* <p>Note that the ordering maintained by a sorted set (whether or not an
* explicit comparator is provided) must be <i>consistent with equals</i> if
* the sorted set is to correctly implement the <tt>Set</tt> interface. (See
* the <tt>Comparable</tt> interface or <tt>Comparator</tt> interface for a
* precise definition of <i>consistent with equals</i>.) This is so because
* the <tt>Set</tt> interface is defined in terms of the <tt>equals</tt>
* operation, but a sorted set performs all element comparisons using its
* <tt>compareTo</tt> (or <tt>compare</tt>) method, so two elements that are
* deemed equal by this method are, from the standpoint of the sorted set,
* equal. The behavior of a sorted set <i>is</i> well-defined even if its
* ordering is inconsistent with equals; it just fails to obey the general
* contract of the <tt>Set</tt> interface.
“precise definition of consistent with equals”是指:
* The natural ordering for a class <tt>C</tt> is said to be <i>consistent
* with equals</i> if and only if <tt>e1.compareTo(e2) == 0</tt> has
* the same boolean value as <tt>e1.equals(e2)</tt>
TreeMap的註釋聲明瞭,如果“不一致”,會違背Set的規範,具體點說,就是會違背通過equals方法判斷重複物件的規範。文件已經說明了,所以,上面遇到的問題不能認為是一個bug。但是可以像《effective Java》中提到的一些點一樣,我認為這是一個設計缺陷,得出這個結論是基於下面三點:
(1)不管是Comparator還是Comparable,目的都是用於對物件排序,從它們的文件中可以看到:
Comparable
This interface imposes a total ordering on the objects of each class that
* implements it.
Comparator
* A comparison function, which imposes a <i>total ordering</i> on some
* collection of objects
而SortedSet不僅讓它們用於排序,還用它們代替equals方法來判斷物件相等,這違背了單一職責原則,使得設計顯得醜陋。
(2)Comparabl的文件中並沒有強制滿足與equals的一致性:
* It is strongly recommended (though not required) that natural orderings be
* consistent with equals. This is so because sorted sets (and sorted maps)
* without explicit comparators behave "strangely" when they are used with
* elements (or keys) whose natural ordering is inconsistent with equals. In
* particular, such a sorted set (or sorted map) violates the general contract
* for set (or map), which is defined in terms of the <tt>equals</tt>
* method.<p>
(3) 站在一個程式之外的視角來看,要求兩個物件相等是兩個物件某一方面比較相等的充要條件,這本身就是不合理的。
最後,我來猜測一下,為什麼類庫的設計人員要這麼做?在二叉搜尋樹中,要搜尋一個key,比較的也只是這個key,根據key的有序儲存方式,可以快速找到這個key對應的物件;這個key,對應的也就是Comparator中用於比較的元素。如果在實現中,用於比較相等的不再是key,那麼二叉搜尋樹的儲存優勢就不在了。(其實,個人感覺,這個問題是可以解決的,比如,假設Java中相等的物件是Comparator比較結果為0的充分條件,就依舊可以使用key進行搜尋,通過equals比較相等,並不會增加演算法複雜度)