一千零一夜:檢查陣列包含某一目標元素的幾種方法分析
阿新 • • 發佈:2019-02-01
最近看programcreek的《Simple Java》材料,在 How to Check if an Array Contains a Value in Java Efficiently一文中作者列舉了四中解決方案,分別是使用List、Set、loop、binarySearch方法,如下所示:
並且使用了陣列為不同大小的的測試用例:5、1k、10kpackage atlas; import java.util.Arrays; import java.util.HashSet; import java.util.Set; /** * @author atlas */ //Four Different Ways to Check If an Array Contains a Value public class checkArrayContailAValue { // use list public boolean useList(String[] arr, String targetValue) { return Arrays.asList(arr).contains(targetValue); } //use set public boolean useSet(String[] arr, String targetValue) { Set<String> set = new HashSet<String>(Arrays.asList(arr)); return set.contains(targetValue); } //use loop public boolean useLoop(String[] arr, String targetValue) { for(String s: arr){ if(s.equals(targetValue)) return true; } return false; } //use binarysearch public boolean useArraysBinarySearch(String[] arr, String targetValue) { int a = Arrays.binarySearch(arr, targetValue); return a > 0; } }
在我機器執行的時間分別是:
結果很明顯,使用二分查詢的方式是最快的,這個不難理解(O(log(n))的複雜度),但是不要忘了一個前提,二分查詢的陣列必須是有序的!,以為到這裡文章結束了麼?不,並沒有那麼簡單。我們看到其他三種方式的差別比較大,這是為什麼呢?這是我們今天研究的重點!
首先,我們來分析下兩個時間相近的方式,使用List和Loop的方式。
使用loop的方式,好理解是ava的for迴圈並結合泛型使用(本質是採用了迭代器Iterator的遍歷),這裡速度是最快的;
其次來看下List,為什麼它的耗時比loop方式大一些呢,分析這個原因,需要知道這兩點,(1)將陣列array轉化為list是需要成本的;(2)list的contatains方式的處理方式,我們逐個分析,將陣列轉為list,是呼叫的Arrays.asList()方法,看Arrays的原始碼中關於這個實現,
是呼叫ArrayList的一個建構函式,傳入的引數一個數組,返回一個可調整大小的arrayList。/** * Returns a fixed-size list backed by the specified array. (Changes to * the returned list "write through" to the array.) This method acts * as bridge between array-based and collection-based APIs, in * combination with {@link Collection#toArray}. The returned list is * serializable and implements {@link RandomAccess}. * * <p>This method also provides a convenient way to create a fixed-size * list initialized to contain several elements: * <pre> * List<String> stooges = Arrays.asList("Larry", "Moe", "Curly"); * </pre> * * @param a the array by which the list will be backed * @return a list view of the specified array */ public static <T> List<T> asList(T... a) { return new ArrayList<T>(a); }
private static class ArrayList<E> extends AbstractList<E>
implements RandomAccess, java.io.Serializable
{
private static final long serialVersionUID = -2764017481108945198L;
private final E[] a;
ArrayList(E[] array) {
if (array==null)
throw new NullPointerException();
a = array;
}
...
}
這個轉換的過程是一個賦值的過程,需要消耗一定的時間。我們再來看下contains方式的實現, /**
* Returns <tt>true</tt> if this list contains the specified element.
* More formally, returns <tt>true</tt> if and only if this list contains
* at least one element <tt>e</tt> such that
* <tt>(o==null ? e==null : o.equals(e))</tt>.
*
* @param o element whose presence in this list is to be tested
* @return <tt>true</tt> if this list contains the specified element
*/
public boolean contains(Object o) {
return indexOf(o) >= 0;
}
/**
* Returns the index of the first occurrence of the specified element
* in this list, or -1 if this list does not contain the element.
* More formally, returns the lowest index <tt>i</tt> such that
* <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt>,
* or -1 if there is no such index.
*/
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
可以看到contains方式內部也是通過一個for迴圈比較來尋找是否有這個元素,也就是同loop方式一樣;
由此,可以推算出來,陣列轉為list的開銷也比較大。
最後,來看一下最耗時的方式Set方法,為啥這個方式最耗時呢,首先你肯定想到了,轉換的開銷是比較大的,而且還是經過了兩種的轉換,
Set<String> set = new HashSet<String>(Arrays.asList(arr));
private transient HashMap<E,Object> map
/**
* Constructs a new set containing the elements in the specified
* collection. The <tt>HashMap</tt> is created with default load factor
* (0.75) and an initial capacity sufficient to contain the elements in
* the specified collection.
*
* @param c the collection whose elements are to be placed into this set
* @throws NullPointerException if the specified collection is null
*/
public HashSet(Collection<? extends E> c) {
map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
/**
* {@inheritDoc}
*
* <p>This implementation iterates over the specified collection, and adds
* each object returned by the iterator to this collection, in turn.
*
* <p>Note that this implementation will throw an
* <tt>UnsupportedOperationException</tt> unless <tt>add</tt> is
* overridden (assuming the specified collection is non-empty).
*
* @throws UnsupportedOperationException {@inheritDoc}
* @throws ClassCastException {@inheritDoc}
* @throws NullPointerException {@inheritDoc}
* @throws IllegalArgumentException {@inheritDoc}
* @throws IllegalStateException {@inheritDoc}
*
* @see #add(Object)
*/
public boolean addAll(Collection<? extends E> c) {
boolean modified = false;
Iterator<? extends E> e = c.iterator();
while (e.hasNext()) {
if (add(e.next()))
modified = true;
}
return modified;
}
首先是先申請一個hashmap,然後通過addall()方法將list元素放入到map中,addall方法也是用過迭代器的方式挨個放入元素,然後呼叫contains方式,
public Iterator<Map.Entry<K,V>> iterator() {
return newEntryIterator();
}
public boolean contains(Object o) {
if (!(o instanceof Map.Entry))
return false;
Map.Entry<K,V> e = (Map.Entry<K,V>) o;
Entry<K,V> candidate = getEntry(e.getKey());
return candidate != null && candidate.equals(e);
}
public boolean remove(Object o) {
return removeMapping(o) != null;
}
public int size() {
return size;
}
public void clear() {
HashMap.this.clear();
}
}
同樣也是一個迴圈比較的過程。
至此,我們分析了這幾種方式的耗時情況以及原因,在專案開發中對於資料量不大的情況下還是建議使用Loop的方式來處理,你知道了麼?