雙重大陣列迴圈優化

阿新 • • 發佈：2019-01-02

雙重大陣列迴圈優化

一、前言

這幾天發現服務在凌晨時容易報警，持續半個小時才正常，第二天分析日誌和檢查程式碼發現，有一個過濾黑白名單的操作，其中黑名單的資料有39萬，白名單資料30萬，然後處理的資料也有80萬左右，在業務邏輯中黑白名單本身有一個過濾邏輯，資料對黑白名單有一個過濾邏輯，此處總共耗時在30分鐘左右，在耗時將近40分鐘後，下一輪低頻任務才開啟，所以cat不斷報警，此處開啟下一輪時間太長不可接受，因此對這一塊程式碼進行優化。

二、雙重陣列迴圈優化

2.1 程式碼邏輯

在檢查程式碼時發現瞭如下幾個程式碼塊：

獲取到黑白名單後，對白名單進行過濾黑名單，其中黑名單39萬，白名單35萬：

for (String blackData : blackDatas) {
    if (whiteDatas.contains(blackData)) {
          continue;
     }
    filterBlackDatas.add(blackData);
}

poiId資料對白名單求差集，然後將差集新增到poiId資料中，其中poiId資料80萬。

for (String whiteData : whiteDatas) {
     if (!dataIds.contains(whiteData)) {
         dataIds.add(whiteData);
    }
 }

PoiId資料和黑名單求交集

for (String dataId : dataIds) {
    if (blackDatas.contains(dataId)) {
           continue;
     } 
}

程式碼耗時主要就在這幾個迴圈處。

2.2 耗時分析

2.2.1 程式碼分析

上面程式碼中均是在一個迴圈中進行一個contain操作，我們看一下ArrayList的contain原始碼，如下：

/**
 * Returns <tt>true</tt> if this list contains the specified element.
 * More formally, returns <tt>true 
</tt> if and only if this list contains
 * at least one element <tt>e</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;e==null&nbsp;:&nbsp;o.equals(e))</tt>.
 *
 * @param o element whose presence in this list is to be tested
 * @return <tt>true</tt> if this list contains the specified element
 */
public boolean contains(Object o) {
    return indexOf(o) >= 0;
}

/**
 * Returns the index of the first occurrence of the specified element
 * in this list, or -1 if this list does not contain the element.
 * More formally, returns the lowest index <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
 * or -1 if there is no such index.
 */
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

contain使用的是下面的indexOf方法，indexOf中又是一個for迴圈操作，在時間複雜度上為O(n^2), 耗時太長，此處可以測試下上述程式碼耗時，因為在改動時上線時，上述程式碼並沒有加日誌觀察耗時，現在只有優化後的結果，但是可以在本地模擬一下耗時，結果及演示資料如下。

2.2.2 本地資料模擬

在本地進行資料模擬時，選擇的是黑名單對白名單過濾這塊，

程式碼邏輯為：

 private static void normalFilter(List<String> blacks, List<String> writes) {
    List<String> filters = new ArrayList<>();
    for (String blackData : blacks) {
        if (writes.contains(blackData)) {
            continue;
        }
        filters.add(blackData);
    }
}

耗時如下：

毫秒數為425965，轉換成分鐘數大概為7分鐘，後面還有更大的PoiId資料和黑白名單的過濾，因此總耗時在三四十分鐘基本是沒有問題的，這種耗時時不可接受的，因此提出新的優化方案。

2.3 優化方案

2.3.1 選擇優化方案

此處其實無非是減少迴圈次數，減少耗時時間，最開始想到的是查下apache的工具包中求差集的工具，為CollectionUtils.subtract，首先沒有去分析原理，直接使用，程式碼如下：

 private static Collection<String> apacheFilter(List<String> blacks, List<String> writes) {
    Collection<String> collection = CollectionUtils.subtract(blacks, writes);
    return collection;
}

效果如下：

時間縮短到49s，是先前的九分之一左右，雖然說耗時仍然較長，但比先前寫的耗時短多了，以下是對實現原理的分析。

2.3.2 原理分析

轉到CollectionUtils.subtract原始碼，原始碼如下：

/**
 * Returns a new {@link Collection} containing <tt><i>a</i> - <i>b</i></tt>.
 * The cardinality of each element <i>e</i> in the returned {@link Collection}
 * will be the cardinality of <i>e</i> in <i>a</i> minus the cardinality
 * of <i>e</i> in <i>b</i>, or zero, whichever is greater.
 *
 * @param a  the collection to subtract from, must not be null
 * @param b  the collection to subtract, must not be null
 * @return a new collection with the results
 * @see Collection#removeAll
 */
public static Collection subtract(final Collection a, final Collection b) {
    ArrayList list = new ArrayList( a );
    for (Iterator it = b.iterator(); it.hasNext();) {
        list.remove(it.next());
    }
    return list;
}

在需要排除的集合b中進行迴圈，然後對每個迴圈的元素做remove操作，remove操作如下：

/**
 * Removes the first occurrence of the specified element from this list,
 * if it is present.  If the list does not contain the element, it is
 * unchanged.  More formally, removes the element with the lowest index
 * <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>
 * (if such an element exists).  Returns <tt>true</tt> if this list
 * contained the specified element (or equivalently, if this list
 * changed as a result of the call).
 *
 * @param o element to be removed from this list, if present
 * @return <tt>true</tt> if this list contained the specified element
 */
public boolean remove(Object o) {
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

/*
 * Private remove method that skips bounds checking and does not
 * return the value removed.
 */
private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work
}

這裡的remove操作仍然是一個for迴圈，fastRemove也沒有什麼新奇之處，但是關鍵在於在每次remove中，a集合的個數一直在減少，因此總的迴圈數就是（n - 1)n/2,遠遠比n^2小，因此能獲得較好的效能，但是在此處，將近49s的耗時仍然是不可取的，因此需要新的優化方案。

2.4 更好的優化方案

2.4.1 方案選擇

需要更好的效能，在這考慮的無非還是減少迴圈次數，或者是利用執行緒池來進行並行處理，但是執行緒池用起來比較麻煩，並且多個執行緒效果可能還並沒有上一個減少迴圈次數的好，因此還是要考慮減少迴圈次數，因為一個巧合的原因，上面方案中使用的apache commons的版本是3.2的，但是在本地測試時下載了一個4.0的包，使用4.0的包的時候，發現效能遠遠比先前好，測試程式碼與先前相同，下圖是測試效果：

耗時170ms，相對前一個方案49446ms的耗時來說，這個解決方案可以說是超出預期的，可以完美解決現在這些計算耗時問題，因此轉到原始碼，查看了下新的包下的程式碼實現，如下。

2.4.2 原理分析

轉到CollectionUtils.subtract原始碼，如下：

 /**
 * Returns a new {@link Collection} containing {@code <i>a</i> - <i>b</i>}.
 * The cardinality of each element <i>e</i> in the returned {@link Collection}
 * will be the cardinality of <i>e</i> in <i>a</i> minus the cardinality
 * of <i>e</i> in <i>b</i>, or zero, whichever is greater.
 *
 * @param a  the collection to subtract from, must not be null
 * @param b  the collection to subtract, must not be null
 * @param <O> the generic type that is able to represent the types contained
 *        in both input collections.
 * @return a new collection with the results
 * @see Collection#removeAll
 */
public static <O> Collection<O> subtract(final Iterable<? extends O> a, final Iterable<? extends O> b) {
    final Predicate<O> p = TruePredicate.truePredicate();
    return subtract(a, b, p);
}

public static <O> Collection<O> subtract(final Iterable<? extends O> a,
                                         final Iterable<? extends O> b,
                                         final Predicate<O> p) {
    final ArrayList<O> list = new ArrayList<O>();
    final HashBag<O> bag = new HashBag<O>();
    for (final O element : b) {
        if (p.evaluate(element)) {
            bag.add(element);
        }
    }
    for (final O element : a) {
        if (!bag.remove(element, 1)) {
            list.add(element);
        }
    }
    return list;
}

程式碼如上，第一個for迴圈中將b集合中的元素儲存在HashBag中，HashBag內部是用HashMap實現，這裡可以看做是一個HashMap，然後在第二個迴圈中，判斷元素是否在bag中，不在的儲存到新的list中，然後返回新的list集合，使用HashMap儲存key和value，兩次遍歷完成兩個差集集合的運算，利用空間換時間的操作，使此處的時間複雜度降低到2n，迴圈次數遠遠比n^2和(n - 1)n/2要小，減少計算邏輯耗時，足以滿足需求。

3. 總結

對自己負責的服務多上點心，優化總是沒有壞處。

雙重大陣列迴圈優化

雙重大陣列迴圈優化

一、前言

二、雙重陣列迴圈優化

2.1 程式碼邏輯

2.2 耗時分析

2.2.1 程式碼分析

2.2.2 本地資料模擬

2.3 優化方案

2.3.1 選擇優化方案

2.3.2 原理分析

2.4 更好的優化方案

2.4.1 方案選擇

2.4.2 原理分析

3. 總結

雙重大陣列迴圈優化

『陣列的最大代價貪心優化DP』

陣列迴圈遍歷優化和陣列去重演算法

Codewars 打怪日記 5星級kyu 數獨遊戲我是否完成了陣列 Did I Finish my Sudoku? 看小菜和大神迴圈的巧妙運用

最大子段-n上找m個子段的和為最大-動態規劃-二維dp+滾動陣列dp優化

[學習筆記] UVA 1659 Help Little Laura - 最大費用迴圈流 - 學習筆記

程式設計師面試100題之十一陣列迴圈移位

457.環形陣列迴圈

7-4 陣列迴圈左移（10 分）c語言解答

程式設計之美9：陣列迴圈位移

js中幾種較常見的陣列迴圈

5大主流ASO優化工具

ognl表示式取值集合中的陣列迴圈取值和頁面普通取值

js陣列迴圈的研究

PTA 陣列迴圈左移（20 分）本題要求實現一個對陣列進行迴圈左移的簡單函式：一個數組a中存有n（>0）個整數，在不允許使用另外陣列的前提下，將每個整數迴圈向左移m（≥0）個位置，即將a中的

js陣列迴圈遍歷陣列內所有元素的方法

C語言再學習5-陣列與優化

Mysql分頁，資料量大時limit優化

C++陣列迴圈左移（PTA）

大物件或大陣列存入老年代

雙重大陣列迴圈優化

雙重大陣列迴圈優化

一、前言

二、雙重陣列迴圈優化

2.1 程式碼邏輯

2.2 耗時分析

2.2.1 程式碼分析

2.2.2 本地資料模擬

2.3 優化方案

2.3.1 選擇優化方案

2.3.2 原理分析

2.4 更好的優化方案

2.4.1 方案選擇

2.4.2 原理分析

3. 總結

相關推薦