無聊寫排序之 ----部分排序(Partial Sort)
當有一個無序的序列集合的時候,我們想知道這個序列裡面按照某種排序關係最大的m個或者前top個有序的元素。比如我又100個學生,我只想知道排名前20的學生的名次列表,剩餘的我並不關心,如何去得到呢? 當然你腦海中第一個閃過的便是sort,做一次排序,取排序後前面的20不就好了嗎? 沒錯,排序作為做常規的方法,肯定是最先想到的,這裡要介紹的是比排序來的更快更直接的一個演算法:部分排序(partial_sort),該演算法來自於STL的演算法庫,在研究STL原始碼時看到的,瞬間眼前一亮,這裡分享出來。
partial_sort演算法接受一個middle的index,該middle位於[first, last)的元素序列範圍內,然後重新安排[first, last),使得序列中的middle-first個最小元素以指定順序排序最終放置在[first, middle)中, 其餘的元素安置在[middle, last)內,不保證有任何指定的順序。因此可以看出來partial_sort執行後並不保證所有的結果都有序,而有序的部分數量永遠都小於等於整個元素區間的數量。所以在只是挑出前m個元素的排序中,效率明顯要高於全排序的sort演算法,當然m越小效率越高,m等於n時相當於全排序了。
partial_sort的原理:部分排序的原型出現在STL的演算法庫裡面,根據其所描述的程式碼,很容易可以看出來partial_sort是借用了堆排序的思想來作為底層排序實現的。對於該演算法的原理這樣描述。假設我們有n個元素序列,需要找到其中最小的m個元素,m<=n時。 先界定區間[first, m) 然後對該區間使用make_heap()來組織成一個大頂堆。然後遍歷剩餘區間[m, last)中的元素, 剩餘區間的每個元素均與大頂堆的堆頂元素進行比較(大頂堆的堆頂元素為最大元素,該元素為第一個元素,很容易獲得),若堆頂元素較小,邊交換堆頂元素和遍歷得到的元素值,重新調整該大頂堆以維持該堆為大頂堆。遍歷結束後,[first, m)區間內的元素便是排名在前的m個元素,在對該堆做一次堆排序便可得到最好的結果。
演算法使用演示如下:
執行結果:#include <iostream> #include <string> #include <vector> #include <algorithm> using namespace std; int main() { vector<int> vc; for (int i = 0; i < 10; i++) { vc.push_back(rand()%100); } for (int i = 0; i < vc.size(); i++) cout << vc[i] << " "; cout << endl; partial_sort(vc.begin(), vc.begin()+4, vc.end()); for (int i = 0; i < vc.size(); i++) cout << vc[i] << " "; cout << endl; return 0; }
STL原始碼:
template <class RandomAccessIterator>
inline void partial_sort(RandomAccessIterator first,
RandomAccessIterator middle,
RandomAccessIterator last) {
__partial_sort(first, middle, last, value_type(first));
}
template <class RandomAccessIterator, class T>
void __partial_sort(RandomAccessIterator first, RandomAccessIterator middle,
RandomAccessIterator last, T*) {
make_heap(first, middle); //將區間[first, middle)構造為一個堆結構
for (RandomAccessIterator i = middle; i < last; ++i)
if (*i < *first) // 遍歷堆以外的元素,並將更優的元素放入堆中
__pop_heap(first, middle, i, T(*i), distance_type(first));
sort_heap(first, middle); // 對最終的堆進行排序
}
heap原始碼:
<span style="font-size:12px;">template <class RandomAccessIterator>
inline void partial_sort(RandomAccessIterator first,
RandomAccessIterator middle,
RandomAccessIterator last) {
__partial_sort(first, middle, last, value_type(first));
}
template <class RandomAccessIterator, class T>
void __partial_sort(RandomAccessIterator first, RandomAccessIterator middle,
RandomAccessIterator last, T*) {
make_heap(first, middle); //將區間[first, middle)構造為一個堆結構
for (RandomAccessIterator i = middle; i < last; ++i)
if (*i < *first) // 遍歷堆以外的元素,並將更優的元素放入堆中
__pop_heap(first, middle, i, T(*i), distance_type(first));
sort_heap(first, middle); // 對最終的堆進行排序
}
template <class RandomAccessIterator>
inline void make_heap(RandomAccessIterator first, RandomAccessIterator last) {
__make_heap(first, last, value_type(first), distance_type(first));
}
template <class RandomAccessIterator, class T, class Distance>
void __make_heap(RandomAccessIterator first, RandomAccessIterator last, T*,
Distance*) {
if (last - first < 2) return;
Distance len = last - first;
Distance parent = (len - 2)/2;
while (true) {
__adjust_heap(first, parent, len, T(*(first + parent)));
if (parent == 0) return;
parent--;
}
}
template <class RandomAccessIterator, class Distance, class T>
void __adjust_heap(RandomAccessIterator first, Distance holeIndex,
Distance len, T value) {
Distance topIndex = holeIndex;
Distance secondChild = 2 * holeIndex + 2;
while (secondChild < len) {
if (*(first + secondChild) < *(first + (secondChild - 1)))
secondChild--;
*(first + holeIndex) = *(first + secondChild);
holeIndex = secondChild;
secondChild = 2 * (secondChild + 1);
}
if (secondChild == len) {
*(first + holeIndex) = *(first + (secondChild - 1));
holeIndex = secondChild - 1;
}
__push_heap(first, holeIndex, topIndex, value);
}
template <class RandomAccessIterator, class Distance, class T>
void __push_heap(RandomAccessIterator first, Distance holeIndex,
Distance topIndex, T value) {
Distance parent = (holeIndex - 1) / 2;
while (holeIndex > topIndex && *(first + parent) < value) {
*(first + holeIndex) = *(first + parent);
holeIndex = parent;
parent = (holeIndex - 1) / 2;
}
*(first + holeIndex) = value;
}
template <class RandomAccessIterator>
inline void pop_heap(RandomAccessIterator first, RandomAccessIterator last) {
__pop_heap_aux(first, last, value_type(first));
}
template <class RandomAccessIterator, class T>
inline void __pop_heap_aux(RandomAccessIterator first,
RandomAccessIterator last, T*) {
__pop_heap(first, last-1, last-1, T(*(last-1)), distance_type(first));
}
template <class RandomAccessIterator, class T, class Distance>
inline void __pop_heap(RandomAccessIterator first, RandomAccessIterator last,
RandomAccessIterator result, T value, Distance*) {
*result = *first;
__adjust_heap(first, Distance(0), Distance(last - first), value);
}
</span>