1. 程式人生 > >歸併排序和快速排序比較

歸併排序和快速排序比較

1 . 歸併排序和快速排序

1.1 歸併排序

歸併排序的思想就是講陣列分為兩部分然後對兩部分進行排序,然後講排序後的兩部分進行合併,主要的難度在於合併部分,合併的時候需要重新開一個臨時陣列儲存合併的結果,然後再複製到原陣列.
下面是歸併排序的python實現

# coding:utf-8
__author__ = 'devin'

'''
data: array
low,high: index of array
'''
def merge_sort(data, low, high):
    if low < high:
        mid = (low + high)/2
merge_sort(data, low, mid) merge_sort(data, mid+1, high) merge(data, low, mid, high) def merge(data, low, mid, high): temp = [] i = low j = mid+1 while i <= mid and j <= high: if data[i] <= data[j]: temp.append(data[i]) i += 1
else: temp.append(data[j]) j += 1 if i > mid: while j <= high: temp.append(data[j]) j += 1 else: while i <= mid: temp.append(data[i]) i += 1 i = low j = 0 while i <= high: data[i] = temp[j] i += 1
j += 1 if __name__ == "__main__": data = [1, 3, 2, 6, 3, 7, 2, 12, 15, 11, 10, 131, 1] merge_sort(data, 0, len(data) - 1) print data
1.1 基於連結串列的歸併排序

基於連結串列的歸併排序與一般使用的歸併排序演算法不同之處主要在於使用連結串列儲存原陣列元素的索引,開闢空間對索引排序,不改變原陣列元素的順序.該演算法使用陣列構造連結串列,排序後返回連結串列頭索引值,也是原陣列第一個元素的索引.這樣相比普通的歸併排序效率要好一些,省去了複製陣列的麻煩,在大規模資料的情況下可以提高一定的效能.
下面是演算法示例,使用python實現.

# coding: utf-8
__author__ = 'devin'
import random

class MergeSortLink(object):
    def __init__(self, data, link):
        self.data = data
        self.link = link

    def insert_sort(self, low, high):
        if low == high:
            return low
        head = low
        i = low + 1
        while i <= high:
            temp = self.data[i]
            p = head
            pre = p
            while p != -1:
                if temp >= self.data[p]:
                    pre = p
                    p = self.link[p]
                else:
                    break
            if p == -1:  # 插入尾部
                self.link[pre] = i
                self.link[i] = -1
            elif p == head:  # 插入頭部
                self.link[i] = p
                head = i
            else:  # 插入中間
                self.link[pre] = i
                self.link[i] = p
            i += 1
        return head

    def merge_sort_link(self, low, high):
        if high-low + 1 < 16:
            return self.insert_sort(low, high)
        # if low == high:
        #     return low
        else:
            mid = (low+high)/2
            q = self.merge_sort_link(low, mid)
            r = self.merge_sort_link(mid+1, high)
            return self.merge(q, r)

    def merge(self, q, r):
        i = q
        j = r
        p = None
        k = 0
        while True:
            if self.data[i] <= self.data[j]:
                if p is None:
                    p = i
                else:
                    self.link[k] = i
                k = i
                i = self.link[i]
            else:
                if p is None:
                    p = j
                else:
                    self.link[k] = j
                k = j
                j = self.link[j]
            if i == -1 or j == -1:
                break

        if i == -1:
            self.link[k] = j
        else:
            self.link[k] = i
        return p

    def print_link(self, p):
        sorted_data = []
        while p != -1:
            sorted_data.append(self.data[p])
            p = self.link[p]
        print sorted_data

if __name__ == "__main__":
    test_data = [random.randint(1, 100) for i in range(50)]
    print test_data
    link = [-1 for i in range(len(test_data))]  # -1表示連結串列結束,為了與索引0區分,所以不能用0
    sort_link = MergeSortLink(test_data, link)
    p = sort_link.merge_sort_link(0, len(test_data)-1)
    sort_link.print_link(p)
1.2 快速排序

快速排序的思想很簡單,在陣列中選擇一個數,將陣列劃分為小於和大於該數兩個部分,然後在這兩個部分進行遞迴快速排序,因此演算法的核心就是劃分數的選擇.
下面示例程式碼使用python實現.

# coding: utf-8
__author__ = 'devin'
import random

def partition(data, low, high):
    index = random.randint(low, high)
    temp = data[index]
    data[index] = data[low]
    data[low] = temp
    i = low
    j = high
    v = data[low]
    while True:
        while i <= high and data[i] <= v:
            i += 1
        while j >= low and data[j] > v:
            j -= 1
        if i < j:
            temp = data[i]
            data[i] = data[j]
            data[j] = temp
        else:
            break
    data[low] = data[j]
    data[j] = v
    return j


def quick_sort(data, low, high):
    if low < high:
        p = partition(data, low, high)
        quick_sort(data, low, p-1)
        quick_sort(data, p+1, high)

if __name__ == "__main__":
    test_data = [random.randint(1, 100) for i in range(40)]
    print test_data
    quick_sort(test_data, 0, len(test_data)-1)
    print test_data

2 演算法測試

使用長讀分別為100,200,300,400,500,600,700,800,900,1000的是個陣列排列統計第一節中兩個演算法的時間複雜度

2.1 編寫測試程式

為了方便測試,編寫一個測試程式,程式的輸入為資料規模因子,比如輸入100,則測試的10個數組每個陣列為i*100,即100,200,300,400,500,600,700,800,900,1000.然後用python的圖形庫matplotlib輸出三個演算法的時間-規模折線圖.
測試程式碼如下:

# coding: utf-8
__author__ = 'devin'
import time
import random
from MergeSort import merge_sort
from MergeSortL import MergeSortLink
from QuickSort import quick_sort
import matplotlib.pyplot as plt


if __name__ == '__main__':
    factor = raw_input()
    data_scale = [i * int(factor) for i in range(1, 11)]
    merge_sort_time = []
    merge_sort_l_time = []
    quick_sort_time = []
    print data_scale
    for i in range(10):
        scale = data_scale[i]
        test_data = [random.randint(1, scale*2) for i in range(scale)]

        data_1 = test_data
        start = time.time()
        merge_sort(data_1, 0, len(data_1) - 1)
        end = time.time()
        merge_sort_time.append(end-start)

        data_2 = test_data
        link = [-1 for i in range(len(data_2))]  # -1表示連結串列結束,為了與索引0區分,所以不能用0
        start = time.time()
        sort_link = MergeSortLink(test_data, link)
        p = sort_link.merge_sort_link(0, len(data_2)-1)
        end = time.time()
        # sort_link.print_link(p)
        merge_sort_l_time.append(end-start)

        data_3 = test_data
        start = time.time()
        quick_sort(data_3, 0, len(data_3) - 1)
        end = time.time()
        # print data_3
        quick_sort_time.append(end-start)

    print "Scale: ", data_scale
    print "Merge: ", merge_sort_time
    print "MergeL: ", merge_sort_l_time
    print "QuickSort: ", quick_sort_time

    merge_sort_plot = plt.plot(data_scale, merge_sort_time, 'b', label='MergeSort')
    merge_sort_l_plot = plt.plot(data_scale, merge_sort_l_time, 'r', label='MergeSortLink')
    quick_sort_plot = plt.plot(data_scale, quick_sort_time, 'g', label='QuickSort')

    quick_sort_plot = plt.plot(data_scale,
                               [quick_sort_time[0]*data_scale[i]/data_scale[0] for i in range(10)],
                               'y', label='O(n)')
    max_time = max(merge_sort_time[9], merge_sort_l_time[9], quick_sort_time[9])
    plt.xlabel("Scale")
    plt.ylabel("time")
    plt.ylim(0, max_time * 1.2)
    plt.title('Algorithm Time & Data Scale')
    plt.legend()
    plt.show()
2.2 測試分析

由於再規模因為為100時,執行時間很短,因此測試因子選擇10000,測試結果如下圖所示


在折線圖中,黃色實線代表時間複雜度為O(n), 藍色是普通的歸併排序,紅色是基於連結串列的歸併排序,綠色是快速排序,可以清楚的看到三種排序演算法的時間複雜度明顯大於O(n), 小於O(n2), 而且再資料規模較小的情況下,三者的排序所用時間差不多,資料規模較大是,快速排序和基於連結串列的歸併排序明顯優於普通的歸併排序,快速排序效果最好,因此再大規模資料下快速排序的速度最快,他們的時間複雜度均為 O(nlogn)