歸併排序和快速排序比較
1 . 歸併排序和快速排序
1.1 歸併排序
歸併排序的思想就是講陣列分為兩部分然後對兩部分進行排序,然後講排序後的兩部分進行合併,主要的難度在於合併部分,合併的時候需要重新開一個臨時陣列儲存合併的結果,然後再複製到原陣列.
下面是歸併排序的python
實現
# coding:utf-8
__author__ = 'devin'
'''
data: array
low,high: index of array
'''
def merge_sort(data, low, high):
if low < high:
mid = (low + high)/2
merge_sort(data, low, mid)
merge_sort(data, mid+1, high)
merge(data, low, mid, high)
def merge(data, low, mid, high):
temp = []
i = low
j = mid+1
while i <= mid and j <= high:
if data[i] <= data[j]:
temp.append(data[i])
i += 1
else:
temp.append(data[j])
j += 1
if i > mid:
while j <= high:
temp.append(data[j])
j += 1
else:
while i <= mid:
temp.append(data[i])
i += 1
i = low
j = 0
while i <= high:
data[i] = temp[j]
i += 1
j += 1
if __name__ == "__main__":
data = [1, 3, 2, 6, 3, 7, 2, 12, 15, 11, 10, 131, 1]
merge_sort(data, 0, len(data) - 1)
print data
1.1 基於連結串列的歸併排序
基於連結串列的歸併排序與一般使用的歸併排序演算法不同之處主要在於使用連結串列儲存原陣列元素的索引,開闢空間對索引排序,不改變原陣列元素的順序.該演算法使用陣列構造連結串列,排序後返回連結串列頭索引值,也是原陣列第一個元素的索引.這樣相比普通的歸併排序效率要好一些,省去了複製陣列的麻煩,在大規模資料的情況下可以提高一定的效能.
下面是演算法示例,使用python
實現.
# coding: utf-8
__author__ = 'devin'
import random
class MergeSortLink(object):
def __init__(self, data, link):
self.data = data
self.link = link
def insert_sort(self, low, high):
if low == high:
return low
head = low
i = low + 1
while i <= high:
temp = self.data[i]
p = head
pre = p
while p != -1:
if temp >= self.data[p]:
pre = p
p = self.link[p]
else:
break
if p == -1: # 插入尾部
self.link[pre] = i
self.link[i] = -1
elif p == head: # 插入頭部
self.link[i] = p
head = i
else: # 插入中間
self.link[pre] = i
self.link[i] = p
i += 1
return head
def merge_sort_link(self, low, high):
if high-low + 1 < 16:
return self.insert_sort(low, high)
# if low == high:
# return low
else:
mid = (low+high)/2
q = self.merge_sort_link(low, mid)
r = self.merge_sort_link(mid+1, high)
return self.merge(q, r)
def merge(self, q, r):
i = q
j = r
p = None
k = 0
while True:
if self.data[i] <= self.data[j]:
if p is None:
p = i
else:
self.link[k] = i
k = i
i = self.link[i]
else:
if p is None:
p = j
else:
self.link[k] = j
k = j
j = self.link[j]
if i == -1 or j == -1:
break
if i == -1:
self.link[k] = j
else:
self.link[k] = i
return p
def print_link(self, p):
sorted_data = []
while p != -1:
sorted_data.append(self.data[p])
p = self.link[p]
print sorted_data
if __name__ == "__main__":
test_data = [random.randint(1, 100) for i in range(50)]
print test_data
link = [-1 for i in range(len(test_data))] # -1表示連結串列結束,為了與索引0區分,所以不能用0
sort_link = MergeSortLink(test_data, link)
p = sort_link.merge_sort_link(0, len(test_data)-1)
sort_link.print_link(p)
1.2 快速排序
快速排序的思想很簡單,在陣列中選擇一個數,將陣列劃分為小於和大於該數兩個部分,然後在這兩個部分進行遞迴快速排序,因此演算法的核心就是劃分數的選擇.
下面示例程式碼使用python
實現.
# coding: utf-8
__author__ = 'devin'
import random
def partition(data, low, high):
index = random.randint(low, high)
temp = data[index]
data[index] = data[low]
data[low] = temp
i = low
j = high
v = data[low]
while True:
while i <= high and data[i] <= v:
i += 1
while j >= low and data[j] > v:
j -= 1
if i < j:
temp = data[i]
data[i] = data[j]
data[j] = temp
else:
break
data[low] = data[j]
data[j] = v
return j
def quick_sort(data, low, high):
if low < high:
p = partition(data, low, high)
quick_sort(data, low, p-1)
quick_sort(data, p+1, high)
if __name__ == "__main__":
test_data = [random.randint(1, 100) for i in range(40)]
print test_data
quick_sort(test_data, 0, len(test_data)-1)
print test_data
2 演算法測試
使用長讀分別為100,200,300,400,500,600,700,800,900,1000的是個陣列排列統計第一節中兩個演算法的時間複雜度
2.1 編寫測試程式
為了方便測試,編寫一個測試程式,程式的輸入為資料規模因子,比如輸入100,則測試的10個數組每個陣列為i*100,即100,200,300,400,500,600,700,800,900,1000.然後用python
的圖形庫matplotlib
輸出三個演算法的時間-規模折線圖.
測試程式碼如下:
# coding: utf-8
__author__ = 'devin'
import time
import random
from MergeSort import merge_sort
from MergeSortL import MergeSortLink
from QuickSort import quick_sort
import matplotlib.pyplot as plt
if __name__ == '__main__':
factor = raw_input()
data_scale = [i * int(factor) for i in range(1, 11)]
merge_sort_time = []
merge_sort_l_time = []
quick_sort_time = []
print data_scale
for i in range(10):
scale = data_scale[i]
test_data = [random.randint(1, scale*2) for i in range(scale)]
data_1 = test_data
start = time.time()
merge_sort(data_1, 0, len(data_1) - 1)
end = time.time()
merge_sort_time.append(end-start)
data_2 = test_data
link = [-1 for i in range(len(data_2))] # -1表示連結串列結束,為了與索引0區分,所以不能用0
start = time.time()
sort_link = MergeSortLink(test_data, link)
p = sort_link.merge_sort_link(0, len(data_2)-1)
end = time.time()
# sort_link.print_link(p)
merge_sort_l_time.append(end-start)
data_3 = test_data
start = time.time()
quick_sort(data_3, 0, len(data_3) - 1)
end = time.time()
# print data_3
quick_sort_time.append(end-start)
print "Scale: ", data_scale
print "Merge: ", merge_sort_time
print "MergeL: ", merge_sort_l_time
print "QuickSort: ", quick_sort_time
merge_sort_plot = plt.plot(data_scale, merge_sort_time, 'b', label='MergeSort')
merge_sort_l_plot = plt.plot(data_scale, merge_sort_l_time, 'r', label='MergeSortLink')
quick_sort_plot = plt.plot(data_scale, quick_sort_time, 'g', label='QuickSort')
quick_sort_plot = plt.plot(data_scale,
[quick_sort_time[0]*data_scale[i]/data_scale[0] for i in range(10)],
'y', label='O(n)')
max_time = max(merge_sort_time[9], merge_sort_l_time[9], quick_sort_time[9])
plt.xlabel("Scale")
plt.ylabel("time")
plt.ylim(0, max_time * 1.2)
plt.title('Algorithm Time & Data Scale')
plt.legend()
plt.show()
2.2 測試分析
由於再規模因為為100時,執行時間很短,因此測試因子選擇10000,測試結果如下圖所示
在折線圖中,黃色實線代表時間複雜度為O(n), 藍色是普通的歸併排序,紅色是基於連結串列的歸併排序,綠色是快速排序,可以清楚的看到三種排序演算法的時間複雜度明顯大於O(n), 小於