Python實現Huffman編碼

阿新 • • 發佈：2018-12-14

基於Huffman編碼的壓縮軟體的Python實現

版權宣告：本文為博主原創文章，未經博主允許不得轉載。 https://blog.csdn.net/xanxus46/article/details/41359841

哈夫曼編碼是利用貪心演算法進行文字壓縮的演算法，其演算法思想是首先統計檔案中各字元出現的次數，儲存到陣列中，然後將各字元按照次數升序排序，挑選次數最小的兩個元素進行連結形成子樹，子樹的次數等於兩節點的次數之和，接著把兩個元素從陣列刪除，將子樹放入陣列，重新排序，重複以上步驟。為了解壓，在壓縮時首先往檔案中填入huffman編碼的對映表的長度，該表的序列化字串，編碼字串分組後最後一組的長度（編碼後字串長度模上分組長度），最後再填充編碼後的字串。本演算法中以一個位元組，8位作為分組長度，將編碼後二進位制字串一一分組。程式碼如下：

__author__ = 'linfuyuan'

import struct

import pickle

type = int(raw_input('please input the type number(0 for compress, 1 for decompress):'))

file = raw_input('please input the filepath:')

class Node:

def __init__(self):

self.value = ''

self.left = None

self.right = None

self.frequency = 0

self.code = ''

# let the unique value be the key in the map

def change_value_to_key(huffmap):

map = {}

for (key, value) in huffmap.items():

map[value] = key

return map

if type == 0:

origindata = ''

# count the frequency of each letter

lettermap = {}

def give_code(node):

if node.left:

node.left.code = '%s%s' % (node.code, '0')

give_code(node.left)

if node.right:

node.right.code = '%s%s' % (node.code, '1')

give_code(node.right)

def print_code(node):

if not node.left and not node.right:

print "%s %s" % (node.value, node.code)

if node.left:

print_code(node.left)

if node.right:

print_code(node.right)

def save_code(map, node):

if not node.left and not node.right:

map[node.value] = node.code

if node.left:

save_code(map, node.left)

if node.right:

save_code(map, node.right)

with open(file)as f:

for line in f.readlines():

origindata += line

for j in line:

if lettermap.get(j):

lettermap[j] += 1

else:

lettermap[j] = 1

nodelist = []

for (key, value) in lettermap.items():

node = Node()

node.value = key

node.frequency = value

nodelist.append(node)

nodelist.sort(cmp=lambda n1, n2: cmp(n1.frequency, n2.frequency))

for i in range(len(nodelist) - 1):

node1 = nodelist[0]

node2 = nodelist[1]

node = Node()

node.left = node1

node.right = node2

node.frequency = node1.frequency + node2.frequency

nodelist[0] = node

nodelist.pop(1)

nodelist.sort(cmp=lambda n1, n2: cmp(n1.frequency, n2.frequency))

# give the code

root = nodelist[0]

give_code(root)

huffman_map = {}

# save the node code to a map

save_code(huffman_map, root)

code_data = ''

for letter in origindata:

code_data += huffman_map[letter]

output_data = ''

f = open('%s_compress' % file, 'wb')

huffman_map_bytes = pickle.dumps(huffman_map)

f.write(struct.pack('I', len(huffman_map_bytes)))

f.write(struct.pack('%ds' % len(huffman_map_bytes), huffman_map_bytes))

f.write(struct.pack('B', len(code_data) % 8))

for i in range(0, len(code_data), 8):

if i + 8 < len(code_data):

f.write(struct.pack('B', int(code_data[i:i + 8], 2)))

else:

# padding

f.write(struct.pack('B', int(code_data[i:], 2)))

f.close()

print 'finished compressing'

if type == 1:

f = open(file, 'rb')

size = struct.unpack('I', f.read(4))[0]

huffman_map = pickle.loads(f.read(size))

left = struct.unpack('B', f.read(1))[0]

data = f.read(1)

datalist = []

while not data == '':

bdata = bin(struct.unpack('B', data)[0])[2:]

datalist.append(bdata)

data = f.read(1)

f.close()

for i in range(len(datalist) - 1):

datalist[i] = '%s%s' % ('0' * (8 - len(datalist[i])), datalist[i])

datalist[-1] = '%s%s' % ('0' * (left - len(datalist[-1])), datalist[-1])

encode_data = ''.join(datalist)

current_code = ''

huffman_map = change_value_to_key(huffman_map)

f = open('%s_origin' % file, 'w')

for letter in encode_data:

current_code += letter

if huffman_map.get(current_code):

f.write(huffman_map[current_code])

current_code = ''

f.close()

print 'finished decompressing'

raw_input('please press any key to quit')

程式碼中有用到pickle模組進行物件序列化，還有struct模組進行讀寫二進位制檔案。

由於演算法中運算量最⼤的地⽅在於迴圈⾥嵌套了排序,故演算法的時間複雜度是O(n2logn)。

經過壓縮後,檔案大⼩小分別為110KB和931KB。原來⼤⼩為190KB和 2.1MB,壓縮效果明顯。

希望對大家有用。

Python實現Huffman編碼

基於Huffman編碼的壓縮軟體的Python實現

Python實現Huffman編碼

自己用 python 實現 base64 編碼

用python實現base64編碼與解碼

【TensorFlow實戰】用Python實現自編碼器

PHP實現Huffman編碼/解碼

基於Huffman編碼的壓縮軟體的Python實現

python實現中文轉換url編碼的方法

Huffman編碼的實現

python實現基於單詞級one-hot編碼和字元級的one-hot編碼

Huffman編碼實現壓縮、解壓檔案

利用DPCM&Huffman編碼實現資料壓縮_C語言實現

huffman編碼——原理與實現

Python實現遺傳演算法（二進位制編碼）求函式最優值

利用huffman編碼實現壓縮檔案

Huffman編碼實現壓縮解壓縮

Huffman編碼的C語言實現

硬體設計測試中,MIPS彙編指令翻譯成二進位制編碼的Python實現

Huffman編碼---java實現

使用python實現批量轉換檔案編碼格式

HuffmanTree的實現及Huffman編碼

Python實現Huffman編碼

基於Huffman編碼的壓縮軟體的Python實現

相關推薦