數學之路-python計算實戰(4)-Lempel-Ziv壓縮(2)
Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of ‘<‘, ‘>‘, ‘!‘ or ‘=‘
本博客所有內容是原創,假設轉載請註明來源
http://blog.csdn.net/myhaspl/
Format | C Type | Python type | Standard size | Notes |
---|---|---|---|---|
x | pad byte | no value | ||
c | char | string of length 1 | 1 | |
b | signed char | integer | 1 | (3) |
B | unsigned char | integer | 1 | (3) |
? | _Bool | bool | 1 | (1) |
h | short | integer | 2 | (3) |
H | unsigned short | integer | 2 | (3) |
i | int | integer | 4 | (3) |
I | unsigned int | integer | 4 | (3) |
l | long | integer | 4 | (3) |
L | unsigned long | integer | 4 | (3) |
q | long long | integer | 8 | (2), (3) |
Q | unsigned long long | integer | 8 | (2), (3) |
f | float | float | 4 | (4) |
d | double | float | 8 | (4) |
s | char[] | string | ||
p | char[] | string | ||
P | void * | integer | (5), (3) |
Return a string containing the values v1,
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
# -*- coding: utf-8 -*- #lempel-ziv算法 #code:[email protected] import struct mystr="" print "\n讀取源文件".decode("utf8") mytextfile= open(‘test2.txt‘,‘r‘) try: mystr=mytextfile.read( ) finally: mytextfile.close() my_str=mystr #碼表 codeword_dictionary={} #待壓縮文本長度 str_len=len(my_str) #碼字最大長度 dict_maxlen=1 #將解析文本段的位置(下一次解析文本的起點) now_index=0 #碼表的最大索引 max_index=0 #壓縮後數據 print "\n生成壓縮數據中".decode("utf8") compresseddata=[] while (now_index<str_len): #向後移動步長 mystep=0 #當前匹配長度 now_len=dict_maxlen if now_len>str_len-now_index: now_len=str_len-now_index #查找到的碼表索引。0表示沒有找到 cw_addr=0 while (now_len>0): cw_index=codeword_dictionary.get(my_str[now_index:now_index+now_len]) if cw_index!=None: #找到碼字 cw_addr=cw_index mystep=now_len break now_len-=1 if cw_addr==0: #沒有找到碼字,添加新的碼字 max_index+=1 mystep=1 codeword_dictionary[my_str[now_index:now_index+mystep]]=max_index print "don‘t find the Code word,add Code word:%s index:%d"%(my_str[now_index:now_index+mystep],max_index) else: #找到碼字,添加新的碼字 max_index+=1 if now_index+mystep+1<=str_len: codeword_dictionary[my_str[now_index:now_index+mystep+1]]=max_index if mystep+1>dict_maxlen: dict_maxlen=mystep+1 print "find the Code word:%s add Code word:%s index:%d"%(my_str[now_index:now_index+now_len],my_str[now_index:now_index+mystep+1],max_index) ....... ...... my_codeword_dictionary[my_maxindex]=my_codeword_dictionary[cwkey]+cwlaster uncompressdata.append(my_codeword_dictionary[cwkey]) uncompressdata.append(cwlaster) print ".", uncompress_str=uncompress_str.join(uncompressdata) uncompressstr=uncompress_str print "\n將解壓結果寫入文件裏..\n".decode("utf8") uncompress_file= open(‘uncompress.txt‘,‘w‘) try: uncompress_file.write(uncompressstr) print "\n解壓成功,已解壓到uncompress.txt!\n".decode("utf8") finally: uncompress_file.close()
以下對中文維基中對python的解釋文本進行壓縮:
調用該程序先壓縮形成壓縮文件,然後打開壓縮文件解壓
$ pypy lempel-ziv-compress.py python.txt python.lzv
………………..
find the Code word: C add Code word: CP index:9938
index:9939de word:ython add Code word:ython
find the Code word:
^ add Code word:
^ h index:9940
find the Code word:ttp add Code word:ttp: index:9941
find the Code word:// add Code word://e index:9942
find the Code word:dit add Code word:ditr index:9943
find the Code word:a. add Code word:a.o index:9944
生成壓縮數據頭部
將壓縮數據寫入壓縮文件裏
…………….
. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .
將解壓結果寫入文件裏..
解壓成功,已解壓到uncompress.txt!
查看壓縮效果:
$ ls -l -h
…………….
-rw-rw-r-- 1 deep deep 5.0K Jul 1 20:55 lempel-ziv-compress.py
-rw-rw-r-- 1 deep deep 30K Jul 1 20:55 python.lzv
-rw-rw-r-- 1 deep deep 36K Jul 1 20:57 python.txt
-rw-rw-r-- 1 deep deep 36K Jul 1 20:55 uncompress.txt從上面顯示結果能夠看到,沒壓縮前為36K,壓縮後為30k
壓縮sqlite 3.8.5的所有源代碼
$ pypy lempel-ziv-compress.py sqlitesrc.txtsqlitesrc.lzv
查看壓縮效果:
$ ls -l -h
…………….
-rw-rw-r-- 1 deep deep 3.2M Jul 1 21:18 sqlitesrc.lzv
-rw-rw-r-- 1 deep deep 5.2M Jul 1 21:16 sqlitesrc.txt
-rw-rw-r-- 1 deep deep 5.2M Jul 1 21:18 uncompress.txt
沒壓縮前為5.2M,壓縮後為3.2M
數學之路-python計算實戰(4)-Lempel-Ziv壓縮(2)