1. 程式人生 > >Python Dict and File -- python字典與檔案讀寫

Python Dict and File -- python字典與檔案讀寫

Python Dict and File – python字典與檔案讀寫

標籤(空格分隔): Python

Dict Hash Table

Python的雜湊表結構叫做字典。基本形式為key:value的鍵值對的集合,被大括號包圍。string數字和turple都可以作為key,任何型別都可以作為value。可以使用in或者dict.get(key)來確認key是否在字典中。

## Can build up a dict by starting with the the empty dict {}
## and storing key/value pairs into the dict like this:
## dict[key] = value-for-that-key dict = {} dict['a'] = 'alpha' dict['g'] = 'gamma' dict['o'] = 'omega' print dict ## {'a': 'alpha', 'o': 'omega', 'g': 'gamma'} print dict['a'] ## Simple lookup, returns 'alpha' dict['a'] = 6 ## Put new key/value into dict 'a' in dict ## True ## print dict['z'] ## Throws KeyError
if 'z' in dict: print dict['z'] ## Avoid KeyError print dict.get('z') ## None (instead of KeyError)

for迴圈能遍歷一個字典的所有的key,而key的順序是任意的。dict.keysdict.values返回所有的key或者value。還有items(),它返回一系列的(key, value) tuple,這是最有效的確認字典中所有的鍵值資料的方法。這些list都可以傳遞給sorted函式。

## By default, iterating over a dict iterates over its keys.
## Note that the keys are in a random order. for key in dict: print key ## prints a g o ## Exactly the same as above for key in dict.keys(): print key ## Get the .keys() list: print dict.keys() ## ['a', 'o', 'g'] ## Likewise, there's a .values() list of values print dict.values() ## ['alpha', 'omega', 'gamma'] ## Common case -- loop over the keys in sorted order, ## accessing each key/value for key in sorted(dict.keys()): print key, dict[key] ## .items() is the dict expressed as (key, value) tuples print dict.items() ## [('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')] ## This loop syntax accesses the whole dict by looping ## over the .items() tuple list, accessing one (key, value) ## pair on each iteration. for k, v in dict.items(): print k, '>', v ## a > alpha o > omega g > gamma

有一種變體的iterkeys(), itervalues() , iteritems()可以避免建造全部的list,這在資料量很大的時候常用。

返回字典中值最大的鍵值對

temp[vector.keys()[argmax(vector.values())]] = max(vector.values())
字典按值排序
ll = sorted(dic.iteritems(), key=lambda d:d[1])

字典按鍵值排序
ll = sorted(dic.iteritems(), key=lambda d:d[0])

Dict Formatting

%操作符方便的把字典中的value代替為字串:

hash = {}
hash['word'] = 'garfield'
hash['count'] = 42
s = 'I want %(count)d copies of %(word)s' % hash  # %d for int, %s for string
# 'I want 42 copies of garfield'

A better way to add element to a dict插入字典高效方法

舉個例子,我們想統計一些元素的數目,通常來講,我們可能寫出如下的形式:

n = 16
myDict = {}
for i in range(0, n):
    char = 'abcd'[i%4]
    if char in myDict:
        myDict[char] += 1
    else:
         myDict[char] = 1
print(myDict)

那麼當dic很大的時候如下的程式碼就比上面的高效許多。

n = 16
myDict = {}
for i in range(0, n):
    char = 'abcd'[i%4]
    try:
        myDict[char] += 1
    except KeyError:
         myDict[char] = 1
print(myDict)

Del刪除操作

del操作符刪除元素,如:

var = 6
del var  # var no more!

list = ['a', 'b', 'c', 'd']
del list[0]     ## Delete first element
del list[-2:]   ## Delete last two elements
print list      ## ['b']

dict = {'a':1, 'b':2, 'c':3}
del dict['b']   ## Delete 'b' entry
print dict      ## {'a':1, 'c':3}

Files

open()函式開啟並且返回一個檔案代號,這可以接下來用來讀或者寫操作。f = open('name','r')的含義是開啟一個檔案傳遞給變數f,準備進行讀操作,可以用f.close()關閉。還可以使用'w'用來寫,'a'用來新增。特殊的'rU'用來將不同的行尾符轉化為'\n'for用來遍歷檔案的每一行很有效,不過注意這隻對text檔案有效,對二進位制檔案不起作用。

# Echo the contents of a file
f = open('foo.txt', 'rU')
for line in f:   ## iterates over the lines of the file
print line,    ## trailing , so print does not add an end-of-line char
               ## since 'line' already includes the end-of line.
f.close()

每次讀一行的操作可以避免使用過多的記憶體。f.readlines()method讀整個檔案加入記憶體,並且返回一個由每一行組成的list。而f.read()method讀整個檔案為一條字串。
對於寫操作來說,f.write()method是把資料寫入一個開啟的輸出檔案的最簡單的方法。或者用print >> f, string來列印到螢幕。

Files Unicode

codecs模組提供對於對於讀取Unicode檔案的支援。

import codecs

f = codecs.open('foo.txt', 'rU', 'utf-8')
for line in f:
# here line is a *unicode* string