三、集合資料型別Collection Data Types
一、序列型別Sequence Types
Python提供了5中內建的序列型別,分別是bytearray, bytes, list, str, and tuple,其中前兩者會在第7章檔案處理時會用到,其他序列型別由標準庫提供,例如collections.namedtuple。這一節主要介紹tuples, named tuples, and lists。
1、元組Tuples
與string類似,元組不可修改,如若想修改元組,利用list函式使其轉換成list資料型別。tuple()函式返回空元組。
——Shallow and deep copying
t.count(x) 函式返回t元組中x物件出現的次數
t.index(x) 函式返回t元組中x物件第一次出現的索引位置(如果沒有引發ValueError異常)
示例:
>>> hair = "black","brown", "blonde", "red"
>>> hair[:2], "gray",hair[2:]
(('black', 'brown'), 'gray', ('blonde','red'))
>>> hair[:2] + ("gray",)+ hair[2:] #返回包含所有項的單個元組(concatenate tuples)
('black', 'brown', 'gray', 'blonde', 'red')
示例(本書的程式設計風格就是這種,在二元運算子的左邊、一元運算子的右邊不加括號):
a, b = (1, 2) # left of binary operator
del a, b # right of unary statement
def f(x):
return x, x ** 2 # right ofunary statement
for x, y in ((1, 1), (2, 4), (3, 9)): # left of binary operator
print(x, y)
示例(巢狀元組):
>>> things = (1, -7.5,("pea", (5, "Xyz"), "queue"))
>>> things[2][1][1][2]
'z'
巢狀元組中的item資料型別可以是任意型別,巢狀太深容易讓人迷惑,可以使用這種辦法:
>>> MANUFACTURER, MODEL, SEATING =(0, 1, 2)
>>> MINIMUM, MAXIMUM = (0, 1)
>>> aircraft =("Airbus", "A320-200", (100, 220))
>>> aircraft[SEATING][MAXIMUM]
220
2、Named Tuples
Python物件可以替代Named Tuples
3、List 列表
與string、tuple不同,list是可變的,我們可以再列表上進行插入、替換、刪除操作。此外列表可以被巢狀、迭代、切片,與tuple相同。
Table 3.1. List Methods |
|
Syntax |
Description |
L.append(x) |
Appends item x to the end of list L |
L.count(x) |
Returns the number of times item x occurs in list L |
L.extend(m) L += m |
Appends all of iterable m's items to the end of list L; the operator += does the same thing |
L.index(x, start, end) |
Returns the index position of the leftmost occurrence of item x in list L (or in the start:end slice of L); otherwise, raises a ValueError exception |
L.insert(i, x) |
Inserts item x into list L at index position int i |
L.pop() |
Returns and removes the rightmost item of list L |
L.pop(i) |
Returns and removes the item at index position int i in L |
L.remove(x) |
Removes the leftmost occurrence of item x from list L, or raises a ValueError exception if x is not found |
L.reverse() |
Reverses list L in-place |
L.sort(...) |
Sorts list L in-place; this method accepts the same key and reverse optional arguments as the built-in sorted() |
——unpacking operator
>>> first, *rest = [9, 2, -4, 8,7]
>>> first, rest
(9, [2, -4, 8, 7])
>>> first, *mid, last ="Charles Philip Arthur George Windsor".split()
>>> first, mid, last
('Charles', ['Philip', 'Arthur', 'George'],'Windsor')
>>> *directories, executable = "/usr/local/bin/gvim".split("/")
>>> directories, executable
(['', 'usr', 'local', 'bin'], 'gvim')
——增加項
woods= ["Cedar", "Yew", "Fir"],表中兩種操作的結果是一樣的:
woods += ["Kauri", "Larch"] |
woods.extend(["Kauri", "Larch"]) |
woods =['Cedar', 'Yew', 'Fir', 'Kauri', 'Larch']
——修改項
——刪除項
4、List Comprehensions
***
二、集合型別Set Types
集合支援成員操作符in,size()函式,還支援set.isdisjoint()函式、比較函式和位運算子(適用於並集和交集的計算),Python提供兩個內建的set型別,可變的set型別和不可變的frozenset型別。
只有hashable物件被加入集合中,Hashable物件擁有__hash__()特別方法和__sq__()方法。
內建的可變資料型別:float, frozenset, int, str, and tuple是hashable的,所以可以加入set,與此同時內建的不可變資料型別:dict, list不可以加入set。
——Sets
Set是可以改變的,可以新增和刪除元素,但是其內部無序,所以不能根據索引訪問元素
S = {7, "veil", 0, -29,("x", 11), "sun", frozenset({8, 4, 7}), 913},注意是花括號
Table 3.2. Set Methods and Operators
Syntax |
Description |
s.add(x) |
Adds item x to set s if it is not already in s |
s.clear() |
Removes all the items from set s |
s.copy() |
Returns a shallow copy of set s |
s.difference(t) s - t |
Returns a new set that has every item that is in set s that is not in set t |
s.difference_update(t) s -= t |
Removes every item that is in set t from set s |
s.discard(x) |
Removes item x from set s if it is in s; see also set.remove() |
s.intersection(t) s & t |
Returns a new set that has each item that is in both set s and set t |
s.intersection_update(t) s &= t |
Makes set s contain the intersection of itself and set t |
s.isdisjoint(t) |
Returns TRue if sets s and t have no items in common |
s.issubset(t) s <= t |
Returns true if set s is equal to or a subset of set t; use s < t to test whether s is a proper subset of t |
s.issuperset(t) s >= t |
Returns true if set s is equal to or a superset of set t; use s > t to test whether s is a proper superset of t |
s.pop() |
Returns and removes a random item from set s, or raises a KeyError exception if s is empty |
s.remove(x) |
Removes item x from set s, or raises a KeyError exception if x is not in s; see also set.discard() |
s.symmetric_difference(t) s ^ t |
Returns a new set that has every item that is in set s and every item that is in set t, but excluding items that are in both sets |
s.symmetric_difference_update(t) s ^= t |
Makes set s contain the symmetric difference of itself and set t |
s.union(t) s | t |
Returns a new set that has all the items in set s and all the items in set t that are not in set s |
s.update(t) s |= t |
Adds every item in set t that is not in set s, to set s |
This method and its operator (if it has one) can also be used with frozensets. |
Set的一種常見的用途是快速的成員測試:
if len(sys.argv) == 1 or sys.argv[1] in{"-h", "--help"}:
另一種常見用於確保不處理重複的資料:
for ip in set(ips):
process_ip(ip)
另一種常見的用途是除掉不想要的項
filenames = set(filenames)
for makefile in {"MAKEFILE","Makefile", "makefile"}:
filenames.discard(makefile)
與之等價的語句:filenames = set(filenames) - {"MAKEFILE","Makefile", "makefile"}
——Set Comprehensions
{expression for item in iterable}
{expression for item in iterable ifcondition}
三、對映型別Mapping Types
Python提供了兩種對映型別,內建的字典型別dict和標準庫的collections.defaultdict。只有雜湊物件可以作為字典的鍵,所以不可變的資料型別如float,frozenset,int,str和tuple可以作為字典的鍵,但是可變型別,如字典,列表和set不能。
Dictionaries字典
生成字典的語法示例:
l d1 = dict({"id": 1948, "name":"Washer", "size": 3})
l d2 = dict(id=1948, name="Washer", size=3)
l d3 = dict([("id", 1948), ("name","Washer"), ("size", 3)])
l d4 = dict(zip(("id", "name", "size"),(1948, "Washer", 3)))
l d5 = {"id": 1948, "name": "Washer","size": 3}
Table3.3. Dictionary Methods
Syntax |
Description |
d.clear() |
Removes all items from dict d |
d.copy() |
Returns a shallow copy of dict d |
d.fromkeys(s, v) |
Returns a dict whose keys are the items in sequence s and whose values are None or v if v is given |
d.get(k) |
Returns key k's associated value, or None if k isn't in dict d |
d.get(k, v) |
Returns key k's associated value, or v if k isn't in dict d |
d.items() |
Returns a view[*] of all the (key, value) pairs in dict d |
d.keys() |
Returns a view[*] of all the keys in dict d |
d.pop(k) |
Returns key k's associated value and removes the item whose key is k, or raises a KeyError exception if k isn't in d |
d.pop(k, v) |
Returns key k's associated value and removes the item whose key is k, or returns v if k isn't in dict d |
d.popitem() |
Returns and removes an arbitrary (key, value) pair from dict d, or raises a KeyError exception if d is empty |
d.setdefault(k, v) |
The same as the dict.get() method, except that if the key is not in dict d, a new item is inserted with the key k, and with a value of None or of v if v is given |
d.update(a) |
Adds every (key, value) pair from a that isn't in dict d to d, and for every key that is in both d and a, replaces the corresponding value in d with the one in a—a can be a dictionary, an iterable of (key, value) pairs, or keyword arguments |
d.values() |
Returns a view[*] of all the values in dict d |
遍歷字典:
for item in d.items():
print(item[0], item[1])
for key, value in d.items():
print(key, value)
Dictionary Comprehensions
Default dictionaries與字典(Plain Dictionaries)有相同的操作符和方法,唯一不同的是它們鍵缺失的處理方式。比較下表兩個程式碼段的不同:
words是Plain Dictionarie words[word] = words.get(word, 0) + 1 |
words是Default dictionaries words = collections.defaultdict(int) words[word] += 1 |
四、迭代和拷貝集合Iterating and Copying Collections
——迭代器、可迭代操作和函式(Iterators and Iterable Operations and Functions)
iterable data type(可迭代資料型別),有__iter__()方法,可提供迭代器;
Iterator是迭代器提供__next__()method,迭代結束引發StopIteration exception
Table3.4. Common Iterable Operators and Functions
Syntax |
Description |
s + t |
Returns a sequence that is the concatenation of sequences s and t |
s * n |
Returns a sequence that is int n concatenations of sequence s |
x in i |
Returns TRue if item x is in iterable i; use not in to reverse the test |
all(i) |
Returns true if every item in iterable i evaluates to true |
any(i) |
Returns true if any item in iterable i evaluates to TRue |
enumerate(i, start) |
Normally used in for ... in loops to provide a sequence of (index, item) tuples with indexes starting at 0 or start; see text |
len(x) |
Returns the "length" of x. If x is a collection it is the number of items; if x is a string it is the number of characters. |
max(i, key) |
Returns the biggest item in iterable i or the item with the biggest key(item) value if a key function is given |
min(i, key) |
Returns the smallest item in iterable i or the item with the smallest key(item) value if a key function is given |
range(start, stop, step) |
Returns an integer iterator. With one argument (stop), the iterator goes from 0 to stop - 1; with two arguments (start, stop) the iterator goes from start to stop - 1; with three arguments it goes from start to stop - 1 in steps of step. |
reversed(i) |
Returns an iterator that returns the items from iterator i in reverse order |
sorted(i, key, reverse) |
Returns a list of the items from iterator i in sorted order; key is used to provide DSU (Decorate, Sort, Undecorate) sorting. If reverse is TRue the sorting is done in reverse order. |
sum(i, start) |
Returns the sum of the items in iterable i plus start (which defaults to 0); i may not contain strings |
zip(i1, ..., iN) |
Returns an iterator of tuples using the iterators i1 to iN; see text |
當使用for item in iterable迴圈語句時,Python內部實際上呼叫iter(iterable)獲得一個迭代器:
product = 1 for i in [1, 2, 4, 8]: product *= i print(product) # prints: 64 |
product = 1 i = iter([1, 2, 4, 8]) while True: try: product *= next(i) except StopIteration: break print(product) # prints: 64 |
——enumerate()函式的用法:
引數時迭代器,返回enumerator物件,該物件本身也可以是迭代器,每一次迭代返回一個2-tuple,元組中第一項是iteration number(預設從0開始),並且the second item the next item from the iterator enumerate() wascalled on。
if len(sys.argv) < 3: print("usage: grepword.py word infile1 [infile2 [... infileN]]") sys.exit() word = sys.argv[1] for filename in sys.argv[2:]: for lino, line in enumerate(open(filename), start=1): if word in line: print("{0}:{1}:{2:.40}".format(filename, lino, line.rstrip())) |
unpack an iterable對可迭代物件的“解引用”操作有* 和range,示例如下(calculate是接受4個引數的函式):
calculate(1, 2, 3, 4)
t = (1, 2, 3, 4)
calculate(*t)
calculate(*range(1, 5))
——sorted函式和reversed函式
另外兩個和迭代相關的函式,sorted函式返回一個拷貝,reversed函式返回一個逆向迭代器
>>> list(range(6))
[0, 1, 2, 3, 4, 5]
>>> list(reversed(range(6)))
[5, 4, 3, 2, 1, 0]
其中sorted()函式的用法更復雜一些,該函式應用的示例有:
>>> x = [] >>> for t in zip(range(-10, 0, 1), range(0, 10, 2), range(1, 10, 2)): ... x += t >>> x [-10, 0, 1, -9, 2, 3, -8, 4, 5, -7, 6, 7, -6, 8, 9] >>> sorted(x) [-10, -9, -8, -7, -6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> sorted(x, reverse=True) [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -6, -7, -8, -9, -10] >>> sorted(x, key=abs) [0, 1, 2, 3, 4, 5, 6, -6, -7, 7, -8, 8, -9, 9, -10] |
兩段程式碼在功能上是等價的:
x = sorted(x, key=str.lower) |
temp = [] for item in x: temp.append((item.lower(), item)) x = [] for key, value in sorted(temp): x.append(value) |
Python提供的排序演算法是自適應的穩定的歸併排序演算法(adaptive stable mergesort),Python排序是用的是”<”,集合內部巢狀集合,Python的排序演算法同樣要給排序。
——Copying Collections
淺拷貝 |
深拷貝 |
淺拷貝初始: >>> songs = ["Because", "Boys", "Carol"] >>> beatles = songs >>> beatles, songs (['Because', 'Boys', 'Carol'], ['Because', 'Boys', 'Carol']) >>> beatles[2] = "Cayenne" >>> beatles, songs (['Because', 'Boys', 'Cayenne'], ['Because', 'Boys', 'Cayenne']) |
>>> x = [53, 68, ["A", "B", "C"]] >>> y = x[:] # shallow copy >>> x, y ([53, 68, ['A', 'B', 'C']], [53, 68, ['A', 'B', 'C']]) >>> y[1] = 40 >>> x[2][0] = 'Q' >>> x, y ([53, 68, ['Q', 'B', 'C']], [53, 40, ['Q', 'B', 'C']]) 與之對比 >>> import copy >>> x = [53, 68, ["A", "B", "C"]] >>> y = copy.deepcopy(x) >>> y[1] = 40 >>> x[2][0] = 'Q' >>> x, y ([53, 68, ['Q', 'B', 'C']], [53, 40, ['A', 'B', 'C']]) |
淺拷貝進一步: 對於字典dict和集合而言 dict.copy() and set.copy() copy模組的copy()方法同樣返回物件的一份拷貝 另一種辦法就是,對於內建型別的拷貝,可以把其為引數傳遞給型別同名函式,示例: copy_of_dict_d = dict(d) copy_of_list_L = list(L) copy_of_set_s = set(s) |