1. 程式人生 > >三、集合資料型別Collection Data Types

三、集合資料型別Collection Data Types

一、序列型別Sequence Types

Python提供了5中內建的序列型別,分別是bytearray, bytes, list, str, and tuple,其中前兩者會在第7章檔案處理時會用到,其他序列型別由標準庫提供,例如collections.namedtuple。這一節主要介紹tuples, named tuples, and lists。

1、元組Tuples

與string類似,元組不可修改,如若想修改元組,利用list函式使其轉換成list資料型別。tuple()函式返回空元組。

——Shallow and deep copying

t.count(x) 函式返回t元組中x物件出現的次數

t.index(x) 函式返回t元組中x物件第一次出現的索引位置(如果沒有引發ValueError異常)

示例:

>>> hair = "black","brown", "blonde", "red"

>>> hair[:2], "gray",hair[2:]

(('black', 'brown'), 'gray', ('blonde','red'))

>>> hair[:2] + ("gray",)+ hair[2:] #返回包含所有項的單個元組(concatenate tuples)

('black', 'brown', 'gray', 'blonde', 'red')

示例(本書的程式設計風格就是這種,在二元運算子的左邊、一元運算子的右邊不加括號):

a, b = (1, 2)           # left of binary operator

del a, b                # right of unary statement

def f(x):

   return x, x ** 2    # right ofunary statement

for x, y in ((1, 1), (2, 4), (3, 9)):  # left of binary operator

   print(x, y)

示例(巢狀元組):

>>> things = (1, -7.5,("pea", (5, "Xyz"), "queue"))

>>> things[2][1][1][2]

'z'

巢狀元組中的item資料型別可以是任意型別,巢狀太深容易讓人迷惑,可以使用這種辦法:

>>> MANUFACTURER, MODEL, SEATING =(0, 1, 2)

>>> MINIMUM, MAXIMUM = (0, 1)

>>> aircraft =("Airbus", "A320-200", (100, 220))

>>> aircraft[SEATING][MAXIMUM]

220

2、Named Tuples

Python物件可以替代Named Tuples

3、List 列表

與string、tuple不同,list是可變的,我們可以再列表上進行插入、替換、刪除操作。此外列表可以被巢狀、迭代、切片,與tuple相同。

Table 3.1. List Methods

Syntax

Description

L.append(x)

Appends item x to the end of list L

L.count(x)

Returns the number of times item x occurs in list L

L.extend(m)

L += m

Appends all of iterable m's items to the end of list L; the operator += does the same thing

L.index(x, start, end)

Returns the index position of the leftmost occurrence of item x in list L (or in the start:end slice of L); otherwise, raises a ValueError exception

L.insert(i, x)

Inserts item x into list L at index position int i

L.pop()

Returns and removes the rightmost item of list L

L.pop(i)

Returns and removes the item at index position int i in L

L.remove(x)

Removes the leftmost occurrence of item x from list L, or raises a ValueError exception if x is not found

L.reverse()

Reverses list L in-place

L.sort(...)

Sorts list L in-place; this method accepts the same key and reverse optional arguments as the built-in sorted()

——unpacking operator

>>> first, *rest = [9, 2, -4, 8,7]

>>> first, rest

(9, [2, -4, 8, 7])

>>> first, *mid, last ="Charles Philip Arthur George Windsor".split()

>>> first, mid, last

('Charles', ['Philip', 'Arthur', 'George'],'Windsor')

>>> *directories, executable = "/usr/local/bin/gvim".split("/")

>>> directories, executable

(['', 'usr', 'local', 'bin'], 'gvim')

——增加項

woods= ["Cedar", "Yew", "Fir"],表中兩種操作的結果是一樣的:

woods += ["Kauri", "Larch"]

woods.extend(["Kauri", "Larch"])

woods =['Cedar', 'Yew', 'Fir', 'Kauri', 'Larch']

——修改項

——刪除項

4、List Comprehensions

***

二、集合型別Set Types

集合支援成員操作符in,size()函式,還支援set.isdisjoint()函式、比較函式和位運算子(適用於並集和交集的計算),Python提供兩個內建的set型別,可變的set型別和不可變的frozenset型別。

只有hashable物件被加入集合中,Hashable物件擁有__hash__()特別方法和__sq__()方法。

內建的可變資料型別:float, frozenset, int, str, and tuple是hashable的,所以可以加入set,與此同時內建的不可變資料型別:dict, list不可以加入set。

——Sets

Set是可以改變的,可以新增和刪除元素,但是其內部無序,所以不能根據索引訪問元素

S = {7, "veil", 0, -29,("x", 11), "sun", frozenset({8, 4, 7}), 913},注意是花括號


Table 3.2. Set Methods and Operators

Syntax

Description

s.add(x)

Adds item x to set s if it is not already in s

s.clear()

Removes all the items from set s

s.copy()

Returns a shallow copy of set s

s.difference(t) s - t

Returns a new set that has every item that is in set s that is not in set t

s.difference_update(t) s -= t

Removes every item that is in set t from set s

s.discard(x)

Removes item x from set s if it is in s; see also set.remove()

s.intersection(t) s & t

Returns a new set that has each item that is in both set s and set t

s.intersection_update(t) s &= t

Makes set s contain the intersection of itself and set t

s.isdisjoint(t)

Returns TRue if sets s and t have no items in common

s.issubset(t) s <= t

Returns true if set s is equal to or a subset of set t; use s < t to test whether s is a proper subset of t

s.issuperset(t) s >= t

Returns true if set s is equal to or a superset of set t; use s > t to test whether s is a proper superset of t

s.pop()

Returns and removes a random item from set s, or raises a KeyError exception if s is empty

s.remove(x)

Removes item x from set s, or raises a KeyError exception if x is not in s; see also set.discard()

s.symmetric_difference(t) s ^ t

Returns a new set that has every item that is in set s and every item that is in set t, but excluding items that are in both sets

s.symmetric_difference_update(t) s ^= t

Makes set s contain the symmetric difference of itself and set t

s.union(t) s | t

Returns a new set that has all the items in set s and all the items in set t that are not in set s

s.update(t) s |= t

Adds every item in set t that is not in set s, to set s

This method and its operator (if it has one) can also be used with frozensets.

Set的一種常見的用途是快速的成員測試:

if len(sys.argv) == 1 or sys.argv[1] in{"-h", "--help"}:

另一種常見用於確保不處理重複的資料:

for ip in set(ips):

       process_ip(ip)

另一種常見的用途是除掉不想要的項

filenames = set(filenames)

for makefile in {"MAKEFILE","Makefile", "makefile"}:

   filenames.discard(makefile)

與之等價的語句:filenames = set(filenames) - {"MAKEFILE","Makefile", "makefile"}

——Set Comprehensions

{expression for item in iterable}

{expression for item in iterable ifcondition}

三、對映型別Mapping Types

Python提供了兩種對映型別,內建的字典型別dict和標準庫的collections.defaultdict。只有雜湊物件可以作為字典的鍵,所以不可變的資料型別如float,frozenset,int,str和tuple可以作為字典的鍵,但是可變型別,如字典,列表和set不能。

Dictionaries字典

生成字典的語法示例:

l  d1 = dict({"id": 1948, "name":"Washer", "size": 3})

l  d2 = dict(id=1948, name="Washer", size=3)

l  d3 = dict([("id", 1948), ("name","Washer"), ("size", 3)])

l  d4 = dict(zip(("id", "name", "size"),(1948, "Washer", 3)))

l  d5 = {"id": 1948, "name": "Washer","size": 3}

Table3.3. Dictionary Methods

Syntax

Description

d.clear()

Removes all items from dict d

d.copy()

Returns a shallow copy of dict d

d.fromkeys(s, v)

Returns a dict whose keys are the items in sequence s and whose values are None or v if v is given

d.get(k)

Returns key k's associated value, or None if k isn't in dict d

d.get(k, v)

Returns key k's associated value, or v if k isn't in dict d

d.items()

Returns a view[*] of all the (key, value) pairs in dict d

d.keys()

Returns a view[*] of all the keys in dict d

d.pop(k)

Returns key k's associated value and removes the item whose key is k, or raises a KeyError exception if k isn't in d

d.pop(k, v)

Returns key k's associated value and removes the item whose key is k, or returns v if k isn't in dict d

d.popitem()

Returns and removes an arbitrary (key, value) pair from dict d, or raises a KeyError exception if d is empty

d.setdefault(k, v)

The same as the dict.get() method, except that if the key is not in dict d, a new item is inserted with the key k, and with a value of None or of v if v is given

d.update(a)

Adds every (key, value) pair from a that isn't in dict d to d, and for every key that is in both d and a, replaces the corresponding value in d with the one in a—a can be a dictionary, an iterable of (key, value) pairs, or keyword arguments

d.values()

Returns a view[*] of all the values in dict d

遍歷字典:

   for item in d.items():

       print(item[0], item[1])

   for key, value in d.items():

       print(key, value)

Dictionary Comprehensions

Default dictionaries與字典(Plain Dictionaries)有相同的操作符和方法,唯一不同的是它們鍵缺失的處理方式。比較下表兩個程式碼段的不同:

words是Plain Dictionarie

words[word] = words.get(word, 0) + 1  

words是Default dictionaries

words = collections.defaultdict(int)  

words[word] += 1


四、迭代和拷貝集合Iterating and Copying Collections

——迭代器、可迭代操作和函式(Iterators and Iterable Operations and Functions)

iterable data type(可迭代資料型別),有__iter__()方法,可提供迭代器;

Iterator是迭代器提供__next__()method,迭代結束引發StopIteration exception

Table3.4. Common Iterable Operators and Functions

Syntax

Description

s + t

Returns a sequence that is the concatenation of sequences s and t

s * n

Returns a sequence that is int n concatenations of sequence s

x in i

Returns TRue if item x is in iterable i; use not in to reverse the test

all(i)

Returns true if every item in iterable i evaluates to true

any(i)

Returns true if any item in iterable i evaluates to TRue

enumerate(i, start)

Normally used in for ... in loops to provide a sequence of (index, item) tuples with indexes starting at 0 or start; see text

len(x)

Returns the "length" of x. If x is a collection it is the number of items; if x is a string it is the number of characters.

max(i, key)

Returns the biggest item in iterable i or the item with the biggest key(item) value if a key function is given

min(i, key)

Returns the smallest item in iterable i or the item with the smallest key(item) value if a key function is given

range(start, stop, step)

Returns an integer iterator. With one argument (stop), the iterator goes from 0 to stop - 1; with two arguments (start, stop) the iterator goes from start to stop - 1; with three arguments it goes from start to stop - 1 in steps of step.

reversed(i)

Returns an iterator that returns the items from iterator i in reverse order

sorted(i, key, reverse)

Returns a list of the items from iterator i in sorted order; key is used to provide DSU (Decorate, Sort, Undecorate) sorting. If reverse is TRue the sorting is done in reverse order.

sum(i, start)

Returns the sum of the items in iterable i plus start (which defaults to 0); i may not contain strings

zip(i1, ..., iN)

Returns an iterator of tuples using the iterators i1 to iN; see text

當使用for item in iterable迴圈語句時,Python內部實際上呼叫iter(iterable)獲得一個迭代器:

    product = 1

    for i in [1, 2, 4, 8]:

        product *= i

    print(product)  # prints: 64

    product = 1

    i = iter([1, 2, 4, 8])

    while True:

        try:

            product *= next(i)

        except StopIteration:

            break

    print(product) # prints: 64

——enumerate()函式的用法:

引數時迭代器,返回enumerator物件,該物件本身也可以是迭代器,每一次迭代返回一個2-tuple,元組中第一項是iteration number(預設從0開始),並且the second item the next item from the iterator enumerate() wascalled on。

if len(sys.argv) < 3:

    print("usage: grepword.py word infile1 [infile2 [... infileN]]")

    sys.exit()

word = sys.argv[1]

for filename in sys.argv[2:]:

    for lino, line in enumerate(open(filename), start=1):

        if word in line:

            print("{0}:{1}:{2:.40}".format(filename, lino,

                                           line.rstrip()))

unpack an iterable對可迭代物件的“解引用”操作有* 和range,示例如下(calculate是接受4個引數的函式):

calculate(1, 2, 3, 4)

t = (1, 2, 3, 4)

calculate(*t)

calculate(*range(1, 5))

——sorted函式和reversed函式

另外兩個和迭代相關的函式,sorted函式返回一個拷貝,reversed函式返回一個逆向迭代器

>>> list(range(6))

[0, 1, 2, 3, 4, 5]

>>> list(reversed(range(6)))

[5, 4, 3, 2, 1, 0]

其中sorted()函式的用法更復雜一些,該函式應用的示例有:

>>> x = []

>>> for t in zip(range(-10, 0, 1), range(0, 10, 2), range(1, 10, 2)):

...     x += t

>>> x

[-10, 0, 1, -9, 2, 3, -8, 4, 5, -7, 6, 7, -6, 8, 9]

>>> sorted(x)

[-10, -9, -8, -7, -6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> sorted(x, reverse=True)

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -6, -7, -8, -9, -10]

>>> sorted(x, key=abs)

[0, 1, 2, 3, 4, 5, 6, -6, -7, 7, -8, 8, -9, 9, -10]

兩段程式碼在功能上是等價的:

    x = sorted(x, key=str.lower)

    temp = []

    for item in x:

        temp.append((item.lower(), item))

    x = []

    for key, value in sorted(temp):

        x.append(value)

Python提供的排序演算法是自適應的穩定的歸併排序演算法(adaptive stable mergesort),Python排序是用的是”<”,集合內部巢狀集合,Python的排序演算法同樣要給排序。

——Copying Collections

淺拷貝

深拷貝

淺拷貝初始:

>>> songs = ["Because", "Boys", "Carol"]

>>> beatles = songs

>>> beatles, songs

(['Because', 'Boys', 'Carol'], ['Because', 'Boys', 'Carol'])

>>> beatles[2] = "Cayenne"

>>> beatles, songs

(['Because', 'Boys', 'Cayenne'], ['Because', 'Boys', 'Cayenne'])

>>> x = [53, 68, ["A", "B", "C"]]

>>> y = x[:]  # shallow copy

>>> x, y

([53, 68, ['A', 'B', 'C']], [53, 68, ['A', 'B', 'C']])

>>> y[1] = 40

>>> x[2][0] = 'Q'

>>> x, y

([53, 68, ['Q', 'B', 'C']], [53, 40, ['Q', 'B', 'C']])

與之對比

>>> import copy

>>> x = [53, 68, ["A", "B", "C"]]

>>> y = copy.deepcopy(x)

>>> y[1] = 40

>>> x[2][0] = 'Q'

>>> x, y

([53, 68, ['Q', 'B', 'C']], [53, 40, ['A', 'B', 'C']])

淺拷貝進一步:

對於字典dict和集合而言

dict.copy() and set.copy()

copy模組的copy()方法同樣返回物件的一份拷貝

另一種辦法就是,對於內建型別的拷貝,可以把其為引數傳遞給型別同名函式,示例:

copy_of_dict_d = dict(d)

copy_of_list_L = list(L)

copy_of_set_s = set(s)