Day 25:Python 模組 collections 3 個常用類
阿新 • • 發佈:2021-07-16
1)NamedTuple:替換整數索引,使用可讀性更好的字串
2)Counter:快速計數
3)DefaultDict:預設初始化某型別的字典值
NamedTuple
資料分析或機器學習領域,用好 NamedTuples 會寫出可讀性更強、更易於維護的程式碼。
做特徵工程的時候,如果把特徵扔到一個list當中,以便日後取用,但是取用的時候,難免出現整數索引,程式碼可讀性差,所以使用NamedTuple,避免出現整數索引,希望是直接按屬性索引:
from collections import namedtuple #建立一個帶有 14 個屬性,名字為 Person 的 NamedTuple 例項 Person Person = namedtuple('Person',['id','age','height','name','address','province','city','town','country','birth_address','father_name', 'monther_name','telephone','emergency_telephone']) # 呼叫例項 Person,建立一個 id=10086 的 Person 物件 a = ['']*11 Person(10086,19,'xiaoming',*a) output: Person(id=10086, age=19, height='xiaoming', name='', address='', province='', city='', town='', country='', birth_address='', father_name='', monther_name='', telephone='', emergency_telephone='')
假設有個任務,再有老資料的情況下,有了一份新資料,現在要比較,哪些人的居住地址(對應欄位 address)、聯絡電話(對應欄位 telephone)、出生地資訊(對應欄位 birth address)發生了變化,統計出這些人。
使用NamedTuple方法:
def update_persons_info(old_data,new_data): changed_list = [] for line innew_data: new_props = line.split() new_person = Person(new_props) # new_props 與 Person 引數卡對好 for old in old_data: old_props = old.split() old_person = Person(old_props) if old_person.id != new_person.id: changed_list.append(old_person.id) elif old_person.address != new_person.address: changed_list.append(old_person.address) elif old_person.birth_address != new_person.birth_address: changed_list.append(old_person.birth_address) return changed_list
但是在帶來這樣的遍歷同時,也帶來一個問題,NamedTuple 建立後,它的屬性取值不允許被修改,也就是屬性只能是可讀的,就看怎麼用了。
from collections import namedtuple #建立一個帶有 14 個屬性,名字為 Person 的 NamedTuple 例項 Person Person = namedtuple('Person',['id','age','height','name','address','province','city','town','country','birth_address','father_name', 'monther_name','telephone','emergency_telephone']) # 呼叫例項 Person,建立一個 id=10086 的 Person 物件 a = ['']*11 xiaoming = Person(10086,19,'xiaoming',*a) print(type(xiaoming)) xiaoming.age = 20 output: <class '__main__.Person'> --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-5-dabf7d8e12b1> in <module>() 7 xiaoming = Person(10086,19,'xiaoming',*a) 8 print(type(xiaoming)) ----> 9 xiaoming.age = 20 AttributeError: can't set attribute
Counter
主要用於統計中的計數,使用 Counter,期待能寫出更加簡化的程式碼
from collections import Counter # 統計出現次數 freq = [3, 8, 3, 10, 3, 3, 1, 3, 7, 6, 1, 2, 7, 0, 7, 9, 1, 5, 1, 0] Counter(freq).most_common() output: [(3, 5), (1, 4), (7, 3), (0, 2), (8, 1), (10, 1), (6, 1), (2, 1), (9, 1), (5, 1)]
並且,是按照頻數由高到低排序的,牛的
使用 Counter 能快速統計,一句話中單詞出現次數,一個單詞中字元出現次數。如下所示:
text = """ def update_persons_info(old_data,new_data): changed_list = [] for line in new_data: new_props = line.split() new_person = Person(new_props) # new_props 與 Person 引數卡對好 for old in old_data: old_props = old.split() old_person = Person(old_props) if old_person.id != new_person.id: changed_list.append(old_person.id) elif old_person.address != new_person.address: changed_list.append(old_person.address) elif old_person.birth_address != new_person.birth_address: changed_list.append(old_person.birth_address) return changed_list""" Counter(text).most_common() output: [(3, 5), (1, 4), (7, 3), (0, 2), (8, 1), (10, 1), (6, 1), (2, 1), (9, 1), (5, 1)] text = """ def update_persons_info(old_data,new_data): changed_list = [] for line in new_data: new_props = line.split() new_person = Person(new_props) # new_props 與 Person 引數卡對好 for old in old_data: old_props = old.split() old_person = Person(old_props) if old_person.id != new_person.id: changed_list.append(old_person.id) elif old_person.address != new_person.address: changed_list.append(old_person.address) elif old_person.birth_address != new_person.birth_address: changed_list.append(old_person.birth_address) return changed_list""" Counter(text).most_common() text = """ def update_persons_info(old_data,new_data): changed_list = [] for line in new_data: new_props = line.split() new_person = Person(new_props) # new_props 與 Person 引數卡對好 for old in old_data: old_props = old.split() old_person = Person(old_props) if old_person.id != new_person.id: changed_list.append(old_person.id) elif old_person.address != new_person.address: changed_list.append(old_person.address) elif old_person.birth_address != new_person.birth_address: changed_list.append(old_person.birth_address) return changed_list""" Counter(text).most_common() [(' ', 182), ('e', 45), ('d', 42), ('s', 40), ('n', 38), ('o', 36), ('r', 33), ('p', 31), ('_', 30), ('l', 24), ('a', 23), ('i', 21), ('t', 16), ('\n', 15), ('.', 14), ('w', 9), ('(', 8), (')', 8), ('h', 8), ('=', 8), ('f', 7), (':', 6), ('c', 5), ('g', 5), ('P', 3), ('!', 3), ('b', 3), ('u', 2), (',', 1), ('[', 1), (']', 1), ('#', 1), ('與', 1), ('參', 1), ('數', 1), ('卡', 1), ('對', 1), ('好', 1)]
DefaultDict
DefaultDict 能自動建立一個被初始化的字典,也就是每個鍵都已經被訪問過一次。
如何建立預設初始化某型別的字典值
from collections import defaultdict # 建立一個字典值型別為 int 的預設字典: dict_1 = defaultdict(int) # 建立一個字典值型別為 list 的預設字典: dict_2 = defaultdict(list) dict_2 output: defaultdict(list, {}) s = 'from collections import defaultdict' for index,i in enumerate(s): dict_2[i].append(index) print(dict_2) output: defaultdict(<class 'list'>, {'f': [0, 26], 'r': [1, 21], 'o': [2, 6, 13, 20], 'm': [3, 18], ' ': [4, 16, 23], 'c': [5, 10, 33], 'l': [7, 8, 29], 'e': [9, 25], 't': [11, 22, 30, 34], 'i': [12, 17, 32], 'n': [14], 's': [15], 'p': [19], 'd': [24, 31], 'a': [27], 'u': [28]})