python 字典查詢提速的小技巧
阿新 • • 發佈:2020-12-18
考慮一個問題:一個python的字典,有1000萬個key-value對,新插入1000對鍵值對,怎麼速度才最快
自己測試了一部分程式碼,慢速的要300秒,加速的只要0.3秒,原因是慢速的程式碼每次查詢非常費時,
if k in C14.keys()可能是這句話的問題,
改進後使用
defaultdict(int)方法提速!不要用dict()初始化方法了...
原始程式碼:極其慢(尤其是原始字典很大的時候)
#test slower code import pandas as pd import pickle from collections import Counter import os from tqdm importtqdm import time from collections import defaultdict C14 = dict() #注意這裡沒有用defaultdict for i in tqdm(range(10000000)): C14[i] = i print("start processing test data:") s_time = time.time() data = pd.read_csv('../../test.gz') print("read test.gz over") print("start to process C14:") s_tt = time.time() C14_list= data['C14'].values #data是dataframe格式,data['C14'].values相當於一個list,比如[42,523,23,24,3,4,1,5,3] for k,v in tqdm(Counter(C14_list).items()): if k in C14.keys(): #判斷所消耗的時間很長 C14[k] += v else: C14[k] = v e_tt = time.time() print("C14 over,cost time:{} seconds".format(e_tt-s_tt)) e_time= time.time() print("test data processing over, cost {} minutes".format((e_time-s_time)/60))
改進後的程式碼:極快
#test code import pandas as pd import pickle from collections import Counter import os from tqdm import tqdm import time from collections import defaultdict C14 = defaultdict(int) #使用python的defaultdict方法,意思是,如果key[value]的value不存在時,預設value值是int的0 for i in tqdm(range(10000000)): C14[i] = i print("start processing test data:") s_time = time.time() data = pd.read_csv('../../test.gz') print("read test.gz over") print("start to process C14:") s_tt = time.time() C14_list = data['C14'].values for k,v in tqdm(Counter(C14_list).items()): C14[k] += v #下面四行話可以全部註釋掉了 #if k in C14.keys(): #C14[k] += v #else: #C14[k] = v e_tt = time.time() print("C14 over,cost time:{} seconds".format(e_tt-s_tt)) e_time = time.time() print("test data processing over, cost {} minutes".format((e_time-s_time)/60))