巧用groupby解決Dataframe篩選分組效率慢問題
阿新 • • 發佈:2019-01-30
原始碼:
for name in list_valid_perfor_inventory: time_stamp = time.time() df_tmp1 = df_all_performance[df_all_performance['res_ins_id'] == name] ###170萬行,該語句大約需要2S if df_tmp1.empty: continue del df_tmp1['res_ins_id'] print('choose time ') print(str(time.time() - time_stamp)) time_stamp = time.time() df_tmp1.to_csv(path_or_buf=os.path.join(cs.max_avg_busy_dir, str(name) + '.csv')) print(str(time.time() - time_stamp))
優化後代碼:
groups = df_all_performance.groupby('res_ins_id') ##先分組 for name in list_valid_perfor_inventory: time_stamp = time.time() df_tmp1 = groups.get_group(name) ##再取每組的值,返回dataframe if df_tmp1.empty: continuedel df_tmp1['res_ins_id'] df_tmp1.to_csv(path_or_buf=os.path.join(cs.max_avg_busy_dir, str(name) + '.csv'))