1. 程式人生 > 其它 >safegraph資料預處理(三):將csv檔案按指定欄位不同的值進行拆分

safegraph資料預處理(三):將csv檔案按指定欄位不同的值進行拆分

技術標籤:safegraphpython

將Nin1.csv按region不同的值進行拆分,儲存為xxx-region.csv,經驗證全部55個子檔案大小之和等於父檔案的大小。

import pandas as pd
import time
# fileLocation='D:/2020-06-08-weekly-patterns.csv'
# fileLocation='D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part1.csv'
file_loc='D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-5in1.csv'
timee=time.process_time() df=pd.read_csv(file_loc) print(time.process_time()-timee) timee=time.process_time() regions=df['region'].unique() print(regions) # df1=df[df["region"].str.contains("NY")] # print(time.process_time()-timee) # timee=time.process_time() length=len(regions)
for i in range(0,length): print('processing '+regions[i]+', schedule:'+str(i+1)+'/'+str(length)+'...') df_by_region=df[df['region'].str.contains(regions[i])] new_file_loc=file_loc[:-8]+regions[i]+'.csv' print('new_file_loc:') print(new_file_loc) # 將子集儲存 df_by_region.to_csv(
new_file_loc,index=False) print(regions[i]+' success!') print('ALL region SUCCESS!')

顯示結果:

['IA' 'TX' 'OK' 'OR' 'NC' 'AR' 'PA' 'NY' 'WA' 'MS' 'MA' 'NJ' 'IL' 'VA'
 'VT' 'FL' 'WV' 'OH' 'MI' 'AZ' 'IN' 'GA' 'MN' 'ME' 'MO' 'TN' 'SC' 'CA'
 'WY' 'WI' 'NH' 'CO' 'UT' 'NE' 'KS' 'AL' 'MD' 'AK' 'LA' 'SD' 'HI' 'DE'
 'KY' 'MT' 'CT' 'RI' 'NV' 'ID' 'NM' 'PR' 'DC' 'ND' 'GU' 'VI' 'AS']
processing IA, schedule:1/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-IA.csv
IA success!
processing TX, schedule:2/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-TX.csv
TX success!
...
...
...
processing VI, schedule:54/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-VI.csv
VI success!
processing AS, schedule:55/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-AS.csv
AS success!
ALL region SUCCESS!

將大檔案拆分成了小檔案:
在這裡插入圖片描述