大檔案分割、命名指令碼 - Python
阿新 • • 發佈:2019-09-29
日誌檔案分割、命名
工作中經常會收到測試同學、客戶同學提供的日誌檔案,其中不乏幾百M一G的也都有,畢竟壓測一晚上產生的日誌量還是很可觀的,xDxD,因此不可避免的需要對日誌進行分割,通常定位問題需要針對時間點,因此最好對分割後的日誌檔案使用檔案中日誌的開始、結束時間點來命名,這樣使用起來最為直觀,下面給大家分享兩個指令碼,分別作分割、命名,希望能夠給大家提供一點點幫助;
大檔案分割
用法:
- python split_big_file.py
- 輸入檔案全路徑名
- 輸入期望的分割後每個小檔案的行數
Just wait.
# -*- coding:utf-8 -*- import os,re,shutil import platform sys_name = platform.system().lower() SPLIT_CHAR = '\\' if sys_name.find('windows') != -1 else '/' print('input big files`s path:') _path = raw_input() names = [] pathes = [] if os.path.isfile(_path): print('is file') names.append(_path) else: print('is nothing') ''' elif os.path.isdir(_path): print('This is dir') pathes = os.listdir(_path) print('pathes='+str(pathes)) for i in range(len(pathes)): fullpath = _path+SPLIT_CHAR+pathes[i] print('fullpath='+fullpath) if os.path.isfile(fullpath): names.append(fullpath) files.append(open(fullpath).read().split('\n')) ''' print(len(names)) line_num = int(raw_input('every file`line num = ')) print('line number='+str(line_num)) for i in range(len(names)): _name = names[i] ori_name = _name.split(SPLIT_CHAR)[len(_name.split(SPLIT_CHAR))-1] dir_name = _name.replace(ori_name,'DIR_'+ori_name) dir_name = dir_name.replace('.','_') print ori_name print dir_name os.system('mkdir '+dir_name) count = 1 print '已處理:'+str(count)+'行' part_file = open(dir_name+SPLIT_CHAR+str(0)+'.part.txt','w') with open(_name, 'rb') as f: for line in f: if count%line_num == 0: part_file.close() part_file = open(dir_name+SPLIT_CHAR+str(int(count/line_num))+'.part.txt','w') part_file.write(line+'\n') count+=1 if count%100000 == 0: print '已處理:'+str(count)+'行' print '已處理:'+str(count)+'行' os.system('python ./get_name_logfile.py '+dir_name)
檔案按照開始、結束行時間戳重新命名
用法:
- python get_name_logfile.py log.txt
- python get_name_logfile.py logs
引數選擇檔案或者資料夾均可,如果是資料夾,則會針對資料夾中的每個檔案做處理(不會遞迴到資料夾下資料夾中的檔案哦);
# -*- coding:utf-8 -*- import os,re,shutil import sys import platform sys_name = platform.system().lower() SPLIT_CHAR = '\\' if sys_name.find('windows') != -1 else '/' _path = sys.argv[1] names = [] files = [] pathes = [] if os.path.isfile(_path): print('is file') names[0] = _path elif os.path.isdir(_path): print('This is dir') pathes = os.listdir(_path) print('pathes='+str(pathes)) for i in range(len(pathes)): fullpath = _path+SPLIT_CHAR+pathes[i] print('fullpath='+fullpath) if os.path.isfile(fullpath): names.append(fullpath) else: print('is nothing') print(len(names)) # 日期格式 : 05-26 18:20:42.093 r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}' # # 05-26 18:20:43.093:r'\d{2}-\d{2} {1,}\d{2}:\d{2}:\d{2}.\d{1,10}' date_reg = r'\d{2}-\d{2} {1,}\d{2}:\d{2}:\d{2}.\d{1,10}' time_reg = r'\d{2}:\d{2}:\d{2}.\d{1,10}' for i in range(len(names)): _name = names[i] print('name='+_name) # head 嘗試在10行內查詢日期 head_len = 10 start_time = '(start_time-' _file_ = open(_name, 'rb') reads = _file_.read() _file = reads.split('\n') if len(_file)/2 < 10: head_len = len(_file)/2 for j in range(head_len): res = re.search(date_reg, _file[j]) if res!=None and res.group(0)!=None: start_time = res.group(0) print('start_time='+start_time) break # tail tail_len = len(_file)-head_len end_time = '-end_time)' for j in range(len(_file)-1,tail_len-1,-1): res = re.search(time_reg, _file[j]) if res!=None and res.group(0)!=None: end_time = res.group(0) print('end_time='+end_time) break _file_.close() ori_name = _name.split(SPLIT_CHAR)[len(_name.split(SPLIT_CHAR))-1] print('ori_name='+ori_name) new_name = start_time.replace(':','-')+'__'+end_time.replace(':','-')+os.path.splitext(ori_name)[1] print('new_name='+new_name) print("copy %s %s" % (_name, _name.replace(ori_name,new_name))) #os.system ("copy %s %s" % (_name, _name.replace(ori_name,new_name))) shutil.copy(_name,_name.replace(ori_name,new_name)) os.system ("rm -rf "+_name)
最後
大家可以到我的Github上看看有沒有其他需要的東西,目前主要是自己做的機器學習專案、Python各種指令碼工具、資料分析挖掘專案以及Follow的大佬、Fork的專案等:
https://github.com/NemoHoHal