1. 程式人生 > >爬蟲的日誌,只存7天的日誌

爬蟲的日誌,只存7天的日誌

如果爬蟲在伺服器中持續執行,那麼日誌都會寫入到一個檔案中,這樣不方便管理日誌

custom_settings = {
        'DEFAULT_REQUEST_HEADERS': {
            'User-Agent':
                'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
', 'Host': 'v.qq.com', 'Proxy-Connection': 'keep-alive', }, 'LOG_FILE': 'logs/PlayInfoDemoSpider_' + str(datetime.datetime.now()) + '.log', 'REDIRECT_ENABLED': False, 'DOWNLOAD_DELAY': 0, 'DOWNLOAD_TIMEOUT': 3, 'RETRY_TIMES': 30,
'CONCURRENT_REQUESTS': 30, 'CONCURRENT_REQUESTS_PER_DOMAIN': 200, 'CONCURRENT_REQUESTS_PER_IP': 0, #'DOWNLOADER_MIDDLEWARES': {'bo_lib.scrapy_tools.BOProxyMiddlewareVPS': 740}, }

在custom_settings 中配置了爬蟲日誌的生成,

以下是刪除舊的日誌的程式碼

def delete_old_logs(name, days):
    today_str 
= str(datetime.date.today()) today = datetime.datetime.strptime(today_str, '%Y-%m-%d') # 轉化為datetime型別,時間為當天0點 target_day = today - datetime.timedelta(days=days) root, dirs, files = [x for x in os.walk('logs')][0] for file_name in files: if name not in file_name: continue try: log_create_day_str = file_name.split('_')[1].split(' ')[0] log_create_day = datetime.datetime.strptime(log_create_day_str, '%Y-%m-%d') except: continue if log_create_day < target_day: file_path = root + '/' + file_name os.remove(file_path)