1. 程式人生 > 其它 >torch.nn.parallel.DistributedDataParallel 小結

torch.nn.parallel.DistributedDataParallel 小結

技術標籤:pytorch

config新增

parser.add_argument('--local_rank', type=int, default=-1)

train中新增

import torch.distributed as dist
from torch.utils.data.distributed import DistributedSampler

在有寫操作時,注意判斷local_rank

初始化

dist.init_process_group(backend='nccl') 
torch.cuda.set_device(self.opt.local_rank)
torch.autograd.set_detect_anomaly(True) #檢查異常使用,訓練時需註釋掉
self.device = torch.device("cuda", self.opt.local_rank) if torch.cuda.is_available() else torch.device("cpu")

模型操作(用到batchnorm需要額外新增一項,每個模型注意新增GPU idx)

self.netD = torch.nn.SyncBatchNorm.convert_sync_batchnorm(self.netD)
self.netD = torch.nn.parallel.DistributedDataParallel(self.netD,
          find_unused_parameters=True,device_ids[self.opt.local_rank],output_device=self.opt.local_rank)

dataloader操作(shuffle不能設定為True,因為sampler自帶shuffle,testset可以不管)

rain_dataset = self.dataset(
            self.opt.data_path, train_filenames, self.opt.data_height, self.opt.data_width,
            self.opt.data_frame_ids, 4, is_train=True, img_ext=img_ext)
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
self.train_loader = torch.utils.data.DataLoader(
            train_dataset, self.opt.batch_size, #shuffle = True,
            num_workers=self.opt.data_workers, pin_memory=True, drop_last=True, sampler=train_sampler)
        

訓練

export CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch --nproc_per_node=2 train_ablation_multi.py

nproc_per_node 是用到幾個GPU

參考:

https://www.cnblogs.com/JunzhaoLiang/archive/2004/01/13/13535952.html

https://www.cnblogs.com/yh-blog/p/12877922.html