faster-rcnn程式碼閱讀2
阿新 • • 發佈:2018-12-16
二、訓練
接下來回到train.py第160行,通過呼叫sw.train_model方法進行訓練:
1 def train_model(self, max_iters): 2 """Network training loop.""" 3 last_snapshot_iter = -1 4 timer = Timer() 5 model_paths = [] 6 while self.solver.iter < max_iters: 7 # Make one SGD update8 timer.tic() 9 self.solver.step(1) 10 timer.toc() 11 if self.solver.iter % (10 * self.solver_param.display) == 0: 12 print 'speed: {:.3f}s / iter'.format(timer.average_time) 13 14 if self.solver.iter % cfg.TRAIN.SNAPSHOT_ITERS == 0:15 last_snapshot_iter = self.solver.iter 16 model_paths.append(self.snapshot()) 17 18 if last_snapshot_iter != self.solver.iter: 19 model_paths.append(self.snapshot()) 20 return model_paths
方法中的self.solver.step(1)即是網路進行一次前向傳播和反向傳播。前向傳播時,資料流會從第一層流動到最後一層,最後計算出loss,然後loss相對於各層輸入的梯度會從最後一層計算回第一層。下面逐層來介紹faster-rcnn演算法的執行過程。
2.1、input-data layer
第一層是由python程式碼構成的,其prototxt描述為:
從中可以看出,input-data層有三個輸出:data、im_info、gt_boxes。其實現為faster-rcnn/lib/roi_data_layer/layer.py中的RoIDataLayer類。網路在構造過程中(即self.solver = caffe.SGDSolver(solver_prototxt))會呼叫該類的setup方法:
1 __C.TRAIN.IMS_PER_BATCH = 1 2 __C.TRAIN.SCALES = [600] 3 __C.TRAIN.MAX_SIZE = 1000 4 __C.TRAIN.HAS_RPN = True 5 __C.TRAIN.BBOX_REG = True 6 7 def setup(self, bottom, top): 8 """Setup the RoIDataLayer.""" 9 10 # parse the layer parameter string, which must be valid YAML 11 layer_params = yaml.load(self.param_str_) 12 13 self._num_classes = layer_params['num_classes'] 14 15 self._name_to_top_map = {} 16 17 # data blob: holds a batch of N images, each with 3 channels 18 idx = 0 19 top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3, 20 max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE) 21 self._name_to_top_map['data'] = idx 22 idx += 1 23 24 if cfg.TRAIN.HAS_RPN: 25 top[idx].reshape(1, 3) 26 self._name_to_top_map['im_info'] = idx 27 idx += 1 28 29 top[idx].reshape(1, 4) 30 self._name_to_top_map['gt_boxes'] = idx 31 idx += 1 32 else: # not using RPN 33 # rois blob: holds R regions of interest, each is a 5-tuple 34 # (n, x1, y1, x2, y2) specifying an image batch index n and a 35 # rectangle (x1, y1, x2, y2) 36 top[idx].reshape(1, 5) 37 self._name_to_top_map['rois'] = idx 38 idx += 1 39 40 # labels blob: R categorical labels in [0, ..., K] for K foreground 41 # classes plus background 42 top[idx].reshape(1) 43 self._name_to_top_map['labels'] = idx 44 idx += 1 45 46 if cfg.TRAIN.BBOX_REG: 47 # bbox_targets blob: R bounding-box regression targets with 4 48 # targets per class 49 top[idx].reshape(1, self._num_classes * 4) 50 self._name_to_top_map['bbox_targets'] = idx 51 idx += 1 52 53 # bbox_inside_weights blob: At most 4 targets per roi are active; 54 # thisbinary vector sepcifies the subset of active targets 55 top[idx].reshape(1, self._num_classes * 4) 56 self._name_to_top_map['bbox_inside_weights'] = idx 57 idx += 1 58 59 top[idx].reshape(1, self._num_classes * 4) 60 self._name_to_top_map['bbox_outside_weights'] = idx 61 idx += 1 62 63 print 'RoiDataLayer: name_to_top:', self._name_to_top_map 64 assert len(top) == len(self._name_to_top_map)
主要是對輸出的shape進行定義(同時申請記憶體)。要說明的是,在前向傳播的過程中,仍然會對輸出的各top的shape進行重定義,並且二者定義的shape往往都是不同的。