==6==基於tensorflow目標識別API進行mask_rcnn訓練

阿新 • • 發佈：2018-12-01

一資料準備

在object_detection下建立資料夾my_mask_rcnn,把下載下來的資料放進去。
不想自己label的直接下載相關文件;連結主要包含原始圖片，標註後的json格式資料,Abyssinian_label_map.pbtxt（類別對映表）。
在這裡插入圖片描述

二生成train.record val.record

create_tf_record.py用大佬修改的：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Aug 26 10:57:09 2018

@author: shirhe-lyh
"""

"""Convert raw dataset to TFRecord for object_detection.

Please note that this tool only applies to labelme's annotations(json file).

Example usage:
    python3 create_tf_record.py \
        --images_dir=your absolute path to read images.
        --annotations_json_dir=your path to annotaion json files.
        --label_map_path=your path to label_map.pbtxt
        --output_path=your path to write .record.
"""

import cv2
import glob
import hashlib
import io
import json
import numpy as np
import os
import PIL.Image
import tensorflow as tf

import read_pbtxt_file


flags = tf.app.flags

flags.DEFINE_string('images_dir', None, 'Path to images directory.')
flags.DEFINE_string('annotations_json_dir', 'datasets/annotations/train', 
                    'Path to annotations directory.')
flags.DEFINE_string('label_map_path', None, 'Path to label map proto.')
flags.DEFINE_string('output_path', None, 'Path to the output tfrecord.')

FLAGS = flags.FLAGS


def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def int64_list_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))


def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def bytes_list_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))


def float_list_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))


def create_tf_example(annotation_dict, label_map_dict=None):
    """Converts image and annotations to a tf.Example proto.
    
    Args:
        annotation_dict: A dictionary containing the following keys:
            ['height', 'width', 'filename', 'sha256_key', 'encoded_jpg',
             'format', 'xmins', 'xmaxs', 'ymins', 'ymaxs', 'masks',
             'class_names'].
        label_map_dict: A dictionary maping class_names to indices.
            
    Returns:
        example: The converted tf.Example.
        
    Raises:
        ValueError: If label_map_dict is None or is not containing a class_name.
    """
    if annotation_dict is None:
        return None
    if label_map_dict is None:
        raise ValueError('`label_map_dict` is None')
        
    height = annotation_dict.get('height', None)
    width = annotation_dict.get('width', None)
    filename = annotation_dict.get('filename', None)
    sha256_key = annotation_dict.get('sha256_key', None)
    encoded_jpg = annotation_dict.get('encoded_jpg', None)
    image_format = annotation_dict.get('format', None)
    xmins = annotation_dict.get('xmins', None)
    xmaxs = annotation_dict.get('xmaxs', None)
    ymins = annotation_dict.get('ymins', None)
    ymaxs = annotation_dict.get('ymaxs', None)
    masks = annotation_dict.get('masks', None)
    class_names = annotation_dict.get('class_names', None)
    
    labels = []
    for class_name in class_names:
        label = label_map_dict.get(class_name, 'None')
        if label is None:
            raise ValueError('`label_map_dict` is not containing {}.'.format(
                class_name))
        labels.append(label)
            
    encoded_masks = []
    for mask in masks:
        pil_image = PIL.Image.fromarray(mask.astype(np.uint8))
        output_io = io.BytesIO()
        pil_image.save(output_io, format='PNG')
        encoded_masks.append(output_io.getvalue())
        
    feature_dict = {
        'image/height': int64_feature(height),
        'image/width': int64_feature(width),
        'image/filename': bytes_feature(filename.encode('utf8')),
        'image/source_id': bytes_feature(filename.encode('utf8')),
        'image/key/sha256': bytes_feature(sha256_key.encode('utf8')),
        'image/encoded': bytes_feature(encoded_jpg),
        'image/format': bytes_feature(image_format.encode('utf8')),
        'image/object/bbox/xmin': float_list_feature(xmins),
        'image/object/bbox/xmax': float_list_feature(xmaxs),
        'image/object/bbox/ymin': float_list_feature(ymins),
        'image/object/bbox/ymax': float_list_feature(ymaxs),
        'image/object/mask': bytes_list_feature(encoded_masks),
        'image/object/class/label': int64_list_feature(labels)}
    example = tf.train.Example(features=tf.train.Features(
        feature=feature_dict))
    return example


def _get_annotation_dict(images_dir, annotation_json_path):  
    """Get boundingboxes and masks.
    
    Args:
        images_dir: Path to images directory.
        annotation_json_path: Path to annotated json file corresponding to
            the image. The json file annotated by labelme with keys:
                ['lineColor', 'imageData', 'fillColor', 'imagePath', 'shapes',
                 'flags'].
            
    Returns:
        annotation_dict: A dictionary containing the following keys:
            ['height', 'width', 'filename', 'sha256_key', 'encoded_jpg',
             'format', 'xmins', 'xmaxs', 'ymins', 'ymaxs', 'masks',
             'class_names'].
#            
#    Raises:
#        ValueError: If images_dir or annotation_json_path is not exist.
    """
#    if not os.path.exists(images_dir):
#        raise ValueError('`images_dir` is not exist.')
#    
#    if not os.path.exists(annotation_json_path):
#        raise ValueError('`annotation_json_path` is not exist.')
    
    if (not os.path.exists(images_dir) or
        not os.path.exists(annotation_json_path)):
        return None
    
    with open(annotation_json_path, 'r') as f:
        json_text = json.load(f)
    shapes = json_text.get('shapes', None)
    if shapes is None:
        return None
    image_relative_path = json_text.get('imagePath', None)
    if image_relative_path is None:
        return None
    image_name = image_relative_path.split('/')[-1]
    image_path = os.path.join(images_dir, image_name)
    image_format = image_name.split('.')[-1].replace('jpg', 'jpeg')
    if not os.path.exists(image_path):
        return None
    
    with tf.gfile.GFile(image_path, 'rb') as fid:
        encoded_jpg = fid.read()
    image = cv2.imread(image_path)
    height = image.shape[0]
    width = image.shape[1]
    key = hashlib.sha256(encoded_jpg).hexdigest()
    
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    masks = []
    class_names = []
    hole_polygons = []
    for mark in shapes:
        class_name = mark.get('label')
        class_names.append(class_name)
        polygon = mark.get('points')
        polygon = np.array(polygon)
        if class_name == 'hole':
            hole_polygons.append(polygon)
        else:
            mask = np.zeros(image.shape[:2])
            cv2.fillPoly(mask, [polygon], 1)
            masks.append(mask)
            
            # Boundingbox
            x = polygon[:, 0]
            y = polygon[:, 1]
            xmin = np.min(x)
            xmax = np.max(x)
            ymin = np.min(y)
            ymax = np.max(y)
            xmins.append(float(xmin) / width)
            xmaxs.append(float(xmax) / width)
            ymins.append(float(ymin) / height)
            ymaxs.append(float(ymax) / height)
    # Remove holes in mask
    for mask in masks:
        mask = cv2.fillPoly(mask, hole_polygons, 0)
        
    annotation_dict = {'height': height,
                       'width': width,
                       'filename': image_name,
                       'sha256_key': key,
                       'encoded_jpg': encoded_jpg,
                       'format': image_format,
                       'xmins': xmins,
                       'xmaxs': xmaxs,
                       'ymins': ymins,
                       'ymaxs': ymaxs,
                       'masks': masks,
                       'class_names': class_names}
    return annotation_dict


def main(_):
    if not os.path.exists(FLAGS.images_dir):
        raise ValueError('`images_dir` is not exist.')
    if not os.path.exists(FLAGS.annotations_json_dir):
        raise ValueError('`annotations_json_dir` is not exist.')
    if not os.path.exists(FLAGS.label_map_path):
        raise ValueError('`label_map_path` is not exist.')
        
    label_map = read_pbtxt_file.get_label_map_dict(FLAGS.label_map_path)
    
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
        
    num_annotations_skiped = 0
    annotations_json_path = os.path.join(FLAGS.annotations_json_dir, '*.json')
    for i, annotation_file in enumerate(glob.glob(annotations_json_path)):
        if i % 100 == 0:
            print('On image %d', i)
            
        annotation_dict = _get_annotation_dict(
            FLAGS.images_dir, annotation_file)
        if annotation_dict is None:
            num_annotations_skiped += 1
            continue
        tf_example = create_tf_example(annotation_dict, label_map)
        writer.write(tf_example.SerializeToString())
    
    print('Successfully created TFRecord to {}.'.format(FLAGS.output_path))


if __name__ == '__main__':
    tf.app.run()

可以把下載下來的json檔案前8張作為train,後兩張作為val;
也就是將下載下來的資料中datasets/annotation 資料夾分為train和val
在這裡插入圖片描述
假設你的所有原始影象的路徑為 path_to_images_dir，使用 labelme 標註產生的所有用於訓練的 json 檔案的路徑為 path_to_train_annotations_json_dir(train)，所有用於驗證的 json 檔案的路徑為 path_to_val_annotaions_json_dir(val)，進入my_mask_rcnn在終端先後執行如下指令：

$ python3 create_tf_record.py \
    --images_dir=datasets/images \   
    --annotations_json_dir=datasets/train\ 
    --label_map_path=Abyssinian_label_map.pbtxt\
    --output_path=my_train.record
$ python3 create_tf_record.py \
    --images_dir=datasets/images \   
    --annotations_json_dir=datasets/val\ 
    --label_map_path=Abyssinian_label_map.pbtxt \
    --output_path=my_val.record

報錯沒有cv2就裝一個 pip3 install opencv-python

三模型下載

模型下載連結
我下載的模型是mask_rcnn_inception_v2_coco_2018_01_28.tar.gz
存放地址為my_mask_rcnn/
並且解壓到當前目錄

四修改配置檔案

將其中的 num_classes : 90 改為你要檢測的目標總類目數，比如，因為這裡我們只檢測阿比西尼亞貓一個類，所以改為 1。另外，還需要將該檔案中 5 個 PATH_TO_BE_CONFIGURED 改為相應檔案的路徑;可從下載下來解壓後的配置檔案修改。‘

# Mask R-CNN with Inception V2
# Configured for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 800
        max_dimension: 1365
      }
    }
    number_of_stages: 3
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        predict_instance_masks: true
        mask_height: 15
        mask_width: 15
        mask_prediction_conv_depth: 0
        mask_prediction_num_conv_layers: 2
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
        conv_hyperparams {
          op: CONV
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.01
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    second_stage_mask_prediction_loss_weight: 4.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "my_mask_rcnn/mask_rcnn_inception_v2_coco/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 2000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "my_mask_rcnn/my_train.record"
  }
  label_map_path: "my_mask_rcnn/Abyssinian_label_map.pbtxt"
  load_instance_masks: true
  mask_type: PNG_MASKS
}

eval_config: {
  num_examples: 10
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "my_mask_rcnn/my_val.record"
  }
  label_map_path: "my_mask_rcnn/Abyssinian_label_map.pbtxt"
  load_instance_masks: true
  mask_type: PNG_MASKS
  shuffle: false
  num_readers: 1
}

五訓練

model_dir 用來存放訓練過程中的資料老版本是train_dir

$ python3 model_main.py \
    --model_dir=my_mask_rcnn/ model_dir\
    --pipeline_config_path=my_mask_rcnn/mask_rcnn_inception_v2_coco.config

執行到第70次～
在這裡插入圖片描述
還有訓練的時候老是報錯超記憶體

應該如何解決呢？

六匯出結果並且測試

1匯出凍結檔案

雖然超出記憶體但是還是會儲存訓練到一定迭代步數的資料（200步）。
model_dir生成資料情況如下：
在這裡插入圖片描述

在object_detection下執行
在my_mask_rcnn下建立資料夾trained,儲放匯出檔案

python3 export_inference_graph.py
--input_tpye_image_tensor
--pipeline_config_path my_mask_rcnn/mask_rcnn_inception_v2_coco.config
--trained_checkpoint_prefix my_mask_rcnn/model_dir/model.ckpt200
--output_directory my_mask_rcnn/trained/

其中會有一個凍結檔案。

2測試

首先複製兩張貓的圖片到object_detection/test_images 修改名字為image+序號.jpg
在這裡插入圖片描述
其次再research/ 右擊開啟終端

jupyter-notebook
點選object_detection
開啟object_detection_tutorial.ipynb
修改文件

在這裡插入圖片描述
修改範圍;圖片序號為5則改為（1,6）：1,2,3,4,5，

因為提前下好了模型，則不需要download,則把下載模型單元刪除了。
效果如下，因為只用了8張訓練，並且迭代次數為200次，效果很粗糙，只是熟悉過程。

總結 -溫習過程

1.訓練需要準備的資料有，原圖（3通道.jpg格式），label後的資料（.json）,類別標籤對映表（Abyssinian_label_map.pbtxt）
2轉換為tfrecord格式。（需要create_tf_record.py，原圖，.json資料,類別標籤對映表）
python3 create_tf_record.py
–images_dir=datasets/images \
–annotations_json_dir=datasets/train\
–label_map_path=Abyssinian_label_map.pbtxt
–output_path=my_train.record
3.下載模型並修改其中自帶的配置表
4.匯出凍結檔案並且測試！

==6==基於tensorflow目標識別API進行mask_rcnn訓練

一資料準備

二生成train.record val.record

三模型下載

四修改配置檔案

五訓練

六匯出結果並且測試

1匯出凍結檔案

2測試

總結 -溫習過程

==6==基於tensorflow目標識別API進行mask_rcnn訓練

==3==基於tensorflow目標識別API執行已有的模型(執行demo coco資料集)

基於TensorFlow Object Detection API進行相關開發的步驟

使用tensorflow object detection api 進行實時目標識別

Android基於微軟語音識別API開發

==4==tesorflow目標識別API跑VOC2012資料集

==2==Ubuntu 16.04下安裝TensorFlow 目標檢測 API(物件檢測API)

Ubuntu 16.04下安裝TensorFlow 目標檢測 API(物件檢測API)

基於Tensorflow, OpenCV. 使用MNIST資料集訓練卷積神經網路模型，用於手寫數字識別

使用Tensorflow物體識別API摳出視訊中的豬

TensorFlow學習實踐（三）：使用TFRecord格式資料和tf.estimator API進行模型訓練和預測

使用TensorFlow Object Detection API進行影象物體檢測

tensorflow語義分割api使用(deeplab訓練cityscapes)

基於谷歌開源的TensorFlow Object Detection API視訊物體識別系統實現教程

基於谷歌開源的TensorFlow Object Detection API視訊物體識別系統搭建自己的應用（三）

基於TensorFlow演算法的物體識別API，將引發計算機視覺鏈式突破

基於win10，GPU的Tensorflow Object Detection API部署及USB攝像頭目標檢測

基於tensorflow + Vgg16進行影象分類識別

基於Tensorflow的視訊目標檢測API實現

基於tensorflow + Vgg16進行影象分類識別的實驗

==6==基於tensorflow目標識別API進行mask_rcnn訓練

一 資料準備

二 生成train.record val.record

三 模型下載

四 修改配置檔案

五訓練

六 匯出結果並且測試

1匯出凍結檔案

2測試

總結 -溫習過程

相關推薦

一資料準備

二生成train.record val.record

三模型下載

四修改配置檔案

六匯出結果並且測試