labelme轉coco資料集

阿新 • • 發佈：2021-10-14

檔案目錄如下：

|-- images
|     |---  1.jpg
|     |---  1.json
|     |---  2.jpg
|     |---  2.json
|     |---  .......
|-- labelme2coco.py
|-- labels.txt

1️⃣ imges目錄下就是你的資料集原始圖片，加上labelme標註的json檔案。

2️⃣ labelme2coco.py原始碼放到最後。

3️⃣ labels.txt就是你的類別標籤，假設我有兩個類（lm，ls），那麼對應的labels.txt內容如下：

__ignore__
_background_
lm
ls

在labelme2coco.py檔案的目錄下，開啟命令列執行：

 python labelme2coco.py --input_dir images --output_dir coco --labels labels.txt

1️⃣ --input_dir：指定images資料夾

2️⃣ --output_dir：指定你的輸出資料夾

3️⃣ --labels：指定你的labels.txt檔案

執行結果如下圖：

生成的coco檔案目錄如下：

|-- annotations
| 	|---  instances_train2017.json
|       |---  instances_val2017.json
|-- train2017
| 	|---  2.jpg
| 	|---  5.jpg
| 	|---  .......
|-- val2017
| 	|---  1.jpg
| 	|---  3.jpg
| 	|---  .......
|-- visualization
| 	|---  1.jpg
| 	|---  2.jpg
| 	|---  .......

訓練之用前三個資料夾就可以了，也就是annotations,train2017,val2017就可以了。visualization可以用來觀察自己標註的資料集效果。原始碼裡都標註好了，大部分來自labelme官方的原始碼：labelme/labelme2coco.py · GitHub

如果想調整訓練集驗證集的比例，可以在labelme2coco.py原始碼中搜索 test_size

labelme2coco.py原始碼：

# 命令列執行： python labelme2coco.py --input_dir images --output_dir coco --labels labels.txt
# 輸出資料夾必須為空資料夾

import argparse
import collections
import datetime
import glob
import json
import os
import os.path as osp
import sys
import uuid
import imgviz
import numpy as np
import labelme
from sklearn.model_selection import train_test_split

try:
    import pycocotools.mask
except ImportError:
    print("Please install pycocotools:\n\n    pip install pycocotools\n")
    sys.exit(1)


def to_coco(args,label_files,train):

    # 建立 總標籤data 
    now = datetime.datetime.now()
    data = dict(
        info=dict(
            description=None,
            url=None,
            version=None,
            year=now.year,
            contributor=None,
            date_created=now.strftime("%Y-%m-%d %H:%M:%S.%f"),
        ),
        licenses=[dict(url=None, id=0, name=None,)],
        images=[
            # license, url, file_name, height, width, date_captured, id
        ],
        type="instances",
        annotations=[
            # segmentation, area, iscrowd, image_id, bbox, category_id, id
        ],
        categories=[
            # supercategory, id, name
        ],
    )

    # 建立一個 {類名 : id} 的字典，並儲存到 總標籤data 字典中。
    class_name_to_id = {}
    for i, line in enumerate(open(args.labels).readlines()):
        class_id = i - 1  # starts with -1
        class_name = line.strip()   # strip() 方法用於移除字串頭尾指定的字元（預設為空格或換行符）或字元序列。
        if class_id == -1:
            assert class_name == "__ignore__"   # background:0, class1:1, ,,
            continue
        class_name_to_id[class_name] = class_id
        data["categories"].append(
            dict(supercategory=None, id=class_id, name=class_name,)
        )


    if train:
        out_ann_file = osp.join(args.output_dir, "annotations","instances_train2017.json")
    else:
        out_ann_file = osp.join(args.output_dir, "annotations","instances_val2017.json")


    for image_id, filename in enumerate(label_files):

        label_file = labelme.LabelFile(filename=filename)
        base = osp.splitext(osp.basename(filename))[0]      # 檔名不帶字尾
        if train:
            out_img_file = osp.join(args.output_dir, "train2017", base + ".jpg")
        else:
            out_img_file = osp.join(args.output_dir, "val2017", base + ".jpg")
        
        print("| ",out_img_file)

        # ************************** 對圖片的處理開始 *******************************************
        # 將標籤檔案對應的圖片進行儲存到對應的 資料夾。train儲存到 train2017/ test儲存到 val2017/
        img = labelme.utils.img_data_to_arr(label_file.imageData)   # .json檔案中包含影象，用函式提出來
        imgviz.io.imsave(out_img_file, img)     # 將影象儲存到輸出路徑

        # ************************** 對圖片的處理結束 *******************************************

        # ************************** 對標籤的處理開始 *******************************************
        data["images"].append(
            dict(
                license=0,
                url=None,
                file_name=osp.relpath(out_img_file, osp.dirname(out_ann_file)),
                #   out_img_file = "/coco/train2017/1.jpg"
                #   out_ann_file = "/coco/annotations/annotations_train2017.json"
                #   osp.dirname(out_ann_file) = "/coco/annotations"
                #   file_name = ..\train2017\1.jpg   out_ann_file檔案所在目錄下 找 out_img_file 的相對路徑
                height=img.shape[0],
                width=img.shape[1],
                date_captured=None,
                id=image_id,
            )
        )

        masks = {}  # for area
        segmentations = collections.defaultdict(list)  # for segmentation
        for shape in label_file.shapes:
            points = shape["points"]
            label = shape["label"]
            group_id = shape.get("group_id")
            shape_type = shape.get("shape_type", "polygon")
            mask = labelme.utils.shape_to_mask(
                img.shape[:2], points, shape_type
            )

            if group_id is None:
                group_id = uuid.uuid1()

            instance = (label, group_id)

            if instance in masks:
                masks[instance] = masks[instance] | mask
            else:
                masks[instance] = mask

            if shape_type == "rectangle":
                (x1, y1), (x2, y2) = points
                x1, x2 = sorted([x1, x2])
                y1, y2 = sorted([y1, y2])
                points = [x1, y1, x2, y1, x2, y2, x1, y2]
            else:
                points = np.asarray(points).flatten().tolist()

            segmentations[instance].append(points)
        segmentations = dict(segmentations)

        for instance, mask in masks.items():
            cls_name, group_id = instance
            if cls_name not in class_name_to_id:
                continue
            cls_id = class_name_to_id[cls_name]

            mask = np.asfortranarray(mask.astype(np.uint8))
            mask = pycocotools.mask.encode(mask)
            area = float(pycocotools.mask.area(mask))
            bbox = pycocotools.mask.toBbox(mask).flatten().tolist()

            data["annotations"].append(
                dict(
                    id=len(data["annotations"]),
                    image_id=image_id,
                    category_id=cls_id,
                    segmentation=segmentations[instance],
                    area=area,
                    bbox=bbox,
                    iscrowd=0,
                )
            )
        # ************************** 對標籤的處理結束 *******************************************

        # ************************** 視覺化的處理開始 *******************************************
        if not args.noviz:
            labels, captions, masks = zip(
                *[
                    (class_name_to_id[cnm], cnm, msk)
                    for (cnm, gid), msk in masks.items()
                    if cnm in class_name_to_id
                ]
            )
            viz = imgviz.instances2rgb(
                image=img,
                labels=labels,
                masks=masks,
                captions=captions,
                font_size=15,
                line_width=2,
            )
            out_viz_file = osp.join(
                args.output_dir, "visualization", base + ".jpg"
            )
            imgviz.io.imsave(out_viz_file, viz)
        # ************************** 視覺化的處理結束 *******************************************

    with open(out_ann_file, "w") as f:  # 將每個標籤檔案彙總成data後，儲存總標籤data檔案
        json.dump(data, f)


# 主程式執行
def main():
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )
    parser.add_argument("--input_dir", help="input annotated directory")
    parser.add_argument("--output_dir", help="output dataset directory")
    parser.add_argument("--labels", help="labels file", required=True)
    parser.add_argument("--noviz", help="no visualization", action="store_true")
    args = parser.parse_args()

    if osp.exists(args.output_dir):
        print("Output directory already exists:", args.output_dir)
        sys.exit(1)
    os.makedirs(args.output_dir)
    print("| Creating dataset dir:", args.output_dir)
    if not args.noviz:
        os.makedirs(osp.join(args.output_dir, "visualization"))

    # 建立儲存的資料夾
    if not os.path.exists(osp.join(args.output_dir, "annotations")):
        os.makedirs(osp.join(args.output_dir, "annotations"))
    if not os.path.exists(osp.join(args.output_dir, "train2017")):
        os.makedirs(osp.join(args.output_dir, "train2017"))
    if not os.path.exists(osp.join(args.output_dir, "val2017")):
        os.makedirs(osp.join(args.output_dir, "val2017"))

    # 獲取目錄下所有的.jpg檔案列表
    feature_files = glob.glob(osp.join(args.input_dir, "*.jpg"))
    print('| Image number: ', len(feature_files))

    # 獲取目錄下所有的joson檔案列表
    label_files = glob.glob(osp.join(args.input_dir, "*.json"))
    print('| Json number: ', len(label_files))
    

    # feature_files:待劃分的樣本特徵集合    label_files:待劃分的樣本標籤集合    test_size:測試集所佔比例 
    # x_train:劃分出的訓練集特徵      x_test:劃分出的測試集特徵     y_train:劃分出的訓練集標籤    y_test:劃分出的測試集標籤
    x_train, x_test, y_train, y_test = train_test_split(feature_files, label_files, test_size=0.3)
    print("| Train number:", len(y_train), '\t Value number:', len(y_test))

    # 把訓練集標籤轉化為COCO的格式，並將標籤對應的圖片儲存到目錄 /train2017/
    print("—"*50) 
    print("| Train images:")
    to_coco(args,y_train,train=True)
    
    # 把測試集標籤轉化為COCO的格式，並將標籤對應的圖片儲存到目錄 /val2017/ 
    print("—"*50)
    print("| Test images:")
    to_coco(args,y_test,train=False)
    

if __name__ == "__main__":
    print("—"*50)
    main()
    print("—"*50)

labelme轉coco資料集

檔案目錄如下： |-- images ---1.jpg ---1.json ---2.jpg ---2.json ---....... |-- labelme2coco.py

將labelme格式資料轉化為標準的coco資料集格式方式

labelme標註影象生成的json格式： { \"version\": \"3.11.2\",\"flags\": {},\"shapes\": [# 每個物件的形狀

COCO資料集轉mask

書接上文，先馬克一下，之後再改 # -*- coding: utf-8 -*- \"\"\" Created on Wed Jul1 14:45:07 2020

COCO資料集提取自己需要的類轉VOC

github:https://github.com/zcc720/COCO2VOC.git 原文地址：http://www.manongjc.com/article/28607.html

Lab-COCO資料集json格式轉txt格式

COCO資料集: JSON轉txt JSON檔案示例程式碼 #COCO 格式的資料集轉化為 YOLO 格式的資料集

VOC、COCO資料集類別

技術標籤：CV資料處理VOCcoco類別標籤label 目錄 VOC（20類）： COCO（全90類）： COCO（檢測、分割所用80類）：

VOC資料集與COCO資料集

技術標籤：深度學習深度學習說明：以下程式碼全部為完整的，但是其中路徑不是一個專案，可根據自己情況修改，僅供參考！個人筆記，一起學習！！ VOC2007：中包含9963張標註過的圖片，由train/val/test三部分組

VOC資料集和COCO資料集直接的相互轉換

VOC資料集（xml格式）和COCO資料集（json格式）的相互轉換我們先來看看voc和coco資料集的目錄結構：以VOC2012資料集為例，下載下來有如下五個資料夾：Annotations資料夾是存放圖片對應的xml檔案，比如“2007_0000

Anaconda下labelme的安裝（訓練資料集之影象標註工具的安裝）以及Anaconda的安裝

終於有時間來寫寫部落格啦，即便現在在校內實習，想想還是挺開心的，時間free。

將 KITTI資料集的點雲和影象轉成Bag格式

這裡需要注意的是隻能轉同步標定後的資料，也就是資料集裡面的[synced+rectified data]這個選項，同時記得把後面的標定檔案下載下來，[calibration]。

目標檢測 – 解析VOC和COCO格式並製作自己的資料集

http://www.xyu.ink/3612.html xhy2020年10月9日無評論　　相對其他計算機視覺任務，目標檢測演算法的資料格式更為複雜。為了對資料進行統一的處理，目標檢測資料一般都會做成VOC或者COCO的格式。　　VOC和COCO都

將json格式資料集轉化為record格式

技術標籤：程式碼pythontensorflow深度學習將json格式資料集轉化為record格式在進行tensorflow訓練時需要record格式的資料，本教程講解如何將常用的json檔案格式轉化成record格式的檔案。

YOLO格式標註資料轉COCO標註資料

技術標籤：目標檢測YOLOCOCO 標註這裡僅僅考慮person類別，如果考慮其他類別，則需要增加類別資訊，稍作調整即可。

影象分割把用labelme標註生成的資料集改成PaddleSeg支援的資料集格式

影象分割把用labelme標註生成的資料集改成PaddleSeg支援的資料集格式 labelme標註後生成的資料集檔案格式

labelme標註後如何生成資料集

安裝labelme環境開啟Anaconda Prompt, 直接輸入pip install labelme即可安裝

[轉]開源語音資料集

ASR 測試集本文為CSDN博主「chenghaoy」的原創文章, 原文連結：https://blog.csdn.net/chenghaoy/article/details/82842151

201971010229-劉轉弟實驗三軟體工程結對專案-《D{0-1}KP問題例項資料集演算法實驗平臺》

專案內容課程班級部落格連結 2019級卓越工程師班這個作業要求連結作業要求我的課程學習目標

voc資料集（xml）轉yolov5資料格式（txt）訓練自己的資料集

#為方便自己檢視，比較囉嗦。。。。。 1、資料集劃分（程式碼來自別人的分享專案中的一個檔案，在專案中能跑通，單獨檔案能否跑通，還沒試）：

Spark 系列（三）—— 彈性式資料集RDDs

彈性式資料集RDDs 一、RDD簡介 RDD 全稱為 Resilient Distributed Datasets，是 Spark 最基本的資料抽象，它是隻讀的、分割槽記錄的集合，支援並行操作，可以由外部資料集或其他 RDD 轉換而來，它具有以下特性：

SQLserver中cube：多維資料集例項詳解

1、cube:生成多維資料集，包含各維度可能組合的交叉表格，使用with 關鍵字連線 with cube

labelme轉coco資料集

相關推薦