tensorflow製作資料集之TFRecord

阿新 • • 發佈：2019-01-15

前面講解了tensorflow讀取csv格式的資料（http://blog.csdn.net/xuan_zizizi/article/details/78400839 ，以及tensorflow的3個函式gfile，WholeFileReader,read_file 讀取影象資料（http://blog.csdn.net/xuan_zizizi/article/details/78418351) ,這些方法對於小資料量都可以較好的實施，但是對於深度學習的大量圖片則存在效率問題。一般讀取可以理解為直接載入資料進記憶體和通過佇列載入。今天要講的tensorboard就是通過佇列進行讀取圖片集並製作成二進位制的資料集，能更好的利用記憶體，更方便複製和移動，不需要單獨的標籤檔案。下面具體講解：
1.生成TFRecord檔案
寫入TFRecord資料：tf.train.Example 協議記憶體塊包含了Features欄位，通過Features欄位裡面的feature將圖片的二進位制資料和label進行統一封裝，然後將example協議記憶體塊轉化為字串， tf.python_io.TFRecordWriter 將圖片寫入到TFRecords檔案中。流程如下注釋所示：

import os
import tensorflow as tf 
#cwd = os.getcwd()#自動獲取當前目錄路徑
cwd = '/home/zcm/tensorf/ball/'#手動輸入路徑
classes = {'ball1','ball2'}#類別設定
#定義writer用於寫入資料，tf.python_io.TFRecordWriter 寫入到TFRecords檔案中
writer = tf.python_io.TFRecordWriter("ball_train.tfrecords")#定義生成的檔名為“ball_train.tfrecords”
for index, name in 
 enumerate(classes):
    class_path = cwd + name + "/"
    for img_name in os.listdir(class_path):
        img_path = class_path + img_name #每一個圖片的地址
        img = Image.open(img_path)
        img = img.resize((224, 224))  #將圖片儲存成224×224大小
        img_raw = img.tobytes()              #將圖片轉化為原生bytes，#tf.train.Example 協議記憶體塊包含了Features欄位，通過feature將圖片的二進位制資料和label進行統一封裝 

        example = tf.train.Example(features=tf.train.Features(feature={
            "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
            'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
        }))   #example物件對label和image進行封裝
        writer.write(example.SerializeToString())  #序列化為字串，example協議記憶體塊轉化為字串
writer.close()

結果在當前執行路徑下生成名為ball_train.tfrecords的檔案。
這裡寫圖片描述
這樣資料集的生成就完成了，接下來就是資料的讀取。
2.讀取TFRecord資料
（1）可以簡單的5通過os讀取：

import os
import tensorflow as tf 

for serialized_example in tf.python_io.tf_record_iterator("ball_train.tfrecords"):
    example = tf.train.Example()
    example.ParseFromString(serialized_example)

    image = example.features.feature['image'].bytes_list.value
    label = example.features.feature['label'].int64_list.value
    # 可以做一些預處理之類的
    print (image,label)

（2）一般比實用的是利用tf.train.string_input_producer()這個函式建立一個佇列，利用tf.RecoderReader()和tf.parse_single_example()來將example協議記憶體塊解析為張量。具體操作如下：

import os
import tensorflow as tf 
from PIL import Image #image後面需要使用

cwd = '/home/zcm/tensorf/ball/'#手動輸入路徑
filename_queue = tf.train.string_input_producer(["ball_train.tfrecords"]) #讀入流中
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)   #返回檔名和檔案
features = tf.parse_single_example(serialized_example,
                                   features={
                                       'label': tf.FixedLenFeature([], tf.int64),
                                       'img_raw' : tf.FixedLenFeature([], tf.string),
                                   })  #取出包含image和label的feature物件
image = tf.decode_raw(features['img_raw'], tf.uint8)
image = tf.reshape(image, [224, 224, 3])
label = tf.cast(features['label'], tf.int32)
with tf.Session() as sess: #開始一個會話
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    coord=tf.train.Coordinator()
    threads= tf.train.start_queue_runners(coord=coord)
    for i in range(20):
        example, l = sess.run([image,label])#在會話中取出image和label
        img=Image.fromarray(example, 'RGB')#這裡Image是之前提到的
        img.save(cwd+str(i)+'_''Label_'+str(l)+'.jpg')#存下圖片
        print('----------------------------')
        print(example, l)
    coord.request_stop()
    coord.join(threads)

結果打印出圖片的種類和儲存圖片：
這裡寫圖片描述
（3）也可以定義一個函式read_and_decode(filename)來進行讀取和解碼，操作如下所示：

import os
import tensorflow as tf 
from PIL import Image #image後面需要使用

cwd = '/home/zcm/tensorf/ball/'#手動輸入路徑
def read_and_decode(filename): # 讀入dog_train.tfrecords
    filename_queue = tf.train.string_input_producer([filename])#生成一個queue佇列

    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)#返回檔名和檔案
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'label': tf.FixedLenFeature([], tf.int64),
                                           'img_raw' : tf.FixedLenFeature([], tf.string),
                                       })#將image資料和label取出來

    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224,224, 3])  #reshape為224*224的3通道圖片
    #歸一化
    img = tf.cast(img, tf.float32) * (1. / 255) - 0.5 #在流中丟擲img張量
    label = tf.cast(features['label'], tf.int32) #在流中丟擲label張量
    return img, label
image, label = read_and_decode("ball_train.tfrecords") #使用函式讀入流中
with tf.Session() as sess: #開始一個會話
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    coord=tf.train.Coordinator()
    threads= tf.train.start_queue_runners(coord=coord)
    for i in range(20):
        example, l = sess.run([image,label])#在會話中取出image和label
        img=Image.fromarray(example, 'RGB')#這裡Image是之前提到的
        img.save(cwd+str(i)+'_''Label_'+str(l)+'.jpg')#存下圖片
        print('----------------------------')
        print(example, l)
    coord.request_stop()
    coord.join(threads)

這裡寫圖片描述
注意：在顯示的時候把歸一化操作註釋掉，但是在進行資料預處理的時候需要加上，下面的圖片是進行歸一化之後的顯示結果。
總結一下自己製作資料集的流程：

tensorflow製作資料集之TFRecord

tensorflow製作資料集之TFRecord

使用tensorflow訓練自己的資料集（一）——製作資料集

深度學習入門專案完整流程——圖片製作資料集、訓練網路、測試準確率（TensorFlow+keras）

tensorflow匯入mnist資料集之超時解決辦法

深度學習（二）——從零自己製作資料集到利用deepNN實現誇張人臉表情的實時監測（tensorflow實現）

[ MOOC課程學習 ] 人工智慧實踐：Tensorflow筆記_CH6_2 製作資料集

TensorFlow Mnist資料集下載問題

Tensorflow: MNIST資料集實現DNN、CNN、LSTM神經網路

pytorch人臉識別——自己製作資料集

Tensorflow建立資料集（mnist為例）

==5== ubuntu16.04 python3.5安裝labelImg/labelme工具--製作資料集

Semantic Segmentation DeepLab v3 讀取資料集（TFRecord）程式碼詳解

lession25 製作資料集

TensorFlow中資料讀取之tfrecords

Tensorflow mnist資料集操作

基於CNN 的 TensorFlow Mnist 資料集實現（另附識別單幅圖片的源程式）

tensorflow MNIST資料集上簡單的MLP網路

製作資料集標籤--使用labelImg

識別MNIST資料集之（二）：用Python實現神經網路

caffe自己製作資料集的時候出現的問題，及解決方法

tensorflow製作資料集之TFRecord

相關推薦