Tensorflow中批量讀取資料的案列分析及TFRecord檔案的打包與讀取

阿新 • • 發佈：2020-07-01

單一資料讀取方式：

　　第一種：slice_input_producer()

# 返回值可以直接通過 Session.run([images,labels])檢視，且第一個引數必須放在列表中，如[...]
[images,labels] = tf.train.slice_input_producer([images,labels],num_epochs=None,shuffle=True)

　　第二種：string_input_producer()

# 需要定義檔案讀取器，然後通過讀取器中的 read()方法來獲取資料（返回值型別 key,value），再通過 Session.run(value)檢視
file_queue = tf.train.string_input_producer(filename,shuffle=True)

reader = tf.WholeFileReader()      # 定義檔案讀取器
key,value = reader.read(file_queue)  # key：檔名；value：檔案中的內容

　　！！！num_epochs=None，不指定迭代次數，這樣檔案佇列中元素個數也不限定（None*資料集大小）。

　　！！！如果它不是None，則此函式建立本地計數器 epochs，需要使用local_variables_initializer()初始化區域性變數

　　！！！以上兩種方法都可以生成檔名佇列。

（隨機）批量資料讀取方式：

batchsize=2　　# 每次讀取的樣本數量
tf.train.batch(tensors,batch_size=batchsize)
tf.train.shuffle_batch(tensors,batch_size=batchsize,capacity=batchsize*10,min_after_dequeue=batchsize*5) # capacity > min_after_dequeue

　　！！！以上所有讀取資料的方法，在Session.run()之前必須開啟檔案佇列執行緒 tf.train.start_queue_runners()

TFRecord檔案的打包與讀取

一、單一資料讀取方式

第一種：slice_input_producer()

def slice_input_producer(tensor_list,shuffle=True,seed=None,capacity=32,shared_name=None,name=None)

案例1：

import tensorflow as tf

images = ['image1.jpg','image2.jpg','image3.jpg','image4.jpg']
labels = [1,2,3,4]

# [images,shuffle=True)

# 當num_epochs=2時，此時檔案佇列中只有 2*4=8個樣本，所有在取第9個樣本時會出錯
# [images,num_epochs=2,shuffle=True)

data = tf.train.slice_input_producer([images,shuffle=True)
print(type(data))  # <class 'list'>

with tf.Session() as sess:
  # sess.run(tf.local_variables_initializer())
  sess.run(tf.local_variables_initializer())
  coord = tf.train.Coordinator() # 執行緒的協調器
  threads = tf.train.start_queue_runners(sess,coord) # 開始在圖表中收集佇列執行器

  for i in range(10):
    print(sess.run(data))

  coord.request_stop()
  coord.join(threads)

"""

執行結果：

[b'image2.jpg',2]
[b'image1.jpg',1]
[b'image3.jpg',3]
[b'image4.jpg',4]
[b'image2.jpg',2]
[b'image3.jpg',3]
"""

　　！！！slice_input_producer() 中的第一個引數需要放在一個列表中，列表中的每個元素可以是 List 或 Tensor，如 [images，labels]，

　　！！！num_epochs設定

第二種：string_input_producer()

def string_input_producer(string_tensor,name=None,cancel_op=None)

檔案讀取器

　　不同型別的檔案對應不同的檔案讀取器，我們稱為 reader物件；

　　該物件的 read 方法自動讀取檔案，並建立資料佇列，輸出key/檔名，value/檔案內容；

reader = tf.TextLineReader()   ### 一行一行讀取，適用於所有文字檔案

reader = tf.TFRecordReader()   ### A Reader that outputs the records from a TFRecords file

reader = tf.WholeFileReader()   ### 一次讀取整個檔案，適用圖片

案例2：讀取csv檔案

import tensorflow as tf

filename = ['data/A.csv','data/B.csv','data/C.csv']

file_queue = tf.train.string_input_producer(filename,num_epochs=2)  # 生成檔名佇列
reader = tf.WholeFileReader()      # 定義檔案讀取器（一次讀取整個檔案）
# reader = tf.TextLineReader()      # 定義檔案讀取器(一行一行的讀)
key,value = reader.read(file_queue)  # key：檔名；value：檔案中的內容
print(type(file_queue))

init = [tf.global_variables_initializer(),tf.local_variables_initializer()]
with tf.Session() as sess:
  sess.run(init)
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(sess=sess,coord=coord)
  try:
    while not coord.should_stop():
      for i in range(6):
        print(sess.run([key,value]))
      break
  except tf.errors.OutOfRangeError:
    print('read done')
  finally:
    coord.request_stop()
  coord.join(threads)

"""
reader = tf.WholeFileReader()      # 定義檔案讀取器（一次讀取整個檔案）
執行結果：
[b'data/C.csv',b'7.jpg,7\n8.jpg,8\n9.jpg,9\n']
[b'data/B.csv',b'4.jpg,4\n5.jpg,5\n6.jpg,6\n']
[b'data/A.csv',b'1.jpg,1\n2.jpg,2\n3.jpg,3\n']
[b'data/A.csv',3\n']
[b'data/B.csv',6\n']
[b'data/C.csv',9\n']
"""
"""
reader = tf.TextLineReader()      # 定義檔案讀取器(一行一行的讀)
執行結果：
[b'data/B.csv:1',4']
[b'data/B.csv:2',b'5.jpg,5']
[b'data/B.csv:3',b'6.jpg,6']
[b'data/C.csv:1',7']
[b'data/C.csv:2',b'8.jpg,8']
[b'data/C.csv:3',b'9.jpg,9']
"""

案例3：讀取圖片（每次讀取全部圖片內容，不是一行一行）

import tensorflow as tf

filename = ['1.jpg','2.jpg']
filename_queue = tf.train.string_input_producer(filename,shuffle=False,num_epochs=1)
reader = tf.WholeFileReader()       # 檔案讀取器
key,value = reader.read(filename_queue)  # 讀取檔案 key:檔名；value：圖片資料，bytes

with tf.Session() as sess:
  tf.local_variables_initializer().run()
  coord = tf.train.Coordinator()   # 執行緒的協調器
  threads = tf.train.start_queue_runners(sess,coord)

  for i in range(filename.__len__()):
    image_data = sess.run(value)
    with open('img_%d.jpg' % i,'wb') as f:
      f.write(image_data)
  coord.request_stop()
  coord.join(threads)

二、（隨機）批量資料讀取方式：

　　功能：shuffle_batch() 和 batch() 這兩個API都是從檔案佇列中批量獲取資料，使用方式類似；

案例4：slice_input_producer() 與 batch()

import tensorflow as tf
import numpy as np

images = np.arange(20).reshape([10,2])
label = np.asarray(range(0,10))
images = tf.cast(images,tf.float32)　　# 可以註釋掉，不影響執行結果
label = tf.cast(label,tf.int32)　　　　 # 可以註釋掉，不影響執行結果

batchsize = 6  # 每次獲取元素的數量
input_queue = tf.train.slice_input_producer([images,label],shuffle=False)
image_batch,label_batch = tf.train.batch(input_queue,batch_size=batchsize)

# 隨機獲取 batchsize個元素，其中，capacity：佇列容量，這個引數一定要比 min_after_dequeue 大
# image_batch,label_batch = tf.train.shuffle_batch(input_queue,capacity=64,min_after_dequeue=10)

with tf.Session() as sess:
  coord = tf.train.Coordinator()   # 執行緒的協調器
  threads = tf.train.start_queue_runners(sess,coord)   # 開始在圖表中收集佇列執行器
  for cnt in range(2):
    print("第{}次獲取資料,每次batch={}...".format(cnt+1,batchsize))
    image_batch_v,label_batch_v = sess.run([image_batch,label_batch])
    print(image_batch_v,label_batch_v,label_batch_v.__len__())

  coord.request_stop()
  coord.join(threads)

"""

執行結果：
第1次獲取資料,每次batch=6...
[[ 0. 1.]
[ 2. 3.]
[ 4. 5.]
[ 6. 7.]
[ 8. 9.]
[10. 11.]] [0 1 2 3 4 5] 6
第2次獲取資料,每次batch=6...
[[12. 13.]
[14. 15.]
[16. 17.]
[18. 19.]
[ 0. 1.]
[ 2. 3.]] [6 7 8 9 0 1] 6
"""

案例5：從本地批量的讀取圖片 --- string_input_producer() 與 batch()

 import tensorflow as tf
 import glob
 import cv2 as cv
 
 def read_imgs(filename,picture_format,input_image_shape,batch_size=):
   """
   從本地批量的讀取圖片
   :param filename: 圖片路徑（包括圖片的檔名），[]
   :param picture_format: 圖片的格式，如 bmp,jpg,png等; string
   :param input_image_shape: 輸入影象的大小; (h,w,c)或[]
   :param batch_size: 每次從檔案佇列中載入圖片的數量; int
   :return: batch_size張圖片資料,Tensor
   """
   global new_img
   # 建立檔案佇列
   file_queue = tf.train.string_input_producer(filename,num_epochs=1,shuffle=True)
   # 建立檔案讀取器
   reader = tf.WholeFileReader()
   # 讀取檔案佇列中的檔案
   _,img_bytes = reader.read(file_queue)
   # print(img_bytes)  # Tensor("ReaderReadV2_19:1",shape=(),dtype=string)
   # 對圖片進行解碼
   if picture_format == ".bmp":
     new_img = tf.image.decode_bmp(img_bytes,channels=1)
   elif picture_format == ".jpg":
     new_img = tf.image.decode_jpeg(img_bytes,channels=3)
   else:
     pass
   # 重新設定圖片的大小
   # new_img = tf.image.resize_images(new_img,input_image_shape)
   new_img = tf.reshape(new_img,input_image_shape)
   # 設定圖片的資料型別
   new_img = tf.image.convert_image_dtype(new_img,tf.uint)
 
   # return new_img
   return tf.train.batch([new_img],batch_size)
 
 
 def main():
   image_path = glob.glob(r'F:\demo\FaceRecognition\人臉庫\ORL\*.bmp')
   image_batch = read_imgs(image_path,".bmp",(112,92,1),5)
   print(type(image_batch))
   # image_path = glob.glob(r'.\*.jpg')
   # image_batch = read_imgs(image_path,".jpg",(313,500,3),1)
 
   sess = tf.Session()
   sess.run(tf.local_variables_initializer())
   tf.train.start_queue_runners(sess=sess)
 
   image_batch = sess.run(image_batch)
   print(type(image_batch))  # <class 'numpy.ndarray'>
 
   for i in range(image_batch.__len__()):
     cv.imshow("win_"+str(i),image_batch[i])
   cv.waitKey()
   cv.destroyAllWindows()
 
 def start():
   image_path = glob.glob(r'F:\demo\FaceRecognition\人臉庫\ORL\*.bmp')
   image_batch = read_imgs(image_path,5)
   print(type(image_batch))  # <class 'tensorflow.python.framework.ops.Tensor'>
 
 
   with tf.Session() as sess:
     sess.run(tf.local_variables_initializer())
     coord = tf.train.Coordinator()   # 執行緒的協調器
     threads = tf.train.start_queue_runners(sess,coord)   # 開始在圖表中收集佇列執行器
     image_batch = sess.run(image_batch)
     print(type(image_batch))  # <class 'numpy.ndarray'>
 
     for i in range(image_batch.__len__()):
       cv.imshow("win_"+str(i),image_batch[i])
     cv.waitKey()
     cv.destroyAllWindows()
 
     # 若使用 with 方式開啟 Session，且沒加如下行語句，則會出錯
     # ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled；
     # 原因：檔案佇列執行緒還處於工作狀態（佇列中還有圖片資料），而載入完batch_size張圖片會話就會自動關閉，同時關閉檔案佇列執行緒
     coord.request_stop()
     coord.join(threads)
 
 
 if __name__ == "__main__":
   # main()
   start()

案列6：TFRecord檔案打包與讀取

TFRecord檔案打包案列

 def write_TFRecord(filename,data,labels,is_shuffler=True):
   """
   將資料打包成TFRecord格式
   :param filename: 打包後路徑名，預設在工程目錄下建立該檔案；String
   :param data: 需要打包的檔案路徑名；list
   :param labels: 對應檔案的標籤；list
   :param is_shuffler:是否隨機初始化打包後的資料，預設：True；Bool
   :return: None
   """
   im_data = list(data)
   im_labels = list(labels)
 
   index = [i for i in range(im_data.__len__())]
   if is_shuffler:
     np.random.shuffle(index)
 
   # 建立寫入器，然後使用該物件寫入樣本example
   writer = tf.python_io.TFRecordWriter(filename)
   for i in range(im_data.__len__()):
     im_d = im_data[index[i]]  # im_d:存放著第index[i]張圖片的路徑資訊
     im_l = im_labels[index[i]] # im_l：存放著對應圖片的標籤資訊
 
     # # 獲取當前的圖片資料 方式一：
     # data = cv2.imread(im_d)
     # # 建立樣本
     # ex = tf.train.Example(
     #   features=tf.train.Features(
     #     feature={
     #       "image": tf.train.Feature(
     #         bytes_list=tf.train.BytesList(
     #           value=[data.tobytes()])),# 需要打包成bytes型別
     #       "label": tf.train.Feature(
     #         int64_list=tf.train.Int64List(
     #           value=[im_l])),#     }
     #   )
     # )
     # 獲取當前的圖片資料 方式二：相對於方式一，打包檔案佔用空間小了一半多
     data = tf.gfile.FastGFile(im_d,"rb").read()
     ex = tf.train.Example(
       features=tf.train.Features(
         feature={
           "image": tf.train.Feature(
             bytes_list=tf.train.BytesList(
               value=[data])),# 此時的data已經是bytes型別
           "label": tf.train.Feature(
             int_list=tf.train.IntList(
               value=[im_l])),}
       )
     )
 
     # 寫入將序列化之後的樣本
     writer.write(ex.SerializeToString())
   # 關閉寫入器
   writer.close()

TFReord檔案的讀取案列

 import tensorflow as tf
 import cv2
 
 def read_TFRecord(file_list,batch_size=):
   """
   讀取TFRecord檔案
   :param file_list: 存放TFRecord的檔名，List
   :param batch_size: 每次讀取圖片的數量
   :return: 解析後圖片及對應的標籤
   """
   file_queue = tf.train.string_input_producer(file_list,shuffle=True)
   reader = tf.TFRecordReader()
   _,ex = reader.read(file_queue)
   batch = tf.train.shuffle_batch([ex],batch_size,capacity=batch_size * 10,min_after_dequeue=batch_size * 5)
 
   feature = {
     'image': tf.FixedLenFeature([],tf.string),'label': tf.FixedLenFeature([],tf.int64)
   }
   example = tf.parse_example(batch,features=feature)
 
   images = tf.decode_raw(example['image'],tf.uint)
   images = tf.reshape(images,[-1,32,3])
 
   return images,example['label']
 
 
 
 def main():
   # filelist = ['data/train.tfrecord']
   filelist = ['data/test.tfrecord']
   images,labels = read_TFRecord(filelist,2)
   with tf.Session() as sess:
     sess.run(tf.local_variables_initializer())
     coord = tf.train.Coordinator()
     threads = tf.train.start_queue_runners(sess=sess,coord=coord)
 
     try:
       while not coord.should_stop():
         for i in range():
           image_bth,_ = sess.run([images,labels])
           print(_)
 
           cv2.imshow("image_0",image_bth[0])
           cv2.imshow("image_1",image_bth[1])
         break
     except tf.errors.OutOfRangeError:
       print('read done')
     finally:
       coord.request_stop()
     coord.join(threads)
     cv2.waitKey(0)
     cv2.destroyAllWindows()
 
 if __name__ == "__main__":
   main()

到此這篇關於Tensorflow中批量讀取資料的案列分析及TFRecord檔案的打包與讀取的文章就介紹到這了,更多相關Tensorflow TFRecord打包與讀取內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們！

Tensorflow中批量讀取資料的案列分析及TFRecord檔案的打包與讀取

單一資料讀取方式：　　第一種：slice_input_producer() # 返回值可以直接通過 Session.run([images,labels])檢視，且第一個引數必須放在列表中，如[...]

.net core 如何向elasticsearch中批量插入資料--2

上一篇我們說了如何向elasticsearch中建立索引和插入資料（https://www.cnblogs.com/zpy1993-09/p/13380197.html）

SparkStreaming消費Kafka資料並計算後往Redis寫資料案列

package com.lg.blgdata.streaming import org.apache.spark.streaming.StreamingContext import org.apache.kafka.common.serialization.StringDeserializer

資料庫、資料倉庫、大資料平臺、資料中臺、資料湖對比分析

層出不窮的新技術、新概念、新應用往往會對初學者造成很大的困擾，有時候很難理清楚它們之間的區別與聯絡。本文將以資料研發相關領域為例，對比分析我們工作中高頻出現的幾個名詞，主要包括以下幾個方面：

shell指令碼從檔案中讀取資料按照特定的格式寫檔案

具體指令碼如下： #!/bin/bash setParams(){sed -i \'$d\' $3echo $2 | tr -d \'[]\'|awk -F, \'{for(i=1;i<=NF;i++)if(match($i,/ENDTIME/)||match($i,/STARTTIME/)){if(length(substr($i,index($i,\":\")))==1

Vscode中不再支援JDK8的原因分析及解決方案

昨天還用得好好的，今天突然給我說僅支援JDK11以上了，也不能進行正常的程式碼補全了。

詳解Python中如何將資料儲存為json格式的檔案

一、基於json模組的儲存、讀取資料 names_writer.py import json names = [\'joker\',\'joe\',\'nacy\',\'timi\']

Springboot升級至2.4.0中出現的跨域問題分析及修改方案

問題 Springboot升級至2.4.0中出現的跨域問題。在Springboot 2.4.0版本之前使用的是2.3.5.RELEASE，對應的Spring版本為5.2.10.RELEASE。

maven中的conf資料夾下的settings.xml檔案配置

一、配置本地倉庫  <localRepository>D:\\java\\maven\\localRepository</localRepository>

docker中修改mysql最大連線數及配置檔案

1.找到mysql映象 docker ps 2.進入映象mysql映象內部 docker exec -it 05138413c565 /bin/bash 3.安裝vim

docker中修改mysql最大連線數及配置檔案的實現

1.找到mysql映象 docker ps 2.進入映象mysql映象內部 docker exec -it 05138413c565 /bin/bash 3.安裝vim

java實現xml檔案儲存與讀取

　　 File GGCNimages_update1XMLFile = FileUtil.isExistXmlFile(\"http://www.baidu.com/sitemap.xml\", getContext());if (GGCNimages_update1XMLFile != null) {Log.d(\"debug\", \"從本地快取檔案中載入輪播圖