對tensorflow中cifar-10文件的Read操作詳解

阿新 • • 發佈：2020-02-10

前言

在tensorflow的官方文件中得卷積神經網路一章，有一個使用cifar-10圖片資料集的實驗，搭建卷積神經網路倒不難，但是那個cifar10_input檔案著實讓我費了一番心思。配合著官方文件也算看的七七八八，但是中間還是有一些不太明白，不明白的mark一下，這次記下一些已經明白的。

研究

cifar10_input.py檔案的read操作，主要的就是下面的程式碼：

if not eval_data:
  filenames = [os.path.join(data_dir,'data_batch_%d.bin' % i)
         for i in xrange(1,6)]
  num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
 else:
  filenames = [os.path.join(data_dir,'test_batch.bin')]
  num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
...
filename_queue = tf.train.string_input_producer(filenames)

...

label_bytes = 1 # 2 for CIFAR-100
 result.height = 32
 result.width = 32
 result.depth = 3
 image_bytes = result.height * result.width * result.depth
 # Every record consists of a label followed by the image,with a
 # fixed number of bytes for each.
 record_bytes = label_bytes + image_bytes

 # Read a record,getting filenames from the filename_queue. No
 # header or footer in the CIFAR-10 format,so we leave header_bytes
 # and footer_bytes at their default of 0.
 reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
 result.key,value = reader.read(filename_queue)

 ...

 if shuffle:
  images,label_batch = tf.train.shuffle_batch(
    [image,label],batch_size=batch_size,num_threads=num_preprocess_threads,capacity=min_queue_examples + 3 * batch_size,min_after_dequeue=min_queue_examples)
 else:
  images,label_batch = tf.train.batch(
    [image,capacity=min_queue_examples + 3 * batch_size)

開始並不明白這段程式碼是用來幹什麼的，越看越糊塗，因為之前使用tensorflow最多也就是使用哪個tf.placeholder()這個操作，並沒有使用tensorflow自帶的讀寫方法來讀寫，所以上面的程式碼看的很費勁兒。不過我在官方文件的How-To這個document中看到了這個東西：

Batching

def read_my_file_format(filename_queue):
 reader = tf.SomeReader()
 key,record_string = reader.read(filename_queue)
 example,label = tf.some_decoder(record_string)
 processed_example = some_processing(example)
 return processed_example,label

def input_pipeline(filenames,batch_size,num_epochs=None):
 filename_queue = tf.train.string_input_producer(
   filenames,num_epochs=num_epochs,shuffle=True)
 example,label = read_my_file_format(filename_queue)
 # min_after_dequeue defines how big a buffer we will randomly sample
 #  from -- bigger means better shuffling but slower start up and more
 #  memory used.
 # capacity must be larger than min_after_dequeue and the amount larger
 #  determines the maximum we will prefetch. Recommendation:
 #  min_after_dequeue + (num_threads + a small safety margin) * batch_size
 min_after_dequeue = 10000
 capacity = min_after_dequeue + 3 * batch_size
 example_batch,label_batch = tf.train.shuffle_batch(
   [example,capacity=capacity,min_after_dequeue=min_after_dequeue)
 return example_batch,label_batch

感覺豁然開朗，再研究一下其官方文件API就能大約明白期間意思。最有代表性的圖示官方文件中也給出來了，雖然官方文件給的解釋並不多。

對tensorflow中cifar-10文件的Read操作詳解

API我就不一一解釋了，我們下面通過實驗來明白。

實驗

首先在tensorflow路徑下建立兩個檔案，分別命名為test.txt以及test2.txt，其內容分別是：

test.txt:

test line1
test line2
test line3
test line4
test line5
test line6

test2.txt:

test2 line1
test2 line2
test2 line3
test2 line4
test2 line5
test2 line6

然後再命令列裡依次鍵入下面的命令：

import tensorflow as tf
filenames=['test.txt','test2.txt']
#建立如上圖所示的filename_queue
filename_queue=tf.train.string_input_producer(filenames)
#選取的是每次讀取一行的TextLineReader
reader=tf.TextLineReader()
init=tf.initialize_all_variables()
#讀取檔案，也就是建立上圖中的Reader
key,value=reader.read(filename_queue)
#讀取batch檔案，batch_size設定成1，為了方便看
bs=tf.train.batch([value],batch_size=1,num_threads=1,capacity=2)
sess=tf.Session() 
#非常關鍵，這個是連通各個queue圖的關鍵          
tf.train.start_queue_runners(sess=sess)
#計算有reader的輸出
b=reader.num_records_produced()

然後我們執行：

>>> sess.run(bs)
array(['test line1'],dtype=object)
>>> sess.run(b)
4
>>> sess.run(bs)
array(['test line2'],dtype=object)
>>> sess.run(b)
5
>>> sess.run(bs)
array(['test line3'],dtype=object)
>>> sess.run(bs)
array(['test line4'],dtype=object)
>>> sess.run(bs)
array(['test line5'],dtype=object)
>>> sess.run(bs)
array(['test line6'],dtype=object)
>>> sess.run(bs)
array(['test2 line1'],dtype=object)
>>> sess.run(bs)
array(['test2 line2'],dtype=object)
>>> sess.run(bs)
array(['test2 line3'],dtype=object)
>>> sess.run(bs)
array(['test2 line4'],dtype=object)
>>> sess.run(bs)
array(['test2 line5'],dtype=object)
>>> sess.run(bs)
array(['test2 line6'],dtype=object)
>>> sess.run(bs)
array(['test line1'],dtype=object)

我們發現，當batch_size設定成為1的時候，bs的輸出是按照檔案行數進行逐步列印的，原因是，我們選擇的是單個Reader進行操作的，這個Reader先將test.txt檔案讀取，然後逐行讀取並將讀取的文字送到example queue(如上圖)中，因為這裡batch設定的是1，而且用到的是tf.train.batch()方法，中間沒有shuffle，所以自然而然是按照順序輸出的，之後Reader再讀取test2.txt。但是這裡有一個疑惑，為什麼reader.num_records_produced的第一個輸出不是從1開始的，這點不太清楚。另外，打印出filename_queue的size:

>>> sess.run(filename_queue.size())
32

發現filename_queue的size有32個之多！這點也不明白。。。

我們可以更改實驗條件，將batch_size設定成2，會發現也是順序的輸出，而且每次輸出為2行文字（和batch_size一樣）

我們繼續更改實驗條件，將tf.train.batch方法換成tf.train.shuffle_batch方法，文字資料不變：

import tensorflow as tf
filenames=['test.txt','test2.txt']
filename_queue=tf.train.string_input_producer(filenames)
reader=tf.TextLineReader()
init=tf.initialize_all_variables()
key,value=reader.read(filename_queue)
bs=tf.train.shuffle_batch([value],capacity=4,min_after_dequeue=2)
sess=tf.Session()           
tf.train.start_queue_runners(sess=sess)
b=reader.num_records_produced()

繼續剛才的執行：

>>> sess.run(bs)
array(['test2 line2'],dtype=object)
>>> sess.run(bs)
array(['test line2'],dtype=object)
>>> sess.run(bs)
array(['test line3'],dtype=object)

我們發現的是，使用了shuffle操作之後，明顯的bs的輸出變得不一樣了，變得沒有規則,然後我們看filename_queue的size：

>>> sess.run(filename_queue.size())
32

發現也是32，由此估計是tensorflow會根據檔案大小預設filename_queue的長度。注意這裡面的capacity=4,min_after_dequeue=2這些個命令，capacity指的是example queue的最大長度，而min_after_dequeue是指在出佇列之後，example queue最少要保留的元素個數，為什麼需要這個，其實是為了混合的更顯著。也正是有這兩個元素，讓shuffle變得可能。

到這裡基本上大概的思路能明白，但是上面的實驗都是對於單個的Reader，和上一節的圖不太一致，根據官網教程，為了使用多個Reader，我們可以這樣：

import tensorflow as tf
filenames=['test.txt','test2.txt']
filename_queue=tf.train.string_input_producer(filenames)
reader=tf.TextLineReader()
init=tf.initialize_all_variables()
key_list,value_list=[reader.read(filename_queue) for _ in range(2)]
bs2=tf.train.shuffle_batch_join([value_list],min_after_dequeue=2)
sess=tf.Session()       
sess.run(init)    
tf.train.start_queue_runners(sess=sess)

執行的結果如下：

>>> sess.run(bs2)
[array(['test2.txt:2'],dtype=object),array(['test2 line2'],dtype=object)]
>>> sess.run(bs2)
[array(['test2.txt:5'],array(['test2 line5'],dtype=object)]
>>> sess.run(bs2)
[array(['test2.txt:6'],array(['test2 line6'],dtype=object)]
>>> sess.run(bs2)
[array(['test2.txt:4'],array(['test2 line4'],dtype=object)]
>>> sess.run(bs2)
[array(['test2.txt:3'],array(['test2 line3'],dtype=object)]
>>> sess.run(bs2)
[array(['test2.txt:1'],array(['test2 line1'],dtype=object)]
>>> sess.run(bs2)
[array(['test.txt:4'],array(['test line4'],dtype=object)]
>>> sess.run(bs2)
[array(['test.txt:3'],array(['test line3'],dtype=object)]
>>> sess.run(bs2)
[array(['test.txt:2'],array(['test line2'],dtype=object)]

以上這篇對tensorflow中cifar-10文件的Read操作詳解就是小編分享給大家的全部內容了，希望能給大家一個參考，也希望大家多多支援我們。

對tensorflow中cifar-10文件的Read操作詳解

對tensorflow中cifar-10文件的Read操作詳解

對Tensorflow中tensorboard日誌的生成與顯示詳解

對Tensorflow中Device例項的生成和管理詳解

基於python-pptx庫中文文件及使用詳解

eclipse引入本地API文件設定步驟詳解

SpringBoot整合Swagger2構建線上API文件的程式碼詳解

API中文文件：Swagger詳解

CSS之float,文件流,position詳解

c++ STL之list對結構體的增加,刪除,排序等操作詳解

對tensorflow中的strides引數使用詳解

使用Python爬蟲庫BeautifulSoup遍歷文件樹並對標籤進行操作詳解

對tensorflow 中tile函式的使用詳解

對tensorflow中tf.nn.conv1d和layers.conv1d的區別詳解

10.jQuery屬性—文件—位置操作

xiaojie wgjj XML中的DTD文件型別定義完全解析

快速整理手機中的PDF文件附圖文

批處理指令碼中變數--獲取文件中的欄位

頁面中插入word文件

swagger中json-api文件通過Java程式碼轉化為markdown格式

Spring 5 中函式式web開發中的swagger文件

對tensorflow中cifar-10文件的Read操作詳解

相關推薦