TensorFlow——二進位制資料讀取

阿新 • • 發佈：2018-12-10

一、CIFAR10二進位制資料集介紹

https://www.cs.toronto.edu/~kriz/cifar.html

二進位制版本資料檔案

二進位制版本包含檔案data_batch_1.bin，data_batch_2.bin，...，data_batch_5.bin以及test_batch.bin

。這些檔案中的每一個格式如下，資料中每個樣本包含了特徵值和目標值：

<1×標籤> <3072×畫素> 
... 
<1×標籤> <3072×畫素>

第一個位元組是第一個影象的標籤，它是一個0-9範圍內的數字。接下來的3072個位元組是影象畫素的值。前1024個位元組是紅色通道值，下1024個綠色，最後1024個藍色。

值以行優先順序儲存，因此前32個位元組是影象第一行的紅色通道值。每個檔案都包含10000個這樣的3073位元組的“行”影象，但沒有任何分隔行的限制。因此每個檔案應該完全是30730000位元組長。

二、CIFAR10 二進位制資料讀取

1.分析

構造檔案佇列
讀取二進位制資料並進行解碼
處理圖片資料形狀以及資料型別，批處理返回
開啟會話執行緒執行

2.程式碼

定義CIFAR類，設定圖片相關的屬性

class CifarRead(object):
    """
    二進位制檔案的讀取，tfrecords儲存讀取
    """

    def __init__(self):
        # 定義一些圖片的屬性
        self.height = 32
        self.width = 32
        self.channel = 3

        self.label_bytes = 1
        self.image_bytes = self.height * self.width * self.channel
        self.bytes = self.label_bytes + self.image_bytes

實現讀取資料方法bytes_read(self, file_list)

構造檔案佇列

# 1、構造檔案佇列
file_queue = tf.train.string_input_producer(file_list)

tf.FixedLengthRecordReader(bytes)讀取

# 2、使用tf.FixedLengthRecordReader(bytes)讀取
# 預設必須指定讀取一個樣本
reader = tf.FixedLengthRecordReader(self.all_bytes)

_, value = reader.read(file_queue)

進行解碼操作

# 3、解碼操作
# (?, )   (3073, ) = label(1, ) + feature(3072, )
label_image = tf.decode_raw(value, tf.uint8)
# 為了訓練方便，一般會把特徵值和目標值分開處理
print(label_image)

將資料的標籤和圖片進行分割

# 使用tf.slice進行切片
label = tf.cast(tf.slice(label_image, [0], [self.label_bytes]), tf.int32)

image = tf.slice(label_image, [self.label_bytes], [self.image_bytes])

print(label, image)

處理資料的形狀，並且進行批處理

# 處理型別和圖片資料的形狀
# 圖片形狀
# reshape (3072, )----[channel, height, width]
# transpose [channel, height, width] --->[height, width, channel]
depth_major = tf.reshape(image, [self.channel, self.height, self.width])
print(depth_major)

image_reshape = tf.transpose(depth_major, [1, 2, 0])

print(image_reshape)

# 4、批處理
image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=10, num_threads=1, capacity=10)

這裡的圖片形狀設定從1維的排列到3維資料的時候，涉及到NHWC與NCHW的概念：

1）NHWC與NCHW

在讀取設定圖片形狀的時候有兩種格式：

設定為 "NHWC" 時，排列順序為 [batch, height, width, channels]；

設定為 "NCHW" 時，排列順序為 [batch, channels, height, width]。

其中 N 表示這批影象有幾張，H 表示影象在豎直方向有多少畫素，W 表示水平方向畫素數，C 表示通道數。

Tensorflow預設的[height, width, channel]

假設RGB三通道兩種格式的區別如下圖所示：

1 理解

假設1, 2, 3, 4-紅色 5, 6, 7, 8-綠色 9, 10, 11, 12-藍色

如果通道在最低維度0[channel, height, width]，RGB三顏色分成三組，在第一維度上找到三個RGB顏色
如果通道在最高維度2[height, width, channel]，在第三維度上找到RGB三個顏色

# 1、想要變成：[2 height, 2width,  3channel]，但是輸出結果不對
In [7]: tf.reshape([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [2, 2, 3]).eval()
Out[7]:
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]], dtype=int32)

# 2、所以要這樣去做
In [8]: tf.reshape([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [3, 2, 2]).eval()
Out[8]:
array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]]], dtype=int32)
# 接著使用tf.transpose ，0，1，2代表三個維度標記
# Convert from [depth, height, width] to [height, width, depth].
# 0,1,2-----> 1, 2, 0
In [17]: tf.transpose(depth_major, [1, 2, 0]).eval()
Out[17]:
array([[[ 1,  5,  9],
        [ 2,  6, 10]],

       [[ 3,  7, 11],
        [ 4,  8, 12]]], dtype=int32)

2 轉換API

tf.transpose(a, perm=None)
- Transposes a. Permutes the dimensions according to perm.
  - 修改維度的位置
- a：資料
- perm:形狀的維度值下標列表

2）處理圖片的形狀

所以在讀取資料處理形狀的時候

1 image (3072, ) —>tf.reshape(image, [])裡面的shape是[channel, height, width]， 所以得先從[depth height width] to [depth, height, width]。
2 然後使用tf.transpose，將剛才的資料[depth, height, width]，變成Tensorflow預設的[height, width, channel]

3 完整程式碼

import tensorflow as tf
import os


class Cifar(object):

    # 初始化
    def __init__(self):
        # 影象的大小
        self.height = 32
        self.width = 32
        self.channels = 3

        # 影象的位元組數
        self.label_bytes = 1
        self.image_bytes = self.height * self.width * self.channels
        self.bytes = self.label_bytes + self.image_bytes

    def read_and_decode(self, file_list):
        # 讀取二進位制檔案
        # print("read_and_decode:\n", file_list)
        # 1、構造檔名佇列
        file_queue = tf.train.string_input_producer(file_list)

        # 2、構造二進位制檔案閱讀器
        reader = tf.FixedLengthRecordReader(self.bytes)
        key, value = reader.read(file_queue)

        print("key:\n", key)
        print("value:\n", value)
        # 3、解碼
        decoded = tf.decode_raw(value, tf.uint8)
        print("decoded:\n", decoded)

        # 4、基本的資料處理
        # 切片處理，把標籤值和特徵值分開
        label = tf.slice(decoded, [0], [self.label_bytes])
        image = tf.slice(decoded, [self.label_bytes], [self.image_bytes])

        print("label:\n", label)
        print("image:\n", image)
        # 改變影象的形狀
        image_reshaped = tf.reshape(image, [self.channels, self.height, self.width])
        # 轉置
        image_transposed = tf.transpose(image_reshaped, [1, 2, 0])
        print("image_transposed:\n", image_transposed)

        # 型別轉換
        label_cast = tf.cast(label, tf.float32)
        image_cast = tf.cast(image_transposed, tf.float32)

        # 5、批處理
        label_batch, image_batch = tf.train.batch([label_cast, image_cast], batch_size=10, num_threads=1, capacity=10)
        return label_batch, image_batch


if __name__ == "__main__":
    # 構造檔名列表
    file_name = os.listdir("./cifar-10-batches-bin")
    print("file_name:\n", file_name)
    file_list = [os.path.join("./cifar-10-batches-bin/", file) for file in file_name if file[-3:] == "bin"]
    print("file_list:\n", file_list)

    # 呼叫讀取二進位制檔案的方法
    cf = Cifar()
    label, image = cf.read_and_decode(file_list)

    # 開啟會話
    with tf.Session() as sess:
        # 建立執行緒協調器
        coord = tf.train.Coordinator()
        # 建立執行緒
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)

        # 列印結果
        print("label:\n", sess.run(label))
        print("image:\n", sess.run(image))

        # 回收資源
        coord.request_stop()
        coord.join(threads)

TensorFlow——二進位制資料讀取

一、CIFAR10二進位制資料集介紹 https://www.cs.toronto.edu/~kriz/cifar.html 二進位制版本資料檔案二進位制版本包含檔案data_batch_1.bin，data_batch_2.bin，...，data_batch_

tensorflow的資料讀取機制詳解

人工智慧/機器學習/深度學習交流QQ群：116270156 tensorflow的資料讀取機制詳解 tf.train.slice_input_producer tf.train.batch 最後 T

TensorFlow中資料讀取之tfrecords

關於Tensorflow讀取資料，官網給出了三種方法：供給資料(Feeding)：在TensorFlow程式執行的每一步，讓Python程式碼來供給資料。從檔案讀取資料：在TensorFlow圖的起始，讓一個輸入管線從檔案中讀取資料。預載入資料：在TensorFlow圖中定義常

tensorflow-檔案資料讀取

# -*- coding:utf-8 -*- import tensorflow as tf fn_queue=tf.train.string_input_producer(["winequality-white-test.csv"]) reader=tf.TextLineReader(skip_header

tensorflow--從檔案讀取資料

讀取資料-csv tensorflow讀取資料流程構造檔案佇列讀取佇列內容 reader = tf.TextLineReader() 解析成一個樣本資料 example, label = tf.decode_csv(value, record_defaults

TensorFlow走過的坑之---資料讀取和tf中batch的使用方法

首先介紹資料讀取問題，現在TensorFlow官方推薦的資料讀取方法是使用tf.data.Dataset，具體的細節不在這裡贅述，看官方文件更清楚，這裡主要記錄一下官方文件沒有提到的坑，以示"後人"。因為是記錄踩過的坑，所以行文混亂，見諒。 I 問題背景不感興趣的可跳過此節。最近在研究ENAS的程式

【Tensorflow】圖片資料讀取

關於Tensorflow讀取資料，官網給出了三種方法：供給資料(Feeding)：在TensorFlow程式執行的每一步，讓Python程式碼來供給資料。從檔案讀取資料：在TensorFlow圖的起始，讓一個輸入管線從檔案中讀取資料。預載入資料：在T

TensorFlow 資料讀取方法總結

作者：黑暗星球原文地址：https://blog.csdn.net/u014061630/article/details/80712635 ====================下一篇：tf.data 官方教程==================== ==============

TensorFlow全新的資料讀取方式：Dataset API入門教程

Dataset API是TensorFlow 1.3版本中引入的一個新的模組，主要服務於資料讀取，構建輸入資料的pipeline。此前，在TensorFlow中讀取資料一般有兩種方法：使用placeholder讀記憶體中的資料使用queue讀硬碟中的資料（關於這種

Tensorflow資料讀取機制及tfrecords高效讀取資料

1. tensorflow 的資料讀取機制以影象資料為例，資料讀取過程如下所示：假設我們的硬碟中有一個圖片資料集0001.jpg，0002.jpg，0003.jpg……我們只需要把它們讀取到記憶體中，然後提供給GPU或是CPU進行計算就

tensorflow資料讀取和處理

檔案匹配 ["file0", "file1"]或[("file%d" % i) for i in range(2)] files = tf.train.match_filenames_once("C:/path/to/data.tfrecords-*") 讀取

tensorflow 16：資料讀取（以cifar10_input.py為例）

資料讀取概述 TensorFlow程式讀取資料一共有3種方法: 供給資料(Feeding)：在TensorFlow程式執行的每一步，讓Python程式碼來供給資料。從檔案讀取資料：在TensorFlow圖的起始，讓一個輸入管線從檔案中讀取資料。預載

Tensorflow 流水線並行讀取資料

前言一直以來都是用 tensorflow 框架實現深度學習的演算法，在網路訓練時有一個重要的問題就是訓練資料的讀取。tensorflow 支援流水線並行讀取資料，這種方式將資料的讀取和網路訓練並行，資料讀取效率和將所有資料載入記憶體後進行存取相當，卻又不會增加記憶體開銷，是很值得推薦的一種方式。這篇筆記就

Tensorflow資料讀取方式總結

1、使用placeholder讀記憶體中的資料最簡單的一種方法是用placeholder，然後以feed_dict將資料給holder的變數，進行傳遞值。如下面程式碼所示： from __future__ import print_function i

tensorflow資料讀取之tfrecords

掌握一個深度學習框架的用法，從訓練一個模型的流程來看，需要掌握以下幾個步驟： 1. 資料的處理，包括訓練資料轉成網路的輸入，模型引數的儲存與讀取 2. 網路結構的定義，包括網路主體的搭建以及loss的定義 3. solver的定義，也就是如何對網路進行優化

TensorFlow基礎3：資料讀取的三種方式

‘在講述在TensorFlow上的資料讀取方式之前，有必要了解一下TensorFlow的系統架構，如下圖所示： TensorFlow的系統架構分為兩個部分：前端系統：提供程式設計模型，負責構造計算圖；後端系統：提供執行時環境，負責執行計算圖。

tensorflow爬坑行：資料讀取

tensorflow的資料讀取 tensorflow在讀取像imagenet這種大量影象資料，不能一次性load進記憶體時有幾個坑，Mark一記，以助後來者。關於多GPU和分散式，本文只討論資料並行方式，即每個GPU上面執行一個網路，稱為tower。

TensorFlow資料讀取模組呼叫過程（cifar10）

最近在看TensorFlow資料讀取模組，有了一點思路，先把讀取部分的呼叫過程寫下來，以cifar10為例。入口 cifar10_train.py distorted_inputs() 函式執行資料讀取 def train(): with tf.Graph().a

tensorflow學習筆記（五）：TensorFlow變數共享和資料讀取

　　這一節我們提及了三個內容：變數共享、執行緒和佇列和資料讀取，這些都是TensorFlow官方指導中的內容。會在程式中經常遇到所以放在一起進行敘述。前面都是再利用已有的資料進行tensorflow的學習，這一節我們要學習怎麼從檔案中讀取我們需要的各類資料。

tensorflow 1.0 學習：十圖詳解tensorflow資料讀取機制

本文轉自：https://zhuanlan.zhihu.com/p/27238630 在學習tensorflow的過程中，有很多小夥伴反映讀取資料這一塊很難理解。確實這一塊官方的教程比較簡略，網上也找不到什麼合適的學習材料。今天這篇文章就以圖片的形式，用最簡單的語言，為大家詳細解釋一下tensorflow的

TensorFlow——二進位制資料讀取

一、CIFAR10二進位制資料集介紹

二、CIFAR10 二進位制資料讀取

1.分析

2.程式碼

3 完整程式碼

相關推薦