tensorflow-TFRecord 檔案詳解

TFRecord 是 tensorflow 內建的檔案格式，它是一種二進位制檔案，具有以下優點：

1. 統一各種輸入檔案的操作

2. 更好的利用記憶體，方便複製和移動

3. 將二進位制資料和標籤(label)儲存在同一個檔案中

引言

在瞭解如下操作後進一步詳細講解TFRecord

tf.train.Int64List(value=list_data)

它的作用是把 list 中每個元素轉換成 key-value 形式，

注意，輸入必須是 list，且 list 中元素型別要相同，且與 Int 保持一致；

# value = tf.constant([1, 2])     ### 這會報錯的
ss = 1               ### Int64List 對應的元素只能是 int long，其他同理
tt = 2
out1 = tf.train.Int64List(value = [ss, tt])
print(out1)
# value: 1
# value: 2

ss = [1 ,2]
out2 = tf.train.Int64List(value = ss)
print(out2)
# value: 1
# value: 2

同類型的方法還有 2 個

tf.train.FloatList
tf.train.BytesList

tf.train.Feature(int64_list=)

它的作用是構建一種型別的特徵集，比如整型

out = tf.train.Feature(int64_list=tf.train.Int64List(value=[33, 22]))
print(out)
# int64_list {
#   value: 33
#   value: 22
# }

也可以是其他型別

tf.train.Feature(float_list=tf.train.FloatList())
tf.train.Feature(bytes_list=tf.train.BytesList())

tf.train.Features(feature=dict_data)

它的作用是構建多種型別的特徵集，可以 dict 格式表達多種型別

ut = tf.train.Features(feature={
                            "suibian": tf.train.Feature(int64_list=tf.train.Int64List(value=[1, 2, 4])),
                            "a": tf.train.Feature(float_list=tf.train.FloatList(value=[5., 7.]))
                        })
print(out)
# feature {
#   key: "a"
#   value {
#     float_list {
#       value: 5.0
#       value: 7.0
#     }
#   }
# }
# feature {
#   key: "suibian"
#   value {
#     int64_list {
#       value: 1
#       value: 2
#       value: 4
#     }
#   }
# }

tf.train.Example(features=tf.train.Features())

它的作用是建立一個樣本，Example 對應一個樣本

example = tf.train.Example(features=
                           tf.train.Features(feature={
                               'a': tf.train.Feature(int64_list=tf.train.Int64List(value=range(2))),
                               'b': tf.train.Feature(bytes_list=tf.train.BytesList(value=[b'm',b'n']))
                           }))
print(example)
# features {
#   feature {
#     key: "a"
#     value {
#       int64_list {
#         value: 0
#         value: 1
#       }
#     }
#   }
#   feature {
#     key: "b"
#     value {
#       bytes_list {
#         value: "m"
#         value: "n"
#       }
#     }
#   }
# }

一幅圖總結一下上面的程式碼

Example 協議塊

它其實是一種資料儲存的格式，類似於 xml、json 等；

用上述方法實現該格式；

一個 Example 協議塊對應一個樣本，一個樣本有多種特徵，每種特徵下有多個元素，可參看上圖；

message Example{
    Features features = 1;
}
message Features{
    map<string,Features> feature = 1;
}
message Feature {
    oneof kind {
        BytesList bytes_list = 1;
        FloateList float_list = 2;
        Int64List int64_list = 3;
    }
}

TFRecord 檔案就是以 Example協議塊格式儲存的；

TFRecord 檔案

該類檔案具有寫功能，且可以把其他型別的檔案轉換成該型別檔案，其實相當於先讀取其他檔案，再寫入 TFRecord 檔案；

該類檔案也具有讀功能；

TFRecord 儲存

儲存分兩步：

1.建立儲存器

2. 構造每個樣本的 Example 協議塊

tf.python_io.TFRecordWriter(file_name)

構造儲存器，儲存器有兩個常用方法

write(record)：向檔案中寫入一個樣本
close()：關閉儲存器

注意：此處的 record 為一個序列化的 Example，通過 Example.SerializeToString()來實現，它的作用是將 Example 中的 map 壓縮為二進位制，節約大量空間

示例程式碼1：將 MNIST 資料集儲存成 TFRecord 檔案

import tensorflow as tf
import numpy as np
import input_data


# 生成整數型的屬性
def _int64_feature(value):
    return tf.train.Feature(int64_list = tf.train.Int64List(value = [value]))

# 生成字串型別的屬性，也就是影象的內容
def _string_feature(value):
    return tf.train.Feature(bytes_list = tf.train.BytesList(value = [value]))

# 讀取影象資料 和一些屬性
mniset = input_data.read_data_sets('../../../data/MNIST_data',dtype=tf.uint8, one_hot=True)
images = mniset.train.images
labels = mniset.train.labels
pixels = images.shape[1]        # (55000, 784)
num_examples = mniset.train.num_examples        # 55000

file_name = 'output.tfrecords'          ### 檔名
writer = tf.python_io.TFRecordWriter(file_name)     ### 寫入器

for index in range(num_examples):
    ### 遍歷樣本
    image_raw = images[index].tostring()        ### 圖片轉成 字元型
    example = tf.train.Example(features = tf.train.Features(feature = {
        'pixel': _int64_feature(pixels),
        'label': _int64_feature(np.argmax(labels[index])),
        'image_raw': _string_feature(image_raw)
    }))
    writer.write(example.SerializeToString())       ### 寫入 TFRecord
writer.close()

示例程式碼2：將 csv 儲存成 TFRecord 檔案

train_frame = pd.read_csv("../myfiles/xx3.csv")
train_labels_frame = train_frame.pop(item="label")
train_values = train_frame.values
train_labels = train_labels_frame.values
print("values shape: ", train_values.shape)     # values shape:  (2, 3)
print("labels shape:", train_labels.shape)      # labels shape: (2,)

writer = tf.python_io.TFRecordWriter("xx3.tfrecords")

for i in range(train_values.shape[0]):
    image_raw = train_values[i].tostring()
    example = tf.train.Example(
        features=tf.train.Features(
            feature={
                "image_raw": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw])),
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[train_labels[i]]))
            }
        )
    )
    writer.write(record=example.SerializeToString())
writer.close()

示例3：將 png 檔案儲存成 TFRecord 檔案

# filenames = tf.train.match_filenames_once('../myfiles/*.png')
filenames = glob.iglob('..\myfiles\*.png')

writer = tf.python_io.TFRecordWriter('png.tfrecords')

for filename in filenames:
    img = Image.open(filename)
    img_raw = img.tobytes()
    label = 1
    example = tf.train.Example(
        features=tf.train.Features(
            feature={
                "image_raw": tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw])),
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[label]))
            }
        )
    )
    writer.write(record=example.SerializeToString())
writer.close()

TFRecord 讀取

讀取檔案和 tensorflow 讀取資料方法類似，參考我的部落格讀取資料

tf.TFRecordReader()

建立讀取器，有 read 和 close 方法

tf.parse_single_example(serialized,features=None,name= None)

解析單個 Example 協議塊

serialized : 標量字串的Tensor，一個序列化的Example,檔案經過檔案閱讀器之後的value
features :字典資料，key為讀取的名字，value為FixedLenFeature
return : 一個鍵值對組成的字典，鍵為讀取的名字

features中的value還可以為tf.VarLenFeature(),但是這種方式用的比較少，它返回的是SparseTensor資料，這是一種只儲存非零部分的資料格式，瞭解即可。

tf.FixedLenFeature(shape,dtype)

shape : 輸入資料的形狀，一般不指定，為空列表
dtype : 輸入資料型別，與儲存進檔案的型別要一致，型別只能是float32，int 64, string
return : 返回一個定長的 Tensor (即使有零的部分也儲存）

示例程式碼

filename = 'png.tfrecords'
file_queue = tf.train.string_input_producer([filename], shuffle=True)

reader = tf.TFRecordReader()
key, value = reader.read(file_queue)

### features 的 key 必須和 寫入時 一致，資料型別也必須一致，shape 可為 空
dict_data= tf.parse_single_example(value, features={'label': tf.FixedLenFeature(shape=(1,1), dtype=tf.int64),
                                                        'image_raw': tf.FixedLenFeature(shape=(), dtype=tf.string)})
label = tf.cast(dict_data['label'], tf.int32)
img = tf.decode_raw(dict_data['image_raw'], tf.uint8)       ### 將 string、bytes 轉換成 int、float

image_tensor = tf.reshape(img, [500, 500, -1])

sess = tf.Session()
sess.run(tf.local_variables_initializer())
tf.train.start_queue_runners(sess=sess)

while 1:
    # print(sess.run(key))        # b'png.tfrecords:0'
    image = sess.run(image_tensor)
    img_PIL = Image.fromarray(image)
    img_PIL.show()

參考資料：

https://blog.csdn.net/chengshuhao1991/article/details/78656724

https://www.cnblogs.com/yanshw/articles/12419616.html

tensorflow-TFRecord 檔案詳解

TFRecord 是 tensorflow 內建的檔案格式，它是一種二進位制檔案，具有以下優點： 1. 統一各種輸入檔案的操作 2. 更好的利用記憶體，方便複製和移動 3. 將二進位制資料和標籤(label)儲存在同一個檔案中引言在瞭解如下操作後進一步詳細講解TFRecord

三大框架（ssh）學習——配置檔案詳解（一)

配置檔案詳解指定web應用預設字符集 <constant name="struts.i18n.encoding" value="gbk" /> 此配置相當於： request.setCharacterEncoding(“gbk”); r

三大框架（ssh）學習——配置檔案詳解（二）

多配置檔案實現專案經常需要多人協作開發，但是如果大家都是用同一個struts.xml檔案，會互相影響不利於開發的正常開展。這樣，我們可以通過<include>元素指定多個配置檔案。可以在src下面建立多個struts配置檔案。然後再struts.xml中分別引入： str

xxx.launch檔案詳解（部落格學習筆記）

ROS筆記(一)xxx.launch檔案詳解 .launch檔案是ROS中用於同時啟動多個節點的重要檔案,在大型的ROS專案中使用頻繁,所以掌握其主要元素與屬性對ROS系統的應用至關重要： launch標籤(元素)說明 launch拓展說明 parameter說明

Nginx配置配置檔案詳解

文章目錄配置檔案 nginx.conf配置檔案詳解用於除錯、定位問題的配置引數正常執行必備的配置引數優化效能的配置引數事件相關配置 Fastcgi相關配置引數常需要調整的引數 nginx作為web伺服器時使

Django中static（靜態）檔案詳解以及{% static %}標籤的使用

想要深入學習Django的可以看一些這個視訊：超細講解Django打造大型企業官網在一個網頁中，不僅僅只有一個html骨架，還需要css樣式檔案，js執行檔案以及一些圖片等。因此在DTL中載入靜態檔案是一個必須要解決的問題。在DTL中，使用static標籤來載入靜態檔案。要使用stat

Hibernate_day01---Hibernate環境搭建、配置檔案詳解、核心api介紹

JavaEE三層結構對應的框架 1） web層：struts2框架 2） service層：spring框架 3）dao層：hibernate框架 -- 對資料庫進行crud操作什麼是框架：可複用的設計構件作用：可以少寫一部分程式碼。使用框架寫程式，會幫我們實現一部

Linux(CentOS)開機自動掛載與fstab檔案詳解

摘要： Linux中我們分完區，並做好檔案系統格式化，掛載(mount)之後才可以使用磁碟裝置。/etc/fstab是用來存放檔案系統的靜態資訊的檔案, 當系統啟動的時候，系統會自動地從這個檔案讀取資訊，並且會自動將此檔案中指定的檔案系統掛載到指定的目錄。 Linux中我們分完區，並做

build.gralde檔案詳解

AS中APP所有的配置盡在一個build.gradle檔案中，打包的時候也是解析build.gralde檔案來打包的，所以搞懂build.gradle檔案是至關重要的，結構如下所示 1、apply plugin用來指定用的是哪個外掛，

scrapy settings配置檔案詳解

# -*- coding: utf-8 -*- # Scrapy settings for step8_king project # # For simplicity, this file contains only settings considered important or # comm

nsswitch.conf檔案詳解

Linux系統下的/etc/nsswitch.conf檔案轉載自：https://www.cnblogs.com/besharp/p/8351227.html 一、什麼是nsswithch.conf（服務搜尋順序）檔案呢？ &n

keepalived介紹及配置檔案詳解

keepalived介紹 Keepalived軟體起初是專為LVS負載均衡軟體設計的，用來管理並監控LVS集群系統中各個服務節點的狀態，後來又加入了可以實現高可用的VRRP功能。因此，Keepalived除了能夠管理LVS軟體外，還可以作為其他服務（例如：Nginx、Hapr

爬蟲Scrapy框架的setting.py檔案詳解

# -*- coding: utf-8 -*- # Scrapy settings for demo1 project # # For simplicity, this file contains only setting

Maven的pom.xml配置檔案詳解

轉自： Maven的pom.xml配置檔案詳解  &

Spring MVC 配置檔案dispatcher-servlet.xml 檔案詳解

<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:mvc="http://www.springframew

Hibernate---實體配置（對映）檔案詳解例如：User.hbm.xml檔案

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN" &n

【MyBatis Generator】程式碼自動生成工具 generatorConfig.xml配置檔案詳解

MyBatis Generator官網地址：http://www.mybatis.org/generator/index.html MyBaris Generator中文地址：http://mbg.cndocs.ml/ 在MBG中，最主要也最重要的，就是generatorConfig.xml

轉：ssm spring+springmvc+mybatis中的xml配置檔案詳解

這幾天一直在整合SSM框架,雖然網上有很多已經整合好的,但是對於裡面的配置檔案並沒有進行過多的說明,很多人知其然不知其所以然,經過幾天的搜尋和整理,今天總算對其中的XML配置檔案有了一定的瞭解,所以拿出來一起分享一下,希望有不足的地方大家批評指正~~~ 首先這篇文章暫時只對框架中所要用到的配

Spring Boot（二）配置檔案詳解

簡介 Spring Boot使用“習慣由於配置”（專案中存在大量預設的配置，而且內建tomcat）的理念，讓你的無需手動進行配置就可以將專案執行起來。使用Spring Boot很容易建立一個獨立執行的、準生產級別的Spring框架的專案。同時，Spring Boot還提供一個全域性配置檔案對一些

Red Hat Enterprise Linux(RHEL)中yum的repo檔案詳解

Yum（全稱為 Yellow dog Updater, Modified）是一個在Fedora和RedHat以及CentOS中的Shell前端軟體包管理器。基於RPM包管理，能夠從指定的伺服器自動下載RPM包並且安裝，可以自動處理依賴性關係，並且一次安裝所有依賴的軟體包，無須繁瑣地一次次下載、安裝。使

tensorflow-TFRecord 檔案詳解

引言

Example 協議塊

TFRecord 檔案

TFRecord 儲存

TFRecord 讀取

相關推薦