關於Tensorflow批量資料的輸入

阿新 • • 發佈：2019-01-04

關於Tensorflow下的批量資料的輸入處理：
1.Tensor TFrecords格式
2.h5py的庫的陣列方法

在tensorflow的框架下寫CNN程式碼，我在書寫過程中，感覺不是框架內容難寫，更多的是我在對影象的預處理和輸入這部分花了很多精神。

使用了兩種方法：
方法一：
Tensor 以Tfrecords的格式儲存資料，如果對資料進行標籤，可以同時做到資料打標籤。
①建立TFrecords檔案

orig_image = '/home/images/train_image/'
gen_image = '/home/images/image_train.tfrecords' 

def create_record():
    writer = tf.python_io.TFRecordWriter(gen_image)
    class_path = orig_image
    for img_name in os.listdir(class_path): #讀取每一幅影象
        img_path = class_path + img_name  
        img = Image.open(img_path) #讀取影象
        #img = img.resize((256, 256)) #設定圖片大小， 在這裡可以對影象進行處理
        img_raw = img.tobytes()  #將圖片轉化為原聲bytes  

        example = tf.train.Example(
                  features=tf.train.Features(feature={
                         'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[0])), #打標籤
                         'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))#儲存資料
                         }))
        writer.write(example.SerializeToString())
    writer.close()

②讀取TFrecords檔案

def read_and_decode(filename):
    #建立檔案佇列，不限讀取的資料
    filename_queue = tf.train.string_input_producer([filename])
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)

    features = tf.parse_single_example(
            serialized_example,
            features={
                    'label': tf.FixedLenFeature([], tf.int64),
                    'img_raw': tf.FixedLenFeature([], tf.string)})
    label = features['label']
    img = features['img_raw']
    img = tf.decode_raw(img, tf.uint8)  #tf.float32
    img = tf.image.convert_image_dtype(img, dtype=tf.float32)
    img = tf.reshape(img, [256, 256, 1])
    label = tf.cast(label, tf.int32)
    return img, label

③批量讀取資料，使用tf.train.batch

min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size
num_samples= len(os.listdir(orig_image))
create_record()
img, label = read_and_decode(gen_image)
total_batch = int(num_samples/batch_size)
image_batch, label_batch = tf.train.batch([img, label], batch_size=batch_size,
                                           num_threads=32, capacity=capacity)  
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
with tf.Session() as sess:
    sess.run(init_op)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    for i in range(total_batch):
         cur_image_batch, cur_label_batch  = sess.run([image_batch, label_batch])
    coord.request_stop()
    coord.join(threads)

方法二：
使用h5py就是使用陣列的格式來儲存資料
這個方法比較好，在CNN的過程中，會使用到多個數據類儲存，比較好用，比如一個數據進行了兩種以上的變化，並且分類儲存，我認為這個方法會比較好用。

import os
import h5py
import matplotlib.pyplot as plt
import numpy as np
import random
from scipy.interpolate import griddata
from skimage import img_as_float
import matplotlib.pyplot as plt
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
class_path = '/home/awen/Juanjuan/Python Project/train_BSDS/test_gray_0_1/'
for img_name in os.listdir(class_path):
    img_path = class_path + img_name
    img = io.imread(img_path)
    m1 = img_as_float(img)
    m2, m3 = sample_inter1(m1) #一個數據處理的函式
    m1 = m1.reshape([256, 256, 1])
    m2 = m2.reshape([256, 256, 1])
    m3 = m3.reshape([256, 256, 1])
    orig_image.append(m1)
    sample_near.append(m2)
    sample_line.append(m3)

arrorig_image = np.asarray(orig_image) # [?, 256, 256, 1]
arrlsample_near = np.asarray(sample_near) # [?, 256, 256, 1]  
arrlsample_line = np.asarray(sample_line) # [?, 256, 256, 1] 

save_path = '/home/awen/Juanjuan/Python Project/train_BSDS/test_sample/train.h5'
def make_data(path):
    with h5py.File(save_path, 'w') as hf:
         hf.create_dataset('orig_image', data=arrorig_image)
         hf.create_dataset('sample_near', data=arrlsample_near)
         hf.create_dataset('sample_line', data=arrlsample_line)

def read_data(path):
    with h5py.File(path, 'r') as hf:
         orig_image = np.array(hf.get('orig_image')) #一定要對清楚上邊的標籤名orig_image;
         sample_near = np.array(hf.get('sample_near'))
         sample_line = np.array(hf.get('sample_line'))
    return orig_image, sample_near, sample_line
make_data(save_path)
orig_image1, sample_near1, sample_line1 = read_data(save_path)
total_number = len(orig_image1)
batch_size = 20
batch_index = total_number/batch_size
for i in range(batch_index):
    batch_orig = orig_image1[i*batch_size:(i+1)*batch_size]
    batch_sample_near = sample_near1[i*batch_size:(i+1)*batch_size]
    batch_sample_line = sample_line1[i*batch_size:(i+1)*batch_size]

在使用h5py的時候，生成的檔案巨大的時候，讀取資料顯示錯誤：ioerror: unable to open file (bad object header version number)
基本就是這個生成的檔案不能使用，適當的減少儲存的資料，即可。

關於Tensorflow批量資料的輸入

關於Tensorflow下的批量資料的輸入處理： 1.Tensor TFrecords格式 2.h5py的庫的陣列方法在tensorflow的框架下寫CNN程式碼，我在書寫過程中，感覺不是框架內容難寫，更多的是我在對影象的預處理和輸入這部分花了很多精神。使用了兩種方法：

TensorFlow 高效能資料輸入管道設計指南

作者：黑暗星球原文地址：https://blog.csdn.net/u014061630/article/details/80776975 TensorFlow版本：1.12.0 本篇主要介紹怎麼使用

Kettle使用_4 Excel批量資料輸入

需求：批量匯入格式統一的excel檔案到表中解決方法:藉助Excel輸入和正則表示式以及表輸入元件完成該功能1) 拖個Excel輸入元件並按照下圖示意操作： 2) 拖個表輸出元件並按照下圖示意操作：3)

tensorflow使用tf.keras.Mode寫模型並使用tf.data.Dataset作為資料輸入

單輸入,單輸出的model使用tf.data.Dataset作為資料輸入很方便,定義好資料的input和對應的label,組成一個tf.data.Dataset型別的變數,直接傳入由tf.keras.Model構成的模型進行model.fit即可,例如: a = tf.keras.l

Tensorflow資料輸入---TFRecords詳解\TFRecords影象預處理

目錄 1、概述 2、預處理資料 2.1、常量定義 2.2、匯入庫 2.3、從train.txt檔案中讀取圖片-標籤對 2.4、預處理圖片並儲存 2.5、呼叫main函式 3、讀取預處理後的資料

TensorFlow 如何構建高效能的資料輸入管道（Pipeline）

本篇主要介紹怎麼使用 tf.data API 來構建高效能的輸入 pipeline。 tf.data官方教程詳見前面的部落格<<<<<<

基於tensorflow 批量修改自己的圖片資料集（附程式碼）

現在網上有很多關於Deeplearning的教程，不過這些教程的資料集都是已經做好的，並且格式名字什麼的都已經整理好了。特別是很多入門的教程都是Mnist 的資料集，這都已經非常的完善了。不過對於想自己製作資料集的小白來說，如何將自己收集的圖片批量轉換為自己需要

Tensorflow學習筆記-輸入資料處理框架

Created with Raphaël 2.1.0獲取輸入檔案列表建立輸入檔案佇列從檔案佇列讀取資料整理成Batch作為神經網路的輸入設計損失函式選擇梯度下降法訓練　　對應的程式碼流程如下： # 建立檔案列表，並通過檔案列表來建立檔案佇列

快速傳輸大批量資料（tar+lz4+pv+ssh）

快速傳輸大批量資料（tar+lz4+pv+ssh）伺服器之間傳輸資料平時常使用的命令如scp、rsync這兩個，一些小的檔案或目錄這兩個命令足以搞定，但是一旦資料幾十上百G，這樣傳輸就有些慢了。前兩天做遠端資料傳輸的時候，用scp命令始終感覺有點慢，就google了一下，發現了一

tensorflow 中資料經過網路傳輸後的embedding視覺化方法例項：

最近在GitHub上看程式碼偶然發現了使輸入經過網路傳輸後的輸出，即“embedding”視覺化的小細節，在此寫下來加深記憶： Git原連結：https://github.com/ywpkwon/siamese_tf_mnist 首先是建立網路（Siamese 網路）： import t

高德地圖大批量資料（上萬）畫歷史軌跡實現方案

轉載請註明出處：https://www.cnblogs.com/Joanna-Yan/p/9896180.html 需求：裝置傳回伺服器的軌跡點，需要在web地圖上顯示。包括畫座標點覆蓋物、軌跡路線圖。當資料量達到一定量時，介面出現卡頓。問題出現幾天前端人員都未解決。第一反應，大量的覆蓋物肯

MySQL刪除大批量資料

1.刪除大表的部分資料一個表有1億6000萬的資料，有一個自增ID。最大值就是1億6000萬，需要刪除大於250萬以後的資料，有什麼辦法可以快速刪除？看到mysql文件有一種解決方案：http://dev.mysql.com/doc/refman/5.0/en/delete.html

TensorFlow載入資料的方式

tensorflow作為符號程式設計框架，需要先構建資料流圖，再讀取資料，然後再進行訓練。tensorflow提供了以下三種方式來載入資料：預載入資料(preloaded data)：在tensorflow圖中定義常量或變數來儲存所有資料填充資料(feeding)：Pytho

Java中資料輸入輸出流——DataInputStream和DataOutputStream

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

TensorFlow Mnist資料集下載問題

安裝好TensorFlow後，按教程輸入如下命令時，會出現不能下載資料的問題。 from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/",

SQL Server 2012中快速插入批量資料的示例及疑惑

caffe，資料輸入層，分類資料label是圖片名字加上一個值，對於迴歸任務或者其他任務，標籤是一組值（一個向量）用hdf5 基於Caffe的人臉關鍵點檢測實現

轉基於Caffe的人臉關鍵點檢測實現 2016年10月09日 11:36:49 haoji007 閱讀數：2827 更多

tensorflow的資料讀取機制詳解

人工智慧/機器學習/深度學習交流QQ群：116270156 tensorflow的資料讀取機制詳解 tf.train.slice_input_producer tf.train.batch 最後 T

tensorflow 檢視模型輸入輸出saved_model_cli show --dir ./xxxx --all

saved_model_cli show --dir ./xxxxxxxx --all 可以檢視模型的輸入輸出,比如使用tensorflow export_model_inference.py 輸出的模型就可以檢視,在save_model/ 這裡輸入命令: saved_model_cli show --d

Tensorflow: MNIST資料集實現DNN、CNN、LSTM神經網路

最近學了一下tensorflow的基本用法，這裡做一下總結全連線深度神經網路(FC-DNN) 全連線深度神經網路，每一層的神經元直接都是全連線，並且不共享權值。在普通的分類的問題中表現的不錯，但是對於圖片處理等具有網格形式的資料，最好採用CNN(卷積神經網路)，對於序列化資料如NL

關於Tensorflow批量資料的輸入

相關推薦