Tensorflow資料讀取方式總結

阿新 • • 發佈：2018-12-25

1、使用placeholder讀記憶體中的資料

最簡單的一種方法是用placeholder，然後以feed_dict將資料給holder的變數，進行傳遞值。如下面程式碼所示：

from __future__ import print_function
import tensorflow as tf
import numpy as np

x1 = tf.placeholder(tf.float32,shape=(3,2))
y1 = tf.placeholder(tf.float32,shape=(2,3))
z1 = tf.matmul(x1,y1)

x2 = tf.placeholder 
(tf.float32,shape=None)
y2 = tf.placeholder(tf.float32,shape=None)
z2 = x2 + y2

# using feed_dict when placehoder
with tf.Session() as sess:
    z2_value = sess.run(z2,feed_dict={x2:1,y2:2}) 
    print(z2_value)
    rand_x = np.random.rand(3,2)
    rand_y = np.random.rand(2,3)
    z1_value,z2_value = sess.run 
(
        [z1,z2],                   # run together
        feed_dict={
            x1:rand_x,y1:rand_y,
            x2:1,y2:2
        }
    )
    print(z1_value,z2_value)

2、使用queue讀硬碟中的資料

參考如下的連線，不過感覺佇列讀取方式較為複雜，有了Dataset API後大部分不用此方法。

3、Dataset API

Dataset可以看作是相同型別“元素”的有序列表。在實際使用時，單個“元素”可以是向量，也可以是字串、圖片，甚至是tuple或者dict。

注意下圖的繼承關係

這裡寫圖片描述

tf.data.TextLineDataset

可以直接從檔案中讀取資料

__init__(
    filenames,
    compression_type=None,
    buffer_size=None
)

程式碼示例：

with tf.Graph().as_default(),tf.Session() as sess:
    # instance a dataset,np.array() => tf.constant => tensorflow
    dataset = tf.data.Dataset.from_tensor_slices(np.array([1,2,3,4,5]))
    # we can also use tf.data.TextLineDataset because this inherit tf.data.Dataset
    # dataset = tf.data.TextLineDataset.from_tensor_slices(np.array([1,2,3,4,5]))
    # return a Iterator over the element of this dataset 
    iterator = dataset.make_one_shot_iterator()
    element = iterator.get_next() # every element is a number
    for i in range(5):
        print(sess.run(element))  # 1,2,3,4,5


##### read data from file
"""
we have a file test.csv:
1,2,0
4,5,1
7,8,2
"""
with tf.Graph().as_default(),tf.Session() as sess:
    dataset = tf.data.TextLineDataset("test.csv")
    iterator = dataset.make_one_shot_iterator()
    element = iterator.get_next() # every element is a vector
    try:
        while True:
            print(sess.run(element))
    except tf.errors.OutOfRangeError:
        print("end!")

##### more complex dataset
"""
1,2,0
4,5,1
7,8,2
the last column is label we create => batch of feature,label
"""

with tf.Graph().as_default(),tf.Session() as sess:
    def to_tensor(line):
        parsed_line = tf.decode_csv(line,[[0.],[0.],[0]]) # => tensor
        #label = parsed_line[-1]
        label =  parsed_line[-1]
        del parsed_line[-1]
        features = parsed_line
        features_names = ['feature_1','feature_2']
        d = dict(zip(features_names,features)),label
        return d

    dataset = tf.data.TextLineDataset("test.csv").map(to_tensor).batch(2)
    iterator = dataset.make_one_shot_iterator()
    batch_features,batch_labels = iterator.get_next()
    try:
        while True:
            batch_fea,batch_lab = sess.run([batch_features,batch_labels])           
            print(batch_fea,batch_lab)
    except tf.errors.OutOfRangeError:
        print("end!")

注意dataloader的使用方式

# create dataloader
dataset = tf.data.Dataset.from_tensor_slices((tfx,tfy)) #reference tf_dataset_basic.py
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(32)
dataset = dataset.repeat(5)
iterator = dataset.make_initializable_iterator()

使用dataset具體的一個例子

x = np.random.uniform(-1,1,(1000,1)) 
y = np.power(x,2) + np.random.normal(0,0.1,size=x.shape)
x_train,x_test = np.split(x,[800])
y_train,y_test = np.split(y,[800])
print(
    '\nx_train shape',x_train.shape,
    '\ny_train shape',y_train.shape,
)
"""
plt.scatter(x_train,y_train)
plt.show()
"""

tfx = tf.placeholder(x_train.dtype,x_train.shape)
tfy = tf.placeholder(y_train.dtype,y_train.shape)

# create dataloader
dataset = tf.data.Dataset.from_tensor_slices((tfx,tfy)) #reference tf_dataset_basic.py
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(32)
dataset = dataset.repeat(5)
iterator = dataset.make_initializable_iterator()

# built network
batch_x,batch_y = iterator.get_next()  # batch_x:(32,1)
h1 = tf.layers.dense(batch_x,10,tf.nn.relu) # batch_x:(32,10)
out = tf.layers.dense(h1,1) # 32*1
loss = tf.losses.mean_squared_error(batch_y,out)
train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

with tf.Session() as sess:
    #initializable
    sess.run([iterator.initializer,tf.global_variables_initializer()],
            feed_dict={tfx:x_train,tfy:y_train})
    for step in range(301):
        try:
            _,train_loss = sess.run([train,loss])
            if step % 10 == 0:
                test_loss = sess.run(loss,{batch_x:x_test,batch_y:y_test})
                print('\nsetp:',step,
                    '\ntrain loss:',train_loss,
                    '\ntest loss:',test_loss,
                )
        except tf.errors.OutOfRangeError:
            print("finish!")
            break

完整程式碼在我的github上~

參考資料

Tensorflow資料讀取方式總結

1、使用placeholder讀記憶體中的資料最簡單的一種方法是用placeholder，然後以feed_dict將資料給holder的變數，進行傳遞值。如下面程式碼所示： from __future__ import print_function i

TensorFlow 資料讀取方法總結

作者：黑暗星球原文地址：https://blog.csdn.net/u014061630/article/details/80712635 ====================下一篇：tf.data 官方教程==================== ==============

TensorFlow全新的資料讀取方式：Dataset API入門教程

Dataset API是TensorFlow 1.3版本中引入的一個新的模組，主要服務於資料讀取，構建輸入資料的pipeline。此前，在TensorFlow中讀取資料一般有兩種方法：使用placeholder讀記憶體中的資料使用queue讀硬碟中的資料（關於這種

GreenPlum資料載入方式總結

在GreenPlum中有以下幾種方式來進行資料的載入，包括通過insert命令來實現少量資料的匯入；通過copy命令來實現資料的匯入匯出；通過建立外部表及gpfdist實現資料的匯入匯出；通過gpload實現資料的匯入；下面將分別介紹這幾種資料載入

Tensorflow資料讀取機制及tfrecords高效讀取資料

1. tensorflow 的資料讀取機制以影象資料為例，資料讀取過程如下所示：假設我們的硬碟中有一個圖片資料集0001.jpg，0002.jpg，0003.jpg……我們只需要把它們讀取到記憶體中，然後提供給GPU或是CPU進行計算就

tensorflow資料讀取和處理

檔案匹配 ["file0", "file1"]或[("file%d" % i) for i in range(2)] files = tf.train.match_filenames_once("C:/path/to/data.tfrecords-*") 讀取

鳶尾花資料讀取的總結

1、手寫最基本讀取f = open('8.iris.data','r',encoding='utf-8')x = []y = []for d in f: d = d.strip() if not d: continue d = d.split(',') x.append(list(map(float,d[:2

tensorflow資料讀取之tfrecords

掌握一個深度學習框架的用法，從訓練一個模型的流程來看，需要掌握以下幾個步驟： 1. 資料的處理，包括訓練資料轉成網路的輸入，模型引數的儲存與讀取 2. 網路結構的定義，包括網路主體的搭建以及loss的定義 3. solver的定義，也就是如何對網路進行優化

opencv四種畫素資料讀取方式

QQ：609162385 https://blog.csdn.net/cqltbe131421 直接上程式碼： Mat MainWindow::applyLookUp(const cv::Mat& image, const cv::Mat& lookup) { M

TensorFlow資料讀取模組呼叫過程（cifar10）

最近在看TensorFlow資料讀取模組，有了一點思路，先把讀取部分的呼叫過程寫下來，以cifar10為例。入口 cifar10_train.py distorted_inputs() 函式執行資料讀取 def train(): with tf.Graph().a

Hive內部表與外部表區別，資料匯入與資料讀取方式小結

建立一個外部表：使用'|'作為分隔符，‘\n’回車作為換行符，指定資料倉庫地址 hive> CREATE EXTERNAL TABLE rdcuser ( > id int, > name string, > passwo

tensorflow 1.0 學習：十圖詳解tensorflow資料讀取機制

本文轉自：https://zhuanlan.zhihu.com/p/27238630 在學習tensorflow的過程中，有很多小夥伴反映讀取資料這一塊很難理解。確實這一塊官方的教程比較簡略，網上也找不到什麼合適的學習材料。今天這篇文章就以圖片的形式，用最簡單的語言，為大家詳細解釋一下tensorflow的

USB Device 資料讀取方式的研究

作者：Sam (甄峰) [email protected] 隨著越來越多的裝置通過USB介面與主機連線，我們需要讀取USB裝置的資料。現就幾種不同讀取USB Device Data的方法進行探討。非常歡迎網友補充。方法一：讀取/dev/input/e

ublox NEO-7 SPI介面資料讀取除錯總結

最近在除錯ublox NEO-7 GPS，主控晶片為TMS320C6722，這款DSP外設沒有UART，所以用SPI匯流排與GPS進行通訊。最初想在網上搜一些關於SPI介面的ublox資料讀取經驗，但網上都是清一色的UART介面讀取ublox資料。通過這兩天

TensorFlow資料讀取方法

轉自：http://honggang.io/2016/08/19/tensorflow-data-reading/ 引言 Tensorflow的資料讀取有三種方式： Preloaded data: 預載入資料Feeding: Python產生資料，再把資料餵給後端

Python Web框架 flask post JSON資料獲取方式總結

提交任務： curl -i -H "Content-Type: application/json" -X POST -d '{"appIds": [ {"appid": "1076877374"},

tensorflow入門學習(2)——tensorflow資料讀取&多執行緒

一、供給資料 TensorFlow的資料供給機制允許你在TensorFlow運算圖中將資料注入到任一張量中。因此，python運算可以把資料直接設定到TensorFlow圖中。通過給run()或者eval()函式輸入feed_dict引數，可以啟動運算過

Tensorflow 資料讀取 tf.data.Dataset API 相關介紹

介紹tf.1.4及以後新出的tf.data.Dataset API 中，使用的資料讀取方式有點類似於pytorch中的Dataloader，大大簡化了資料讀取。下面是程式碼例項。# coding=utf-8 import os import numpy as np impor

tensorflow讀取資料的方式

轉載：https://blog.csdn.net/u014038273/article/details/77989221 TensorFlow程式讀取資料一共有四種方法（一般針對影象）: 供給資料(Feeding)：在TensorFlow程式執行的每一步，讓Python程式碼來供給資

TensorFlow基礎3：資料讀取的三種方式

‘在講述在TensorFlow上的資料讀取方式之前，有必要了解一下TensorFlow的系統架構，如下圖所示： TensorFlow的系統架構分為兩個部分：前端系統：提供程式設計模型，負責構造計算圖；後端系統：提供執行時環境，負責執行計算圖。

Tensorflow資料讀取方式總結

1、使用placeholder讀記憶體中的資料

2、使用queue讀硬碟中的資料

3、Dataset API

tf.data.TextLineDataset

相關推薦