1. 程式人生 > >tensorflo讀取資料之dataset

tensorflo讀取資料之dataset

一、Tensorflow讀入資料的三種方式
1 Feeding: Python code provides the data when running each step
2 Reading from files: an input pipeline reads the data from files at the beginning of a TensorFlow graph.
3 Preloaded data: a constant or variable in the TensorFlow graph holds all the data (for small data sets).
二、Dataset
1

Dataset API屬於第二種方式,使讀取資料、複雜的資料格式變換變得更容易
2 A tf.data.Dataset represents a sequence of elements, in which each element contains one or more Tensorobjects. For example, in an image pipeline, an element might be a single training example, with a pair of tensors representing the image data and a label. There are two distinct ways to create a dataset
3.dataset的建立可以來自於tensor,也可以來自於檔案
dataset1=tf.data.Dataset.from_tensor_slices,建立來自於tensors的dataset
dataset1= tf.contrib.data.TextLineDataset(src_file)
4 dataset主要API使用,主要做資料轉換
tf.data.Dataset.zip
dataset1.map
dataset1.padded_batch
4 iterator建立
dataset1.make_initializable_iterator()
5 使用流程
構造Dataset物件
建立 iterator
三、程式碼片段

import tensorflow as tf

#An element contains one or more tf.Tensor objects, called components
dataset1 = tf.data.Dataset.from_tensor_slices(tf.random_uniform([4, 10]))
print("dataset1.output_types",dataset1.output_types) 
print("dataset1.output_shapes",dataset1.output_shapes)

dataset2 = tf.data.Dataset.from_tensor_slices(
   {"a"
: tf.random_uniform([4]), "b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)}) print("dataset2.output_types",dataset2.output_types) print("dataset2.output_shapes",dataset2.output_shapes) dataset3 = tf.data.Dataset.zip((dataset1, dataset2)) print("dataset3.output_types",dataset3.output_types) print("dataset3.output_types",dataset3.output_shapes) #which apply a function to each element, the element structure determines the arguments of the function dataset1 = dataset1.map(lambda x:x+1) #dataset1 = dataset1.padded_batch(2,padded_shapes=[11]) dataset1 = dataset1.padded_batch(2,padded_shapes=[None]) iterator = dataset1.make_initializable_iterator() next_element = iterator.get_next() init_op = iterator.initializer with tf.Session() as sess: print(sess.run(init_op)) print("batched data 1:",sess.run(next_element)) print("batch data 2:",sess.run(next_element))