使用IRIS資料集訓練第一個深度神經網路

阿新 • • 發佈：2019-01-03

本文主要以Iris資料集為例子講解如何訓練一個簡單的Deep Neural Network。

環境配置

python 3.5.4

TensorFlow 1.4

完整原始碼

import os
import urllib

import numpy as np
import tensorflow as tf

# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

def main():
    # If the training and test sets aren't stored locally, download them.
    if not os.path.exists(IRIS_TRAINING):
         raw = urllib.request.urlopen(IRIS_TRAINING_URL).read().decode()
    with open(IRIS_TRAINING, "w") as f:
        f.write(raw)
    if not os.path.exists(IRIS_TEST):
        raw = urllib.request.urlopen(IRIS_TEST_URL).read().decode()
    with open(IRIS_TEST, "w") as f:
        f.write(raw)
    # Load datasets.
    training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
        filename=IRIS_TRAINING,
        target_dtype=np.int,
        features_dtype=np.float32)
    test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
        filename=IRIS_TEST,
        target_dtype=np.int,
        features_dtype=np.float32)
    # Specify that all features have real-value data
    feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]
    # Build 3 layer DNN with 10, 20, 10 units respectively.
    classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                          hidden_units=[10, 20, 10],
                                          n_classes=3,
                                          model_dir="/tmp/iris_model")
    # Define the training inputs
    def get_train_inputs():
        x = tf.constant(training_set.data)
        y = tf.constant(training_set.target)

        return x, y
    # Fit model.
    classifier.fit(input_fn=get_train_inputs, steps=2000)
    # Define the test inputs
     def get_test_inputs():
         x = tf.constant(test_set.data)
         y = tf.constant(test_set.target)
        return x, y
    # Evaluate accuracy.
    accuracy_score = classifier.evaluate(input_fn=get_test_inputs,
                                   steps=1)["accuracy"]
    print("\nTest Accuracy: {0:f}\n".format(accuracy_score))
    # Classify two new flower samples.
    def new_samples():
        return np.array(
        [[6.4, 3.2, 4.5, 1.5],
        [5.8, 3.1, 5.0, 1.7]], dtype=np.float32)
    predictions = list(classifier.predict(input_fn=new_samples))
    print(
        "New Samples, Class Predictions:    {}\n"
        .format(predictions))
if __name__ == "__main__":
    main()

分步講解

下載並載入資料集

Iris資料集是一個包含150個樣本的資料集，主要是用於區分花的種類，其結構如下所示：

如圖所示，花的種類有三種，用0,1,2表示，對應的特徵有4個。在本例中，將150個樣本劃分為訓練集（120個樣本）和測試集（30個樣本）。

首先第一次使用的時候，需要先從tensorflow上下載對應的訓練集資料和測試集資料。

import os
import urllib

import numpy as np
import tensorflow as tf

# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"


#如果本地沒有該資料集則從線上下載下來 
if not os.path.exists(IRIS_TRAINING):
     raw = urllib.request.urlopen(IRIS_TRAINING_URL).read().decode()
with open(IRIS_TRAINING, "w") as f:
    f.write(raw)
if not os.path.exists(IRIS_TEST):
    raw = urllib.request.urlopen(IRIS_TEST_URL).read().decode()
with open(IRIS_TEST, "w") as f:
        f.write(raw)

接下來，使用learn.datasets.base中的load_csv_with_header()方法將訓練和測試集裝入資料集。load_csv_with_header()方法需要三個必需的引數：

1.filename，CSV檔案的路徑

2.target_dtype，接受資料集的目標值的numpy資料型別。

3.features_dtype，接受資料集的特徵值的numpy資料型別。

在這裡，target（你訓練模型預測的值）是花種，它是一個從0-2的整數，所以對應的適當的numpy資料型別是np.int：

training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING,
    target_dtype=np.int,
    features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TEST,
    target_dtype=np.int,
    features_dtype=np.float32)

tf.contrib.learn中的Dataset是命名元組；您可以通過data和target欄位訪問特徵資料和目標值。這裡training_set.data和training_set.target分別包含訓練集的特徵資料和目標值；test_set.data和test_set.target分別包含測試集的特徵資料和目標值。

構建一個DNN分類器

tf.contrib.learn提供了一系列預定義的模型，叫做Estimators。通過Estimator，可以幫助我們很方便地對資料進行訓練和評估，在這裡，我們配置一個深層神經網路分類器模型來適配IRIS資料，通過使用tf.contrib.learn，可以使用一行程式碼就幫助我們例項化一個tf.contrib.learn.DNNClassifier.

首先我們要定義模型的特徵列，如下程式碼所示：

feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]

它制定了資料集中特徵的資料型別，所有的特徵資料都是連續的，因此tf.contrib.layers.real_valued_column適用於構造特徵列的適當函式。資料集中有四個特徵（萼片寬度，萼片高度，花瓣寬度和花瓣高度），因此相應的尺寸必須設定為4以儲存所有資料。

然後，程式碼使用以下引數建立DNNClassifier模型：

feature_columns=feature_columns。上面定義的一組特徵

hidden_units=[10, 20, 10]。三個隱藏層分別包含10，20，10個神經元。

n_classes=3。三個目標類，代表三個鳶尾物種。

model_dir=/tmp/iris_model。TensorFlow在模型訓練期間將儲存檢查點資料的目錄。有關使用TensorFlow進行日誌記錄和監視的更多資訊，請見使用tf.contrib.learn記錄和監視的基本知識。

使用定義好的分類器用於IRIS訓練集訓練

現在，你已經配置好了你的DNNclassifier模型，你可以使用fit方法來將Iris訓練資料應用到分類器上。將特徵資料（training_set.data），目標值（training_set.target）和要訓練的步數（這裡是2000）作為引數傳遞：

classifier.fit(x=training_set.data, y=training_set.target, steps=2000)

模型的狀態儲存在classifier(分類器)中，這意味著如果你喜歡，你可以迭代地訓練。

執行結果如下圖：

可以看到，最後一輪迭代得到的loss為0.0252。這時候其實已經得到一個較好的訓練模型了。

評估訓練效果

現在已經將Iris的訓練資料適配到了DNNClassifier模型上；現在，可以使用evaluate方法在Iris測試資料上檢查其準確性。像fit（擬合）一樣，evaluate（評估操作）將特徵資料和目標值作為引數，並返回帶有評估結果的dict（字典）。以下程式碼通過了Iris測試資料-test_set.data和test_set.target來評估和列印結果的準確性：

accuracy_score = classifier.evaluate(x=test_set.data, y=test_set.target)["accuracy"]
print('Accuracy: {0:f}'.format(accuracy_score))

可以得到結果的準確度大概在97%左右，當然這個結果可能在不同機器上有差異。

分類新樣本

當我們有一個新的樣本的時候，我們可以使用predict()方法來分類一個新的樣本。

可以使用如下程式碼對他們的物種進行預測：

new_samples = np.array(
    [[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
prediction= list(classifier.predict(new_samples, as_iterable=True))
print('Predictions: {}'.format(str(prediction)))

predict()方法返回了一個預測陣列，每個樣本對應其中的一個結果：

Conclusion

這樣，利用IRIS資料我們完整的執行了一次深度神經網路的構建和訓練，可以看到，使用Tensorflow可以高效的完成這一工作，TF大法好。

使用IRIS資料集訓練第一個深度神經網路

環境配置

完整原始碼

分步講解

下載並載入資料集

構建一個DNN分類器

使用定義好的分類器用於IRIS訓練集訓練

評估訓練效果

分類新樣本

Conclusion

使用IRIS資料集訓練第一個深度神經網路

TensorFlow從入門到理解（四）：你的第一個迴圈神經網路RNN（分類例子）

TensorFlow從入門到理解（五）：你的第一個迴圈神經網路RNN（迴歸例子）

基於PTB資料集實現RNN-LSTM迴圈神經網路（智慧填詞）

lesson22-24 MNIST資料集，模組化搭建神經網路八股，手寫數字識別準確率輸出

用 Keras 編寫你的第一個人工神經網路（Python）—— Jinkey 翻譯

從ImageNet資料集上的卷積神經網路可以學到什麼What I learned from competing against a ConvNet on ImageNet

Keras之DNN：利用DNN演算法【Input(8)→12+8(relu)→O(sigmoid)】利用糖尿病資料集訓練、評估模型(利用糖尿病資料集中的八個引數特徵預測一個0或1結果)

TensorFlow入門教程：8：訓練資料之Iris資料集

TensorFlow入門教程：18：Iris資料集的線性迴歸訓練

基於Tensorflow, OpenCV. 使用MNIST資料集訓練卷積神經網路模型，用於手寫數字識別

【資料極客】Week3_訓練深度神經網路的技巧

跟我上手深度學習: 五分鐘嘗試第一個深度學習(Caffe)訓練和影象分類(詳細圖文步驟)

深度學習實踐經驗：用Faster R-CNN訓練Caltech資料集——訓練檢測

【深度學習】BP演算法分類iris資料集

利用mnist資料集進行深度神經網路

動畫《區塊鏈100問》第4集：第一個比特幣誕生啦！

【火爐煉AI】深度學習003-構建並訓練深度神經網路模型

#####好好好好####Keras深度神經網路訓練分類模型的四種方法

深度學習資料整理（深度神經網路理解）

使用IRIS資料集訓練第一個深度神經網路

環境配置

完整原始碼

分步講解

下載並載入資料集

構建一個DNN分類器

使用定義好的分類器用於IRIS訓練集訓練

評估訓練效果

分類新樣本

Conclusion

相關推薦