[Keras深度學習淺嘗]實戰四· Embedding實現 IMDB資料集影評文字分類

阿新 • • 發佈：2018-12-22

[Keras深度學習淺嘗]實戰四· Embedding實現 IMDB資料集影評文字分類

此實戰來源於TensorFlow Keras官方教程

先更新程式碼在這裡，後面找時間理解註釋一下。

# TensorFlow and tf.keras
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as 
 plt

print(tf.__version__)

1.12.0

imdb = keras.datasets.imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 12s 1us/step

print("Training entries: {}, labels: {}".format(len(train_data), len(train_labels)))

Training entries: 25000, labels: 25000

print(train_data[0])
len(train_data[0]), len(train_data[1])

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]





(218, 189)

# A dictionary mapping words to an integer index
word_index = imdb.get_word_index()

# The first indices are reserved
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 2s 1us/step

decode_review(train_data[0])

"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"

train_data = keras.preprocessing.sequence.pad_sequences(train_data,
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

test_data = keras.preprocessing.sequence.pad_sequences(test_data,
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)

len(train_data[0]), len(train_data[1])

(256, 256)

print(train_data[0])

[   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941
    4  173   36  256    5   25  100   43  838  112   50  670    2    9
   35  480  284    5  150    4  172  112  167    2  336  385   39    4
  172 4536 1111   17  546   38   13  447    4  192   50   16    6  147
 2025   19   14   22    4 1920 4613  469    4   22   71   87   12   16
   43  530   38   76   15   13 1247    4   22   17  515   17   12   16
  626   18    2    5   62  386   12    8  316    8  106    5    4 2223
 5244   16  480   66 3785   33    4  130   12   16   38  619    5   25
  124   51   36  135   48   25 1415   33    6   22   12  215   28   77
   52    5   14  407   16   82    2    8    4  107  117 5952   15  256
    4    2    7 3766    5  723   36   71   43  530  476   26  400  317
   46    7    4    2 1029   13  104   88    4  381   15  297   98   32
 2071   56   26  141    6  194 7486   18    4  226   22   21  134  476
   26  480    5  144   30 5535   18   51   36   28  224   92   25  104
    4  226   65   16   38 1334   88   12   16  283    5   16 4472  113
  103   32   15   16 5345   19  178   32    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0]

網路模型的介紹：
1，輸入網路的形狀為（-1，256）
2，Embedding後為（-1，256，16）網路引數為（10000，16）
3，GlobalAveragePooling1D後為（-1，16）詳細介紹見此
4，Dense1後（-1，16）網路引數為w:1616 + b:116 共計272
4，Dense2後（-1，1）網路引數為w:161 + b:11 共計17個引數

vocab_size = 10000

model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, None, 16)          160000    
_________________________________________________________________
global_average_pooling1d (Gl (None, 16)                0         
_________________________________________________________________
dense (Dense)                (None, 16)                272       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
=================================================================
Total params: 160,289
Trainable params: 160,289
Non-trainable params: 0
_________________________________________________________________

model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='binary_crossentropy',
              metrics=['accuracy'])

x_val = train_data[:10000]
partial_x_train = train_data[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)

Train on 15000 samples, validate on 10000 samples
Epoch 1/20
15000/15000 [==============================] - 3s 215us/step - loss: 0.6919 - acc: 0.5925 - val_loss: 0.6899 - val_acc: 0.6360
Epoch 2/20
15000/15000 [==============================] - 2s 159us/step - loss: 0.6863 - acc: 0.7131 - val_loss: 0.6824 - val_acc: 0.7418
Epoch 3/20
15000/15000 [==============================] - 2s 155us/step - loss: 0.6746 - acc: 0.7652 - val_loss: 0.6676 - val_acc: 0.7583
Epoch 4/20
15000/15000 [==============================] - 2s 153us/step - loss: 0.6534 - acc: 0.7707 - val_loss: 0.6440 - val_acc: 0.7636
Epoch 5/20
15000/15000 [==============================] - 2s 153us/step - loss: 0.6221 - acc: 0.7933 - val_loss: 0.6104 - val_acc: 0.7872
Epoch 6/20
15000/15000 [==============================] - 2s 153us/step - loss: 0.5820 - acc: 0.8095 - val_loss: 0.5713 - val_acc: 0.7985
Epoch 7/20
15000/15000 [==============================] - 2s 154us/step - loss: 0.5368 - acc: 0.8271 - val_loss: 0.5297 - val_acc: 0.8163
Epoch 8/20
15000/15000 [==============================] - 2s 159us/step - loss: 0.4907 - acc: 0.8427 - val_loss: 0.4891 - val_acc: 0.8306
Epoch 9/20
15000/15000 [==============================] - 3s 170us/step - loss: 0.4478 - acc: 0.8557 - val_loss: 0.4525 - val_acc: 0.8405
Epoch 10/20
15000/15000 [==============================] - 2s 165us/step - loss: 0.4089 - acc: 0.8692 - val_loss: 0.4213 - val_acc: 0.8482
Epoch 11/20
15000/15000 [==============================] - 2s 156us/step - loss: 0.3760 - acc: 0.8791 - val_loss: 0.3977 - val_acc: 0.8541
Epoch 12/20
15000/15000 [==============================] - 2s 153us/step - loss: 0.3483 - acc: 0.8852 - val_loss: 0.3745 - val_acc: 0.8616
Epoch 13/20
15000/15000 [==============================] - 3s 171us/step - loss: 0.3236 - acc: 0.8929 - val_loss: 0.3581 - val_acc: 0.8661
Epoch 14/20
15000/15000 [==============================] - 3s 171us/step - loss: 0.3031 - acc: 0.8981 - val_loss: 0.3436 - val_acc: 0.8711
Epoch 15/20
15000/15000 [==============================] - 3s 178us/step - loss: 0.2854 - acc: 0.9033 - val_loss: 0.3322 - val_acc: 0.8732
Epoch 16/20
15000/15000 [==============================] - 3s 173us/step - loss: 0.2702 - acc: 0.9057 - val_loss: 0.3230 - val_acc: 0.8755
Epoch 17/20
15000/15000 [==============================] - 2s 165us/step - loss: 0.2557 - acc: 0.9131 - val_loss: 0.3152 - val_acc: 0.8771
Epoch 18/20
15000/15000 [==============================] - 2s 155us/step - loss: 0.2431 - acc: 0.9171 - val_loss: 0.3087 - val_acc: 0.8799
Epoch 19/20
15000/15000 [==============================] - 2s 155us/step - loss: 0.2315 - acc: 0.9213 - val_loss: 0.3033 - val_acc: 0.8812
Epoch 20/20
15000/15000 [==============================] - 2s 164us/step - loss: 0.2213 - acc: 0.9236 - val_loss: 0.2991 - val_acc: 0.8821

results = model.evaluate(test_data, test_labels)

print(results)

25000/25000 [==============================] - 1s 38us/step
[0.3124048164367676, 0.87232]

history_dict = history.history
history_dict.keys()

dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])

import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

# "bo" is for "blue dot"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

png

plt.clf()   # clear figure
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

png

[Keras深度學習淺嘗]實戰四· Embedding實現 IMDB資料集影評文字分類

[Keras深度學習淺嘗]實戰四· Embedding實現 IMDB資料集影評文字分類此實戰來源於TensorFlow Keras官方教程先更新程式碼在這裡，後面找時間理解註釋一下。 # TensorFlow and tf.keras import os os.environ

[Keras深度學習淺嘗]實戰三·CNN實現Fashion MNIST 資料集分類

[Keras深度學習淺嘗]實戰三·RNN實現Fashion MNIST 資料集分類與我們上篇博文[Keras深度學習淺嘗]實戰一結構相同，修改的地方有，定義網路與模型訓練兩部分，可以對比著來看。通過使用RNN結構，預測準確率略有提升，可以通過修改超引數以獲得更優結果。程式碼部分

[Keras深度學習淺嘗]實戰二·CNN實現Fashion MNIST 資料集分類

[Keras深度學習淺嘗]實戰二·CNN實現Fashion MNIST 資料集分類與我們上篇博文[Keras深度學習淺嘗]實戰一結構相同，修改的地方有，定義網路與模型訓練兩部分，可以對比著來看。通過使用CNN結構，預測準確率略有提升，可以通過修改超引數以獲得更優結果。程式碼部分

[Keras深度學習淺嘗]實戰一·DNN實現Fashion MNIST 資料集分類

[Keras深度學習淺嘗]實戰一·DNN實現Fashion MNIST 資料集分類此實戰來源於TensorFlow Keras官方教程 Fashion-MNIST是一個替代MNIST手寫數字集的影象資料集。它是由Zalando（一家德國的時尚科技公司）旗下的研究部門提供。其涵蓋了

[Keras深度學習淺嘗]實戰五·使用DNN自編碼器實現聚類操作資料降維

[Keras深度學習淺嘗]實戰五·使用DNN自編碼器實現聚類操作資料降維程式碼部分 # TensorFlow and tf.keras import tensorflow as tf from tensorflow import keras # Helper libraries

用深度學習解決自然語言處理中的7大問題，文字分類、語言建模、機器翻譯

摘要：本文講的是用深度學習解決自然語言處理中的7大問題，文字分類、語言建模、機器翻譯等，自然語言處理領域正在從統計學方法轉向神經網路方法。在自然語言中，仍然存在許多具有挑戰性的問題。但是，深度學習方法在某些特定的語言問題上取得了state-of-the-art的結果。本文講的是用深度學習解決自

深度學習之TensorFlow使用CNN測試Cifar-10資料集（Python實現）

題目描述： 1. 對Cifar-10影象資料集，用卷積神經網路進行分類，統計正確率。 2.選用Caffe, Tensorflow, Pytorch等開源深度學習框架之一，學會安裝這些框架並呼叫它們的介面。 3.直接採用這些深度學習框架針對Cifar-10資料集已訓練好的網路模型，只

深度學習（二）——從零自己製作資料集到利用deepNN實現誇張人臉表情的實時監測（tensorflow實現）

一、背景介紹這篇文章主要參考我的上一篇文章：深度學習（一）——deepNN模型實現攝像頭實時識別人臉表情（C++和python3.6混合程式設計）。由於上一篇文章的模型所採用的資料集為fer2013，前面也介紹過這個基於這個資料集的模型識別人臉表情的準確率大概在70%左右

深度學習中常見的打標籤工具和資料集集合

集大家之所長彙集於此，希望對有需要的你能有所幫助。一、打標籤工具（1）labelimg/labelme 這兩款工具簡便易行，前者主要用於對目標進行大致的標定，用於常見的框選標定，後者主要用於較為細緻的輪廓標定，多用於mask rcnn等。安裝也是很方便的，

如何在 GPU 深度學習雲服務裡，使用自己的資料集？

本文為你介紹，如何在 GPU 深度學習雲服務裡，上傳和使用自己的資料集。（由於微信公眾號外部連結

【深度學習筆記】（二）基於MNIST資料集的神經網路實驗

一、介紹 MNIST（Mixed National Institute of Standards and Technology database）是網上著名的公開資料庫之一，是一個入門級的計算機視覺資料集，它包含龐大的手寫數字圖片。無論我們學習哪門程式語言

深度學習之 TensorFlow（四）：卷積神經網絡

padding valid 叠代 val 分析此外 nbsp drop BE 基礎概念：　　卷積神經網絡（CNN）：屬於人工神經網絡的一種，它的權值共享的網絡結構顯著降低了模型的復雜度，減少了權值的數量。卷積神經網絡不像傳統的識別算法一樣，需要對數據進行特征提取和數據重

深度學習之PyTorch實戰（3）——實戰手寫數字識別

　　上一節，我們已經學會了基於PyTorch深度學習框架高效，快捷的搭建一個神經網路，並對模型進行訓練和對引數進行優化的方法，接下來讓我們牛刀小試，基於PyTorch框架使用神經網路來解決一個關於手寫數字識別的計算機視覺問題，評價我們搭建的模型的標準是它是否能準確的對手寫數字圖片進行識別。　　

38套大資料，雲端計算，架構，資料分析師，Hadoop，Spark，Storm，Kafka，人工智慧，機器學習，深度學習，專案實戰視訊教程

38套大資料，雲端計算，架構，資料分析師，Hadoop，Spark，Storm，Kafka，人工智慧，機器學習，深度學習，專案實戰視訊教程視訊課程包含： 38套大資料和人工智慧高階課包含：大資料，雲端計算，架構，資料探勘實戰，實時推薦系統實戰，電視收視率專案實戰，實時流統計專案實戰，離線電

分享《深度學習之PyTorch實戰計算機視覺》PDF

pan mage watermark pro image 實戰 type fff ima 下載：https://pan.baidu.com/s/1Yhf2Pv0gy63VgiMQTglgXA 更多資料：http://blog.51cto.com/3215120 《深度學習之

分享《深度學習之PyTorch實戰計算機視覺》+PDF+唐進民

water size mar 文字資料 image 圖片目錄 ima 下載：https://pan.baidu.com/s/1QUsaFOs8NUpeoxXRdGeP7Q 更多資料：http://blog.51cto.com/14087171 《深度學習之PyTorch

《深度學習 21天實戰Caffe》讀書筆記1

關於深度學習常見術語的通俗描述 “有監督”學習：上課時，我們跟著老師一步一步學習。 “無監督”學習：自主完成課後的作業。 “訓練資料集”：平時做的課後練習題。 “測試資料集”：考試時卷面的題目。關於訓練效果： “學霸”：訓練效果其他人好，對測試資料集的所有情況如

深度學習分散式訓練實戰（一）

本系列部落格主要介紹使用Pytorch和TF進行分散式訓練，本篇重點介紹相關理論，分析為什麼要進行分散式訓練。後續會從程式碼層面逐一介紹實際程式設計過程中如何實現分散式訓練。常見的訓練方式單機單卡（單GPU）這種訓練方式一般就是在自己筆記本上，窮學生專

深度學習分散式訓練實戰（二）——TF

本篇部落格主要介紹TF的分散式訓練，重點從程式碼層面進行講解。理論部分可以參考深度學習分散式訓練實戰（一) TF的分散式實現方式 TF的分散式有兩種實現方式，一種是圖內分散式（In-graph replication）；一種是圖間分散式(Between-gra

深度學習入門專案：用keras構建CNN或LSTM對minist資料集做簡單分類任務

深度學習入門專案：用keras構建CNN或LSTM或RNN對Minist資料集做簡單分類任務參考keras中文文件 ——keras：是一個高階神經網路庫，用 Python 語言寫成，可以執行在 TensorFlow 或者 Theano 之上（即以此為後端）。

[Keras深度學習淺嘗]實戰四· Embedding實現 IMDB資料集影評文字分類

[Keras深度學習淺嘗]實戰四· Embedding實現 IMDB資料集影評文字分類

相關推薦