CS20SI Tensorflow for Deeplearning課程筆記(四) word2vec with NCE loss and visualize the embeddings

阿新 • • 發佈：2019-01-01

一、如何去構建Tensorflow model

Phase 1: 定義Tensorflow圖
1. 給輸入和輸出定義placeholders
2. 定義weights
3. 定義推斷模型
4. 定義損失函式
5. 定義優化器

Phase 2: 執行計算
1. 初始化所有的模型變數
2. 給placeholders傳遞數值
3. 模型在訓練資料上開始執行
4. 計算損失
5. 調整模型引數去最大化或者最小化損失函式

二、在Tensorflow中搭建word2vec模型

Phase 1: 定義Tensorflow圖

1. 給輸入和輸出定義placeholders

首先確定BATCH_SIZE，也就是每次訓練的樣本數

center_words = tf.placeholder(shape=[BATCH_SIZE], dtype=tf.int32)
target_words = tf.placeholder(shape=[BATCH_SIZE], dtype=tf.int32)

2. 定義weights

VOCAB_SIZE為特徵的維數，初始範圍為-1.0 到 1.0

embed_matrix = tf.Variable(tf.random_normal(shape=[VOCAB_SIZE, BATCH_SIZE]),\ -1.0, 1.0
)

3. 定義前向傳播圖
params是embed_matrix, ids是詞向量，用來選擇embed_matrix中的其中一行，

tf.nn.embedding_lookup(params, ids, partition_strategy = 'mod' , name = None , validate_indices = True , max_norm = None)

embed = tf.nn.embedding_lookup(embed_matrix, center_words)

這裡寫圖片描述

4. 定義損失函式

雖然NCE演算法在python中實現比較困難，但Tensorflow已經幫我們封裝好了
要注意的是labels引數對應的是真實的y值，inputs對應的是轉化後的詞向量。

tf.nn.nec_loss( weights, biases, labels, inputs, num_sampled, num_classes , num_true = 1, sampled_values = None , remove_accidental_hits = False , partition_strategy = 'mod', name = 'nce_loss')

weight.shape = (N, K) : 每一行對應著每一個詞，叫做輔助向量
bias.shape = (N)
inputs.shape = (batch_size, K)　
labels.shape = (batch_size, num_true)
num_true : 實際的正樣本個數
num_sampled: 取樣出多少個負樣本
num_classes = N 有多少個不同的詞
sampled_values: 取樣出的負樣本，如果是None，就會用不同的sampler去取樣。
remove_accidental_hits: 如果取樣時不小心取樣到的負樣本剛好是正樣本，要不要去掉
partition_strategy：對weights進行embedding_lookup時並行查表時的策略。TF的embeding_lookup是在CPU裡實現的，這裡需要考慮多執行緒查表時的鎖的問題。

如果我們沒有直接傳遞取樣值給sampled_values，則Tensorflow會幫我們使用一個sampler區取樣。取樣的函式如下所示

if sampled_values is None:
      sampled_values = candidate_sampling_ops.log_uniform_candidate_sampler(
          true_classes=labels,
          num_true=num_true,
          num_sampled=num_sampled,
          unique=True,
          range_max=num_classes)

- This operation randomly samples a tensor of sampled classes
- (sampled_candidates) from the range of integers [0, range_max).
P(k) = (log(k + 2) - log(k + 1)) / log(range_max + 1)
從上面這兩句話中可以看出k越大，被抽樣到的概率越小，那麼k是怎麼來的呢，看下面的原始碼

def build_dataset(words):
  count = [['UNK', -1]]
  # 統計各個單詞出現的次數
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
  dictionary = dict()  # 建立字典，用於儲存各個單詞出現的次數
  for word, _ in count:
    dictionary[word] = len(dictionary)  # 詞頻高的，編號值小
  data = list()       # 將文章儲存成編號的形式
  unk_count = 0       # 沒被計數的單詞數
  for word in words:  
    if word in dictionary:
      index = dictionary[word]   # 獲取編號
    else:
      index = 0  # dictionary['UNK']   #  獲取編號
      unk_count += 1              # 沒被計數的單詞數+1
    data.append(index)
  count[0][1] = unk_count      
  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
  return data, count, dictionary, reverse_dictionary

通過分析上面的程式碼我們可知道詞頻越大的詞，編號越小，就意味著被抽到的概率越大。

接下來需要定義一下nce_weight 和 nce_bias

# 每一行對應著每一個詞，叫做輔助向量 
nce_weight = tf.Variable(tf.truncated_normal([VOCAB_SIZE, EMBED_SIZE],
                                            stddev=1.0/ EMBED_SIZE**0.5
 )) 
 nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]))

然後正式定義損失函式

loss = tf.reduce_mean ( tf.nn.nce_loss(weights = nce_weight,
biases = nce_bias,
labels = target_words,
inputs = embed,
num_sampled = NUM_SAMPLED,
num_classes = VOCAB_SIZE ))

5.定義優化器

我們使用最基本的梯度下降優化器

optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize (loss)

Phase 2: Execute the computation

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    average_loss = 0.0
    for index in xrange(NUM_TRAIN_STEPS):
        batch = batch_gen.next()
        loss_batch, _ = sess.run([loss, optimizer], 
                                feed_dict={center_words:batch[0], target_words:batch[1]})
        if(index+1)%2000 == 0:
            print('Average loss at step {}: {:5.1f}'.format(index=1, 
                                            average_loss/(index+1)))

為了更好的視覺化，我們需要新增上with tf . name_scope ( name_of_that_scope )，這樣能幫我們對Node進行分組:

完整程式碼如下所示

""" word2vec with NCE loss 
and code to visualize the embeddings on TensorBoard
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

import numpy as np
from tensorflow.contrib.tensorboard.plugins import projector
import tensorflow as tf

from process_data import process_data

VOCAB_SIZE = 50000
BATCH_SIZE = 128
EMBED_SIZE = 128 # dimension of the word embedding vectors
SKIP_WINDOW = 1 # the context window
NUM_SAMPLED = 64    # Number of negative examples to sample.
LEARNING_RATE = 1.0
NUM_TRAIN_STEPS = 100000
WEIGHTS_FLD = 'processed/'
SKIP_STEP = 2000

class SkipGramModel:
    """ Build the graph for word2vec model """
    def __init__(self, vocab_size, embed_size, batch_size, num_sampled, learning_rate):
        self.vocab_size = vocab_size
        self.embed_size = embed_size
        self.batch_size = batch_size
        self.num_sampled = num_sampled
        self.lr = learning_rate
        self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')

    def _create_placeholders(self):
        """ Step 1: define the placeholders for input and output """
        with tf.name_scope("data"):
            self.center_words = tf.placeholder(tf.int32, shape=[self.batch_size], name='center_words')
            self.target_words = tf.placeholder(tf.int32, shape=[self.batch_size, 1], name='target_words')

    def _create_embedding(self):
        """ Step 2: define weights. In word2vec, it's actually the weights that we care about """
        # Assemble this part of the graph on the CPU. You can change it to GPU if you have GPU
        with tf.device('/cpu:0'):
            with tf.name_scope("embed"):
                self.embed_matrix = tf.Variable(tf.random_uniform([self.vocab_size, 
                                                                    self.embed_size], -1.0, 1.0), 
                                                                    name='embed_matrix')

    def _create_loss(self):
        """ Step 3 + 4: define the model + the loss function """
        with tf.device('/cpu:0'):
            with tf.name_scope("loss"):
                # Step 3: define the inference
                embed = tf.nn.embedding_lookup(self.embed_matrix, self.center_words, name='embed')

                # Step 4: define loss function
                # construct variables for NCE loss
                nce_weight = tf.Variable(tf.truncated_normal([self.vocab_size, self.embed_size],
                                                            stddev=1.0 / (self.embed_size ** 0.5)), 
                                                            name='nce_weight')
                nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias')

                # define loss function to be NCE loss function
                self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, 
                                                    biases=nce_bias, 
                                                    labels=self.target_words, 
                                                    inputs=embed, 
                                                    num_sampled=self.num_sampled, 
                                                    num_classes=self.vocab_size), name='loss')
    def _create_optimizer(self):
        """ Step 5: define optimizer """
        with tf.device('/cpu:0'):
            self.optimizer = tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss, 
                                                              global_step=self.global_step)

    def _create_summaries(self):
        with tf.name_scope("summaries"):
            tf.summary.scalar("loss", self.loss)
            tf.summary.histogram("histogram_loss", self.loss)
            # because you have several summaries, we should merge them all
            # into one op to make it easier to manage
            self.summary_op = tf.summary.merge_all()

    def build_graph(self):
        """ Build the graph for our model """
        self._create_placeholders()
        self._create_embedding()
        self._create_loss()
        self._create_optimizer()
        self._create_summaries()

def train_model(model, batch_gen, num_train_steps, weights_fld):
    saver = tf.train.Saver() # defaults to saving all variables - in this case embed_matrix, nce_weight, nce_bias

    initial_step = 0
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint'))
        # if that checkpoint exists, restore from checkpoint
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)

        total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps
        writer = tf.summary.FileWriter('improved_graph/lr' + str(LEARNING_RATE), sess.graph)
        initial_step = model.global_step.eval()
        for index in xrange(initial_step, initial_step + num_train_steps):
            centers, targets = batch_gen.next()
            feed_dict={model.center_words: centers, model.target_words: targets}
            loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], 
                                              feed_dict=feed_dict)
            writer.add_summary(summary, global_step=index)
            total_loss += loss_batch
            if (index + 1) % SKIP_STEP == 0:
                print('Average loss at step {}: {:5.1f}'.format(index, total_loss / SKIP_STEP))
                total_loss = 0.0
                saver.save(sess, 'checkpoints/skip-gram', index)

        ####################
        # code to visualize the embeddings. uncomment the below to visualize embeddings
        final_embed_matrix = sess.run(model.embed_matrix)

        # # it has to variable. constants don't work here. you can't reuse model.embed_matrix
        embedding_var = tf.Variable(final_embed_matrix[:1000], name='embedding')
        sess.run(embedding_var.initializer)

        config = projector.ProjectorConfig()
        summary_writer = tf.summary.FileWriter('processed')

        # # add embedding to the config file
        embedding = config.embeddings.add()
        embedding.tensor_name = embedding_var.name

        # # link this tensor to its metadata file, in this case the first 500 words of vocab
        embedding.metadata_path = 'processed/vocab_1000.tsv'

        # # saves a configuration file that TensorBoard will read during startup.
        projector.visualize_embeddings(summary_writer, config)
        saver_embed = tf.train.Saver([embedding_var])
        saver_embed.save(sess, 'processed/model3.ckpt', 1)

def main():
    model = SkipGramModel(VOCAB_SIZE, EMBED_SIZE, BATCH_SIZE, NUM_SAMPLED, LEARNING_RATE)
    model.build_graph()
    batch_gen = process_data(VOCAB_SIZE, BATCH_SIZE, SKIP_WINDOW)
    train_model(model, batch_gen, NUM_TRAIN_STEPS, WEIGHTS_FLD)

if __name__ == '__main__':
    main()

CS20SI Tensorflow for Deeplearning課程筆記(四) word2vec with NCE loss and visualize the embeddings

一、如何去構建Tensorflow model Phase 1: 定義Tensorflow圖 1. 給輸入和輸出定義placeholders 2. 定義weights 3. 定義推斷模型 4. 定義損失函式 5. 定義優化器 Phase 2: 執行

CS20SI-tensorflow for research筆記: Lecture3

cas spa total times input HERE 全部 nis 發現本文整理自知乎專欄深度煉丹，轉載請征求原作者同意。 CS20SI是Stanford大學開設的基於Tensorflow的深度學習研究課程。 TensorFlow中的Linear Regressi

吳恩達機器學習+deeplearning課程筆記----乾貨連結分享

分享兩個GitHub連結，今天看到的，超讚超讚不能更讚了，答應我一定要去看好嗎~~~~不論是筆記還是github中分享的其它資源，課程視訊連結，PPT下載，作業佈置等都超棒。我要把這段安利用紅色標出來！！！吳恩達老師的機器學習課程個人筆記-黃海廣博士

InvalidArgumentError: You must feed a value for placeholder tensor 'xxx' with dtype xxx and shape xx

InvalidArgumentError: You must feed a value for placeholder tensor 'xxx' with dtype xxx and shape xxx 發生原因解決方法發生原因這個錯誤

Skype for Business邊緣架構與設計課程筆記

business skype 文章課程文章鏈接：http://yangqs.com/?p=26 本文出自 “強生的博客” 博客，請務必保留此出處http://yangqs.blog.51cto.com/127876/1934724Skype for Business邊緣架構與設計課程筆記

Andrew Ng機器學習課程筆記（四）之神經網絡

斯坦福CS20SI TensorFlow學習筆記1——graph、session和op

efault constant 例如 sub 否則我們 vector 安全出現 graph即tf.Graph()，session即tf.Session()，很多人經常將兩者混淆，其實二者完全不是同一個東西。 graph定義了計算方式，是一些加減乘除等運算的組合，類似於

Hibernate框架技術視頻課程——筆記（四）

tab generic hql type cacheable manage 系統根據 osc 視頻課程鏈接：http://edu.51cto.com/course/10747.html 一、繼承映射 1. 簡介 1.1 概念 ? 在面向對象中很重要的一個特性就是繼承，

【DeepLearning學習筆記】Coursera課程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning課堂筆記

決定如同樣本理解你是水平包含 rod spa Coursera課程《Neural Networks and Deep Learning》 deeplearning.ai Week1 Introduction to deep learning What is a

【DeepLearning學習筆記】Coursera課程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics課堂筆記

樣本數目 and 編程多次之間優化我們 round 符號 Coursera課程《Neural Networks and Deep Learning》 deeplearning.ai Week2 Neural Networks Basics 2.1 Logistic

tensorflow課程筆記（一）

MOOC上的tensorflow課程筆記 import tensorflow as tf """ a = tf.constant([1.0,2.0]) #一行兩列的張量 b = tf.constant([3.0,4.0]) #一行兩列的張量 result = a + b

Tensorflow學習筆記(四)——簡單的神經網路實現

首先要建立一個神經網路，就要把架構瞭解清楚才能動手。建立的神經網路如下：有一個輸入層和二個隱藏層和一個輸出層組成的簡單神經網路，結果通過交叉熵softmax分類得到損失函式，然後通過梯度下降優化模型。程式碼如下： import numpy as np import tensorfl

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 10—Advice for applying machine learning

Lecture 10—Advice for applying machine learning 10.1 如何除錯一個機器學習演算法？有多種方案： 1、獲得更多訓練資料；2、嘗試更少特徵；3、嘗試更多特徵；4、嘗試新增多項式特徵；5、減小 λ；6、增大 λ 為了避免一個方案一個方

Java for Web學習筆記（一四一）Spring security準備（2）授權

訪問的範圍和許可權屬於授權。 Principals和Identities 就Java而言，很方便利用java.security.Principal。Principal至少會包含已被認證的使用者identity，例如使用者名稱，還可能有其他資訊。此外還可以保護使用者的授權資

斯坦福深度學習課程cs231n assignment2作業筆記四：Fully-Connected Neural Nets

在有引導的情況下，發現具體實現和相關原理並不難。可是在學習這個課程之前，這些知識點對於博主來說都是不想去理解的理論知識，更沒想過手動實現。不得不說，大牛的課程就是牛啊。跟著走了一遍之後，以前感覺底層的東西都理解的很透徹。本部落格只貼出程式碼，給大家自己編寫時有

吳恩達DeepLearning.ai《深度學習》課程筆記目錄總集

本文釋出在知乎的專欄中，為了方便習慣使用CSDN的使用者，更改了下面文章的直鏈到CSDN中的筆記。同時，也歡迎大家關注我的知乎：大樹先生，會不定期有新的乾貨更新。一起學習一起進步呀！^_^ DeepLearning.ai簡介 deepLearning.

Ng深度學習課程-第四課第一週筆記摘要

卷積神經網路邊緣檢測 padding 卷積步長三維卷積池化層卷積神經網路

Ng深度學習課程-第四課第三週筆記摘要

目標檢測目標定位 &n

四。陣列課程筆記

陣列課程筆記：一。知識點彙總： 1.在C語言中，陣列【i】屬於構造資料型別。一個數組可以分解為多個數組元素，這些陣列元素可以是基本資料型別或是構造型別。因此按陣列元素的型別不同，陣列又可分為數值陣列、字元陣列、指標陣列、結構陣列等各種類別。 2.陣列說明的一般形式為：型別說明符陣列名

AIQ - deeplearning.ai 全套吳恩達老師的深度學習課程筆記及資源線上閱讀

http://www.6aiq.com/deeplearning_ai/html/SUMMARY.html 深度學習筆記目錄第一門課神經網路和深度學習(Neural Networks and Deep Learning) 第一週：深度學習引言(Introduction to

CS20SI Tensorflow for Deeplearning課程筆記(四) word2vec with NCE loss and visualize the embeddings

一、如何去構建Tensorflow model

二、在Tensorflow中搭建word2vec模型

Phase 1: 定義Tensorflow圖

Phase 2: Execute the computation

相關推薦