Nematus（二）程式執行流程分析

阿新 • • 發佈：2019-01-22

神經機器翻譯工具Nematus

程式執行流程分析

nematus/nmt.py/train （程式入口，從此函式開始分析）

1.1 讀取源語言和目標語言詞彙表

    # 獲取設定的超引數引數
    model_options = locals().copy()  
    print 'Model options:',model_options

    # 載入字典，並且反轉
    worddicts = [None]*len(dictionaries)
    worddicts_r = [None]*len(dictionaries)
    for ii,dd in 
 enumerate(dictionaries):
        worddicts[ii] = load_dict(dd)
        worddicts_r[ii] = dict()
        for kk,vv in worddicts[ii].iteritems():
            worddicts_r[ii][vv] = kk

    # 若詞彙總大小未設定，則給定預設值為詞彙表大小
    if n_words_src is None:
        n_words_src = len(worddicts[0])
        model_options['n_words_src' 
] = n_words_src
    if n_words_tgt is None:
        n_words_tgt = len(worddicts[1])
        model_options['n_words_tgt'] = n_words_tgt

1.2 載入訓練集和開發集

    # 載入資料
    print 'Loading data ...'
    train = TextIterator(datasets[0],datasets[1],
                        dictionaries[0],dictionaries[1],
                        n_words_source=n_words_src,
                        n_words_target=n_words_tgt,
                        batch_size=batch_size,
                        maxlen=maxlen,
                        shuffle_each_epoch=shuffle_each_epoch,
                        sort_by_length=sort_by_length,
                        maxibatch_size=maxibatch_size)
    valid = TextIterator(valid_datasets[0 
], valid_datasets[1],
                        dictionaries[0], dictionaries[1],
                        n_words_source=n_words_src, n_words_target=n_words_tgt,
                        batch_size=valid_batch_size,
                        maxlen=maxlen)

1.3 初始化模型引數 init_params(model_options)

     # 初始化模型引數
    print 'Init parameters ...'
    params = init_params(model_options)

1.4 過載模型，呼叫 load_params(saveto, params)

    # 重新載入模型，當程式意外中斷的時候，可以繼續執行程式碼
    if reload_ and os.path.exists(saveto):
        print 'Reloading model parameters'
        params = load_params(saveto,params)

1.5 把網路中的引數變為共享變數，變成共享變數後引數才可以更新 init_theano_params(params)

    # 把網路中的W，b 變為共享變數
    tparams = init_theano_params(params)

1.6 建立模型，即搭建計算圖，定義網路前向傳播過程並定義損失函式 build_model(tparams, model_options)

    # 建立模型
    print 'Building model ...'

    trng,use_noise,x,x_mask,y,y_mask,\
        opt_ret, cost, ctx, tt, _ = build_model(tparams,model_options)

    inps = [x, x_mask, y, y_mask]

1.7 建立取樣器，用於測試過程

#建立取樣器
    if validFreq or sampleFreq:
        print 'Building sampler ...'
        f_init, f_next = build_sampler(tparams, model_options, use_noise, trng)

1.8 正則化操作
- 權重正則化，應用 $l 2$ 正則化
- 注意力權重歸一化
- ???

    # apply L2 regularization on weights
    if decay_c > 0.:
        decay_c = theano.shared(numpy.float32(decay_c), name='decay_c')
        weight_decay = 0.
        for kk, vv in tparams.iteritems():
            weight_decay += (vv ** 2).sum()
        weight_decay *= decay_c
        cost += weight_decay #加上正則項

    # regularize the alpha weights
    if alpha_c > 0. and not model_options['decoder'].endswith('simple'):
        alpha_c = theano.shared(numpy.float32(alpha_c), name='alpha_c')
        alpha_reg = alpha_c * (
            (tensor.cast(y_mask.sum(0)//x_mask.sum(0), 'float32')[:, None] -
             opt_ret['dec_alphas'].sum(0))**2).sum(1).mean()
        cost += alpha_reg

     # apply L2 regularisation to loaded model (map training)
    if map_decay_c > 0:
        map_decay_c = theano.shared(numpy.float32(map_decay_c), name="map_decay_c")
        weight_map_decay = 0.
        for kk, vv in tparams.iteritems():
            init_value = theano.shared(vv.get_value(), name= kk + "_init")
            weight_map_decay += ((vv -init_value) ** 2).sum()
        weight_map_decay *= map_decay_c
        cost += weight_map_decay

1.9 計算損失函式關於網路中各個引數的梯度

    print 'Computing gradient...',
    grads = tensor.grad(cost, wrt=itemlist(tparams))
    print 'Done'

1.10 應用 梯度裁剪 策略

 # apply gradient clipping here
    if clip_c > 0.:
        g2 = 0.
        for g in grads:
            g2 += (g**2).sum()
        new_grads = []
        for g in grads:
            new_grads.append(tensor.switch(g2 > (clip_c**2),
                                           g / tensor.sqrt(g2) * clip_c,
                                           g))
        grads = new_grads

1.11 定義學習率標量，並建立優化器，使用優化器更新學習率

# compile the optimizer, the actual computational graph is compiled here
    lr = tensor.scalar(name='lr')

    print 'Building optimizers...',
    f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost, profile=profile)
    print 'Done'

1.12 開始優化過程…

過載歷史，包括：更新次數 uidx，歷史錯誤率 history_errs

#開始優化
    print 'Optimization'

    best_p = None
    bad_counter = 0
    uidx = 0
    estop = False
    history_errs = []
    # reload history
    if reload_ and os.path.exists(saveto):
        rmodel = numpy.load(saveto)
        history_errs = list(rmodel['history_errs'])
        if 'uidx' in rmodel:
            uidx = rmodel['uidx']

    if validFreq == -1:
        validFreq = len(train[0])/batch_size
    if saveFreq == -1:
        saveFreq = len(train[0])/batch_size
    if sampleFreq == -1:
        sampleFreq = len(train[0])/batch_size

    valid_err = None

從此處開始優化過程…
- max_epochs: 表示最大的 epochs 次數
- prepare_data: 準備資料，輸入x為列表，列表行為 batch_size，每一行為一個句子中的詞的 id 號

假設輸入 $x$ 為：

x = [\begin{matrix} 10 & 20 & 32 & 40 & 3 & 5 \\ 14 & 3 & 15 & 41 \\ 12 & 50 & 89 & 100 & 200 & 321 & 23 \end{matrix}]

則 prepare_data(x,y, maxlen=maxlen, …) 輸出

x

為：

x = [\begin{matrix} 10 & 14 & 12 \\ 20 & 3 & 50 \\ 32 & 15 & 89 \\ 40 & 41 & 100 \\ 3 & 0 & 200 \\ 5 & 0 & 321 \\ 0 & 0 & 23 \\ 0 & 0 & 0 \end{matrix}]

- 在上面這個例子中，矩陣中的紅色0，代表詞彙表中的 eos，即句尾結束符。即每個句子後面加上句尾結束符
- 函式 prepare_data 輸出 x_mask為：

x_m a s k = [\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ 0 & 0 & 1 \end{matrix}]

- x_mask的作用為，在GRU網路中輸出隱狀態後，在最後一個時候的隱狀態一直複製下去。隱狀態組成一個三維陣列 (x_len, batch_size, dim)那麼在此三維陣列的底層就儲存了每個句子的最後一個時刻的隱狀態，具體見下圖。上述矩陣中紅色的1表示，每個句子最後的結尾eos通過GRU輸出的隱狀態也要保留。
這裡寫圖片描述

更新引數後，計算損失值 cost = f_grad_shared(x, x_mask, y, y_mask)

更新學習率，f_update(lrate) ???

for eidx in xrange(max_epochs):
        n_samples = 0

        for x, y in train:
            n_samples += len(x)
            uidx += 1
            use_noise.set_value(1.)
            # 準備資料用於訓練
            x, x_mask, y, y_mask = prepare_data(x, y, maxlen=maxlen,
                                                n_words_src=n_words_src,
                                                n_words=n_words_tgt)
            #長度小於 maxlen 的值的句子為 0
            if x is None:
                print 'Minibatch with zero sample under length ', maxlen
                uidx -= 1
                continue

            ud_start = time.time()

            # compute cost, grads and copy grads to shared variables
            cost = f_grad_shared(x, x_mask, y, y_mask)  #引數更新後，損失值
            # do the update on parameters
            f_update(lrate) #更新學習率

顯示Epoch(Epoch次數), Update(更新次數), Cost(損失值), UD(執行一次更新的時間)

儲存網路最優引數(最優引數存放在 best_p 中)，並且儲存 history_errs 和更新次數 uidx

儲存當前迭代次數對應的模型引數

            ud = time.time() - ud_start

            # check for bad numbers, usually we remove non-finite elements
            # and continue training - but not done here
            if numpy.isnan(cost) or numpy.isinf(cost):
                print 'NaN detected'
                return 1., 1., 1.

            # verbose
            if numpy.mod(uidx, dispFreq) == 0:
                print 'Epoch ', eidx, 'Update ', uidx, 'Cost ', cost, 'UD ', ud

            # save the best model so far, in addition, save the latest model
            # into a separate file with the iteration number for external eval
            if numpy.mod(uidx, saveFreq) == 0:
                print 'Saving the best model...', #儲存模型最優引數
                if best_p is not None:
                    params = best_p
                else:
                    params = unzip_from_theano(tparams)
                numpy.savez(saveto, history_errs=history_errs, uidx=uidx, **params)
                json.dump(model_options, open('%s.json' % saveto, 'wb'), indent=2)
                print 'Done'

                # save with uidx
                if not overwrite:
                    print 'Saving the model at iteration {}...'.format(uidx),
                    saveto_uidx = '{}.iter{}.npz'.format(
                        os.path.splitext(saveto)[0], uidx)
                    numpy.savez(saveto_uidx, history_errs=history_errs,
                                uidx=uidx, **unzip_from_theano(tparams))
                    print 'Done'

產生當前模型引數下，翻譯的結果樣例

# generate some samples with the model and display them

            if sampleFreq and numpy.mod(uidx, sampleFreq) == 0:
                # FIXME: random selection?
                for jj in xrange(numpy.minimum(5, x.shape[1])):
                    stochastic = True
                    sample, score, sample_word_probs, alignment = gen_sample([f_init], [f_next],
                                               x[:, jj][:, None],

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    Nematus（二）程式執行流程分析
      
							
							
							神經機器翻譯工具Nematus


  程式執行流程分析




nematus/nmt.py/train   （程式入口，從此函式開始分析）


1.1 讀取源語言和目標語言詞彙表




    # 獲取設定的超引數引數
    model_options  

  
 

    

    
    深入淺出Mybatis系列（十）---SQL執行流程分析（源碼篇）(轉)
      factor   demo   讀取配置   gist   wrapper   load   任性   wrap   深入淺出   轉載自：http://www.cnblogs.com/dongying/p/4142476.html
1. SqlSessionFactory 與 SqlSession.
　　通 

  
 

    

    
    Spark學習筆記（10）—— wordcount 執行流程分析
      
								
								            
							
							
							1  啟動叢集

啟動 HDFS  start-dfs.sh
啟動 Spark 叢集 /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/sbin/start-all 

  
 

    

    
    （O）Telephony分析之通話流程分析（二）撥打電話流程分析（上）
      
                
撥打電話，是從通話的撥號盤開始的，而撥號盤的介面是在DialpadFragment，因此，需要從這個地方進行分析一.撥號盤介面撥號流程
public void onClick(View view) {
    ......
    if (resId == R.id.dia 

  
 

    

    
    Exynos4412 Uboot 移植（二）—— Uboot 啟動流程分析
      
                
    uboot啟動流程分析如下：
第一階段：
a -- 設定cpu工作模式為SVC模式
b -- 關閉中斷，mmu,cache
v -- 關看門狗
d -- 初始化記憶體，串列埠
e -- 設定棧
f -- 程式碼自搬移
g -- 清bss
h -- 跳c
第二階段
a 

  
 

    

    
    Flume NG原始碼分析（二）支援執行時動態修改配置的配置模組
       
 
 在上一篇中講了Flume NG配置模組基本的介面的類，PropertiesConfigurationProvider提供了基於properties配置檔案的靜態配置的能力，這篇細說一下PollingPropertiesFileConfigurationProvider提供的執行時動態修改配置並生效的 

  
 

    

    
    Java語言基礎（二）---程式流程控制語句、函式
       
 
  
  
 Java語言基礎組成：關鍵字、識別符號、註釋、常量和變數、運算子、語句、函式、陣列 
  一. 程式流程控制語句 
 1. 順序結構： 
 簡單的語句都是順序結構的。 
 2. 判斷結構： 
  
  
    
    【if 語句定義】 
    （1） if（條件表示式）   {  

  
 

    

    
    Android 資料庫綜述（二） 程式計算器與訊號量來處理多執行緒併發問題
       
 
  
  
 Android 資料庫綜述（二） 程式計算器與訊號量來處理多執行緒併發問題 
  
 多執行緒操作資料庫，為處理併發問題，大家第一想到的是加鎖操作 ，SQLite是檔案級別的鎖.SQLite3對於併發的處理機制是允許同一個程序的多個執行緒同時讀取一個數據庫，但是任何時刻只允許一個執行緒/ 

  
 

    

    
    跟廠長學PHP內核（二）：源碼分析的環境與工具
      compiler   one   upload   info   org   print   fin   圖形界面   waiting   
本文主要介紹分析源碼的方式，其中包含環境的搭建、分析工具的安裝以及源碼調試的基本操作。

一、工具清單

PHP7.0.12
GDB
CLion

二、源碼下載及安裝
 

  
 

    

    
    Rosyln入門（二）-C＃語義分析
      sse   顯示   nor   runt   對象   完全   exp   其他   pre   先決條件
Visual Studio 2017
.NET Compiler Platform SDK
Rosyln入門（一）-C＃語法分析
簡介
今天，Visual Basic和C＃編譯器是黑盒子： 

  
 

    

    
    實戰演練（二）：執行20小時的報表SQL優化後秒出
       
 
 一、概述 
 這是我們SQL優化班的一個學員，據說該SQL在生產環境中已經運行了20個小時，快把伺服器的磁碟資源耗盡了。這20個小時，我們可愛的學員就是靠著刪除一些不重要的檔案才能夠勉強度過。  據瞭解，該SQL為一個月執行一次的跑報表的SQL，主要問題是隨著SQL的執行時間越來越長，所需 

  
 

    

    
    實習專案之（二）APP熱點標籤分析
       
 
 APP熱點標籤分析 
 專案角色: 核心研發 開發組人員: 1 
 工作內容： 
 通過hive資料倉庫，hivesql語句和udf/udaf/udtf對海量資料完成統計分析，找到熱度標籤，通過熱度標籤能夠提高APP的下載量和使用量 
   
 一、主要過程基本點 
 1.資料倉庫工作的四 

  
 

    

    
    執行緒（二）：執行緒開啟方式與多執行緒（threading模組）
       
 
 目錄 
 執行緒的建立Threading.Thread類 
 1）執行緒的建立 
 2）多執行緒與多程序 
 3）Thread類的其他方法 
 4）守護執行緒 
 
 multiprocess模組的完全模仿了threading模組的介面，二者在使用層面，有很大的相似性，因而不再詳細介紹（官方連結）  

  
 

    

    
    01分散式基礎（二）-分散式通訊協議分析
       
  
  
 
 
  分散式通訊協議分析
  
   網路協議： TCP/IP 和UDP/IP
   
    TCP/IP
    
     TCP的五層模型
     OSI的七層模型
    
    3次握手協議
    4次揮手協議
    TCP通訊原理
    分散式Java應用
    

  
 

    

    
    java多執行緒（二）:建立執行緒的三種方式以及優缺點總結
       
 
 
 一、Java中建立執行緒主要有三種方式： 
 1、繼承Thread類建立執行緒類 
 步驟： 
 （1）定義Thread類的子類，並重寫該類的run方法，該run方法的方法體就代表了執行緒要完成的任務。因此把run()方法稱為執行體。 
 （2）建立Thread子類的例項，即建立了執行緒物件。  

  
 

    

    
    Java併發程式設計（二）多執行緒四種實現方式
       
 
 Java實現多執行緒的方式 
 Java實現多執行緒的方式有4種： 
 繼承Thread方法、實現Runnable介面、實現Callable介面並通過FutureTask建立執行緒、使用ExecutorService。 
 其中，前兩種執行緒執行結果沒有返回值，後兩種是有返回值的。 
 1、繼承Th 

  
 

    

    
    TensorFlow學習（二） 資料聚類分析
      
                本文通過K均值演算法作為例子研究資料聚類分析

一、無監督學習概念

無監督學習可以從給定的資料集中找到感興趣的模式。

無監督學習，一般不給出模式的相關資訊。所以，無監督學習演算法需要自動探索資訊是怎樣組成的，並識別資料中的不同結構。

二、什麼是聚類

聚類就是對大量未知 

  
 

    

    
    PyQt5進階（二）——多執行緒：QTimer
      
                應用程式開發中多執行緒的必要性：


一般情況下，應用程式都是單執行緒執行的，但是對GUI程式來說，單執行緒有時候滿足不了要求，但是對於一些特殊情況：比如一個耗時較長的操作，執行過程會有卡頓讓使用者以為程式出錯而把程式關閉或是系統本身認為程式執行出錯而自動關閉程式。這個時候就 

  
 

    

    
    JVM記憶體模型（二）—— HotSpot虛擬機器分析
      
                上一節我們講了Java虛擬機器的理論記憶體模型，同時我們也提到了，這些只是Java虛擬機器規範中的內容，如果我們要研究一個物件是如何建立、如何佈局等一系列細節問題的時候，我們就必須在具體的虛擬機器中分析，因為不同的虛擬機器的實現是不一樣的，下面我們就選最常用、最普遍的虛擬機器 

  
 

    

    
    Activiti（二）簡單請假流程實現
      
                在SpringBoot2整合Activiti6的環境中，實現簡單的請假流程。編寫請假業務流程。流程業務為：

1，員工請假，先建立請假流程

2，員工填寫請假申請，也可以不填寫，直接結束流程

3，提交給直接主管審批，如果直接主管拒絕，則重新填寫，如果直接主管同意，再到部門主

Nematus（二）程式執行流程分析

神經機器翻譯工具Nematus

程式執行流程分析

nematus/nmt.py/train （程式入口，從此函式開始分析）

Nematus（二）程式執行流程分析

深入淺出Mybatis系列（十）---SQL執行流程分析（源碼篇）(轉)

Spark學習筆記（10）—— wordcount 執行流程分析

（O）Telephony分析之通話流程分析（二）撥打電話流程分析（上）

Exynos4412 Uboot 移植（二）—— Uboot 啟動流程分析

Flume NG原始碼分析（二）支援執行時動態修改配置的配置模組

Java語言基礎（二）---程式流程控制語句、函式

Android 資料庫綜述（二）程式計算器與訊號量來處理多執行緒併發問題

跟廠長學PHP內核（二）：源碼分析的環境與工具

Rosyln入門（二）-C＃語義分析

實戰演練（二）：執行20小時的報表SQL優化後秒出

實習專案之（二）APP熱點標籤分析

執行緒（二）：執行緒開啟方式與多執行緒（threading模組）

01分散式基礎（二）-分散式通訊協議分析

java多執行緒（二）:建立執行緒的三種方式以及優缺點總結

Java併發程式設計（二）多執行緒四種實現方式

TensorFlow學習（二）資料聚類分析

PyQt5進階（二）——多執行緒：QTimer

JVM記憶體模型（二）—— HotSpot虛擬機器分析

Activiti（二）簡單請假流程實現