《星際戰甲》全新資料片“新紀之戰”將於12月上線
阿新 • • 發佈:2021-11-15
終於來到transformer了,之前的幾個東西都搞的差不多了,剩下的就是搭積木搭模型了。首先來看一下transformer模型,OK好像就是那一套東西。
transformer是純基於注意力機制的架構,但是也是之前的encoder-decoder架構。
層歸一化
這裡用到了層歸一化,和之前的批量歸一化有區別。
這裡參考了torch文件:
N就是batchsize維,layernorm就是對一個batch裡序列裡的向量做歸一化。
Encoder
import torch import torch.nn as nn import torch.nn.functional as F from d2l import torch as d2l class add_norm(nn.Module): def __init__(self, norm_shape, dropout=0): super(add_norm, self).__init__() self.norm = nn.LayerNorm(norm_shape) self.dropout = nn.Dropout(dropout) def forward(self, X, Y): return self.norm(X + self.dropout(Y)) #這裡預設X, Y的shape一樣 class EncoderBlock(nn.Module): def __init__(self, embed_dim, norm_shape): super(EncoderBlock, self).__init__() self.add_norm1 = add_norm(norm_shape=norm_shape) self.attention = nn.MultiheadAttention(embed_dim, 8, batch_first=True) # 這裡將batch_first 設定為了True。 self.ffn = nn.Sequential(nn.Linear(embed_dim, embed_dim), nn.ReLU(), nn.Linear(embed_dim, embed_dim)) self.add_norm2 = add_norm(norm_shape=norm_shape) def forward(self, X): Y,_ = self.attention(X, X, X) X = self.add_norm1(X, Y) Y = self.ffn(X) X = self.add_norm2(X, Y) return X class Encoder(nn.Module): def __init__(self, embed_dim, norm_shape, num_block) -> None: super(Encoder, self).__init__() self.pos_encoding = d2l.PositionalEncoding(embed_dim, dropout=0) self.EncoderBlocks = [EncoderBlock(embed_dim, norm_shape) for _ in range(num_block)] def forward(self, X): X = self.pos_encoding(X) for i in range(len(self.EncoderBlocks)): X = self.EncoderBlocks[i](X) return X model = Encoder(128, [35, 128], 2) s = torch.zeros((64, 35, 128)) s = model(s)
用torch實現了一個encoder, decoder不想寫,擺爛了,就這樣,愛咋滴咋滴,以後就呼叫框架了。
直接用框架實現了,愛咋滴咋滴吧。