跟著官方文件學DGL框架第十二天——大型圖上的隨機訓練之節點分類
參考資料
- https://docs.dgl.ai/en/latest/guide/minibatch.html
- https://docs.dgl.ai/en/latest/guide/minibatch-node.html#guide-minibatch-node-classification-sampler
概述
之前學習的訓練圖神經網路的方法,都是在整個圖上進行的。對於大型圖,圖的節點或邊是百萬級、億級的。假設一個
L
L
L層的GCN,隱藏狀態(hidden state)的維度為
H
H
H,圖的節點數量為
N
N
N,那麼只是儲存中間的隱藏狀態向量就需要
O
(
N
L
H
)
O\left ( NLH\right )
類比傳統的mini-batch訓練方式,我們一次只對部分節點進行訓練,那麼就只需要將與這些節點與其 L L L階鄰居送入GPU,而不用將所有節點的特徵都送入GPU。
鄰居節點取樣(Neighborhood Sampling)
剛才提到,我們在mini-batch訓練中,一次只求batch_size個節點在第 L L L層的輸出。為了得到這些節點第 L L L層的表示,又需要從它們在第 L − 1 L-1 L−1層的鄰居得到,以此類推,直到到達第0層,即我們的輸入特徵。
如下圖所示,例如我們要求第2層的節點8的表示,我們需要得到它第1層的鄰居的表示,而第一層鄰居的表示,又需要第0層鄰居的表示。如此,我們就得到了一個子圖,現在只需要在這個子圖上進行訊息傳遞就可以得到節點8在第二層的表示。可以類別遞迴問題,從我們的目標節點8,逐層遞迴到求它的鄰居、它鄰居的鄰居的表示,最後在第0層達到遞迴終止條件,於是又逐層傳遞回來。
而鄰居節點取樣,就是指,在上述過程中,我們可以不用每層把節點所有鄰居都納入我們的子圖,而可以按照一些策略進行取樣。DGL提供了一些取樣方法,以後再展開講。
隨機訓練之同構圖上的節點分類
要將以前那些在全圖上的訓練模型,改為隨機訓練模型,只需三步:
- 定義一個鄰居取樣器;
- 調整模型
- 修改訓練迴圈
定義鄰居取樣器
這裡使用DGL內建的最簡單的取樣器“MultiLayerFullNeighborSampler”,其實就是不採樣,訓練將使用所有的鄰居。內建取樣器要搭配“NodeDataLoader”使用。下面“MultiLayerFullNeighborSampler(2)”中的引數“2”表示有兩層GCN。
這裡使用資料集“Citeseer”;隨機選擇1000個節點作為訓練集;batch size設為256;“num_workers”指載入資料時的程序數,Windows系統不可用。
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]
train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nids, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
“dataloader”會返回三個部分。第一部分是由輸入節點ID(第0層)組成的張量,第二部分是輸出節點ID(第 L L L層,本例中指第2層)組成的張量,第三部分是blocks,即每層訊息傳遞對應的子圖。blocks[0]、blocks[1]… … 依次對應第0層到第1層的子圖、第1層到第2層的子圖… …。所以blocks一共有 L L L個。
input_nodes, output_nodes, blocks = next(iter(dataloader))
print(blocks)
調整模型
下面是修改前的針對全圖的訓練模型。
class TwoLayerGCN(nn.Module):
def __init__(self, in_features, hidden_features, out_features):
super().__init__()
self.conv1 = dglnn.GraphConv(in_features, hidden_features)
self.conv2 = dglnn.GraphConv(hidden_features, out_features)
def forward(self, g, x):
x = F.relu(self.conv1(g, x))
x = F.relu(self.conv2(g, x))
return x
要將其調整為適應隨機訓練的模型,只需要將原來的全圖“g”替換為子圖“blocks[0]”、“blocks[1]”。
class StochasticTwoLayerGCN(nn.Module):
def __init__(self, in_features, hidden_features, out_features):
super().__init__()
self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
def forward(self, blocks, x):
x = F.relu(self.conv1(blocks[0], x))
x = F.relu(self.conv2(blocks[1], x))
return x
修改訓練迴圈(CPU版)
如果理解了第一步中的blocks,那麼訓練部分的修改就很好理解。blocks[0]對應的是第0層到第1層的子圖,那麼輸入特徵就是blocks[0]的源節點特徵;blocks[1]對應的是第1層到第2層的子圖,那麼真實標籤就是blocks[1]目標節點對應的標籤。
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions, output_labels)
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()
修改訓練迴圈(GPU版)
與CPU版相比,就多了將模型和子圖送入GPU的步驟。
model = model.cuda()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks]
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = compute_loss(output_labels, output_predictions)
opt.zero_grad()
loss.backward()
opt.step()
隨機訓練之異構圖上的節點分類
定義鄰居取樣器
還是使用“跟著官方文件學DGL框架第八天”中人工構建的異構圖資料集。
依然可以使用DGL內建的取樣器,並搭配“NodeDataLoader”使用,與同構圖的區別在於,訓練集的節點需要以字典的形式給出:鍵為節點型別,值為節點ID。這裡為了簡便,訓練集只包含了“user”節點,但注意取樣後的子圖還是有兩種型別的節點的。
n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10
follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)
hetero_graph = dgl.heterograph({
('user', 'follow', 'user'): (follow_src, follow_dst),
('user', 'followed-by', 'user'): (follow_dst, follow_src),
('user', 'click', 'item'): (click_src, click_dst),
('item', 'clicked-by', 'user'): (click_dst, click_src),
('user', 'dislike', 'item'): (dislike_src, dislike_dst),
('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})
hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()
g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nid_dict, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
調整模型
與同構圖一樣,將原來的“g”替換為“blocks”即可。
class StochasticTwoLayerRGCN(nn.Module):
def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
super().__init__()
self.conv1 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
for rel in rel_names
})
self.conv2 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
for rel in rel_names
})
def forward(self, blocks, x):
x = self.conv1(blocks[0], x)
x = self.conv2(blocks[1], x)
return x
修改訓練迴圈(CPU版)
與同構圖還是沒有太大區別,只是需要注意輸入特徵“input_features”和模型輸出“output_labels”都是字典:鍵為型別,值為特徵。由於模型輸出是字典型別,所以在編寫損失函式時需要為每類節點單獨編寫。這裡為了簡便,訓練集只使用了“user”型別的節點,所以輸出也取出“user”對應的輸出來計算損失。
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()
修改訓練迴圈(GPU版)
model = model.cuda()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks]
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
opt.zero_grad()
loss.backward()
opt.step()
完整程式碼
隨機訓練之同構圖上的節點分類
import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]
train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nids, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
class StochasticTwoLayerGCN(nn.Module):
def __init__(self, in_features, hidden_features, out_features):
super().__init__()
self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
def forward(self, blocks, x):
x = F.relu(self.conv1(blocks[0], x))
x = F.relu(self.conv2(blocks[1], x))
return x
in_features = g.ndata['feat'].shape[1]
hidden_features = 100
out_features = dataset.num_labels
model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())
# for input_nodes, output_nodes, blocks in dataloader:
# blocks = [b.to(torch.device('cuda')) for b in blocks]
# input_features = blocks[0].srcdata['feat']
# output_labels = blocks[-1].dstdata['label']
# output_predictions = model(blocks, input_features)
# loss = compute_loss(output_labels, output_predictions)
# opt.zero_grad()
# loss.backward()
# opt.step()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions, output_labels)
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()
隨機訓練之異構圖上的節點分類
import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10
follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)
hetero_graph = dgl.heterograph({
('user', 'follow', 'user'): (follow_src, follow_dst),
('user', 'followed-by', 'user'): (follow_dst, follow_src),
('user', 'click', 'item'): (click_src, click_dst),
('item', 'clicked-by', 'user'): (click_dst, click_src),
('user', 'dislike', 'item'): (dislike_src, dislike_dst),
('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})
hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()
g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nid_dict, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
class StochasticTwoLayerRGCN(nn.Module):
def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
super().__init__()
self.conv1 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
for rel in rel_names
})
self.conv2 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
for rel in rel_names
})
def forward(self, blocks, x):
x = self.conv1(blocks[0], x)
x = self.conv2(blocks[1], x)
return x
in_features = n_hetero_features
hidden_features = 100
out_features = n_user_classes
model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, g.etypes)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())
# for input_nodes, output_nodes, blocks in dataloader:
# blocks = [b.to(torch.device('cuda')) for b in blocks]
# input_features = blocks[0].srcdata['feat']
# output_labels = blocks[-1].dstdata['label']
# output_predictions = model(blocks, input_features)
# loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
# opt.zero_grad()
# loss.backward()
# opt.step()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()