1. 程式人生 > 其它 >跟著官方文件學DGL框架第十二天——大型圖上的隨機訓練之節點分類

跟著官方文件學DGL框架第十二天——大型圖上的隨機訓練之節點分類

參考資料

  1. https://docs.dgl.ai/en/latest/guide/minibatch.html
  2. https://docs.dgl.ai/en/latest/guide/minibatch-node.html#guide-minibatch-node-classification-sampler

概述

之前學習的訓練圖神經網路的方法,都是在整個圖上進行的。對於大型圖,圖的節點或邊是百萬級、億級的。假設一個 L L L層的GCN,隱藏狀態(hidden state)的維度為 H H H,圖的節點數量為 N N N,那麼只是儲存中間的隱藏狀態向量就需要 O ( N L H ) O\left ( NLH\right )

O(NLH)的空間,很容易超過視訊記憶體。

類比傳統的mini-batch訓練方式,我們一次只對部分節點進行訓練,那麼就只需要將與這些節點與其 L L L階鄰居送入GPU,而不用將所有節點的特徵都送入GPU。

鄰居節點取樣(Neighborhood Sampling)

剛才提到,我們在mini-batch訓練中,一次只求batch_size個節點在第 L L L層的輸出。為了得到這些節點第 L L L層的表示,又需要從它們在第 L − 1 L-1 L1層的鄰居得到,以此類推,直到到達第0層,即我們的輸入特徵。

如下圖所示,例如我們要求第2層的節點8的表示,我們需要得到它第1層的鄰居的表示,而第一層鄰居的表示,又需要第0層鄰居的表示。如此,我們就得到了一個子圖,現在只需要在這個子圖上進行訊息傳遞就可以得到節點8在第二層的表示。可以類別遞迴問題,從我們的目標節點8,逐層遞迴到求它的鄰居、它鄰居的鄰居的表示,最後在第0層達到遞迴終止條件,於是又逐層傳遞回來。

在這裡插入圖片描述
而鄰居節點取樣,就是指,在上述過程中,我們可以不用每層把節點所有鄰居都納入我們的子圖,而可以按照一些策略進行取樣。DGL提供了一些取樣方法,以後再展開講。

隨機訓練之同構圖上的節點分類

要將以前那些在全圖上的訓練模型,改為隨機訓練模型,只需三步:

  1. 定義一個鄰居取樣器;
  2. 調整模型
  3. 修改訓練迴圈

定義鄰居取樣器

這裡使用DGL內建的最簡單的取樣器“MultiLayerFullNeighborSampler”,其實就是不採樣,訓練將使用所有的鄰居。內建取樣器要搭配“NodeDataLoader”使用。下面“MultiLayerFullNeighborSampler(2)”中的引數“2”表示有兩層GCN。

這裡使用資料集“Citeseer”;隨機選擇1000個節點作為訓練集;batch size設為256;“num_workers”指載入資料時的程序數,Windows系統不可用。

sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]

train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nids, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)

“dataloader”會返回三個部分。第一部分是由輸入節點ID(第0層)組成的張量,第二部分是輸出節點ID(第 L L L層,本例中指第2層)組成的張量,第三部分是blocks,即每層訊息傳遞對應的子圖。blocks[0]、blocks[1]… … 依次對應第0層到第1層的子圖、第1層到第2層的子圖… …。所以blocks一共有 L L L個。

input_nodes, output_nodes, blocks = next(iter(dataloader))
print(blocks)

調整模型

下面是修改前的針對全圖的訓練模型。

class TwoLayerGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.conv1 = dglnn.GraphConv(in_features, hidden_features)
        self.conv2 = dglnn.GraphConv(hidden_features, out_features)

    def forward(self, g, x):
        x = F.relu(self.conv1(g, x))
        x = F.relu(self.conv2(g, x))
        return x

要將其調整為適應隨機訓練的模型,只需要將原來的全圖“g”替換為子圖“blocks[0]”、“blocks[1]”。

class StochasticTwoLayerGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
        self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)

    def forward(self, blocks, x):
        x = F.relu(self.conv1(blocks[0], x))
        x = F.relu(self.conv2(blocks[1], x))
        return x

修改訓練迴圈(CPU版)

如果理解了第一步中的blocks,那麼訓練部分的修改就很好理解。blocks[0]對應的是第0層到第1層的子圖,那麼輸入特徵就是blocks[0]的源節點特徵;blocks[1]對應的是第1層到第2層的子圖,那麼真實標籤就是blocks[1]目標節點對應的標籤。

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions, output_labels)
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

修改訓練迴圈(GPU版)

與CPU版相比,就多了將模型和子圖送入GPU的步驟。

model = model.cuda()
opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    blocks = [b.to(torch.device('cuda')) for b in blocks]
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = compute_loss(output_labels, output_predictions)
    opt.zero_grad()
    loss.backward()
    opt.step()

隨機訓練之異構圖上的節點分類

定義鄰居取樣器

還是使用“跟著官方文件學DGL框架第八天”中人工構建的異構圖資料集。

依然可以使用DGL內建的取樣器,並搭配“NodeDataLoader”使用,與同構圖的區別在於,訓練集的節點需要以字典的形式給出:鍵為節點型別,值為節點ID。這裡為了簡便,訓練集只包含了“user”節點,但注意取樣後的子圖還是有兩種型別的節點的。

n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10

follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)

hetero_graph = dgl.heterograph({
    ('user', 'follow', 'user'): (follow_src, follow_dst),
    ('user', 'followed-by', 'user'): (follow_dst, follow_src),
    ('user', 'click', 'item'): (click_src, click_dst),
    ('item', 'clicked-by', 'user'): (click_dst, click_src),
    ('user', 'dislike', 'item'): (dislike_src, dislike_dst),
    ('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})

hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()

g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nid_dict, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)

調整模型

與同構圖一樣,將原來的“g”替換為“blocks”即可。

class StochasticTwoLayerRGCN(nn.Module):
    def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
        super().__init__()
        self.conv1 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                for rel in rel_names
            })
        self.conv2 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                for rel in rel_names
            })

    def forward(self, blocks, x):
        x = self.conv1(blocks[0], x)
        x = self.conv2(blocks[1], x)
        return x

修改訓練迴圈(CPU版)

與同構圖還是沒有太大區別,只是需要注意輸入特徵“input_features”和模型輸出“output_labels”都是字典:鍵為型別,值為特徵。由於模型輸出是字典型別,所以在編寫損失函式時需要為每類節點單獨編寫。這裡為了簡便,訓練集只使用了“user”型別的節點,所以輸出也取出“user”對應的輸出來計算損失。

opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

修改訓練迴圈(GPU版)

model = model.cuda()
opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    blocks = [b.to(torch.device('cuda')) for b in blocks]
    input_features = blocks[0].srcdata['feat']     
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
    opt.zero_grad()
    loss.backward()
    opt.step()

完整程式碼

隨機訓練之同構圖上的節點分類

import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]

train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nids, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)

class StochasticTwoLayerGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
        self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)

    def forward(self, blocks, x):
        x = F.relu(self.conv1(blocks[0], x))
        x = F.relu(self.conv2(blocks[1], x))
        return x

in_features = g.ndata['feat'].shape[1]
hidden_features = 100
out_features = dataset.num_labels
model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())

# for input_nodes, output_nodes, blocks in dataloader:
#     blocks = [b.to(torch.device('cuda')) for b in blocks]
#     input_features = blocks[0].srcdata['feat']
#     output_labels = blocks[-1].dstdata['label']
#     output_predictions = model(blocks, input_features)
#     loss = compute_loss(output_labels, output_predictions)
#     opt.zero_grad()
#     loss.backward()
#     opt.step()

opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions, output_labels)
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

隨機訓練之異構圖上的節點分類

import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10

follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)

hetero_graph = dgl.heterograph({
    ('user', 'follow', 'user'): (follow_src, follow_dst),
    ('user', 'followed-by', 'user'): (follow_dst, follow_src),
    ('user', 'click', 'item'): (click_src, click_dst),
    ('item', 'clicked-by', 'user'): (click_dst, click_src),
    ('user', 'dislike', 'item'): (dislike_src, dislike_dst),
    ('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})

hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()

g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nid_dict, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)


class StochasticTwoLayerRGCN(nn.Module):
    def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
        super().__init__()
        self.conv1 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                for rel in rel_names
            })
        self.conv2 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                for rel in rel_names
            })

    def forward(self, blocks, x):
        x = self.conv1(blocks[0], x)
        x = self.conv2(blocks[1], x)
        return x

in_features = n_hetero_features
hidden_features = 100
out_features = n_user_classes
model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, g.etypes)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())

# for input_nodes, output_nodes, blocks in dataloader:
#     blocks = [b.to(torch.device('cuda')) for b in blocks]
#     input_features = blocks[0].srcdata['feat']     
#     output_labels = blocks[-1].dstdata['label']
#     output_predictions = model(blocks, input_features)
#     loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
#     opt.zero_grad()
#     loss.backward()
#     opt.step()

opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()