利用圖神經網路進行link prediction
阿新 • • 發佈:2021-11-11
gcn for prediction of protein interactions
專案地址:https://github.com/jiangnanboy/gcn_for_prediction_of_protein_interactions
利用各種圖神經網路進行link prediction of protein interactions。
Guide
Intro
目前主要實現基於【data/yeast/yeast.edgelist】下的蛋白質資料進行link prediction。
Model
模型
模型主要使用圖神經網路,如gae、vgae等
-
1.GCNModelVAE(src/vgae):圖卷積自編碼和變分圖卷積自編碼(config中可配置使用自編碼或變分自編碼),Variational Graph Auto-Encoders。
-
2.GCNModelARGA(src/arga):對抗正則化圖自編碼,利用gae/vgae作為生成器;一個三層前饋網路作判別器,Adversarially Regularized Graph Autoencoder for Graph Embedding。
Usage
-
相關引數的配置config見每個模型資料夾中的config.cfg檔案,訓練和預測時會載入此檔案。
-
訓練及預測
1.GCNModelVAE(src/vgae)
(1).訓練
from src.vgae.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 0.73368 val_roc_score = 0.77485 average_precision_score = 0.69364 time= 0.79382 Epoch: 0002 train_loss = 0.73334 val_roc_score = 0.80637 average_precision_score = 0.74248 time= 0.78920 Epoch: 0003 train_loss = 0.73341 val_roc_score = 0.85901 average_precision_score = 0.84317 time= 0.78759 Epoch: 0004 train_loss = 0.73353 val_roc_score = 0.86936 average_precision_score = 0.85909 time= 0.78880 Epoch: 0005 train_loss = 0.73334 val_roc_score = 0.86945 average_precision_score = 0.86092 time= 0.78438 Epoch: 0006 train_loss = 0.73353 val_roc_score = 0.87117 average_precision_score = 0.86205 time= 0.78761 Epoch: 0007 train_loss = 0.73352 val_roc_score = 0.87235 average_precision_score = 0.86407 time= 0.78210 Epoch: 0008 train_loss = 0.73338 val_roc_score = 0.87317 average_precision_score = 0.86462 time= 0.78477 Epoch: 0009 train_loss = 0.73341 val_roc_score = 0.87462 average_precision_score = 0.86755 time= 0.78378 Epoch: 0010 train_loss = 0.73348 val_roc_score = 0.87606 average_precision_score = 0.86853 time= 0.78587 Epoch: 0011 train_loss = 0.73344 val_roc_score = 0.87686 average_precision_score = 0.86923 time= 0.78406 Epoch: 0012 train_loss = 0.73331 val_roc_score = 0.87665 average_precision_score = 0.86880 time= 0.78253 Epoch: 0013 train_loss = 0.73357 val_roc_score = 0.87426 average_precision_score = 0.86521 time= 0.78202 Epoch: 0014 train_loss = 0.73327 val_roc_score = 0.87218 average_precision_score = 0.86192 time= 0.78299 Epoch: 0015 train_loss = 0.73336 val_roc_score = 0.87118 average_precision_score = 0.85946 time= 0.78166 Epoch: 0016 train_loss = 0.73336 val_roc_score = 0.86960 average_precision_score = 0.85835 time= 0.78792 Epoch: 0017 train_loss = 0.73355 val_roc_score = 0.87126 average_precision_score = 0.85940 time= 0.78401 Epoch: 0018 train_loss = 0.73357 val_roc_score = 0.87050 average_precision_score = 0.85648 time= 0.78511 Epoch: 0019 train_loss = 0.73332 val_roc_score = 0.86737 average_precision_score = 0.84906 time= 0.78132 Epoch: 0020 train_loss = 0.73345 val_roc_score = 0.86632 average_precision_score = 0.84532 time= 0.78603 test roc score: 0.863696753293295 test ap score: 0.8381410617542567
(2).預測
from src.vgae.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # 會返回原始的圖鄰接矩陣和經過模型編碼後的hidden embedding經過內積解碼的鄰接矩陣,可以對這兩個矩陣進行比對,得出link prediction. adj_orig, adj_rec = predict.predict()
2.GCNModelARGA(src/arga)
(1).訓練
from src.arga.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 2.17176 val_roc_score = 0.77090 average_precision_score = 0.69050 time= 0.81113 Epoch: 0002 train_loss = 2.16173 val_roc_score = 0.84636 average_precision_score = 0.81340 time= 0.81458 Epoch: 0003 train_loss = 2.14979 val_roc_score = 0.87660 average_precision_score = 0.86472 time= 0.80898 Epoch: 0004 train_loss = 2.13698 val_roc_score = 0.87735 average_precision_score = 0.86534 time= 0.80995 Epoch: 0005 train_loss = 2.12339 val_roc_score = 0.87765 average_precision_score = 0.86592 time= 0.80865 Epoch: 0006 train_loss = 2.10753 val_roc_score = 0.87756 average_precision_score = 0.86571 time= 0.80748 Epoch: 0007 train_loss = 2.08996 val_roc_score = 0.87806 average_precision_score = 0.86621 time= 0.80738 Epoch: 0008 train_loss = 2.06920 val_roc_score = 0.87801 average_precision_score = 0.86623 time= 0.80744 Epoch: 0009 train_loss = 2.04701 val_roc_score = 0.87795 average_precision_score = 0.86618 time= 0.80932 Epoch: 0010 train_loss = 2.02241 val_roc_score = 0.87830 average_precision_score = 0.86643 time= 0.80722 Epoch: 0011 train_loss = 1.99754 val_roc_score = 0.87807 average_precision_score = 0.86620 time= 0.80533 Epoch: 0012 train_loss = 1.97255 val_roc_score = 0.87749 average_precision_score = 0.86586 time= 0.80859 Epoch: 0013 train_loss = 1.94664 val_roc_score = 0.87607 average_precision_score = 0.86483 time= 0.80660 Epoch: 0014 train_loss = 1.92208 val_roc_score = 0.87408 average_precision_score = 0.86320 time= 0.80300 Epoch: 0015 train_loss = 1.89869 val_roc_score = 0.87290 average_precision_score = 0.86218 time= 0.80400 Epoch: 0016 train_loss = 1.87584 val_roc_score = 0.87244 average_precision_score = 0.86186 time= 0.80392 Epoch: 0017 train_loss = 1.85415 val_roc_score = 0.87554 average_precision_score = 0.86400 time= 0.80675 Epoch: 0018 train_loss = 1.83373 val_roc_score = 0.87653 average_precision_score = 0.86473 time= 0.80762 Epoch: 0019 train_loss = 1.81515 val_roc_score = 0.87718 average_precision_score = 0.86532 time= 0.80596 Epoch: 0020 train_loss = 1.79975 val_roc_score = 0.87745 average_precision_score = 0.86551 time= 0.80889 test roc score: 0.8797451083479302 test ap score: 0.8681038618348471
(2).預測
from src.arga.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # 會返回原始的圖鄰接矩陣和經過模型編碼後的hidden embedding經過內積解碼的鄰接矩陣,可以對這兩個矩陣進行比對,得出link prediction. adj_orig, adj_rec = predict.predict()
Dataset
資料來自酵母蛋白質相互作用yeast。 資料集的格式如下,具體可見data。
YLR418C YOL145C
YOL145C YLR418C
YLR418C YOR123C
YOR123C YLR418C
...... ......