1. 程式人生 > >Pre-Trained Models for Visual Common Sense in AI

Pre-Trained Models for Visual Common Sense in AI

Pre-Trained Models for Visual Common Sense in AI

If you’ve been following our blog this past summer, you’d have already noticed that we have released Something-Something V2, the world’s largest and most suitable video dataset for gauging visual common sense in AI. Something-Something is also one of the datasets that our powerful

SuperModel trained on.

At a sheer volume of 220,847 video clips (translating into many millions of frames) over 174 action labels, Something-Something V2 dataset makes model training highly power-intensive. To save our fellow researchers the time of training these models from scratch, we decided to provide you three readily available pre-trained models so that you can use them to extract features on your video datasets and add more experiments in your paper submission to conferences such as

CVPR, ICCV, BMVC, and NIPS.

The models and their performance on validation set are:

  • model3D_1: top-1 49.88%, top-5 78.82%
  • model3D_1_224: top-1 47.67%, top-5 77.35%
  • model3D_1 with left-right augmentation and fps jitter: top-1 51.33%, top-5 80.46%
Use the notebook we provide to visualize saliency maps on any validation sample

For more information and instruction to use the pre-trained models, please refer to the Github repository link below. Enjoy deep learning!