視訊理解論文和資料集

阿新 • • 發佈：2019-01-05

轉自：https://github.com/sujiongming/awesome-video-understanding

Awesome Video Understanding

Understanding Video: Perceiving dynamic actions could be a huge advance in how software makes sense of the world.(from MIT Technology Review December 6, 2017)

A list of resources for video understanding. Most of papers can be searched by scholar.google.com.

This list is updated on December 13th 2017.

Video Classification
Action Recognition
Video Captioning: will be updated
Temporal Action Detection: will be updated
Video Datasets

Papers

Video Classification

image-based methods
- Zha S, Luisier F, Andrews W, et al. Exploiting Image-trained CNN Architectures for Unconstrained Video Classification[J]. Computer Science, 2015.
- Sánchez J, Perronnin F, Mensink T, et al. Image Classification with the Fisher Vector: Theory and Practice[J]. International Journal of Computer Vision, 2013, 105: 222-245.
CNN-based methods
- Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014: 1725-1732.
- Tran D, Bourdev L D, Fergus R, et al. C3D: generic features for video analysis[J]. CoRR, abs/1412.0767, 2014, 2(7): 8.
- Fernando B, Gould S. Learning end-to-end video classification with rank-pooling[C]//International Conference on Machine Learning. 2016: 1187-1196.
RNN-based methods
- Wu Z, Wang X, Jiang Y G, et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[C]//Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015: 461-470.
- Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets: Deep networks for video classification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4694-4702.

Action Recognition

CNN-based methods
- Ji S, Xu W, Yang M, et al. 3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1):221-231.
- Tran D, Bourdev L D, Fergus R, et al. C3D: generic features for video analysis[J]. CoRR, abs/1412.0767, 2014, 2(7): 8.
- Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[J]. arXiv preprint arXiv:1604.04494, 2016.
- Sun L, Jia K, Yeung D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 4597-4605.
- Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in neural information processing systems. 2014: 568-576.
- Ye H, Wu Z, Zhao R W, et al. Evaluating two-stream CNN for video classification[C]//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 2015: 435-442.
- Wang L, Qiao Y, Tang X. Action recognition with trajectory-pooled deep-convolutional descriptors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4305-4314.
- Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1933-1941.
- Wang L, Xiong Y, Wang Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 20-36.
- Zhang B, Wang L, Wang Z, et al. Real-time action recognition with enhanced motion vector CNNs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2718-2726.
- Wang X, Farhadi A, Gupta A. Actions~ transformations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2658-2667.
- Zhu W, Hu J, Sun G, et al. A key volume mining deep framework for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1991-1999.
- Bilen H, Fernando B, Gavves E, et al. Dynamic image networks for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3034-3042.
- Fernando B, Anderson P, Hutter M, et al. Discriminative hierarchical rank pooling for activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1924-1932.
- Cherian A, Fernando B, Harandi M, et al. Generalized rank pooling for activity recognition[J]. arXiv preprint arXiv:1704.02112, 2017.
- Fernando B, Gavves E, Oramas J, et al. Rank pooling for action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(4): 773-787.
- Fernando B, Gould S. Discriminatively Learned Hierarchical Rank Pooling Networks[J]. arXiv preprint arXiv:1705.10420, 2017.
RNN-based methods
- Baccouche M, Mamalet F, Wolf C, et al. Sequential deep learning for human action recognition[C]//International Workshop on Human Behavior Understanding. Springer, Berlin, Heidelberg, 2011: 29-39.
- Donahue J, Anne Hendricks L, Guadarrama S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 2625-2634.
- Veeriah V, Zhuang N, Qi G J. Differential recurrent neural networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 4041-4049.
- Li Q, Qiu Z, Yao T, et al. Action recognition by learning deep multi-granular spatio-temporal video representation[C]//Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 2016: 159-166.
- Wu Z, Jiang Y G, Wang X, et al. Multi-stream multi-class fusion of deep networks for video classification[C]//Proceedings of the 2016 ACM on Multimedia Conference. ACM, 2016: 791-800.
- Sharma S, Kiros R, Salakhutdinov R. Action recognition using visual attention[J]. arXiv preprint arXiv:1511.04119, 2015.
- Li Z, Gavves E, Jain M, et al. VideoLSTM convolves, attends and flows for action recognition[J]. arXiv preprint arXiv:1607.01794, 2016.
Unsupervised learning methods
- Taylor G W, Fergus R, LeCun Y, et al. Convolutional learning of spatio-temporal features[C]//European conference on computer vision. Springer, Berlin, Heidelberg, 2010: 140-153.
- Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011: 3361-3368.
- Yan X, Chang H, Shan S, et al. Modeling video dynamics with deep dynencoder[C]//European Conference on Computer Vision. Springer, Cham, 2014: 215-230.
- Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms[C]//International Conference on Machine Learning. 2015: 843-852.
- Pan Y, Li Y, Yao T, et al. Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure[C]//IJCAI. 2016: 3832-3838.
- Ballas N, Yao L, Pal C, et al. Delving deeper into convolutional networks for learning video representations[J]. arXiv preprint arXiv:1511.06432, 2015.

Video Datasets

HMDB51
- Kuehne H, Jhuang H, Garrote E, et al. HMDB: a large video database for human motion recognition[C]//Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011: 2556-2563.
- state-of-the-art: 75%
  - Lan Z, Zhu Y, Hauptmann A G. Deep Local Video Feature for Action Recognition[J]. arXiv preprint arXiv:1701.07368, 2017.
UCF-101
- Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv preprint arXiv:1212.0402, 2012.
- state-of-the-art: 95.6%
  - Diba A, Sharma V, Van Gool L. Deep temporal linear encoding networks[J]. arXiv preprint arXiv:1611.06678, 2016.
ActivityNet
- Caba Heilbron F, Escorcia V, Ghanem B, et al. Activitynet: A large-scale video benchmark for human activity understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 961-970.
- state-of-the-art: 91.3%
  - Wang L, Xiong Y, Lin D, et al. UntrimmedNets for Weakly Supervised Action Recognition and Detection[J]. arXiv preprint arXiv:1703.03329, 2017.
Sports-1M
- Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014: 1725-1732.
- state-of-the-art: 67.6%
  - Abu-El-Haija S, Kothari N, Lee J, et al. YouTube-8M: A large-scale video classification benchmark[J]. arXiv preprint arXiv:1609.08675, 2016.
YouTube-8M
- Abu-El-Haija S, Kothari N, Lee J, et al. YouTube-8M: A large-scale video classification benchmark[J]. arXiv preprint arXiv:1609.08675, 2016.
- state-of-the-art: 84.967%
  - Miech A, Laptev I, Sivic J. Learnable pooling with Context Gating for video classification[J]. arXiv preprint arXiv:1706.06905, 2017.
Kinetics
- Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action Video Dataset[J]. arXiv preprint arXiv:1705.06950, 2017.
- state-of-the-art: ?
Moments in Time Dataset
- Mathew Monfort, Bolei Zhou, Sarah Adel Bargal, Tom Yan, Alex Andonian, Kandan Ramakrishnan, Lisa Brown, Quanfu Fan, Dan Gutfreund, Carl Vondrick, Aude Oliva.Moments in Time Dataset: one million videos for event understanding. tech report
- state-of-the-art: ?

視訊理解論文和資料集

轉自：https://github.com/sujiongming/awesome-video-understanding Awesome Video Understanding Understanding Video: Perceiving dynamic acti

雜湊檢索方面的論文、程式碼和資料集Learning to Hash Paper, Code and Dataset

Semi-Supervised Hashing for Scalable Image Retrieval [paper] Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. [CVPR], 2010Minimal Loss Hashing for Compact Bi

瞭解SparkSQL、DataFrames和資料集

對於資料集和DataFrameAPI存在很多混淆，因此在本文中，我們將瞭解SparkSQL、DataFrames和DataSet。火花SQL 它是一個用於結構化資料處理的SparkModule，它允許您編寫更少的程式碼來完成任務，並且在幕後，它可以智慧地執行優化。SparkSQL模組由兩個主要部分組

機器視覺、影象處理、機器學習領域相關程式碼和工程專案和資料集集合

SIFT [1] [Demo program][SIFT Library] [VLFeat] PCA-SIFT [2] [Project] Affine-SIFT [3] [Project] SURF [4] [OpenSURF] [Matlab Wrapper] Af

基於faster-rcnn的圖片標註和資料集生成自動化工具（資料集格式同pascal voc）

未完待續。。。 faster-rcnn的模型訓練需要大量資料集，目前使用labelImg工具，需要人工一張一張標註，效率低。本文使用python編寫了自動化圖片標註和資料集生成工具（本文標註目標是人臉，大家可以根據需要訓練其他目標判別模型，比如汽車、自行車等標註需要的圖片資

視訊理解論文雜讀（門外漢級）

從圖片到視訊肯定是科研發展的方向，可惜需要的資源太多，像我這種只有一個GPU的根本做不了,不過學習學習總是好的。未完成，主要自己看，主要是動作識別方向 [2018-arxiv] Temporal Shift Module for Efficient Video Unde

深度學習中常見的打標籤工具和資料集集合

集大家之所長彙集於此，希望對有需要的你能有所幫助。一、打標籤工具（1）labelimg/labelme 這兩款工具簡便易行，前者主要用於對目標進行大致的標定，用於常見的框選標定，後者主要用於較為細緻的輪廓標定，多用於mask rcnn等。安裝也是很方便的，

關於HMR論文的資料集的資料

1、LSP 樣本數：2K 關節點個數：14 全身單人地址資料集格式： ZIP檔案包含在兩個資料夾的影象：影象/ -包含原始影象視覺化/ -與姿勢包含影象視覺化的檔案joints.mat是包含在3x14x2000矩陣稱為“接頭”，其中x關節註釋的MAT

UCI資料集整理（附論文常用資料集）

摘要：UCI資料集作為標準測試資料集經常出現在許多機器學習的論文中，為了更方便使用這些資料集有必要對其進行整理，這裡整理了論文中經常出現的資料集，並詳細介紹如何使用MATLAB將資料集檔案整理成自己需

瞭解Spark SQL，DataFrame和資料集

關於Datasets和DataFrame API存在很多混淆，因此在本文中，我們將瞭解Spark SQL，DataFrames和Datasets。 Spark SQL 它是一個用於結構化資料處理的Spark模組，它允許您編寫更少的程式碼來完成工作，並且在底層，它可以智慧地

Spark Sql，Dataframe和資料集指南

概述 Spark SQL是一個spark模組，主要用於結構化資料的處理。不像基礎的spark RDD的API那麼抽象，該介面能夠對資料和資料的計算提供更多的資訊。Spark SQL使用這些額外的資

NLP-Progress記錄NLP最新資料集、論文和程式碼: 助你緊跟NLP前沿

方向是自然語言處理的同學們有福啦，為了跟蹤自然語言處理(NLP)的進展，有大量仁人志士在 Github 上維護了一個名為 NLP-Progress 的庫。它記錄了幾乎所有NLP任務的 baseline 和標準資料集，同時還記錄了這些問題的state-of-the-art。 ●&nb

【視訊理解資料集彙總】A collection of recent video understanding datasets, under construction!

【視訊理解資料集彙總】’A collection of recent video understanding datasets, under construction!’ by Yao Zhou 原文地址：https://github.com//yoosan/video-understan

vggface2人臉識別資料集【論文筆記】VGGFace2——一個能夠用於識別不同姿態和年齡人臉的資料集

原【論文筆記】VGGFace2——一個能夠用於識別不同姿態和年齡人臉的資料集 2018年01月10日 14:53:31 有來有去-CV 閱讀數：6701

Moments in Time：IBM-MIT聯合提出最新百萬規模視訊動作理解資料集

在過去一年中，視訊理解相關的領域湧現了大量的新模型、新方法，與之相伴的，今年也出現了多個新的大規模的視訊理解資料集。近期，MIT-IBM Watson AI Lab 就推出了一個全新的百萬規模視訊理解資料集Moments-in-Time[1]。雖然沒有之前的YouTube

理解ArcGIS中的要素類和要素資料集

在ArcGIS的使用中，我們常常會遇到要素類和要素資料集兩個概念，要完全理解這兩個概念並知道如何在軟體中使用，則需要以下知識： 1、Esri開發的三種向量資料型別 Esri公司的每個新軟體包都會引入一個新的向量資料型別： Arc/Info與Coverage

python 把資料分成訓練集和測試集

from sklearn.model_selection import train_test_split import pandas as pd f1=pd.read_excel('aaa.xlsx') f1.columns #Index(['X', 'Y'], dtype='object')

各種IT學習視訊和資料

#知識點 PC版梯子 https://github.com/getlantern/lantern 知識點彙總連結：http://pan.baidu.com/s/1dFFq1mp 密碼：hdvc 程式設計師英語單詞彙總連結：http://pan.baidu.com/s/1o8bGMh0 密

行人重識別簡介+資料集+核心論文點

1、行人重識別是什麼？行人重識別（Person re-identification）也稱行人再識別，是利用計算機視覺技術判斷影象或者視訊序列中是否存在特定行人的技術。廣泛被認為是一個影象檢索的子問題。給定一個監控行人影象，檢索跨裝置下的該行人影象。旨在彌補目前固定的攝像頭的

正則化方法 L1和L2 regularization 資料集擴增 dropout

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

視訊理解論文和資料集

Awesome Video Understanding

Table of Contents

Papers

Video Classification

Action Recognition

Video Datasets

相關推薦