I am going to maintain this page to record a few things about computer vision that I have read, am doing, or will have a look at. Previously I’d like to write short notes of the papers that I have read. It is a good way to remember and understand the ideas of the authors. But gradually I found that I forget much portion of what I had learnt because in addition to paper I also derive knowledges from others’ blogs, online courses and reports, not recording them at all. Besides, I need a place to keep a list of what I should have a look at but do not at the time when I discover them. This page will be much like a catalog.



  • DetNet: A Backbone network for Object Detection (PDF)
  • Zero-Shot Object Detection (PDF)
  • Unsupervised Discovery of Object Landmarks as Structural Representations (PDF, Project/Code)
  • Cascade R-CNN: Delving into High Quality Object Detection (PDF, PROJECT/CODE
  • Path Aggregation Network for Instance Segmentation (PDF)
  • ClickBAIT-v2: Training an Object Detector in Real-Time (PDF)
  • Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection (PDF)
  • Complex-YOLO: Real-time 3D Object Detection on Point Clouds (PDF)
  • Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts (
  • Domain Adaptive Faster R-CNN for Object Detection in the Wild (PDF)
  • Chinese Text in the Wild (PDF, Project/Code)
  • TSSD: Temporal Single-Shot Detector Based on Attention and LSTM for Robotic Intelligent Perception (PDF)
  • Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection (PDF, Reading Note)
  • Object Detection in Videos by Short and Long Range Object Linking (PDF)
  • Learning a Rotation Invariant Detector with Rotatable Bounding Box (PDF, Project/Code)
  • Detecting Curve Text in the Wild: New Dataset and New Solution (PDF, Project/Code)
  • Single Shot Text Detector with Regional Attention (PDF, Project/Code)
  • Single-Shot Refinement Neural Network for Object Detection (PDF, Project/Code, Reading Note)
  • S3FD: Single Shot Scale-invariant Face Detector (PDF, Reading Note)
  • MegDet: A Large Mini-Batch Object Detector (PDF)
  • Light-Head R-CNN: In Defense of Two-Stage Object Detector (PDF)
  • Interpretable R-CNN (PDF)
  • Cascade Region Proposal and Global Context for Deep Object Detection (PDF)
  • PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection (PDF, Project/Code, Reading Note)
  • Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks (PDF, Reading Note)
  • Object Detection from Video Tubelets with Convolutional Neural Networks (PDF, Reading Note)
  • R-FCN: Object Detection via Region-based Fully Convolutional Networks (PDF, Project/Code, Reading Note)
  • SSD: Single Shot MultiBox Detector (PDF, Project/Code, Reading Note)
  • Pushing the Limits of Deep CNNs for Pedestrian Detection (PDF, Reading Note)
  • Object Detection by Labeling Superpixels(PDF, Reading Note)
  • Crafting GBD-Net for Object Detection (PDF, Projct/Code)
    code for CUImage and CUVideo, the object detection champion of ImageNet 2016.
  • Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection (PDF, Reading Note)
  • Training Region-based Object Detectors with Online Hard Example Mining (PDF, Reading Note)
  • Detecting People in Artwork with CNNs (PDF, Project/Code)
  • Deeply supervised salient object detection with short connections (PDF)
  • Learning to detect and localize many objects from few examples (PDF)
  • Multi-Scale Saliency Detection using Dictionary Learning (PDF)
  • Straight to Shapes: Real-time Detection of Encoded Shapes (PDF)
  • Weakly Supervised Cascaded Convolutional Networks (PDF, Reading Note)
  • Speed/accuracy trade-offs for modern convolutional object detectors (PDF, Reading Note)
  • Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks (PDF)
  • Feature Pyramid Networks for Object Detection (PDF, Reading Note)
  • COCO-Stuff: Thing and Stuff Classes in Context (PDF)
  • Finding Tiny Faces (PDF)
  • Beyond Skip Connections: Top-Down Modulation for Object Detection (PDF, Reading Note)
  • YOLO9000: Better, Faster, Stronger (PDF, Project/Code, Reading Note)
  • Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study (PDF)
  • To Boost or Not to Boost? On the Limits of Boosted Trees for Object Detection (PDF)
  • DSSD: Deconvolutional Single Shot Detector (PDF, Reading Note)
  • A Fast and Compact Salient Score Regression Network Based on Fully Convolutional Network (PDF)
  • Wide-Residual-Inception Networks for Real-time Object Detection (PDF)
  • Zoom Out-and-In Network with Recursive Training for Object Proposal (PDF, Project/Code)
  • Improving Object Detection with Region Similarity Learning (PDF)
  • Tree-Structured Reinforcement Learning for Sequential Object Localization (PDF)
  • Weakly Supervised Object Localization Using Things and Stuff Transfer (PDF)
  • Unsupervised learning from video to detect foreground objects in single images (PDF)
  • A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection (PDF, Project/Code)
  • A Learning non-maximum suppression (PDF)
  • Real Time Image Saliency for Black Box Classifiers (PDF)
  • An Efficient Approach for Object Detection and Tracking of Objects in a Video with Variable Background (PDF)
  • RON: Reverse Connection with Objectness Prior Networks for Object Detection (PDF, Project/Code)
  • Deformable Part-based Fully Convolutional Network for Object Detection (PDF, Reading Note)
  • Recurrent Scale Approximation for Object Detection in CNN (PDF)
  • DSOD: Learning Deeply Supervised Object Detectors from Scratch (PDF, Project/Code, Reading Note)
  • PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN (PDF)
  • Focal Loss for Dense Object Detection (PDF)
  • Learning Uncertain Convolutional Features for Accurate Saliency Detection (PDF)
  • Optimizing Region Selection for Weakly Supervised Object Detection (PDF)
  • Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition (PDF)
  • Flow-Guided Feature Aggregation for Video Object Detection (PDF)
  • BlitzNet: A Real-Time Deep Network for Scene Understanding ([PDF]( BlitzNet: A Real-Time Deep Network for Scene Understanding), Project/Code)
  • Soft Proposal Networks for Weakly Supervised Object Localization (PDF, Project/Code)
  • Feature-Fused SSD: Fast Detection for Small Objects (PDF)
  • Light Cascaded Convolutional Neural Networks for Accurate Player Detection (PDF)
  • Personalized Saliency and its Prediction (PDF)
  • WeText: Scene Text Detection under Weak Supervision (PDF)
  • VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (PDF, Project/Code)


  • Deep Object Co-Segmentation (PDF)
  • Fusing Hierarchical Convolutional Features for Human Body Segmentation and Clothing Fashion Classification (PDF)
  • ShuffleSeg: Real-time Semantic Segmentation Network (PDF)
  • Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (PDF, Project/Code)
  • Learning random-walk label propagation for weakly-supervised semantic segmentation (PDF)
  • Panoptic Segmentation (PDF, Reading Note)
  • Learning to Segment Every Thing (PDF)
  • Deep Extreme Cut: From Extreme Points to Object Segmentation (PDF)
  • Instance-aware Semantic Segmentation via Multi-task Network Cascades (PDF, Project/Code)
  • ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation (PDF, Reading Note)
  • Learning Deconvolution Network for Semantic Segmentation (PDF, Reading Note)
  • Semantic Object Parsing with Graph LSTM (PDF, Reading Note)
  • Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (PDF, Reading Note)
  • Learning to Segment Moving Objects in Videos (PDF, Reading Note)
  • Deep Structured Features for Semantic Segmentation (PDF)

    We propose a highly structured neural network architecture for semantic segmentation of images that combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stages i) and ii) are completely pre-specified, only the linear classifier is learned from data. Thanks to its high degree of structure, our architecture has a very small memory footprint and thus fits onto low-power embedded and mobile platforms. We apply the proposed architecture to outdoor scene and aerial image semantic segmentation and show that the accuracy of our architecture is competitive with conventional pixel classification CNNs. Furthermore, we demonstrate that the proposed architecture is data efficient in the sense of matching the accuracy of pixel classification CNNs when trained on a much smaller data set.

  • CNN-aware Binary Map for General Semantic Segmentation (PDF)

  • Learning to Refine Object Segments (PDF)
  • Clockwork Convnets for Video Semantic Segmentation(PDF, Project/Code)
  • Convolutional Gated Recurrent Networks for Video Segmentation (PDF)
  • Efficient Convolutional Neural Network with Binary Quantization Layer (PDF)
  • One-Shot Video Object Segmentation (PDF)
  • Fully Convolutional Instance-aware Semantic Segmentation (PDF, Projcet/Code, Reading Note)
  • Semantic Segmentation using Adversarial Networks (PDF)
  • Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes (PDF)
  • Deep Watershed Transform for Instance Segmentation (PDF)
  • InstanceCut: from Edges to Instances with MultiCut (PDF)
  • The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (PDF)
  • Improving Fully Convolution Network for Semantic Segmentation (PDF)
  • Video Scene Parsing with Predictive Feature Learning (PDF)
  • Training Bit Fully Convolutional Network for Fast Semantic Segmentation (PDF)
  • Pyramid Scene Parsing Network (PDF, Reading Note)
  • Mining Pixels: Weakly Supervised Semantic Segmentation Using Image Labels (PDF)
  • FastMask: Segment Object Multi-scale Candidates in One Shot (PDF, Project/Code, Reading Note)
  • A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction (PDF, Reading Note)
  • FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos (PDF)
  • Visual Saliency Prediction Using a Mixture of Deep Neural Networks (PDF)
  • PixelNet: Representation of the pixels, by the pixels, and for the pixels (PDF, Project/Code)
  • Super-Trajectory for Video Segmentation (PDF)
  • Understanding Convolution for Semantic Segmentation (PDF, Reading Note)
  • Adversarial Examples for Semantic Image Segmentation (PDF)
  • Large Kernel Matters – Improve Semantic Segmentation by Global Convolutional Network (PDF)
  • Deep Image Matting (PDF, Reading Note)
  • Predicting Deeper into the Future of Semantic Segmentation (PDF)
  • Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks (PDF, Project/Code)
  • One-Shot Video Object Segmentation (PDF, Project/Code)
  • Semantic Instance Segmentation via Deep Metric Learning (PDF)
  • Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade (PDF)
  • Semantically-Guided Video Object Segmentation (PDF)
  • Recurrent Multimodal Interaction for Referring Image Segmentation (PDF)
  • Loss Max-Pooling for Semantic Image Segmentation (PDF)
  • Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation (PDF)
  • Learning Video Object Segmentation with Visual Memory (PDF)
  • A Review on Deep Learning Techniques Applied to Semantic Segmentation (PDF)
  • BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks (PDF)
  • Rethinking Atrous Convolution for Semantic Image Segmentation (PDF)
  • Discriminative Localization in CNNs for Weakly-Supervised Segmentation of Pulmonary Nodules (PDF)
  • Superpixel-based semantic segmentation trained by statistical process control (PDF)
  • The Devil is in the Decoder (PDF)
  • Semantic Segmentation with Reverse Attention (PDF)
  • Learning Deconvolution Network for Semantic Segmentation (PDF, Project/Code)
  • Depth Adaptive Deep Neural Network for Semantic Segmentation (PDF)
  • Semantic Instance Segmentation with a Discriminative Loss Function (PDF)
  • A Cost-Sensitive Visual Question-Answer Framework for Mining a Deep And-OR Object Semantics from Web Images (PDF)
  • ICNet for Real-Time Semantic Segmentation on High-Resolution Images (PDF, Project/Code)
  • Learning to Segment Instances in Videos with Spatial Propagation Network (PDF, Project/Code)
  • Learning Affinity via Spatial Propagation Networks (PDF, Project/Code)


  • Trajectory Factory: Tracklet Cleaving and Re-connection by Deep Siamese Bi-GRU for Multiple Object Tracking (PDF)
  • Machine Learning Methods for Solving Assignment Problems in Multi-Target Tracking (PDF)
  • Multi-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent Progress on DukeMTMC Project (PDF)
  • Detect-and-Track: Efficient Pose Estimation in Videos (PDF)
  • Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking (PDF)
  • Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking (PDF, Reading Note)
  • Joint Tracking and Segmentation of Multiple Targets (PDF, Reading Note)
  • Deep Tracking on the Move: Learning to Track the World from a Moving Vehicle using Recurrent Neural Networks (PDF)
  • Convolutional Regression for Visual Tracking (PDF)
  • Kernelized Correlation Filters(Project CODE1 CODE2)
  • Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering (PDF)
  • SANet: Structure-Aware Network for Visual Tracking (PDF)
  • Semantic tracking: Single-target tracking with inter-supervised convolutional networks (PDF)
  • On The Stability of Video Detection and Tracking (PDF)
  • Dual Deep Network for Visual Tracking (PDF)
  • Deep Motion Features for Visual Tracking (PDF)
  • Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation (PDF, Project/Code)
  • Instance Flow Based Online Multiple Object Tracking (PDF)
  • PathTrack: Fast Trajectory Annotation with Path Supervision (PDF)
  • Good Features to Correlate for Visual Tracking (PDF)
  • Re3 : Real-Time Recurrent Regression Networks for Object Tracking (PDF)
  • Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning (PDF, Project/Code)
  • Simple Online and Realtime Tracking with a Deep Association Metric (PDF)
  • Learning Policies for Adaptive Tracking with Deep Feature Cascades (PDF)
  • Recurrent Filter Learning for Visual Tracking (PDF)
  • Tracking Persons-of-Interest via Unsupervised Representation Adaptation (PDF)
  • Detect to Track and Track to Detect (PDF, Project/Code, Reading Note)


  • Simple Baselines for Human Pose Estimation and Tracking (PDF)
  • End-to-end Recovery of Human Shape and Pose (PDF, PROJECT/CODE, Code)
  • PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model (PDF)
  • DensePose: Dense Human Pose Estimation In The Wild (PDF, Project/Code)
  • Cascaded Pyramid Network for Multi-Person Pose Estimation (PDF)
  • Chained Predictions Using Convolutional Neural Networks (PDF, Reading Note)
  • CRF-CNN: Modeling Structured Information in Human Pose Estimation (PDF)
  • Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (PDF, Project/Code, Reading Note)
  • Towards Accurate Multi-person Pose Estimation in the Wild (PDF, Reading Note)
  • Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation (PDF)
  • Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose (PDF, Project/Code)
  • Learning Feature Pyramids for Human Pose Estimation (PDF, Project/Code)
  • Joint Multi-Person Pose Estimation and Semantic Part Segmentation (PDF)
  • DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation (PDF)
  • Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image (PDF)
  • Human Pose Regression by Combining Indirect Part Detection and Contextual Information (PDF)
  • Dual Path Networks for Multi-Person Human Pose Estimation (PDF)


  • PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation (PDF, Project/Code)
  • Superframes, A Temporal Video Segmentation (PDF)
  • Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation (PDF)
  • 2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning (PDF)
  • Real-Time End-to-End Action Detection with Two-Stream Networks (PDF)
  • Learning Video-Story Composition via Recurrent Neural Network (PDF)
  • Real-world Anomaly Detection in Surveillance Videos (PDF)
  • Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition (PDF)
  • Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward (PDF, Project/Code)
  • Making a long story short: A Multi-Importance Semantic for Fast-Forwarding Egocentric Videos (PDF)
  • Attentional Pooling for Action Recognition (PDF, Project/Code)
  • Pooling the Convolutional Layers in Deep ConvNets for Action Recognition (PDF, Reading Note)
  • Two-Stream Convolutional Networks for Action Recognition in Videos (PDF, Reading Note)
  • YouTube-8M: A Large-Scale Video Classification Benchmark (PDF, Project/Code)
  • Spatiotemporal Residual Networks for Video Action Recognition (PDF)
  • An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data (PDF)
  • Fast Video Classification via Adaptive Cascading of Deep Models (PDF)
  • Video Pixel Networks (PDF)
  • Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection (PDF)
  • EM-Based Mixture Models Applied to Video Event Detection (PDF)
  • Video Captioning and Retrieval Models with Semantic Attention (PDF)
  • Title Generation for User Generated Videos (PDF)
  • Review of Action Recognition and Detection Methods (PDF)
  • Self-Supervised Video Representation Learning With Odd-One-Out Networks (PDF)
  • Recurrent Memory Addressing for describing videos (PDF)
  • Online Real time Multiple Spatiotemporal Action Localisation and Prediction on a Single Platform (PDF)
  • Real-Time Video Highlights for Yahoo Esports (PDF)
  • Surveillance Video Parsing with Single Frame Supervision (PDF)
  • Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks (PDF)
  • Action Recognition with Dynamic Image Networks (PDF)
  • ActionFlowNet: Learning Motion Representation for Action Recognition (PDF)
  • Video Propagation Networks (PDF)
  • Detecting events and key actors in multi-person videos (PDF)
  • A Pursuit of Temporal Accuracy in General Activity Detection (PDF, Reading Note)
  • Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos (PDF)
  • Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos (PDF)
  • Incremental Tube Construction for Human Action Detection (PDF)
  • Unsupervised Action Proposal Ranking through Proposal Recombination (PDF)
  • CERN: Confidence-Energy Recurrent Network for Group Activity Recognition (PDF)
  • Forecasting Human Dynamics from Static Images (PDF)
  • Interpretable 3D Human Action Analysis with Temporal Convolutional Networks (PDF)
  • Training object class detectors with click supervision (PDF)
  • Skeleton-based Action Recognition with Convolutional Neural Networks (PDF)
  • Online growing neural gas for anomaly detection in changing surveillance scenes (PDF)
  • Learning Person Trajectory Representations for Team Activity Analysis (PDF)
  • Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition (PDF)
  • Video Imagination from a Single Image with Transformation Generation (PDF, Project/Code)
  • Optimizing Deep CNN-Based Queries over Video Streams at Scale (PDF, Project/Code, Reading Note)
  • Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning (PDF)
  • Predicting Human Activities Using Stochastic Grammar (PDF)
  • Discriminative convolutional Fisher vector network for action recognition (PDF)
  • Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning (PDF)
  • Exploiting Semantic Contextualization for Interpretation of Human Activity in Videos (PDF)
  • Lattice Long Short-Term Memory for Human Action Recognition (PDF)
  • Kinship Verification from Videos using Spatio-Temporal Texture Features and Deep Learning (PDF)
  • Fast-Forward Video Based on Semantic Extraction (PDF)
  • Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks (PDF)
  • ConvNet Architecture Search for Spatiotemporal Feature Learning (PDF, Project/Code, Github)
  • Fully Context-Aware Video Prediction (PDF)


  • Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks (PDF, Reading Note)
  • MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices (PDF)
  • Survey of Face Detection on Low-quality Images (PDF)
  • PyramidBox: A Context-assisted Single Shot Face Detector (PDF)
  • SFace: An Efficient Network for Face Detection in Large Scale Variations ([PDF](SFace: An Efficient Network for Face Detection in Large Scale Variations))
  • Deep Facial Expression Recognition: A Survey (PDF)
  • Deep Face Recognition: A Survey (PDF)
  • Deep Semantic Face Deblurring (PDF, Project/Code)
  • Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild (PDF)
  • SSH: Single Stage Headless Face Detector (PDF, Project/Code)
  • Detecting and counting tiny faces (PDF, Project/Code)
  • Training Deep Face Recognition Systems with Synthetic Data (PDF)
  • Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (PDF, Project/Code)
  • Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (PDF, Project/Code, Code Caffe)
  • Deep Architectures for Face Attributes (PDF)
  • Face Detection with End-to-End Integration of a ConvNet and a 3D Model (PDF, Reading Note, Project/Code)
  • A CNN Cascade for Landmark Guided Semantic Part Segmentation (PDF, Project/Code)
  • Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario (PDF)
  • An All-In-One Convolutional Neural Network for Face Analysis (PDF)
  • Fast Face-swap Using Convolutional Neural Networks (PDF)
  • Cross-Age Reference Coding for Age-Invariant Face Recognition and Retrieval (Project/Code)
  • CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection (Project/Code)
  • Face Synthesis from Facial Identity Features (PDF)
  • DeepFace: Face Generation using Deep Learning (PDF)
  • Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns (PDF, Project/Code)
  • EmotioNet Challenge: Recognition of facial expressions of emotion in the wild (PDF)
  • Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation (PDF)
  • Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network (PDF)
  • Deep Alignment Network: A convolutional neural network for robust face alignment (PDF, Project/Code)
  • Scale-Aware Face Detection (PDF)
  • SSH: Single Stage Headless Face Detector (PDF)
  • AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild (PDF)
  • SphereFace: Deep Hypersphere Embedding for Face Recognition (PDF, Project/Code)
  • Age Group and Gender Estimation in the Wild with Deep RoR Architecture (PDF)
  • Island Loss for Learning Discriminative Features in Facial Expression Recognition (PDF)
  • Temporal Non-Volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition (PDF)


  • DeepFlow: Large displacement optical flow with deep matching (PDF, Project/Code)
  • Guided Optical Flow Learning (PDF)


  • Image Inpainting for Irregular Holes Using Partial Convolutions (PDF)
  • Neural Aesthetic Image Reviewer (PDF, Reading Note)
  • Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression (PDF)
  • Learning Intelligent Dialogs for Bounding Box Annotation (PDF)
  • Real-time video stabilization and mosaicking for monitoring and surveillance (PDF, Project/Code)
  • Learning Recursive Filter for Low-Level Vision via a Hybrid Neural Network (PDF, Project/Code)
  • Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding(PDF, Project/Code)
  • A Learned Representation For Artistic Style(PDF)
  • Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification (PDF, Project/Code)
  • Pixel Recurrent Neural Networks (PDF)
  • Conditional Image Generation with PixelCNN Decoders (PDF, Project/Code)
  • RAISR: Rapid and Accurate Image Super Resolution (PDF)
  • Photo-Quality Evaluation based on Computational Aesthetics: Review of Feature Extraction Techniques (PDF)
  • Fast color transfer from multiple images (PDF)
  • Bringing Impressionism to Life with Neural Style Transfer in Come Swim (PDF)
  • PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications (PDF, (Project/CODE)[https://github.com/openai/pixel-cnn])
  • Deep Photo Style Transfer (PDF)
  • A Neural Representation of Sketch Drawings (PDF)
  • Visual Attribute Transfer through Deep Image Analogy (PDF)
  • Deep Semantics-Aware Photo Adjustment (PDF)
  • Diversified Texture Synthesis with Feed-forward Networks (PDF, Project/Code)
  • Real-Time Neural Style Transfer for Videos (PDF)
  • Creatism: A deep-learning photographer capable of creating professional work (PDF)
  • Deep Image Harmonization (PDF, Project/Code)
  • Neural Color Transfer between Images (PDF)
  • Deeper, Broader and Artier Domain Generalization (PDF)


  • Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling (PDF, Project/Code)


  • An Information-Theoretic View for Deep Learning (PDF)
  • Understanding Individual Neuron Importance Using Information Theory (PDF)
  • Understanding Convolutional Neural Network Training with Information Theory (PDF)
  • The unreasonable effectiveness of the forget gate (PDF)
  • Discovering Hidden Factors of Variation in Deep Networks (PDF)
  • Regularizing Deep Networks by Modeling and Predicting Label Structure (PDF)
  • Hierarchical Novelty Detection for Visual Object Recognition (PDF)
  • Guide Me: Interacting with Deep Networks (PDF)
  • Studying Invariances of Trained Convolutional Neural Networks (PDF)
  • Deep Residual Networks and Weight Initialization (PDF)
  • WNGrad: Learn the Learning Rate in Gradient Descent (PDF)
  • Understanding the Loss Surface of Neural Networks for Binary Classification (PDF)
  • Tell Me Where to Look: Guided Attention Inference Network (PDF)
  • Convolutional Neural Networks with Alternately Updated Clique (PDF, Project/Code)
  • Visual Interpretability for Deep Learning: a Survey (PDF)
  • Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey (PDF)
  • CNNs are Globally Optimal Given Multi-Layer Support (PDF)
  • Take it in your stride: Do we need striding in CNNs? (PDF)
  • Gradients explode - Deep Networks are shallow - ResNet explained (PDF)
  • Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates (PDF, Project/Code)
  • Data Distillation: Towards Omni-Supervised Learning (PDF)
  • Peephole: Predicting Network Performance Before Training (PDF)
  • AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks (PDF)
  • Gradual Tuning: a better way of Fine Tuning the parameters of a Deep Neural Network (PDF)
  • CondenseNet: An Efficient DenseNet using Learned Group Convolutions (PDF, Project/Code)
  • Population Based Training of Neural Networks (PDF)
  • Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN (PDF)
  • Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions (PDF)
  • Unleashing the Potential of CNNs for Interpretable Few-Shot Learning (PDF)
  • Non-local Neural Networks (PDF, Caffe2)
  • Log-DenseNet: How to Sparsify a DenseNet (PDF)
  • Don’t Decay the Learning Rate, Increase the Batch Size (PDF)
  • Guarding Against Adversarial Domain Shifts with Counterfactual Regularization (PDF)
  • UberNet: Training a ‘Universal’ Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory (PDF, Project/Code)
  • What makes ImageNet good for transfer learning? (PDF, Project/Code, Reading Note)

    The tremendous success of features learnt using the ImageNet classification task on a wide range of transfer tasks begs the question: what are the intrinsic properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of training examples per class? Does adding m