Explainability in Deep Neural Networks
The wild success of Deep Neural Network (DNN) models in a variety of domains has created considerable excitement in the machine learning community. Despite this success, a deep understanding of why DNNs perform so well, and whether their performance is somehow brittle, has been lacking. The discovery[1] that several Deep Neural Network (DNN) models are vulnerable to adversarial examples: it is often possible to slightly perturb the input to a DNN classifier (e.g. an image-classifier) in such a way that the perturbation is invisible to a human, and yet the classifier's output can change drastically: for example a classifier that is correctly labeling an image as a school bus can be fooled into classifying it as an ostritch by adding an imperceptible change to the image. Besides the obvious security implications, the existense of adversarial examples seems to suggest that perhaps DNNs are not really learning the "essense" of a concept (which would presumably make them robust to such attacks). This opens up a variety of research avenues aimed at developing methods to train adversarially robust networks, and examining properties of adversarially trained networks.