2016.4.15 nature deep learning review［1］

阿新 • • 發佈：2019-01-17

今天，我本來想膜一下，所以找到了上古時期發表再nature上的反向傳播的論文，但是沒看下去。。。所以，翻出來了15年發表在nature上的deep learning，相當於一個review，來閱讀一下，而且感覺引文會比較重要，所以這篇中樞值較高的文獻拿來學一學。

相關資料：

英文原文：

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.894&rep=rep1&type=pdf

中文翻譯：

http://www.csdn.net/article/2015-06-01/2824811

http://www.csdn.net/article/2015-06-02/2824825

視覺化資料：

http://colah.github.io/

Abstract講深度學習近些年在各領域取得了非常好的效果。

第一段其實是在講一個大框架，ml的傳統方法是，如果為了去做分類等等的人物，需要自己去提特徵，然後進行後續的任務，但是這種方法需要很多專業的知識，不容易再工程上上手。於是有個representation learning這個領域，就是輸入資料之後，學習到一些為了目標容易區分的特徵，或者說講原始資料換一個表達方式，使得方便後面的分類啊等等的處理。Deep learning這個工具就很牛逼，我雖然什麼都不知道，但是我還是能夠從不同的層次抽象出來不同的特徵，從而進行學習。現在呢，已經廣泛應用再各個領域中了。

下面是一些近期文獻：

影象識別

1. Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deepconvolutional neural networks. In Proc. Advances in Neural InformationProcessing Systems 25 1090–1098 (2012).This report was a breakthrough that used convolutional nets to almost halvethe error rate for object recognition, and precipitated the rapid adoption ofdeep learning by the computer vision community.

2. Farabet, C., Couprie, C., Najman, L. & LeCun, Y. Learning hierarchical features forscene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1915–1929 (2013).

3. Tompson, J., Jain, A., LeCun, Y. & Bregler, C. Joint training of a convolutionalnetwork and a graphical model for human pose estimation. In Proc. Advances inNeural Information Processing Systems 27 1799–1807 (2014).

4. Szegedy, C. et al. Going deeper with convolutions. Preprint at http://arxiv.org/abs/1409.4842 (2014).

語音識別

5. Mikolov, T., Deoras, A., Povey, D., Burget, L. & Cernocky, J. Strategies for traininglarge scale neural network language models. In Proc. Automatic SpeechRecognition and Understanding 196–201 (2011).

6. Hinton, G. et al. Deep neural networks for acoustic modeling in speechrecognition. IEEE Signal Processing Magazine 29, 82–97 (2012).This joint paper from the major speech recognition laboratories, summarizingthe breakthrough achieved with deep learning on the task of phoneticclassification for automatic speech recognition, was the first major industrialapplication of deep learning.

7. Sainath, T., Mohamed, A.-R., Kingsbury, B. & Ramabhadran, B. Deepconvolutional neural networks for LVCSR. In Proc. Acoustics, Speech and SignalProcessing 8614–8618 (2013).

藥物分子

8. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as amethod for quantitative structure-activity relationships. J. Chem. Inf. Model. 55,263–274 (2015).

粒子加速器的資料

9. Ciodaro, T., Deva, D., de Seixas, J. & Damazio, D. Online particle detection withneural networks based on topological calorimetry information. J. Phys. Conf.Series 368, 012030 (2012).

10. Kaggle. Higgs boson machine learning challenge. Kaggle https://www.kaggle.com/c/higgs-boson (2014).

重構大腦回路

11. Helmstaedter, M. et al. Connectomic reconstruction of the inner plexiform layerin the mouse retina. Nature 500, 168–174 (2013).

基因的疾病與表達

12. Leung, M. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissueregulatedsplicing code. Bioinformatics 30, i121–i129 (2014).

13. Xiong, H. Y. et al. The human splicing code reveals new insights into the geneticdeterminants of disease. Science 347, 6218 (2015).

自然語言理解

14. Collobert, R., et al. Natural language processing (almost) from scratch. J. Mach.Learn. Res. 12, 2493–2537 (2011).

問答系統

15. Bordes, A., Chopra, S. & Weston, J. Question answering with subgraph

embeddings. In Proc. Empirical Methods in Natural Language Processing http://arxiv.org/abs/1406.3676v3 (2014).

機器翻譯

16. Jean, S., Cho, K., Memisevic, R. & Bengio, Y. On using very large targetvocabulary for neural machine translation. In Proc. ACL-IJCNLP http://arxiv.org/abs/1412.2007 (2015).

17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems 273104–3112 (2014).

This paper showed state-of-the-art machine translation results with thearchitecture introduced in ref. 72, with a recurrent network trained to read asentence in one language, produce a semantic representation of its meaning,and generate a translation in another language.

監督式學習一段表示，以往的提取特徵線性分類或者淺層非線性分類的方法都效果不太好，深層次的非線效能夠提取不變的特徵也能夠從背景中區分出主要的內容。

比較費力的訓練方式

18. Bottou, L. & Bousquet, O. The tradeoffs of large scale learning. In Proc. Advancesin Neural Information Processing Systems 20 161–168 (2007).

空間的一半的區域的分類

19. Duda, R. O. & Hart, P. E. Pattern Classification and Scene Analysis (Wiley, 1973).

核方法

20. Sch?lkopf, B. & Smola, A. Learning with Kernels (MIT Press, 2002).

高斯核

21. Bengio, Y., Delalleau, O. & Le Roux, N. The curse of highly variable functions for local kernel machines. In Proc. Advances in Neural Information Processing Systems 18 107–114 (2005).

多層結構的反向傳播一段講了通過反向傳播演算法能夠訓練網路，但是在就是年代的時候，人們因為認為很少的先驗知識推斷出有用的特徵是在扯淡，而且認為容易陷入區域性最優解，所以神經網路逐漸受到冷落。但是大資料使得區域性最有很少，由於初始情況不同，最後僅有很少的差異。不過本世紀初，深度網路重燃戰火，是因為CIFAR採用無監督學習到了特徵去初始化網路，然後採用bp去fine-fune，效果非常好，尤其是在手寫數字識別和行人檢測的應用上。所以當時的訓練如果有大量label的資料，那就訓吧，但是如果label的資料比較少，還是建議先用沒有label 的資料pre-training一下。卷積神經網路近些年來也逐漸興起，尤其在cv方面。

早年的模式識別

22. Selfridge, O. G. Pandemonium: a paradigm for learning in mechanisation of thought processes. In Proc. Symposium on Mechanisation of Thought Processes 513–526 (1958).

23. Rosenblatt, F. The Perceptron — A Perceiving and Recognizing Automaton. Tech.Rep. 85-460-1 (Cornell Aeronautical Laboratory, 1957).

八九年代通過簡單的隨即梯度下降訓練神經網路

24. Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard Univ. (1974).

25. Parker, D. B. Learning Logic Report TR–47 (MIT Press, 1985).

26. LeCun, Y. Une procédure d’apprentissage pour Réseau à seuil assymétrique in Cognitiva 85: a la Frontière de l’Intelligence Artificielle, des Sciences de la Connaissance et des Neurosciences [in French] 599–604 (1985).

27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

使用ReLU從而避免unsupervised pre-training

28. Glorot, X., Bordes, A. & Bengio. Y. Deep sparse rectifier neural networks. In Proc.14th International Conference on Artificial Intelligence and Statistics 315–323(2011).

This paper showed that supervised training of very deep neural networks is much faster if the hidden layers are composed of ReLU.

哪裡有什麼區域性最優，倒是有一些鞍點

29. Dauphin, Y. et al. Identifying and attacking the saddle point problem in highdimensional non-convex optimization. In Proc. Advances in Neural Information Processing Systems 27 2933–2941 (2014).

30. Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. & LeCun, Y. The loss surface of multilayer networks. In Proc. Conference on AI and Statistics http://arxiv.org/abs/1412.0233 (2014).

深度網路重燃戰火

31. Hinton, G. E. What kind of graphical model is the brain? In Proc. 19^th International Joint Conference on Artificial intelligence 1765–1775 (2005).

32. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comp. 18, 1527–1554 (2006).

This paper introduced a novel and effective way of training very deep neural networks by pre-training one hidden layer at a time using the unsupervised learning procedure for restricted Boltzmann machines.

33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. Greedy layer-wise training of deep networks. In Proc. Advances in Neural Information Processing Systems 19 153–160 (2006).

This report demonstrated that the unsupervised pre-training method introduced in ref. 32 significantly improves performance on test data and generalizes the method to other unsupervised representation-learning techniques, such as auto-encoders.

34. Ranzato, M., Poultney, C., Chopra, S. & LeCun, Y. Efficient learning of sparse representations with an energy-based model. In Proc. Advances in Neural Information Processing Systems 19 1137–1144 (2006).

無監督初始化，bp fine-tune

33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. Greedy layer-wise trainingof deep networks. In Proc. Advances in Neural Information Processing Systems 19 153–160 (2006).

35. Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

小資料上採用pre-training + fine-tune進行手寫數字識別和行人檢測

36. Sermanet, P., Kavukcuoglu, K., Chintala, S. & LeCun, Y. Pedestrian detection with unsupervised multi-stage feature learning. In Proc. International Conference on Computer Vision and Pattern Recognition http://arxiv.org/abs/1212.0142 (2013).

採用gpu進行訓練

37. Raina, R., Madhavan, A. & Ng, A. Y. Large-scale deep unsupervised learning using graphics processors. In Proc. 26th Annual International Conference on Machine Learning 873–880 (2009).

在語音識別上取得了重大的突破

小資料38

大資料39

38. Mohamed, A.-R., Dahl, G. E. & Hinton, G. Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2012).

39. Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 33–42 (2012).

小資料集上pre-training 防止過擬合

40. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Machine Intell. 35, 1798–1828 (2013).

卷積神經網路

41. LeCun, Y. et al. Handwritten digit recognition with a back-propagation network. In Proc. Advances in Neural Information Processing Systems 396–404 (1990).

This is the first paper on convolutional networks trained by backpropagation for the task of classifying low-resolution images of handwritten digits.

42. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

This overview paper on the principles of end-to-end training of modular systems such as deep neural networks using gradient-based optimization showed how neural networks (and in particular convolutional nets) can be combined with search or inference mechanisms to model complex outputs that are interdependent, such as sequences of characters associated with the content of a document.

卷積神經網路一段，還有一些經典的層，比如說卷積層或者是池化層。還有一些經典的特徵。

不過人們總說卷積的區域性連結是因為一個區域性的特徵可能也分佈在別的地方，但是我感覺這其實和概率更有關係，我看到的其實是在這個模式下的某個概率的分佈。

言歸正傳，總的架構和視覺上LGN-V1-V2-V4-IT的整體架構很相似。當一隻猴子和convnet網路面對著一張圖片的時候，convnet在一半的隨即抽樣的某區域神經元和猴子很相似？（大概這麼翻譯）convents發源於神經認知機，但是架構上雖然有一些相似，但是神經認知機沒有類似於反向傳播似的端到端的監督式的學習演算法。一個一維的convnet可以較多延遲神經網路，可以用來識別音素和基本的詞語。

回溯1990年代，有很多對於time-delay neural networks(1d convent)的應用，比如說語音識別和文件閱讀上。文件閱讀系統使用convnet訓練一個概率模型，能夠實現語言的約束到某一個範圍。到了90年代後期，這個系統已經識別了超過10%的支票，基於convnet的光學媳婦識別和手寫數字識別被微軟研究。在90年代初期，convnet用在了在自然圖片上進行檢測，比如說面部和手部的檢測，以及面部識別。

視覺神經元啟發卷積和池化層

43. Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962).

44. Felleman, D. J. & Essen, D. C. V. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

一個研究關於convnet和猴子面對同一個神經元在高層次的表現

45. Cadieu, C. F. et al. Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol. 10, e1003963 (2014).

convent和神經認知機的關係

46. Fukushima, K. & Miyake, S. Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition15, 455–469 (1982).

一維的convnet (time-delay neural net)用來識別音素和簡單詞語

47. Waibel, A., Hanazawa, T., Hinton, G. E., Shikano, K. & Lang, K. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process. 37, 328–339 (1989).

48. Bottou, L., Fogelman-Soulié, F., Blanchet, P. & Lienard, J. Experiments with time delay networks and dynamic time warping for speaker independent isolated digit recognition. In Proc. EuroSpeech 89 537–540 (1989).

微軟進行光學字元識別和手寫數字識別

49. Simard, D., Steinkraus, P. Y. & Platt, J. C. Best practices for convolutional neural networks. In Proc. Document Analysis and Recognition 958–963 (2003).

自然圖片中的物體檢測

50. Vaillant, R., Monrocq, C. & LeCun, Y. Original approach for the localisation of objects in images. In Proc. Vision, Image, and Signal Processing 141, 245–250(1994).

51. Nowlan, S. & Platt, J. in Neural Information Processing Systems 901–908 (1995).

面部識別

52. Lawrence, S., Giles, C. L., Tsoi, A. C. & Back, A. D. Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks 8, 98–113(1997).

2016.4.15 nature deep learning review［1］

2016.4.15 nature deep learning review［1］

2016.4.12 nature deep learning review[2]

Machine Learning for Humans, Part 4: Neural Networks & Deep Learning

論文筆記-深度估計(4) Semi-Supervised Deep Learning for Monocular Depth Map Prediction

Deep Learning – Review by LeCun, Bengio, and Hinton

neural network and deep learning筆記（1）

機器學習 Machine Learning 深度學習 Deep Learning 資料 Chapter 1

[Deep Learning Lab] Episode-1: Fashion-MNIST

Deep Learning讀書筆記1--基礎知識篇（第二、三、四、五章）

Deep Learning Notes: Chapter 1 Introduction

Deep Learning 系列（1）：RBM（受限波爾茲曼機）和 DBN（深信度神經網路）

論文筆記：Deep Learning [nature review by Lecun, Bengio, & Hinton]

深度學習 Deep Learning UFLDL 最新Tutorial 學習筆記 4：Debugging: Gradient Checking

吳恩達deep learning課程作業：Class 4 Week 3 Car detection

sp1.3-1.4 Neural Networks and Deep Learning

python Deep learning 學習筆記（4）

論文筆記&翻譯——Nature 綜述論文《deep learning》LeCun、Bengio和Hinton

15.Relational inductive biases, deep learning, and graph networks

Understanding Feature Engineering (Part 4) — A hands-on intuitive approach to Deep Learning Methods

Spark-2.4 Deep Learning Pipelines Image Claasifer Demo

2016.4.15 nature deep learning review［1］

相關推薦