1. 程式人生 > >2016.4.12 nature deep learning review[2]

2016.4.12 nature deep learning review[2]

使用卷積神經網路進行圖片理解一段,從二十一世紀開始,卷積神經網路就成功運用在了檢測,切割和識別上面。這通常是在一些擁有大量標註資料的領域中得到了充分的應用。

畫素級的識別能夠運用在自動機器人,自動駕駛汽車等諸多領域。其他的領域包括語音識別和自然語言的理解。

直到12年之前,cnn都沒有活起來,但是alexnet使得一切變成可能。最近的研究成果是一個影象識別的cnn和語言處理的rnn連線起來產生對於圖片的描述。

包含大量引數的網路隨著軟體和硬體的提升使得訓練時間從幾個禮拜減少到幾個小時。

而且由於其效果,使得工業界也開始發力。諸多企業進行研究,同時由於器容易使現在fpga,即可程式設計門陣列上,英偉達、高通等公司進行了相關研究。

交通訊號識別

53. Ciresan, D., Meier, U. Masci, J. & Schmidhuber, J. Multi-column deep neural network for traffic sign classification. Neural Networks 32, 333–338 (2012).

生物影象切割

54. Ning, F. et al. Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process. 14, 1360–1371 (2005).

神經連線

55. Turaga, S. C. et al. Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput. 22, 511–538 (2010).

面部檢測、行人檢測、軀幹檢測等

36. Sermanet, P., Kavukcuoglu, K., Chintala, S. & LeCun, Y. Pedestrian detection with unsupervised multi-stage feature learning. In Proc. International Conference on Computer Vision and Pattern Recognition http://arxiv.org/abs/1212.0142 (2013).

50. Vaillant, R., Monrocq, C. & LeCun, Y. Original approach for the localisation of objects in images. In Proc. Vision, Image, and Signal Processing 141, 245–250(1994).

51. Nowlan, S. & Platt, J. in Neural Information Processing Systems 901–908 (1995).

56. Garcia, C. & Delakis, M. Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell. 26,1408–1423 (2004).

57. Osadchy, M., LeCun, Y. & Miller, M. Synergistic face detection and pose estimation with energy-based models. J. Mach. Learn. Res. 8, 1197–1215 (2007).

58. Tompson, J., Goroshin, R. R., Jain, A., LeCun, Y. Y. & Bregler, C. C. Efficient object localization using convolutional networks. In Proc. Conference on Computer Vision and Pattern Recognition http://arxiv.org/abs/1411.4280 (2014).

面部識別

59. Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: closing the gap to human-level performance in face verification. In Proc. Conference on Computer Vision and Pattern Recognition 1701–1708 (2014).

使用cnn的自動駕駛汽車

60. Hadsell, R. et al. Learning long-range vision for autonomous off-road driving. J. Field Robot. 26, 120–144 (2009).

61. Farabet, C., Couprie, C., Najman, L. & LeCun, Y. Scene parsing with multiscale feature learning, purity trees, and optimal covers. In Proc. International Conference on Machine Learning http://arxiv.org/abs/1202.2160 (2012).

自然語言理解

14. Collobert, R., et al. Natural language processing (almost) from scratch. J. Mach.Learn. Res. 12, 2493–2537 (2011).

語音識別

7.   Sainath, T., Mohamed, A.-R., Kingsbury, B. & Ramabhadran, B. Deepconvolutional neural networks for LVCSR. In Proc. Acoustics, Speech and SignalProcessing 8614–8618 (2013).

alexnet

1.   Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deepconvolutional neural networks. In Proc. Advances in Neural InformationProcessing Systems 25 1090–1098 (2012).This report was a breakthrough that used convolutional nets to almost halvethe error rate for object recognition, and precipitated the rapid adoption ofdeep learning by the computer vision community.

dropout

62. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–1958 (2014).

識別和檢測

4. Szegedy, C. et al. Going deeper with convolutions. Preprint at http://arxiv.org/abs/1409.4842 (2014).

58. Tompson, J., Goroshin, R. R., Jain, A., LeCun, Y. Y. & Bregler, C. C. Efficient object localization using convolutional networks. In Proc. Conference on Computer Vision and Pattern Recognition http://arxiv.org/abs/1411.4280 (2014).

59. Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: closing the gap to human-level performance in face verification. In Proc. Conference on Computer Vision and Pattern Recognition 1701–1708 (2014).

63. Sermanet, P. et al. Overfeat: integrated recognition, localization and detection using convolutional networks. In Proc. International Conference on Learning Representations http://arxiv.org/abs/1312.6229 (2014).

64. Girshick, R., Donahue, J., Darrell, T. & Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. Conference on Computer Vision and Pattern Recognition 580–587 (2014).

65. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations http://arxiv.org/abs/1409.1556 (2014).

fpgacnn

66. Boser, B., Sackinger, E., Bromley, J., LeCun, Y. & Jackel, L. An analog neural network processor with programmable topology. J. Solid State Circuits 26, 2017–2025 (1991).

67. Farabet, C. et al. Large-scale FPGA-based convolutional networks. In Scaling up Machine Learning: Parallel and Distributed Approaches (eds Bekkerman, R., Bilenko, M. & Langford, J.) 399–419 (Cambridge Univ. Press, 2011).

特徵表達和語言處理

深度學習的理論比哪些不用distributed representations好。這依賴於那些隱藏在資料分佈之下的規律和結構。

多個隱藏層能夠更好的通過區域性輸入來預測輸出。每一個詞作為一個n維向量,只有一個1,其他都是0.第一層,每個詞語都會形成一個不同的啟用模式,或者說是詞語的向量。在一個語言模型中,其他的卷積神經網路的層會學習去把輸入向量轉換成為預測詞語的輸出向量。網路會學習包含多個啟用組成了一個詞語的不同特徵,作為第一個維度。這些語義上的特徵並不能精確的在輸入中表達,它們能夠在學習中進行一些輸入到輸出的微小的規則的表達。學習詞語向量對於特徵的表達,隨著文字資料量的增大變得效果越來越好。

在神經網路應用於語言之前,標準的使用統計的方法並沒有distributed representations,而是基於長度維n的頻率的統計。這需要大量的訓練資料,所以不能產生泛化的相關的詞語序列。

distributed representations

21. Bengio, Y., Delalleau, O. & Le Roux, N. The curse of highly variable functions for local kernel machines. In Proc. Advances in Neural Information Processing Systems 18 107–114 (2005).

資料分佈下的整體架構

40. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Machine Intell. 35, 1798–1828 (2013).

distributed representations增強泛化能力

68. Bengio, Y. Learning Deep Architectures for AI (Now, 2009).

69. Montufar, G. & Morton, J. When does a mixture of products contain a product of mixtures? J. Discrete Math. 29, 321–347 (2014).

深度增強表達能力

70. Montufar, G. F., Pascanu, R., Cho, K. & Bengio, Y. On the number of linear regions of deep neural networks. In Proc. Advances in Neural Information Processing Systems 27 2924–2932 (2014).

通過區域性輸入確定下一個輸出

71. Bengio, Y., Ducharme, R. & Vincent, P. A neural probabilistic language model. In Proc. Advances in Neural Information Processing Systems 13 932–938 (2001). This paper introduced neural language models, which learn to convert a word symbol into a word vector or word embedding composed of learned semantic features in order to predict the next word in a sequence.

網路根據詞語學習不同的啟用特徵

27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

大量的語料庫使得單獨的規則可信度較低,需要多個規則

71. Bengio, Y., Ducharme, R. & Vincent, P. A neural probabilistic language model. In Proc. Advances in Neural Information Processing Systems 13 932–938 (2001). This paper introduced neural language models, which learn to convert a word symbol into a word vector or word embedding composed of learned semantic features in order to predict the next word in a sequence.

文字的向量化的表達

14. Collobert, R., et al. Natural language processing (almost) from scratch. J. Mach.Learn. Res. 12, 2493–2537 (2011).

17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neuralnetworks. In Proc. Advances in Neural Information Processing Systems 273104–3112 (2014).This paper showed state-of-the-art machine translation results with thearchitecture introduced in ref. 72, with a recurrent network trained to read asentence in one language, produce a semantic representation of its meaning,and generate a translation in another language.

72. Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. Conference on Empirical Methods in Natural Language Processing 1724–1734 (2014).

73. Schwenk, H. Continuous space language models. Computer Speech Lang. 21, 492–518 (2007).

74. Socher, R., Lin, C. C-Y., Manning, C. & Ng, A. Y. Parsing natural scenes and natural language with recursive neural networks. In Proc. International Conference on Machine Learning 129–136 (2011).

75. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013).

76. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proc. International Conference on Learning Representations http://arxiv.org/abs/1409.0473 (2015).

神經網路的語言模型

71. Bengio, Y., Ducharme, R. & Vincent, P. A neural probabilistic language model. In Proc. Advances in Neural Information Processing Systems 13 932–938 (2001). This paper introduced neural language models, which learn to convert a word symbol into a word vector or word embedding composed of learned semantic features in order to predict the next word in a sequence.