1. 程式人生 > >Teaching Machines to Understand Us 讓機器理解我們 之二 深度學習的歷史

Teaching Machines to Understand Us 讓機器理解我們 之二 深度學習的歷史

意義 for 不同 ima basic devel media 方向 模型

Deep history

深度學習的歷史

The roots of deep learning reach back further than LeCun’s time at Bell Labs. He and a few others who pioneered the technique were actually resuscitating a long-dead idea in artificial intelligence.

深度學習的研究之根是在LeCun在Bell實驗室研究之前的。他和其他幾個人作為這種技術的先驅者,實際上是重新利用了人工智能中一個早就廢棄的技術。

When the field got started, in the 1950s, biologists were just beginning to develop simple mathematical theories of how intelligence and learning emerge from signals passing between neurons in the brain. The core idea — still current today — was that the links between neurons are strengthened if those cells communicate frequently. The fusillade of neural activity triggered by a new experience adjusts the brain’s connections so it can understand it better the second time around.

當神經網絡這個領域在20世紀50年代開始的時候,生物學家只是想用簡單的數學理論來描述智能和學習是如何從人腦中的神經元之間互相傳遞信號中產生的。其核心思想(今天仍然是)是如果這些細胞頻繁通訊,神經元之間的聯結會得到增強。一種新經驗觸發的猛烈的神經活動調整了大腦的神經元聯結,這樣在第二次遇到這種情況的時候,會更好的進行理解。

In 1956, the psychologist Frank Rosenblatt used those theories to invent a way of making simple simulations of neurons in software and hardware. The New York Times announced his work with the headline “Electronic ‘Brain’ Teaches Itself.” Rosenblatt’s perceptron, as he called his design, could learn how to sort simple images into categories—for instance, triangles and squares. Rosenblatt usually implemented his ideas on giant machines thickly tangled with wires, but they established the basic principles at work in artificial neural networks today.

在1956年,心理學家Frank Rosnblatt用這些理論發明了一種在軟件和硬件中對神經元進行簡單模擬的方法。《紐約時報》用標題“電子大腦教自己東西”來宣布了他的工作。Rosenblatt稱之為感知機,可以學習對簡單的圖像進行分類,比如,三角形和方形。Rosenblatt通常在纏繞著線團的巨大機器中實現他的想法,但他們建立了今天人工神經網絡的基本準則基礎。

One computer he built had eight simulated neurons, made from motors and dials connected to 400 light detectors. Each of the neurons received a share of the signals from the light detectors, combined them, and, depending on what they added up to, spit out either a 1 or a 0. Together those digits amounted to the perceptron’s “description” of what it saw. Initially the results were garbage. But Rosenblatt used a method called supervised learning to train a perceptron to generate results that correctly distinguished diferent shapes. He would show the perceptron an image along with the correct answer. Then the machine would tweak how much attention each neuron paid to its incoming signals, shifting those “weights” toward settings that would produce the right answer. After many examples, those tweaks endowed the computer with enough smarts to correctly categorize images it had never seen before. Today’s deep-learning networks use sophisticated algorithms and have millions of simulated neurons, with billions of connections between them. But they are trained in the same way.

他建立的一臺計算機有八個模擬神經元,用馬達和撥號盤連到了400個光檢測器上。每個神經元接收光檢測器中的一部分信號,將每個神經元的輸入合並,根據總計結果的不同,輸出1或者0。這些數字一起描述了感知機“看到”的東西。開始的時候輸出的結果毫無意義,但Rosenblatt用了監督學習的方法來訓練一個感知機,產生能夠正確區分不同形狀的結果。然後這臺機器就可以調整每個神經元對輸入信號的註意力,也就是向能產生正確答案的方向調整“權重”。經過很多例子後,這些調整使計算機有了足夠的智能,能夠正確的對從沒見過的圖像進行分類。今天的深度學習網絡用了復雜的算法,有數百萬個模擬神經元,其間有數十億個聯結,但它們的訓練方法是一樣的。

Rosenblatt predicted that perceptrons would soon be capable of feats like greeting people by name, and his idea became a linchpin of the nascent field of artificial intelligence. Work focused on making perceptrons with more complex networks, arranged into a hierarchy of multiple learning layers. Passing images or other data successively through the layers would allow a perceptron to tackle more complex problems. Unfortunately, Rosenblatt’s learning algorithm didn’t work on multiple layers. In 1969 the AI pioneer Marvin Minsky, who had gone to high school with Rosenblatt, published a book-length critique of perceptrons that killed interest in neural networks at a stroke. Minsky claimed that getting more layers working wouldn’t make perceptrons powerful enough to be useful. Artificial intelligence researchers abandoned the idea of making software that learned. Instead, they turned to using logic to craft working facets of intelligence—such as an aptitude for chess. Neural networks were shoved to the margins of computer science.

Rosenblatt預測感知機很快就可以帶著名字向人打招呼,他的思想也成為了人工智能領域初期的關鍵。工作都集中在將感知機推廣到更復雜的網絡,以及將感知機級聯成多層學習層次。使圖像或其他數據相繼通過各層次,這能使感知機能處理更復雜的問題。不幸的是,Rosenblatt的學習算法在多層情況下並不適用。在1969年,人工智能的先驅者,Marvin Minsky,他和Rosenblatt一起上過中學,出版了一本書,以很大的篇幅批評了感知機模型,認為其一下子就消滅了人們對神經網絡的興趣。Minsky認為用更多的層次不能使感知機更有用。人工智能的研究者拋棄了研制能學習的軟件的想法。相反,他們轉向用邏輯來制作智能的各個方面,比如下棋。神經網絡被拋到了計算機科學的邊緣。

Nonetheless, LeCun was mesmerized when he read about perceptrons as an engineering student in Paris in the early 1980s. “I was amazed that this was working and wondering why people abandoned it,” he says. He spent days at a research library near Versailles, hunting for papers published before perceptrons went extinct. Then he discovered that a small group of researchers in the United States were covertly working on neural networks again. “This was a very underground movement,” he says. In papers carefully purged of words like “neural” and “learning” to avoid rejection by reviewers, they were working on something very much like Rosenblatt’s old problem of how to train neural networks with multiple layers.

但是,當LeCun在20世紀80年代初還是一個巴黎的工程學學生的時候,讀到了感知機的內容,就迷住了。他說:“我對這種模型可以工作感到非常神奇,搞不懂人們為什麽會拋棄它”。他在凡爾賽附近的一個研究圖書館待了好幾天,尋找感知機消失前發表的論文。然後他發現美國的一個研究小組正在再次秘密的進行神經網絡的研究工作。“這是一個非常地下的活動”,他說。論文中都小心的清除了像“神經”和“學習”這樣的詞語,以免被審稿者拒絕,他們研究的問題很像Rosenblatt的老問題,如何在多層情況中訓練神經網絡。

LeCun joined the underground after he met its central figures in 1985, including a wry Brit named Geoff Hinton, who now works at Google and the University of Toronto. They immediately became friends, mutual admirers—and the nucleus of a small community that revived the idea of neural networking. They were sustained by a belief that using a core mechanism seen in natural intelligence was the only way to build artificial intelligence. “The only method that we knew worked was a brain, so in the long run it had to be that systems something like that could be made to work,” says Hinton.

LeCun在1985年與其核心人物進行了會面,然後就加入了這個地下組織,核心人物中包括一個人,名叫Geoff Hinton,他現在在Google和多倫多大學工作。他們立刻成為了朋友,並互相仰慕,並成為了這個小團體的核心,他們致力於復興神經網絡的思想。維持他們的一個信仰,那就是只能通過使用自然智能中核心機制,才能創造出人工智能。Hintion說:“我們知道的唯一方法就是大腦,所以長期來看,只能是類似大腦的系統才能真正工作”。

LeCun’s success at Bell Labs came about after he, Hinton, and others perfected a learning algorithm for neural networks with multiple layers. It was known as backpropagation, and it sparked a rush of interest from psychologists and computer scientists. But after LeCun’s check-reading project ended, backpropagation proved tricky to adapt to other problems, and a new way to train software to sort data was invented by a Bell Labs researcher down the hall from LeCun. It didn’t involve simulated neurons and was seen as mathematically more elegant. Very quickly it became a cornerstone of Internet companies such as Google, Amazon, and LinkedIn, which use it to train systems that block spam or suggest things for you to buy.

LeCun、Hinton和其他人完善了多層神經網絡的學習算法,並在Bell實驗室取得了成功。算法被稱之為BP算法,即反向傳播算法,這點燃了從心理學家到計算機科學家的一股興趣。但在LeCun的支票讀取工程結束後,人們發現反向傳播算法難以應用到其他問題中去,這時Bell實驗室的一個研究人員,繼續了LeCun的道路,找到了一種訓練軟件整理數據的新方法。這與刺激神經元沒有關系,並在數學上更加講究。很快這成了互聯網公司比如Google、Amazon以及LinkedIn的基礎,這些公司使用這種技術對系統進行訓練,使其過濾垃圾郵件,或給你推薦購買的東西。

After LeCun got to NYU in 2003, he, Hinton, and a third collaborator, University of Montreal professor Yoshua Bengio, formed what LeCun calls “the deep-learning conspiracy.” To prove that neural networks would be useful, they quietly developed ways to make them bigger, train them with larger data sets, and run them on more powerful computers. LeCun’s handwriting recognition system had had five layers of neurons, but now they could have 10 or many more. Around 2010, what was now dubbed deep learning started to beat established techniques on real-world tasks like sorting images. Microsoft, Google, and IBM added it to speech recognition systems. But neural networks were still alien to most researchers and not considered widely useful. In early 2012 LeCun wrote a fiery letter—initially published anonymously—after a paper claiming to have set a new record on a standard vision task was rejected by a leading conference. He accused the reviewers of being “clueless” and “negatively biased.”

在LeCun2003年到了紐約大學之後,他、Hinton和第三位合作者,蒙特利爾大學的教授Yoshua Bengio,形成了LeCun稱之為“深度學習密謀”的小組。為了證明神經網絡是有用的,他們悄悄的將其發展壯大,用更大的數據集來訓練它們,在更強大的計算機上進行運算。LeCun的手寫體識別系統有5層神經元,但現在他們的設計有10層或者更多。在2010年左右,現在稱之為深度學習的技術開始在現實任務,比如圖像分類中擊敗傳統技術。微軟、Google和IBM將其在語音識別的系統中進行了應用。但神經網絡仍然被很多研究者視為異類,大部分人認為其沒什麽用。在2012年開始的時候,他的一篇論文聲稱在標準視覺任務中創下了新記錄,但被一個主要會議拒稿了,這時LeCun寫了一封措辭激烈的信,開始時是匿名發表的,他指責那些審稿者“毫無根據”以及“有負面偏向”。

Everything changed six months later. Hinton and two grad students used a network like the one LeCun made for reading checks to rout the field in the leading contest for image recognition. Known as the ImageNet Large Scale Visual Recognition Challenge, it asks software to identify 1,000 types of objects as diverse as mosquito nets and mosques. The Toronto entry correctly identified the object in an image within five guesses about 85 percent of the time, more than 10 percentage points better than the second-best system (see Innovator Under 35 Ilya Sutskever, page 47). The deep-learning software’s initial layers of neurons optimized themselves for finding simple things like edges and corners, with the layers after that looking for successively more complex features like basic shapes and, eventually, dogs or people.

六個月後,一切都改變了。Hinton和兩個研究生使用了LeCun在讀取支票系統中的網絡,在圖像識別的主要比賽中擊敗了其他人,這個比賽就是ImageNet大規模視覺識別挑戰賽。比賽要求算法能識別1000種種類繁多的物體,從蚊子到清真寺。多倫多參賽隊正確的識別了圖像中要求的5個物體,用了85%的時間,比第二名的系統多了10個百分點(見Innovator Under 35 Ilya Sutskever,47頁)。深度學習算法的初始層神經元的優化主要為了找到簡單的特征如邊緣和角點,後面的層主要是尋找更復雜的特征比如基本的形狀,最後才是狗或人這些目標。

LeCun recalls seeing the community that had mostly ignored neural networks pack into the room where the winners presented a paper on their results. “You could see right there a lot of senior people in the community just flipped,” he says. “They said, ‘Okay, now we buy it. That’s it, now—you won.’”

LeCun回憶到,看到那些忽視神經網絡的團體全都聚集在一個房間裏,在這個房間裏勝出的團隊發表了其結果的論文。他說:“你在那裏可以看到很多團體內的資深人士都發狂了,他們說,‘好好,現在我們認了,就這樣把,現在你贏了’”。

Journey of Acceptance

1956: Psychologist Frank Rosenblatt uses theories about how brain cells work to design the perceptron, an artificial neural network that can be trained to categorize simple shapes.

1969: AI pioneers Marvin Minsky and Seymour Papert write a book critical of perceptrons that quashes interest in neural networks for decades.

1986: Yann LeCun and Geof Hinton perfect backpropagation to train neural networks that pass data through successive layers of artificial neurons, allowing them to learn more complex skills.

1987: Terry Sejnowski at Johns Hopkins University creates a system called NETtalk that can be trained to pronounce text, going from random babbling to recognizable speech.

1990: At Bell Labs, LeCun uses backpropagation to train a network that can read andwritten text. AT&T later uses it in machines that can read checks.

1995: Bell Labs mathematician Vladimir Vapnik publishes an alternative method for training software to categorize data such as images. This sidelines neural networks again.

2006: Hinton’s research group at the University of Toronto develops ways to train much larger networks with tens of layers of artificial neurons.

June 2012: Google uses deep learning to cut the error rate of its speech recognition software by 25 percent.

October 2012: Hinton and two colleagues from the University of Toronto win the largest challenge for software that recognizes objects in photos, almost halving the previous error rate.

March 2013: Google buys DNN Research, the company founded by the Toronto team to develop their ideas. Hinton starts working at Google.

March 2014: Facebook starts using deep learning to power its facial recognition feature, which identifies people in uploaded photos.

May 2015: Google Photos launches. The service uses deep learning to group photos of the same people and let you search your snapshots using terms like “beach” or “dog.”

接受神經網絡的過程

1956年:心理學家Frank Rosenblatt用腦細胞如何工作的理論設計了感知機,這是一個人工神經網絡,經過訓練後可以對簡單的形狀進行分類。

1969年:人工智能的先驅者Marvin Minsky和Seymour Papert寫了一本書,批評了感知機模型,在幾十年裏壓制了對神經網絡的研究興趣。

1986年:Yann LeCun和Geoff Hinton完善了反向傳播算法,可以在多層神經網絡中進行訓練,得以學到更復雜的技能。

1987年:霍普金斯大學的Terry Sejnowski開發了一個NETtalk系統,經過訓練可以發音,從隨機的發音到可以識別的語音。

1990年:在Bell實驗室,LeCun使用反向傳播算法訓練了一個網絡,可以讀取手寫字。AT&T後來用這種算法開發了可以讀取支票的機器。

1995年:Bell實驗室的數學家Vladimir Vapnik發表了一種替代方法來訓練軟件對圖像之類的數據進行分類。這使神經網絡再次邊緣化。

2006年:多倫多大學Hinton的研究小組研究出了一種方法訓練有數十層神經元的大型神經網絡。

2012.06:Google用深度學習技術將其語音識別的軟件的錯誤率降低了25個百分點。

2012.10:Hinton和多倫多大學的兩個同事贏得了最大的圖像目標識別的比賽,幾乎將原來的錯誤率降低了一半。

2013.03:Google收購了DNN Research,多倫多團隊成立的一家公司,Hinton開始在Google開展工作。

2014.03:Facebook開始使用深度學習技術增強自己的人臉識別特性,從人們上傳的圖片中識別人臉。

2015.05:發布了Google Photos。這項服務用深度學習來對同一個人的圖片進行分類,你還可以用“沙灘”或“狗”還搜索你的所有圖片。

Academics working on computer vision quickly abandoned their old methods, and deep learning suddenly became one of the main strands in artificial intelligence. Google bought a company founded by Hinton and the two others behind the 2012 result, and Hinton started working there part time on a research team known as Google Brain. Microsoft and other companies created new projects to investigate deep learning. In December 2013, Facebook CEO Mark Zuckerberg stunned academics by showing up at the largest neural-network research conference, hosting a party where he announced that LeCun was starting FAIR (though he still works at NYU one day a week).

計算機視覺方面的研究人員迅速的放棄了老式方法,深度學習突然成為了人工智能的主線之一。Google收購了Hinton和其他2個同事因為2012年比賽成立的公司,所以Hinton開始在一個叫Google Brain的研究小組做兼職工作。微軟和其他公司也形成了研究深度學習的新工程。在2013年12月,Facebook CEO紮克伯格出現在最大的神經網絡研究會議上,震驚了學術界,他主持了一個聚會,宣布LeCun開始在FAIR工作(他現在仍然每周在紐約大學工作一天)。

LeCun still harbors mixed feelings about the 2012 research that brought the world around to his point of view. “To some extent this should have come out of my lab,” he says. Hinton shares that assessment. “It was a bit unfortunate for Yann that he wasn’t the one who actually made the breakthrough system,” he says. LeCun’s group had done more work than anyone else to prove out the techniques used to win the ImageNet challenge. The victory could have been his had student graduation schedules and other commitments not prevented his own group from taking on ImageNet, he says. LeCun’s hunt for deep learning’s next breakthrough is now a chance to even the score.

LeCun對2012年那次讓全世界都聚焦在他的觀點的研究懷有復雜的感情。他說:“一定程度上來說,這應當是出自我的實驗室的工作”。Hinton也同意這樣的評估,他說:“對於Yann來說這有一點點不幸,沒有親手做出這個突破”。LeCun的小組為了證明贏得ImageNet挑戰算法的有效性做了最多的工作。他說,如果不是他的學生畢業的計劃和其他承諾,這些阻礙了他的小組參加ImageNet,這個勝利應當是他的。LeCun對深度學習下一個突破的追求是扳回比分的一個機會。

Teaching Machines to Understand Us 讓機器理解我們 之二 深度學習的歷史