1. 程式人生 > >The Past, Present, and Future of Speech Recognition Technology

The Past, Present, and Future of Speech Recognition Technology

The earliest advances in speech recognition focused mainly on the creation of vowel sounds, as the basis of a system that might also learn to interpret phonemes (the building blocks of speech) from nearby interlocutors.

These inventors were hampered by the technological context in which they lived, with only basic means at their disposal to invent a talking machine. Nonetheless, they provide important background to more recent innovations.

Dictation machines, pioneered by Thomas Edison in the late 19th century, were capable of recording speech and grew in popularity among doctors and secretaries with a lot of notes to take on a daily basis.

However, it was not until the 1950s that this line of inquiry would lead to genuine speech recognition. Up to this point, we see attempts at speech creation and recording, but not yet interpretation.

Audrey, a machine created by Bell Labs, could understand the digits 0–9, with a 90% accuracy rate. Interestingly, this accuracy level was only recorded when its inventor spoke; it hovered between 70% and 80% when other people spoke to Audrey.

This hints at some of the persistent challenges of speech recognition; each individual has a different voice and spoken language can be very inconsistent. Unlike text, which has a much greater level of standardization, the spoken word varies greatly based on regional dialects, speed, emphasis, even social class and gender. Therefore, scaling any speech recognition system has always been a significant obstacle.

Alexander Waibel, who worked on Harpy, a machine developed at Carnegie Mellon University that could understand over 1,000 words, built on this point:

“So you have things like ‘euthanasia’, which could be ‘youth in Asia’. Or if you say ‘Give me a new display’ it could be understood as ‘give me a nudist play’.”

Until the 1990s, even the most successful systems were based on template matching, where sound waves would be translated into a set of numbers and stored. These would then be triggered when an identical sound was spoken into the machine. Of course, this meant that one would have to speak very clearly, slowly, and in an environment with no background noise to have a good chance of the sounds being recognized.

IBM Tangora, released in the mid-1980s and named after Albert Tangora, then the world’s fastest typist, could adjust to the speaker’s voice. It still required slow, clear speech and no background noise, but its use of hidden Markov models allowed for increased flexibility through data clustering and the prediction of upcoming phonemes based on recent patterns.

Although it required 20 minutes of training data (in the form of recorded speech) from each user, Tangora could recognize up to 20,000 English words and some full sentences.

The seeds are sown here for voice recognition, one of the most significant and essential developments in this field. It was a long-established truism that speech recognition could only succeed by adapting to each person’s unique way of communicating, but arriving at this breakthrough has been much easier said than done.

It was only in 1997 that the world’s first “continuous speech recognizer” (ie. one no longer had to pause between each word) was released, in the form of Dragon’s NaturallySpeaking software. Capable of understanding 100 words per minute, it is still in use today (albeit in an upgraded form) and is favored by doctors for notation purposes.

Machine learning, as in so many fields of scientific discovery, has provided the majority of speech recognition breakthroughs in this century. Google combined the latest technology with the power of cloud-based computing to share data and improve the accuracy of machine learning algorithms.

This culminated in the launch of the Google Voice Search app for iPhone in 2008.

Driven by huge volumes of training data, the Voice Search app showed remarkable improvements on the accuracy levels of previous speech recognition technologies. Google built on this to introduce elements of personalization into its voice search results, and used this data to develop its Hummingbird algorithm, arriving at a much more nuanced understanding of language in use. These strands have been tied together in the Google Assistant, which is now resident on almost 50% of all smartphones.

It was Siri, Apple’s entry into the voice recognition market, that first captured the public’s imagination, however. As the result of decades of research, this AI-powered digital assistant brought a touch of humanity to the sterile world of speech recognition.

After Siri, Microsoft launched Cortana, Amazon launched Alexa, and the wheels were set in motion for the current battle for supremacy among the tech giants’ respective speech recognition platforms.

In essence, we have spent hundreds of years teaching machines to complete a journey that takes the average person just a few years. Starting with the phoneme and building up to individual words, then to phrases and finally sentences, machines are now able to understand speech with a close to 100% accuracy rate.

The techniques used to make these leaps forward have grown in sophistication, to the extent that they are now loosely based on the workings of the human brain. Cloud-based computers have entered millions of homes and can be controlled by voice, even offering conversational responses to a wide range of queries.

That journey is still incomplete, but we have travelled quite some distance from the room-sized computers of the 1950s.

相關推薦

The Past, Present, and Future of Speech Recognition Technology

The earliest advances in speech recognition focused mainly on the creation of vowel sounds, as the basis of a system that might also learn to interpret pho

The past, present and future of humankind

Life: a concoction of dual realities, myths and storiesCredit: TEDFor starters, Yuval Noah Harari is one of the skeptical thinkers who claims that humans k

42 Cutting Edge Facts About the Past, Present and Future of Artificial Intelligence

People have been dreaming about Artificial Intelligence for hundreds, if not thousands of years. Well, it's starting to feel like the future is actually he

轉載 論文筆記-Person Re-identification Past, Present and Future翻譯

原文地址:http://blog.csdn.net/zdh2010xyz/article/details/537416822016_Person Re-identification Past, Present and FutureLiang Zheng, Yi Yang, a

[論文閱讀] Person Re-identification: Past, Present and Future

廣泛 proto ssm 矩陣 obi 添加 的區別 多級 prot 這是一篇行人重識別的綜述文章,作為我該方向入門的基礎讀物 Title: Person Re-identification: Past, Present and Future [PDF]

A Critique of the User Interface and Experience of VSCO

However, even though a major problem of VSCO’s platform is having too simple of a platform, the editing page does a good job of incorporating learnability,

Fundamentals of Speech Recognition: Lawrence Rabiner, Biing

This book is a comprehensive and excellent introduction to the ever-expanding field of Automatic Speech Recognition. Starting with models of speech produ

Self-Driving Cars and the Future of Transportation

So, what’s it like? Let me start off by saying, once you experience not having to deal with rush-hour traffic there is no going back. The technology is mos

Embracing the future of AI and wearable tech in the workplace

The modern workplace has already embraced advanced technology with smart devices, paperless workplaces, cloud services and wearable tech that tracks employ

AI and the Future of Healthcare Keynote

The benefits of artificial intelligence – speed, accuracy, and automation of mundane tasks – take on new importance as patients demand the same levels of s

The future of Information Architecture: Machine Learning, Voice User Interface and Augmented…

The future of Information Architecture: Machine Learning, Voice User Interface and Augmented RealityIntroductionIn the information age we live in, it’s mor

AI and the Future of Work: A Discussion

Students, faculty, and educators are invited to attend this event where experts and local companies will discuss how workforce needs and skills are shiftin

It's Watching What You Eat: Machine Vision and The Future of Consumer Products Manufacturing

Rather, this new reality applies to the cameras enabled with machine vision capabilities that are increasingly being placed inside advanced production faci

How AI and Big Data will Shape the Future of Cybersecurity

As we are moving rapidly towards the technology innovation, we are also getting dependent on technology on a daily basis. With the increase in dependency,

Data Science, Geography and Frontify: The Future of Venture Capital

Data Science, Geography and Frontify: The Future of Venture CapitalOne of our fundamental beliefs at Blossom is that great teams aren’t limited by geograph

The eventual demise of Moore’s Law and alternatives for the future of high performance computing.

What can be done?Despite these potential issues, research on nontraditional methods of computing promise higher performance in the future.GrapheneA view of

AI, Automation and the Digital Future of Apparel

A parade of models struts down the catwalk in garments that take the audience's breath away – a collection that is iterative yet wholly unique. As is custo

Exploring the future of learning through virtual and augmented reality

At a recent on-campus symposium titled “VR, Sound and Cinema: Implications for Storytelling and Learning,” MIT Open Learning explored the future of storyte

Marginally Interesting: Pheed, Tent.io, and the Future of Social Networks

Tweet Pheed made the news lately because they managed to get a large nu

Bakkt, Institutional Investors, and the Future of Blockchain

If the largest companies in the space are dedicating their time to creating investment vehicles and infrastructure for institutional investors, then that’s