Children learn to speak, as well as recognize objects, people, and places, long before they learn to read or write. They can learn from hearing, seeing, and interacting without being given any instructions. So why shouldnt artificial intelligence systems be able to work the same way?
That's the key insight driving a research project under way at MIT that takes a novel approach to speech and image recognition: Teaching a computer to successfully associate specific elements of images with corresponding sound files in order to identify imagery (say, a lighthouse in a photographic landscape) when someone in an audio clip says the word "lighthouse."
Though in the very early stages of what could be a years-long process of research and development, the implications of the MIT project, led by PhD student David Harwath and senior research scientist Jim Glass, are substantial. Along with being able to automatically surface images based on corresponding audio clips and vice versa, the research opens a path to creating language-to-language translation without needing to go through the laborious steps of training AI systems on the correlation between two languages words.
That could be particularly important for deciphering languages that are dying because there aren't enough native speakers to warrant the expensive investment in manual annotation of vocabulary by bilingual speakers, which has traditionally been the cornerstone of AI-based translation. Of 7,000 spoken languages, Harwath says, speech recognition systems have been applied to less than 100.
It could even eventually be possible, Harwath suggested, for the system to translate languages with little to no written record, a breakthrough that would be a huge boon to anthropologists.
"Because our model is just working on the level of audio and images," Harwath told Fast Company, "we believe it to be language-agnostic. It shouldnt care what language its working on."
t-SNE analysis of the 150 lowest-variance audio pattern cluster centroids for k = 500. Displayed is the majority-vote transcription of the each audio cluster. All clusters shown contained a minimum of 583 members and an average of 2482, with an average purity of .668.
The MIT project isnt the first to consider the idea that computers could automatically associate audio and imagery. But the research being done at MIT may well be the first to pursue it at scale, thanks to the "renaissance" in deep neural networks, which involve multiple layers of neural units that mimic the way the human brain solves problems. The networks require churning through massive amounts of data, and so theyve only taken off as a meaningful AI technique in recent years as computers processing power has increased.
Thats led just about every major technology company to go on hiring sprees in a bid to automate services like search, surfacing relevant photos and news, restaurant recommendations, and so on. Many consider AI to be perhaps the next major computing paradigm.
"It is the most important computing development in the last 20 years," Jen-Hsun Huang, the CEO of Nvidia, one of the worlds largest makers of the kinds of graphics processors powering many AI initiatives, told Fast Company last year, "and [big tech companies] are going to have to race to make sure that AI is a core competency."
Now that computers are powerful enough to begin utilizing deep neural networks in speech recognition, the key is to develop better algorithms, and in the case of the MIT project, Harwath and Glass believe that by employing more organic speech recognition algorithms, they can move faster down the path to truly artificial intelligent systems along the line of what characters like C-3PO have portrayed in Star Wars movies.
To be sure, were many years away from such systems, but the MIT project is aiming to excise one of the most time-consuming and expensive pieces of the translation puzzle: requiring people to train models by manually labeling countless collections of images or vocabularies. That laborious process involves people going through large collections of imagery and annotating them, one by one, with descriptive keywords.
Harwath acknowledges that his team spent quite a lot of time starting in late 2014 doing that kind of manual, or supervised, learning on sound files and imagery, and that afforded them a "big collection of audio."
Now, theyre on to the second version of the project, which is to build algorithms that can both learn language as well as the real-world concepts the language is grounded in, and to do so utilizing very unstructured data.
Heres how it works: The MIT team sets out to train neural networks on what amounts to a game of "which one of these things is not like the other," Harwath explains.
They want to teach the system to understand the difference between matching pairsan image of a dog with a fluffy hat and an audio clip with the caption "dog with a fluffy hat"and mismatched pairs like the same audio clip and a photo of a cat.
Matches get a high score and mismatches get a low score, and when the goal is for the system to learn individual objects within an image and individual words in an audio stream, they apply the neural network to small regions of an image, or small intervals of the audio.
Right now the system is trained on only about 500 words. Yet its often able to recognize those words in new audio clips it has never encountered. The system is nowhere near perfect, for some word categories, Harwath says, the accuracy is in the 15%-20% range. But in others, its as high as 90%.
"The really exciting thing," he says, "is its able to make the association between the acoustic patterns and the visual patterns. So when I say lighthouse, Im referring to a particular [area] in an image that has a lighthouse, [and it can] associate it with the start and stop time in the audio where you says, lighthouse."
A different task that they frequently run the system through is essentially an image retrieval task, something like a Google image search. They give it a spoken query, say, "Show me an image of a girl wearing a blue dress in front of a lighthouse," and then wait for the neural network to search for an image thats relevant to the query.
Heres where its important not to get too excited about the technology being ready for prime time. Harwath says the team considers the results of the query accurate if the appropriate image comes up in the top 10 results from a library of only about 1,000 images. The system is currently able to do that just under 50% of the time.
The number is improving, though. When Harwath and Glass wrote a paper on the project for an upcoming conference in France, it was 43%. Still, he believes that although there are regular improvements and increased accuracy every time they train a new model, theyre held back by the available computational power. Even with a set of eight powerful GPUs, it can still take two weeks to train a single model.
An example of our grounding method. The left image displays a grid defining the allowed start and end coordinates for the bounding box proposals. The bottom spectrogram displays several audio region proposals drawn as the families of stacked red line segments. The image on the right and spectrogram on the top display the final output of the grounding algorithm. The top spectrogram also displays the time-aligned text transcript of the caption, so as to demonstrate which words were captured by the groundings. In this example, the top three groundings have been kept, with the colors indicating the audio segment that is grounded to each bounding box.
Perhaps the most exciting potential of the research is in breakthroughs for language-to-language translation.
"The way to think about it is this," Harwath says. "If you have an image of a lighthouse, and if we speak different languages but describe the same image, and if the system can figure out the word Im using and the word youre using, then implicitly, it has a model for translating my word to your word . . . It would bypass the need for manual translations and a need for someone whos bilingual. It would be amazing if we could just completely bypass that."
To be sure, that is entirely theoretical today. But the MIT team is confident that at some point in the future, the system could reach that goal. It could be 10 years, or it could be 20. "I really have no idea," he says. "Were always wrong when we make predictions."
In the meantime, another challenge is coming up with enough quality data to satisfy the system. Deep neural networks are very hungry models.
Traditional machine learning models were limited by diminishing returns on additional data. "If you think of a machine learning algorithm as an engine, data is like the gasoline," he says. "Then, traditionally, the more gas you pour into the engine, the faster it runs, but it only works up to a point, and then levels off.
"With deep neural networks, you have a much higher capacity. The more data you give it, the faster and faster it goes. It just goes beyond what older algorithms were capable of."
But he thinks no ones sure of the outer limits of deep neural networks capacities. The big question, he says, is how far will a deep neural network scale? Will they saturate at some point and stop learning, or will it just keep going?
"We havent reached this point yet," Harwath says, "because people have been consistently showing that the more data you give them, the better they work. We dont know how far we can push it."
Read more:
AI For Matching Images With Spoken Word Gets A Boost From MIT - Fast Company
- AI File Extension - Open . AI Files - FileInfo [Last Updated On: June 14th, 2016] [Originally Added On: June 14th, 2016]
- Ai | Define Ai at Dictionary.com [Last Updated On: June 16th, 2016] [Originally Added On: June 16th, 2016]
- ai - Wiktionary [Last Updated On: June 22nd, 2016] [Originally Added On: June 22nd, 2016]
- Adobe Illustrator Artwork - Wikipedia, the free encyclopedia [Last Updated On: June 25th, 2016] [Originally Added On: June 25th, 2016]
- AI File - What is it and how do I open it? [Last Updated On: June 29th, 2016] [Originally Added On: June 29th, 2016]
- Ai - Definition and Meaning, Bible Dictionary [Last Updated On: July 25th, 2016] [Originally Added On: July 25th, 2016]
- ai - Dizionario italiano-inglese WordReference [Last Updated On: July 25th, 2016] [Originally Added On: July 25th, 2016]
- Bible Map: Ai [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- Ai dictionary definition | ai defined - YourDictionary [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- Ai (poet) - Wikipedia, the free encyclopedia [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- AI file extension - Open, view and convert .ai files [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- History of artificial intelligence - Wikipedia, the free ... [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- Artificial intelligence (video games) - Wikipedia, the free ... [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- North Carolina Chapter of the Appraisal Institute [Last Updated On: September 8th, 2016] [Originally Added On: September 8th, 2016]
- Ai Weiwei - Wikipedia, the free encyclopedia [Last Updated On: September 11th, 2016] [Originally Added On: September 11th, 2016]
- Adobe Illustrator Artwork - Wikipedia [Last Updated On: November 17th, 2016] [Originally Added On: November 17th, 2016]
- 5 everyday products and services ripe for AI domination - VentureBeat [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Realdoll builds artificially intelligent sex robots with programmable personalities - Fox News [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- ZeroStack Launches AI Suite for Self-Driving Clouds - Yahoo Finance [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- AI and the Ghost in the Machine - Hackaday [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Why Google, Ideo, And IBM Are Betting On AI To Make Us Better Storytellers - Fast Company [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Roses are red, violets are blue. Thanks to this AI, someone'll fuck you. - The Next Web [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Wearable AI Detects Tone Of Conversation To Make It Navigable (And Nicer) For All - Forbes [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Who Leads On AI: The CIO Or The CDO? - Forbes [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Teach undergrads ethics to ensure future AI is safe compsci boffins - The Register [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- AI is here to save your career, not destroy it - VentureBeat [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- A Heroic AI Will Let You Spy on Your Lawmakers' Every Word - WIRED [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- With a $16M Series A, Chorus.ai listens to your sales calls to help your team close deals - TechCrunch [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Microsoft AI's next leap forward: Helping you play video games - CNET [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Samsung Galaxy S8's Bixby AI could beat Google Assistant on this front - CNET [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- 3 common jobs AI will augment or displace - VentureBeat [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Stephen Hawking and Elon Musk endorse new AI code - Irish Times [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- SumUp co-founders are back with bookkeeping AI startup Zeitgold - TechCrunch [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Five Trends Business-Oriented AI Will Inspire - Forbes [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- AI Systems Are Learning to Communicate With Humans - Futurism [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Pinterest uses AI and your camera to recommend pins - Engadget [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Chinese Firms Racing to the Front of the AI Revolution - TOP500 News [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Real life CSI: Google's new AI system unscrambles pixelated faces - The Guardian [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- AI could transform the way governments deliver public services - The Guardian [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Amazon Is Humiliating Google & Apple In The AI Wars - Forbes [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- What's Still Missing From The AI Revolution - Co.Design (blog) [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Legaltech 2017: Announcements, AI, And The Future Of Law - Above the Law [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Can AI make Facebook more inclusive? - Christian Science Monitor [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- How a poker-playing AI could help prevent your next bout of the flu - ExtremeTech [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Dynatrace Drives Digital Innovation With AI Virtual Assistant - Forbes [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- AI and the end of truth - VentureBeat [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Taser bought two computer vision AI companies - Engadget [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Google's DeepMind pits AI against AI to see if they fight or cooperate - The Verge [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- The Coming AI Wars - Huffington Post [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Is President Trump a model for AI? - CIO [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Who will have the AI edge? - Bulletin of the Atomic Scientists [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- How an AI took down four world-class poker pros - Engadget [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- We Need a Plan for When AI Becomes Smarter Than Us - Futurism [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- See how old Amazon's AI thinks you are - The Verge [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Ford to invest $1 billion in autonomous vehicle tech firm Argo AI - Reuters [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Zero One: Are You Ready for AI? - MSPmentor [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Ford bets $1B on Argo AI: Why Silicon Valley and Detroit are teaming up - Christian Science Monitor [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Google Test Of AI's Killer Instinct Shows We Should Be Very Careful - Gizmodo [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Google's New AI Has Learned to Become "Highly Aggressive" in Stressful Situations - ScienceAlert [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- An artificially intelligent pathologist bags India's biggest funding in healthcare AI - Tech in Asia [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Ford pledges $1bn for AI start-up - BBC News [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Dyson opens new Singapore tech center with focus on R&D in AI and software - TechCrunch [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- How to Keep Your AI From Turning Into a Racist Monster - WIRED [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- How Chinese Internet Giant Baidu Uses AI And Machine Learning - Forbes [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Humans engage AI in translation competition - The Stack [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Watch Drive.ai's self-driving car handle California city streets on a ... - TechCrunch [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Cryptographers Dismiss AI, Quantum Computing Threats - Threatpost [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Is AI making credit scores better, or more confusing? - American Banker [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI and Robotics Trends: Experts Predict - Datamation [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- IoT And AI: Improving Customer Satisfaction - Forbes [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI's Factions Get Feisty. But Really, They're All on the Same Team - WIRED [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Elon Musk: Humans must become cyborgs to avoid AI domination - The Independent [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Facebook Push Into Video Allows Time To Catch Up On AI Applications - Investor's Business Daily [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Defining AI, Machine Learning, and Deep Learning - insideHPC [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI Predicts Autism From Infant Brain Scans - IEEE Spectrum [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- The Rise of AI Makes Emotional Intelligence More Important - Harvard Business Review [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Google's AI Learns Betrayal and "Aggressive" Actions Pay Off - Big Think [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI faces hype, skepticism at RSA cybersecurity show - PCWorld [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- New AI Can Write and Rewrite Its Own Code to Increase Its Intelligence - Futurism [Last Updated On: February 17th, 2017] [Originally Added On: February 17th, 2017]
- Microsoft Takes Another Crack at Health Care, This Time With Cloud, AI and Chatbots - Bloomberg [Last Updated On: February 17th, 2017] [Originally Added On: February 17th, 2017]