As technology lends you its ear, these technologies will determine what it hears – TechCrunch

Posted: May 17, 2017 at 1:44 am

In 1999, while traveling through Eastern Europe, I was robbed in a relatively remote area of the Czech Republic.

Someone called the police, but we quickly realized we couldnt communicate: They didnt speak English and I could offer no Czech. Even the local high school English teacher, who offered her assistance, didnt speak English well enough for me to effectively communicate with the police.

At a time well before smartphones, I didnt realize then that technologists were already hard at work on innovationsthat could one day play a vital role in events like the one I had.

In 1994, several influential computer scientists at Microsoft, led by Xuedong Huang, began setting the groundwork to tackle our global language barrier through technology. Microsoft was developing a new voice recognition team one of the first in the world.

Image: Getty Images/dane_mark/DigitalVision

In the early days of the technology, voice recognition was imperfect. We measure the accuracy of voice recognition with something called the Word Error Rate (WER). The WER measures how many words are interpreted incorrectly. If I say five words and four of them are understood correctly while one word is not, we have a WER of 20 percent. Back in the 1990s, the WER was nearly 100 percent. Almost every word spoken was incorrectly heard by these computer systems.

But computer scientists such as Huang and his team continued to work. Slowly but surely, the technology improved. By 2013, the WER had dropped to roughly 25 percent, an improvement to be sure, but still not sufficient to be truly helpful.

While a WER of 25 percent might seem adequate, imagine the frustration a user might feel in a home automation environment when they say, turn on the BEDroom lights, and the LIVINGroom lights go on. Or imagine trying to dictate something and having to correct a quarter of your work after the fact. The long-promised productivity gains simply hadnt materialized after decades of efforts.

And then the magic of innovation and technology began to kick in.

Over the last three years, the WER has dropped from roughly 25 percent to around five percent.

The team at Microsoftrecently declared they had achieved human parity with the technology it was now as good at interpreting human speech as humans are. We have seen more progress in the last 30 months than we saw in the first 30 years.

Image: Mina De La O/Getty Images

Many of us have experienced the seeming magic that voice recognition has become. In using voice recognition platforms in recent years, youve also likely watched as the words transcribed are updated and changed after additional words are spoken.

Speech recognition is going beyond just individual word recognition to account for context and grammar as well. Network effects are kicking in and the application of big data is enabling the technology to move at a rapid pace unseen in its history.

Today, we talk to computers on an increasingly regular basis. While packing for a trip to Singapore, I talked with Google Homes voice-activated digital assistant to prepare for my trip going back and forth on everything from weather and history to the religious breakdown of the city-state.

Similarly, Amazons Alexa will order you an Uber or a pizza, read off your Fitbit stats or update you on the balance in your bank account. Alexa can help around the house, too, if you ask dishing you the daily news while youre in the kitchen or reading you an audiobook before bed. And paired with the right hardware, shell lock your front door, turn off your lights, or adjust the temperature in your home.

To be sure, the technology has a long way to go before it is omnipresent. But it is beginning to be deployed in new and interesting ways. And at CES 2017, voice-recognition was one of the clear winners, permeating every corner of the show floor. From Ford and Volkswagen to Martian Watches and LG refrigerators, voice-integration transcended every category. Voice is becoming the common OS stitching together diverse systems across a myriad of user applications.

As we have made these astronomical improvements in voice accuracy, I foresee two important directions voice will go from here.

First, digitization and connectivity will beget personalization. In the future, it wont be enough that we can talk to the connected objects around us. Each member of a household or office can and will have a unique relationship with voice-enabled objects. Google has started to push Google Home in this direction.

Second, remember that voice is the user-interface layer to a much richer computing environment. Siri, Cortana, Alexa, Google Home and others are bringing individuals face-to-face with an AI-infused computing experience. For many daily tasks where we might use voice today, doing them on our phones or other devices might currently be more efficient because we can see extra information. But the role of AI in these voice systems will begin to transform the user experience.

Context is the next dimension for voice-optimized platforms. For example, when I can open my refrigerator, read off a series of potential ingredients I have on hand, and get back receipt suggestions, Ill have accomplished something with my voice that would be cumbersome in other computing environments. Context is king and voice will make that more apparent and readily accessible than ever.

I sometimes think back to that incident in Eastern Europe when even the local English teacher couldnt communicate with me. Today, I could speak to her mobile phone and get a relevant reply in return. The technology now available to us would have changed my experience. And likewise, this technology will forever change how we interact with computing and with each other.

See the original post:

As technology lends you its ear, these technologies will determine what it hears - TechCrunch

Related Posts