The Turing Test is obsolete. It’s time to build a new barometer for AI – Fast Company

Posted: December 29, 2020 at 12:22 am

This year marks 70 years since Alan Turing published his paper introducing the concept of the Turing Test in response to the question, Can machines think? The tests goal was to determine if a machine can exhibit conversational behavior indistinguishable from a human. Turing predicted that by the year 2000, an average human would have less than a 70% chance of distinguishing an AI from a human in an imitation game where who is respondinga human or an AIis hidden from the evaluator.

Why havent we as an industry been able to achieve that goal, 20 years past that mark? I believe the goal put forth by Turing is not a useful one for AI scientists like myself to work toward. The Turing Test is fraught with limitations, some of which Turing himself debated in his seminal paper. With AI now ubiquitously integrated into our phones, cars, and homes, its become increasingly obvious that people care much more that their interactions with machines be useful, seamless and transparentand that the concept of machines being indistinguishable from a human is out of touch. Therefore, it is time to retire the lore that has served as an inspiration for seven decades, and set a new challenge that inspires researchers and practitioners equally.

In the years that followed its introduction, the Turing Test served as the AI north star for academia. The earliest chatbots of the 60s and 70s, ELIZA and PARRY, were centered around passing the test. As recently as 2014, chatbot Eugene Goostman declared that it had passed the Turing Test by tricking 33% of the judges that it was human. However, as others have pointed out, the bar of fooling 30% of judges is arbitrary, and even then the victory felt outdated to some.

Still, the Turing Test continues to drive popular imagination. OpenAIs Generative Pre-trained Transformer 3 (GPT-3) language model has set off headlines about its potential to beat the Turing Test. Similarly, Im still asked by journalists, business leaders, and other observers, When will Alexa pass the Turing Test? Certainly, the Turing Test is one way to measure Alexas intelligencebut is it consequential and relevant to measure Alexas intelligence that way?

To answer that question, lets go back to when Turing first laid out his thesis. In 1950, the first commercial computer had yet to be sold, groundwork for fiber-optic cables wouldnt be published for another four years, and the field of AI hadnt been formally establishedthat would come in 1956. We now have 100,000 times more computing power on our phones than Apollo 11, and together with cloud computing and high-bandwidth connectivity, AIs can now make decisions based on huge amounts of data within seconds.

While Turings original vision continues to be inspiring, interpreting his test as the ultimate mark of AIs progress is limited by the era when it was introduced. For one, the Turing Test all but discounts AIs machine-like attributes of fast computation and information lookup, features that are some of modern AIs most effective. The emphasis on tricking humans means that for an AI to pass Turings test, it has to inject pauses in responses to questions like, do you know what is the cube root of 3434756? or, how far is Seattle from Boston? In reality, AI knows these answers instantaneously, and pausing to make its answers sound more human isnt the best use of its skills. Moreover, the Turing Test doesnt take into account AIs increasing ability to use sensors to hear, see, and feel the outside world. Instead, its limited simply to text.

To make AI more useful today, these systems need to accomplish our everyday tasks efficiently. If youre asking your AI assistant to turn off your garage lights, you arent looking to have a dialogue. Instead, youd want it to fulfill that request and notify you with a simple acknowledgment, ok or done. Even when you engage in an extensive dialogue with an AI assistant on a trending topic or have a story read to your child, youd still like to know it is an AI and not a human. In fact, fooling users by pretending to be human poses a real risk. Imagine the dystopian possibilities, as weve already begun to see with bots seeding misinformation and the emergence of deep fakes.

Instead of obsessing about making AIs indistinguishable from humans, our ambition should be building AIs that augment human intelligence and improve our daily lives in a way that is equitable and inclusive. A worthy underlying goal is for AIs to exhibit human-like attributes of intelligenceincluding common sense, self-supervision, and language proficiencyand combine machine-like efficiency such as fast searches, memory recall, and accomplishing tasks on your behalf. The end result is learning and completing a variety of tasks and adapting to novel situations, far beyond what a regular person can do.

This focus informs current research into areas of AI that truly mattersensory understanding, conversing, broad and deep knowledge, efficient learning, reasoning for decision-making, and eliminating any inappropriate bias or prejudice (i.e. fairness). Progress in these areas can be measured in a variety of ways. One approach is to break a challenge into constituent tasks. For example, Kaggles Abstraction and Reasoning Challenge focuses on solving reasoning tasks the AI hasnt seen before. Another approach is to design a large-scale real-world challenge for human-computer interaction such as Alexa Prize Socialbot Grand Challengea competition focused on conversational AI for university students.

In fact, when we launched the Alexa Prize in 2016, we had intense debate on how the competing socialbots should be evaluated. Are we trying to convince people that the socialbot is a human, deploying a version of the Turing Test? Or, are we trying to make the AI worthy of conversing naturally to advance learning, provide entertainment, or just a welcome distraction?

We landed on a rubric that asks socialbots to converse coherently and engagingly for 20 minutes with humans on a wide range of popular topics including entertainment, sports, politics, and technology. During the development phases leading up to the finals, customers score the bots on whether theyd like to converse with the bots again. In the finals, independent human judges assess for coherency and naturalness and assign a score on a 5-point scaleand if any of the social bots converses for an average duration of 20 minutes and scores 4.0 or higher, then it will meet the grand challenge. While the grand challenge hasnt been met yet, this methodology is guiding AI development that has human-like conversational abilities powered by deep learning-based neural methods. It prioritizes methods that allow AIs to exhibit humor and empathy where appropriate, all without pretending to be a human.

The broad adoption of AI like Alexa in our daily lives is another incredible opportunity to measure progress in AI. While these AI services depend on human-like conversational skills to complete both simple transactions (e.g. setting an alarm) and complex tasks (e.g. planning a weekend), to maximize utility they are going beyond conversational AI to Ambient AIwhere the AI answers your requests when you need it, anticipates your needs, and fades into the background when you dont. For example, Alexa can detect the sound of glass breaking, and alert you to take action. If you set an alarm while going to bed, it suggests turning off a connected light downstairs thats been left on. Another aspect of such AIs is that they need to be an expert in a large, ever-increasing number of tasks, which is only possible with more generalized learning capability instead of task-specific intelligence. Therefore, for the next decade and beyond, the utility of AI services, with their conversational and proactive assistance abilities on ambient devices, are a worthy test.

None of this is to denigrate Turings original visionTurings imitation game was designed as a thought experiment, not as the ultimate test for useful AI. However, now is the time to dispel the Turing Test and get inspired by Alan Turings bold vision to accelerate progress in building AIs that are designed to help humans.

Rohit Prasad is vice president and head scientist of Alexa at Amazon.

Here is the original post:

The Turing Test is obsolete. It's time to build a new barometer for AI - Fast Company

Related Posts