An AI learned to play hide-and-seek. The strategies it came up with were astounding. – Vox.com

Posted: September 27, 2019 at 7:48 am

This week, leading AI lab OpenAI released their latest project: an AI that can play hide-and-seek. Its the latest example of how, with current machine learning techniques, a very simple setup can produce shockingly sophisticated results.

The AI agents play a very simple version of the game, where the seekers get points whenever the hiders are in their field of view. The hiders get a little time at the start to set up a hiding place and get points when theyve successfully hidden themselves; both sides can move objects around the playing field (like blocks, walls, and ramps) for an advantage.

The results from this simple setup were quite impressive. Over the course of 481 million games of hide-and-seek, the AI seemed to develop strategies and counterstrategies, and the AI agents moved from running around at random to coordinating with their allies to make complicated strategies work. (Along the way, they showed off their ability to break the game physics in unexpected ways, too; more on that below.)

Its the latest example of how much can be done with a simple AI technique called reinforcement learning, where AI systems get rewards for desired behavior and are set loose to learn, over millions of games, the best way to maximize their rewards.

Reinforcement learning is incredibly simple, but the strategic behavior it produces isnt simple at all. Researchers have in the past leveraged reinforcement learning among other techniques to build AI systems that can play complex wartime strategy games, and some researchers think that highly sophisticated systems could be built just with reinforcement learning. This simple game of hide-and-seek makes for a great example of how reinforcement learning works in action and how simple instructions produce shockingly intelligent behavior. AI capabilities are continuing to march forward, for better or for worse.

You can watch the whole video here, or check out these highlights.

It may have taken a few million games of hide-and-seek, but eventually the AI agents figured out the basics of the game: chasing one another around the map.

AI agents have the ability to lock blocks in place. Only the team that locked a block can unlock it. After millions of games of practice, the AI agents learned to build a shelter out of the available blocks; you can see them doing that here. In the shelter, the seeker agents cant find them, so this is a win for the hiders at least until someone comes up with a new idea.

Millions of generations later, the seekers have figured out how to handle this behavior by the hiders: they can drag a ramp over, climb the ramp, and find the hiders.

After a while, the hiders learned a counterattack: they could freeze the ramps in place so the seekers couldnt move them. OpenAIs team notes that they thought this would be the end of the game, but they were wrong.

Eventually, seekers learned to push a box over to the frozen ramps, climb onto the box, and surf it over to the shelter where they can once again find the hiders.

Theres an obvious counterstrategy for the hiders here: freezing everything around so the seekers have no tools to work with. Indeed, thats what they learn how to do.

Thats how a game of hide-and-seek between AI agents with millions of games of experience goes. The interesting thing here is that none of the behavior on display was directly taught or even directly rewarded. Agents only get rewards when they win the game. But that simple incentive was enough to encourage lots of creative in-game behavior.

Many AI researchers think that reinforcement learning can be used to solve complicated tasks with real-world implications, too. The way powerful strategic decision-making emerges from simple instructions is promising but its also concerning. Solving problems with reinforcement learning leads, as weve seen, to lots of unexpected behavior charming in a game of hide-and-seek, but potentially alarming in a drug meant to treat cancer (if the unintended behavior causes life-threatening complications) or an algorithm meant to improve power plant output (if the AI arranges to exploit some obscure condition in its goals rather than simply provide consistent power).

Thats the hazardous flip side of techniques like reinforcement learning. On the one hand, theyre powerful techniques that can produce advanced behavior from a simple starting point. On the other hand, theyre powerful techniques that can produce unexpected and sometimes undesired advanced behavior from a simple starting point.

As AI systems grow more powerful, we need to give careful consideration to how to ensure they do what we want.

Sign up for the Future Perfect newsletter. Twice a week, youll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and to put it simply getting better at doing good.

Read the original post:

An AI learned to play hide-and-seek. The strategies it came up with were astounding. - Vox.com

Related Posts