A Google AI Watched 30,000 Hours of Video GamesNow It Makes Its Own – Singularity Hub

Posted: March 8, 2024 at 6:25 am

AI continues to generate plenty of light and heat. The best models in text and imagesnow commanding subscriptions and being woven into consumer productsare competing for inches. OpenAI, Google, and Anthropic are all, more or less, neck and neck.

Its no surprise then that AI researchers are looking to push generative models into new territory. As AI requires prodigious amounts of data, one way to forecast where things are going next is to look at what data is widely available online, but still largely untapped.

Video, of which there is plenty, is an obvious next step. Indeed, last month, OpenAI previewed a new text-to-video AI called Sora that stunned onlookers.

But what about videogames?

It turns out there are quite a few gamer videos online. Google DeepMind says it trained a new AI, Genie, on 30,000 hours of curated video footage showing gamers playing simple platformersthink early Nintendo gamesand now it can create examples of its own.

Genie turns a simple image, photo, or sketch into an interactive video game.

Given a prompt, say a drawing of a character and its surroundings, the AI can then take input from a player to move a character through its world. In a blog post, DeepMind showed Genies creations navigating 2D landscapes, walking around or jumping between platforms. Like a snake eating its tail, some of these worlds were even sourced from AI-generated images.

In contrast to traditional video games, Genie generates these interactive worlds frame by frame. Given a prompt and command to move, it predicts the most likely next frames and creates them on the fly. It even learned to include a sense of parallax, a common feature in platformers where the foreground moves faster than the background.

Notably, the AIs training didnt include labels. Rather, Genie learned to correlate input commandslike, go left, right, or jumpwith in-game movements simply by observing examples in its training. That is, when a character in a video moved left, there was no label linking the command to the motion. Genie figured that part out by itself. That means, potentially, future versions could be trained on as much applicable video as there is online.

The AI is an impressive proof of concept, but its still very early in development, and DeepMind isnt planning to make the model public yet.

The games themselves are pixellated worlds streaming by at a plodding one frame per second. By comparison, contemporary video games can hit 60 or 120 frames per second. Also, like all generative algorithms, Genie generates strange or inconsistent visual artifacts. Its also prone to hallucinating unrealistic futures, the team wrote in their paper describing the AI.

That said, there are a few reasons to believe Genie will improve from here.

Because the AI can learn from unlabeled online videos and is still a modest sizejust 11 billion parameterstheres ample opportunity to scale up. Bigger models trained on more information tend to improve dramatically. And with a growing industry focused on inferencethe process of by which a trained AI performs tasks, like generating images or textits likely to get faster.

DeepMind says Genie could help people, like professional developers, make video games. But like OpenAIwhich believes Sora is about more than videosthe team is thinking bigger. The approach could go well beyond video games.

One example: AI that can control robots. The team trained a separate model on video of robotic arms completing various tasks. The model learned to manipulate the robots and handle a variety of objects.

DeepMind also said Genie-generated video game environments could be used to train AI agents. Its not a new strategy. In a 2021 paper, another DeepMind team outlined a video game called XLand that was populated by AI agents and an AI overlord generating tasks and games to challenge them. The idea that the next big step in AI will require algorithms that can train one another or generate synthetic training data is gaining traction.

All this is the latest salvo in an intense competition between OpenAI and Google to show progress in AI. While others in the field, like Anthropic, are advancing multimodal models akin to GPT-4, Google and OpenAI also seem focused on algorithms that simulate the world. Such algorithms may be better at planning and interaction. Both will be crucial skills for the AI agents both organizations seem intent on producing.

Genie can be prompted with images it has never seen before, such as real world photographs or sketches, enabling people to interact with their imagined virtual worldsessentially acting as a foundation world model, the researchers wrote in the Genie blog post. We focus on videos of 2D platformer games and roboticsbut our method is general and should work for any type of domain, and is scalable to ever larger internet datasets.

Similarly, when OpenAI previewed Sora last month, researchers suggested it might herald something more foundational: a world simulator. That is, both teams seem to view the enormous cache of online video as a way to train AI to generate its own video, yes, but also to more effectively understand and operate out in the world, online or off.

Whether this pays dividends, or is sustainable long term, is an open question. The human brain operates on a light bulbs worth of power; generative AI uses up whole data centers. But its best not to underestimate the forces at play right nowin terms of talent, tech, brains, and cashaiming to not only improve AI but make it more efficient.

Weve seen impressive progress in text, images, audio, and all three together. Videos are the next ingredient being thrown in the pot, and they may make for an even more potent brew.

Image Credit: Google DeepMind

Go here to read the rest:

A Google AI Watched 30,000 Hours of Video GamesNow It Makes Its Own - Singularity Hub

Related Posts