AlphaFold, GPT-3 and How to Augment Intelligence with AI (Pt. 2) – A16Z Future

Posted: March 8, 2022 at 11:06 pm

As we saw in Part 1, its possible to get started on a number of important problems by building augmentation infrastructure around the strengths of an artificial intelligence model. Does the model generate text? Build around text. Can it accurately predict 3D structures? Build around 3D structures. But taking an artificial intelligence system completely at face value comes with its own limitations.

Douglas Engelbart used the term co-evolution to describe the way in which humanitys tools and its processes for using those tools adapt and evolve together. Models like GPT-3 and DALL-E represent a large step in the evolution of tools, but its only one half of the equation. When you build around the model, without also building new tools and processes for the model, youre stuck with what you get. The models weaknesses become your weaknesses. If you dont like the result, its up to you to fix it. And since training any of the large, complex AI systems weve discussed so far requires massive data and computation, you likely dont have the resources to change the model all that much.

This is a bit of a conundrum: On the one hand, we dont have the resources to change the model significantly. On the other hand, we need to change the model, or at least come up with better ways of working with it, to solve for our specific use case. For prompt-based models like GPT-3 and DALL-E, the two easiest ways to tackle this fixed-model conundrum are prompt-hacking and fine-tuning neither of which are particularly efficient:

The goal of augmented intelligence is to make manual processes like these more efficient so humans can spend more time on the things they are good at, like reasoning and strategizing. The inefficiency of prompt-hacking and fine-tuning show that the time is ripe for a reciprocal step in process evolution. So, in this section, well explore some examples of a new theme building for the model and the role it plays in creating more effective augmentation tools.

As a working example, lets say youre an up-and-coming game developer working on the next online gaming franchise. Youve seen how games like Call of Duty and Fortnite have created massively successful (and lucrative) marketplaces for custom skins and in-game assets, but youre a resource-constrained startup. So, instead of developing these assets yourself, you offload content generation to DALL-E, which can generate any number of skins and asset styles for a fraction of the cost. This is a great start, but prompt-hacking your way to a fully stocked asset store is inefficient.

To make things less manual, you can turn prompting over to a text generation model like GPT-3. The key to the virality of a game like Fortnite is the combination of a number of key game assets weapons, vehicles, armor with a variety of unique styles and references, such as eye-catching patterns/colors, superheroes, and the latest pop culture trends. When you seed GPT-3 with your asset types, it can generate any number of these combinations into a prompt. Pass that prompt over to DALL-E, and out comes your skin design.

This GPT-3 to DALL-E handoff sounds great, but it only really works if it produces stimulating, high-quality skin designs for your users. Combing through each of the design candidates manually is not an option, especially at scale. The key here is to build tools that let the marketplace do the work for you. Users flock to good content and have no patience for bad content apps like TikTok are based entirely on this concept. User engagement will therefore be a strong signal for which DALL-E prompts are working (i.e., leading to interesting skin designs) and which are not.

To let your users do the work for you, youll want to build a recursive loop that cross-references user activity with each prompt and translates user engagement metrics into a ranking of your active content prompts. Once you have that, normal A/B testing will automatically surface prompt insights and you can prioritize good prompts, remove bad prompts, and even compare the similarity of newly generated prompts to those you have tested before.

But thats not all the same user engagement signal can also be used for fine-tuning.

Lets move one more step backward and focus on GPT-3s performance. As long as you keep track of the inputs you are giving to GPT-3 (asset types + candidate themes), you can join that data with the quality rankings you have just gotten from further down in your content pipeline to create a dataset of successful and unsuccessful input-output pairs. This dataset can be used to fine-tune GPT-3 on game-design-focused prompt generation, making it even better at generating prompts for your application.

This user-driven cyclical pipeline helps DALL-E generate better content for your users by surfacing the best prompts, and helps GPT-3 generate better prompts by fine-tuning on examples generated from your own user activity. Without having to worry about prompt-hacking and fine-tuning, you are free to work on bigger-ticket items, like which assets are next in the pipeline, and which new content themes might lead to even more interesting skins down the road.

There also exists a huge opportunity to build middleware connecting creative industries and creative, personalized content-generating models.AI models and the services they enable (e.g. Copilot) could help for use cases that require novel content creation. This, again, requires using our understanding of the AI system and how it works to think of ways in which we can modify its behavior ever so slightly to create new and better experiences.

Imagine you are building a service for learning to code that uses Copilot under the hood to generate programming exercises. Out of the box, Copilot will generate anywhere from a single line of code to a whole function, depending on the docstring its given as input. This is great you can construct a bunch of exercises really quickly!

To make this educational experience more engaging, though, youll probably want to tailor the exercises generated by Copilot to the needs and interests of your users. For example, you might want to personalize across dimensions such as:

Generating docstrings yourself is tedious and manual, so personalizing Copilots outputs should be as automated as possible. Well, we know of another AI system, GPT-3, that is great at generating virtually any type of text so maybe we can offload the docstring creation to GPT-3.

This can be done in one of two ways. One approach is to ask GPT-3 to generate generic docstrings that correspond to a particular skill or concept (e.g., looping, recursion, etc.). With one prompt, you can generate any number of boilerplate docstrings. Then, using a curated list of target themes and keywords (a slight manual effort), you can replace variable names in the boilerplate to your target audience. Alternatively, you can try feeding both target skills/concepts and themes to GPT-3 at the same time and let GPT-3 tailor the docstrings to your themes automatically.

The success of this idea, of course, comes down to the quality of GPT-3s content. For one, youll want to make sure the exercises generated by this GPT/Copilot combination are age-appropriate. Perhaps an aligned model like InstructGPT would be better here.

We are now over a decade into the latest AI summer. The flurry of activity in the AI community has led to incredible breakthroughs that will have significant impact across a number of industries and, possibly, on the trajectory of humanity as a whole. Augmented intelligence represents an opportunity to kickstart this progress, and all it takes is a slight reframing of our design principles for building AI systems. In addition to building models to solve problems, we can think of new ways to build infrastructure around models and for models; and even ways in which foundation models might work together (like GPT-3 and DALL-E or GPT-3 + CoPilot).

Maybe one day we will be able to offload all of the dirty work of life to some artificial general intelligence and live hakuna-matata style, but until that day comes we should think of Engelbart focusing less on machines that replace human intelligence and more about those that are savvy enough to enhance it.

Posted March 8, 2022

Technology, innovation, and the future, as told by those building it.

Check your inbox for a welcome note.

Go here to read the rest:

AlphaFold, GPT-3 and How to Augment Intelligence with AI (Pt. 2) - A16Z Future

Related Posts