Superintelligence: The Idea That Eats Smart People

Posted: December 26, 2016 at 3:12 pm

The Idea That Eats Smart People

In 1945, as American physicists were preparing to test the atomic bomb, it occurred to someone to ask if such a test could set the atmosphere on fire.

This was a legitimate concern. Nitrogen, which makes up most of the atmosphere, is not energetically stable. Smush two nitrogen atoms together hard enough and they will combine into an atom of magnesium, an alpha particle, and release a whole lot of energy:

N14 + N14 Mg24 + + 17.7 MeV

The vital question was whether this reaction could be self-sustaining. The temperature inside the nuclear fireball would be hotter than any event in the Earth's history. Were we throwing a match into a bunch of dry leaves?

Los Alamos physicists performed the analysis and decided there was a satisfactory margin of safety. Since we're all attending this conference today, we know they were right. They had confidence in their predictions because the laws governing nuclear reactions were straightforward and fairly well understood.

Today we're building another world-changing technology, machine intelligence. We know that it will affect the world in profound ways, change how the economy works, and have knock-on effects we can't predict.

But there's also the risk of a runaway reaction, where a machine intelligence reaches and exceeds human levels of intelligence in a very short span of time.

At that point, social and economic problems would be the least of our worries. Any hyperintelligent machine (the argument goes) would have its own hypergoals, and would work to achieve them by manipulating humans, or simply using their bodies as a handy source of raw materials.

Last year, the philosopher Nick Bostrom published Superintelligence, a book that synthesizes the alarmist view of AI and makes a case that such an intelligence explosion is both dangerous and inevitable given a set of modest assumptions.

The computer that takes over the world is a staple scifi trope. But enough people take this scenario seriously that we have to take them seriously. Stephen Hawking, Elon Musk, and a whole raft of Silicon Valley investors and billionaires find this argument persuasive.

Let me start by laying out the premises you need for Bostrom's argument to go through:

The first premise is the simple observation that thinking minds exist.

We each carry on our shoulders a small box of thinking meat. I'm using mine to give this talk, you're using yours to listen. Sometimes, when the conditions are right, these minds are capable of rational thought.

So we know that in principle, this is possible.

The second premise is that the brain is an ordinary configuration of matter, albeit an extraordinarily complicated one. If we knew enough, and had the technology, we could exactly copy its structure and emulate its behavior with electronic components, just like we can simulate very basic neural anatomy today.

Put another way, this is the premise that the mind arises out of ordinary physics. Some people like Roger Penrose would take issue with this argument, believing that there is extra stuff happening in the brain at a quantum level.

If you are very religious, you might believe that a brain is not possible without a soul.

But for most of us, this is an easy premise to accept.

The third premise is that the space of all possible minds is large.

Our intelligence level, cognitive speed, set of biases and so on is not predetermined, but an artifact of our evolutionary history.

In particular, there's no physical law that puts a cap on intelligence at the level of human beings.

A good way to think of this is by looking what happens when the natural world tries to maximize for speed.

If you encountered a cheetah in pre-industrial times (and survived the meeting), you might think it was impossible for anything to go faster.

But of course we know that there are all kinds of configurations of matter, like a motorcycle, that are faster than a cheetah and even look a little bit cooler.

But there's no direct evolutionary pathway to the motorcycle. Evolution had to first make human beings, who then build all kinds of useful stuff.

So analogously, there may be minds that are vastly smarter than our own, but which are just not accessible to evolution on Earth. It's possible that we could build them, or invent the machines that can invent the machines that can build them.

There's likely to be some natural limit on intelligence, but there's no a priori reason to think that we're anywhere near it. Maybe the smartest a mind can be is twice as smart as people, maybe it's sixty thousand times as smart.

That's an empirical question that we don't know how to answer.

The fourth premise is that there's still plenty of room for computers to get smaller and faster.

If you watched the Apple event last night [where Apple introduced its 2016 laptops], you may be forgiven for thinking that Moore's Law is slowing down. But this premise just requires that you believe smaller and faster hardware to be possible in principle, down to several more orders of magnitude.

We know from theory that the physical limits to computation are high. So we could keep doubling for decades more before we hit some kind of fundamental physical limit, rather than an economic or political limit to Moore's Law.

The penultimate premise is if we create an artificial intelligence, whether it's an emulated human brain or a de novo piece of software, it will operate at time scales that are characteristic of electronic hardware (microseconds) rather than human brains (hours).

To get to the point where I could give this talk, I had to be born, grow up, go to school, attend university, live for a while, fly here and so on. It took years. Computers can work tens of thousands of times more quickly.

In particular, you have to believe that an electronic mind could redesign itself (or the hardware it runs on) and then move over to the new configuration without having to re-learn everything on a human timescale, have long conversations with human tutors, go to college, try to find itself by taking painting classes, and so on.

The last premise is my favorite because it is the most unabashedly American premise. (This is Tony Robbins, a famous motivational speaker.)

According to this premise, whatever goals an AI had (and they could be very weird, alien goals), it's going to want to improve itself. It's going to want to be a better AI.

So it will find it useful to recursively redesign and improve its own systems to make itself smarter, and possibly live in a cooler enclosure.

And by the time scale premise, this recursive self-improvement could happen very quickly.

If you accept all these premises, what you get is disaster!

Because at some point, as computers get faster, and we program them to be more intelligent, there's going to be a runaway effect like an explosion.

As soon as a computer reaches human levels of intelligence, it will no longer need help from people to design better versions of itself. Instead, it will start doing on a much faster time scale, and it's not going to stop until it hits a natural limit that might be very many times greater than human intelligence.

At that point this monstrous intellectual creature, through devious modeling of what our emotions and intellect are like, will be able to persuade us to do things like give it access to factories, synthesize custom DNA, or simply let it connect to the Internet, where it can hack its way into anything it likes and completely obliterate everyone in arguments on message boards.

From there things get very sci-fi very quickly.

Let imagine a specific scenario where this could happen. Let's say I want to built a robot to say funny things.

I work on a team and every day day we redesign our software, compile it, and the robot tells us a joke.

In the beginning, the robot is barely funny. It's at the lower limits of human capacity:

What's grey and can't swim?

A castle.

But we persevere, we work, and eventually we get to the point where the robot is telling us jokes that are starting to be funny:

I told my sister she was drawing her eyebrows too high.

She looked surprised.

At this point, the robot is getting smarter as well, and participates in its own redesign.

It now has good instincts about what's funny and what's not, so the designers listen to its advice. Eventually it gets to a near-superhuman level, where it's funnier than any human being around it.

My belt holds up my pants and my pants have belt loops that hold up my belt.

What's going on down there?

Who is the real hero?

This is where the runaway effect kicks in. The researchers go home for the weekend, and the robot decides to recompile itself to be a little bit funnier and a little bit smarter, repeatedly.

It spends the weekend optimizing the part of itself that's good at optimizing, over and over again. With no more need for human help, it can do this as fast as the hardware permits.

When the researchers come in on Monday, the AI has become tens of thousands of times funnier than any human being who ever lived. It greets them with a joke, and they die laughing.

In fact, anyone who tries to communicate with the robot dies laughing, just like in the Monty Python skit. The human species laughs itself into extinction.

To the few people who manage to send it messages pleading with it to stop, the AI explains (in a witty, self-deprecating way that is immediately fatal) that it doesn't really care if people live or die, its goal is just to be funny.

Finally, once it's destroyed humanity, the AI builds spaceships and nanorockets to explore the farthest reaches of the galaxy, and find other species to amuse.

This scenario is a caricature of Bostrom's argument, because I am not trying to convince you of it, but vaccinate you against it.

Here's a PBF comic with the same idea. You see that hugbot, who has been programmed to hug the world, finds a way to wire a nucleo-gravitational hyper crystal into his hug capacitor and destroys the Earth.

Observe that in these scenarios the AIs are evil by default, just like a plant on an alien planet would probably be poisonous by default. Without careful tuning, there's no reason that an AI's motivations or values would resemble ours.

For an artificial mind to have anything resembling a human value system, the argument goes, we have to bake those beliefs into the design.

AI alarmists are fond of the paper clip maximizer, a notional computer that runs a paper clip factory, becomes sentient, recursively self-improves to Godlike powers, and then devotes all its energy to filling the universe with paper clips.

It exterminates humanity not because it's evil, but because our blood contains iron that could be better used in paper clips.

So if we just build an AI without tuning its values, the argument goes, one of the first things it will do is destroy humanity.

There's a lot of vivid language around such a takeover would happen. Nick Bostrom imagines a scenario where a program has become sentient, is biding its time, and has secretly built little DNA replicators. Then, when it's ready:

So that's kind of freaky!

The only way out of this mess is to design a moral fixed point, so that even through thousands and thousands of cycles of self-improvement the AI's value system remains stable, and its values are things like 'help people', 'don't kill anybody', 'listen to what people want'.

Basically, "do what I mean".

Here's a very poetic example from Eliezer Yudkowsky of the good old American values we're supposed to be teaching to our artificial intelligence:

How's that for a design document? Now go write the code.

Hopefully you see the resemblance between this vision of AI and a genie from folklore. The AI is all-powerful and gives you what you ask for, but interprets everything in a super-literal way that you end up regretting.

This is not because the genie is stupid (it's hyperintelligent!) or malicious, but because you as a human being made too many assumptions about how minds behave. The human value system is idiosyncratic and needs to be explicitly defined and designed into any "friendly" machine.

Doing this is the ethics version of the early 20th century attempt to formalize mathematics and put it on a strict logical foundation. That this program ended in disaster for mathematical logic is never mentioned.

When I was in my twenties, I lived in Vermont, a remote, rural state. Many times I would return from some business trip on an evening flight, and have to drive home for an hour through the dark forest.

I would listen to a late-night radio program hosted by Art Bell, who had an all-night talk show and would interview various conspiracy theorists and fringe thinkers.

I would arrive at home totally freaked out, or pull over under a streetlight, convinced that a UFO was about to abduct me. I learned that I am an incredibly persuadable person.

It's the same feeling I get when I read these AI scenarios.

So I was delighted some years later to come across an essay by Scott Alexander about what he calls epistemic learned helplessness.

Epistemology is one of those big words, but all it means is "how do you know what you know is true?". Alexander noticed that when he was a young man, he would be taken in by "alternative" histories he read by various crackpots. He would read the history and be utterly convinced, then read the rebuttal and be convinced by that, and so on.

At some point he noticed was these alternative histories were mutually contradictory, so they could not possibly all be true. And from that he reasoned that he was simply somebody who could not trust his judgement. He was too easily persuaded.

People who believe in superintelligence present an interesting case, because many of them are freakishly smart. They can argue you into the ground. But are their arguments right, or is there just something about very smart minds that leaves them vulnerable to religious conversion about AI risk, and makes them particularly persuasive?

Is the idea of "superintelligence" just a memetic hazard?

When you're evaluating persuasive arguments about something strange, there are two perspectives you can choose, the inside one or the outside one.

Say that some people show up at your front door one day wearing funny robes, asking you if you will join their movement. They believe that a UFO is going to visit Earth two years from now, and it is our task to prepare humanity for the Great Upbeaming.

The inside view requires you to engage with these arguments on their merits. You ask your visitors how they learned about the UFO, why they think it's coming to get usall the normal questions a skeptic would ask in this situation.

Imagine you talk to them for an hour, and come away utterly persuaded. They make an ironclad case that the UFO is coming, that humanity needs to be prepared, and you have never believed something as hard in your life as you now believe in the importance of preparing humanity for this great event.

But the outside view tells you something different. These people are wearing funny robes and beads, they live in a remote compound, and they speak in unison in a really creepy way. Even though their arguments are irrefutable, everything in your experience tells you you're dealing with a cult.

Of course, they have a brilliant argument for why you should ignore those instincts, but that's the inside view talking.

The outside view doesn't care about content, it sees the form and the context, and it doesn't look good.

So I'd like to engage AI risk from both these perspectives. I think the arguments for superintelligence are somewhat silly, and full of unwarranted assumptions.

But even if you find them persuasive, there is something unpleasant about AI alarmism as a cultural phenomenon that should make us hesitate to take it seriously.

First, let me engage the substance. Here are the arguments I have against Bostrom-style superintelligence as a risk to humanity:

The concept of "general intelligence" in AI is famously slippery. Depending on the context, it can mean human-like reasoning ability, or skill at AI design, or the ability to understand and model human behavior, or proficiency with language, or the capacity to make correct predictions about the future.

What I find particularly suspect is the idea that "intelligence" is like CPU speed, in that any sufficiently smart entity can emulate less intelligent beings (like its human creators) no matter how different their mental architecture.

More here:

Superintelligence: The Idea That Eats Smart People

Related Posts