FLASHPOINTS #10: A.I. risk for dummies (or, why we should worry less about climate change and more about killer algorithms)

A conversation with Tom Chivers

Apr 01, 2023

∙ Paid

This week, about a thousand experts in artificial intelligence, including one Elon Musk, signed an open letter calling for a pause on AI development until we work out what the hell is going on here. The letter is somewhat vague because the signatories don’t all agree on what the risks from AI are. But there’s no doubt that many of them view it as a threat to the very survival of our species; a threat to people alive today.

In the wake of Chat-GPT’s startling advances, we’ve seen a renewed interest in artificial intelligence. Much of the discussion focuses on its imminent impact on society and our institutions - on jobs, on education, on the justice system, on democracy. Those issues are very important (I have and will continue to write about them). But here I want to put them to one side for a moment and take a closer look at what might be the biggest problem of all: whether the machines are about to wipe out the human race.

It sounds a bit nutty to even pose such a question (apparently the White House finds it hard to take seriously) but that’s why I wanted to do this. There are many smart people who take this possibility very seriously indeed. But while we often read about the risk of human catastrophe from climate change, there’s relatively little discussion of this risk, partly because it seems so wild and because it’s quite difficult to get one’s head around. It doesn’t help that the subject is often discussed, rather feverishly, by people so immersed in technical jargon that the discourse is hard for the rest of us to penetrate. So I felt there was a real need to have the theory of existential AI risk explained clearly and simply for a layman audience.

In the FLASHPOINTS series I usually talk with academics or expert practitioners; this is the first time I’ve spoken to a journalist. But I think Tom Chivers is the perfect person to discuss this question with, for a few reasons. First of all, he is actually an expert on this topic, having written an excellent book about it, or partly about it, called The Rationalist’s Guide To The Galaxy. Second, as many of you will know, he’s a superb explainer of highly complex topics for the general reader. Tom writes the daily briefing newsletter for Semafor, an exciting news startup; before that he was chief science writer for the i newspaper. Third, he is very sane and not nutty, and he takes this risk seriously. We had a fascinating and slightly scary conversation. If you have dumb questions, don’t worry, I asked them.

This is an edited transcript of a live conversation.

Tom, when you look at the media and cultural landscape, would you say we are generally overrating or underrating the existential threat from AI? Or rating it about right?

Lots of people will have heard of the idea that there is some sort of existential threat from AI, but it tends to get sidelined. People think ‘This sounds a bit science fiction, like Terminator, so we're not going to pay attention to it’. I get that. Others think it’s a deflection tactic from the tech companies who don't want to talk about how AI is entrenching racist or sexist biases in society or how they’re enriching themselves at the expense of everyone else. And it is just a big and kind of crazy-sounding topic that’s very hard to convey in an 800-word newspaper column.

So it does get attention, but much of that attention is of the “lol tech bros” type. I think that is unfortunate. The questions around algorithmic bias are certainly not trivial (although I do think we should always ask, ‘What are you comparing it to?’ - since, generally speaking, I think you can make algorithms less racist than humans, or you can certainly adjust them in a way that you can't easily adjust humans). These are real problems. But the existence of these real problems doesn't mean that talking about the risk of catastrophe from AI is inherently insane or a distraction.

From your experience of talking to and reading AI experts - people either working on AI or who have a good grasp of it - what’s the range of views on the likelihood of catastrophe?

There's a wide range. Gary Marcus is probably the most famous example of an AI researcher who is sceptical of how far it is advancing. He's always saying we're miles away from it doing anything really extraordinary or exciting. But he just signed this open letter which argues we need to slow down AI research because of various risks, including existential ones. And he said that if there's a 1% chance of existential catastrophe that is worth paying attention to. 1% is a non-trivial possibility, right? Rolling three sixes in a row is less than a 1% chance. Now, he doesn’t, I don't think, believe the AI is going to become conscious, but he thinks there are ways in which it can become powerful, be misused, and end really badly. So here’s someone on the sceptical end, and he thinks 1%.

At the other end you have people in the rationalist community like Eliezer Yudkowsky who has written a piece called Death with Dignity which basically says we've lost, we're all going to die, let's just deal with it. Then there's a broad spectrum between them. Scott Alexander puts the possibility of catastrophe at 33%. And he said he was more optimistic than most people he speaks to.

For AI researchers in particular, you get answers of somewhere between 5% and 15% for “very bad” outcomes like existential catastrophe. It's a bit murky because some of them may be bucketing other bad outcomes in with existential catastrophe. But I think in general AI researchers believe there is a non-trivial chance in maybe double-figure percentages that it goes really, terribly wrong. While also believing that there’s a similar or greater chance of it going really, really right and everyone having a marvellous time. So a lot of people who I take seriously on this think there is a realistic chance, at the sort of rolling double-ones level or higher, that this could go really badly wrong.

It seems to me that in order to believe in AI as a real threat, you basically have to believe two things. One is that AGI (super-powerful, generalised machine intelligence) is possible, and not that far off - within decades. The second part is that you have to have some theory of why it would destroy us. Let’s take those one at a time. Can you give me a sense of where the field is on the first question?

So bear in mind that forecasts, even from people within the field, are often notoriously bad. Ernest Rutherford said getting power from nuclear energy was “moonshine” and the next day Leo Szilárd invented the model for a fission reactor. So even great experts in the field can be miles off on predictions. That said, a survey of AI researchers suggested that the average AI researcher thinks it’s about 50% likely we’ll have AGI by 2065. But that was a few years ago - since then, and post Chat-GPT, it’s been bumped forward; they’re talking about 20 or 30 years from now.

But there's a wide spread on this. Some people still think it will never happen, that computers will never have general intelligence and be able to do all the things that humans can. Others think it could be five years away. I hear rumours that people in OpenAI say GPT-5 will achieve generality, although that's just rumours and people always get over-excited about their own stuff.

When John McCarthy, the AI pioneer, was asked when AIs will be as smart as humans, he said, “Somewhere between 5 and 500 years from now”. It’s still a bit like that. But these days it feels closer to 5 than 500.

OK, so let’s assume for now that we have AGI within decades. The next question is whether and how it will destroy us. Actually, I think the first question I have is, why would it want to do that?

It's a very reasonable question, and it is the thing that is hardest to get your head around. There was a paper that came out four or five years ago which provides an interesting example. The researchers used the model of evolution to develop AI algorithms for certain tasks, like playing Tic Tac Toe on a massive board with hundreds of squares. So there’s competition, the algorithm breeds different models of itself, the ones that do well breed more, and so on. In theory, you get algorithms that are successful at doing the job you want them to. What the researchers found was that the algorithms succeeded in the task they were given, but in ways that were not foreseen and which seemed to miss the point of the exercise somehow.

One of the tasks was finding a way to walk or travel some distance, point A to point B. Instead of evolving legs and discovering how to walk, the winning algorithm just evolved into a very tall tower with weights on the top, so that when the simulation started, it fell over in the direction of the target it had been set. So you feel it's kind of cheated. On the Tic Tac Toe game, one of the algorithms realised that if it gave coordinates for a move that was billions of squares outside the actual board - say it's 2 trillion squares in that direction and 5 trillion squares in that direction - then the AIs it was playing against would have to try and model a board that was that big in their memory, and they couldn’t, so they crashed, so the first AI won by default. In those games, the AIs achieved the target they were set but not in the way the programmers wanted. The target was only a proxy for what the programmers wanted (actual locomotion, or working out rules of Tic Tac Toe). The AIs hacked it.

We see this in human society all the time. It’s Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure”. Whenever a company or government uses some measure as a performance target, it also provides an incentive for people to manipulate that measure in order to get their reward. Governments set targets for police or schools or hospitals, and then we get all sorts of unintended consequences, because people optimise for the target rather than for wider, more intangible goals.

If you give some sufficiently powerful AI a goal, you don’t know how it’s going to meet it. To give a silly but simple example: let’s say we ask an AI to rid the world of cancer. And the AI says “Well, I'll look into it…The mission here is to get the number of cancer cells on earth down to zero. It turns out that biochemistry and medicine is hard, but hacking the controls of nuclear missile silos is quite easy. I can blow everybody up and there's no more cancer cells.” Or in a famous hypothetical example, you ask it to make paperclips and be as productive as possible and it thinks OK, I'll turn every single atom in the solar system into paperclips, which means destroying humans.

So the fear is not that the AI becomes malicious, it’s that it becomes competent. It does exactly what you wanted it to do, but not in the way you wanted it to. It’s just maximising the number that you put in its reward function. But it turns out that what we desire as humans is hard to pin down to a reward function, which means things can go terribly wrong.

A common question is, “Well, if this AI is so super clever, why wouldn't it realise that you didn't want it to do that?” The point is, it doesn't care what you want. What it cares about is what maximises the number in its brain. There’s a parallel with evolution, right? Evolution “wants” to maximise our reproductive fitness, to pass as many genes down to the next generation as we as we can. But it's done that by rewarding us for having sex and for eating sweet foods and for things which are only somewhat correlated to how reproductively successful we are. And we as humans don't care directly about our reproductive fitness; we care about the feeling of an orgasm or the taste of ice cream. Evolution wants us to have as many children and nephews and nieces as we possibly can but we don't think like that. We just want to fulfil the reward functions that evolution has programmed us with.

An AI won’t step outside its own programme. There's no ghost in the machine that thinks, “I've been programmed to care about this thing, but I should actually care about this thing.”

Anyone who worries about climate change should be able to grasp that point, right? Humanity as a collective intelligence may be on the verge of doing itself great harm - our own goals aren't aligned with what we want.

Yes, it's very hard to align humans with human goals. That’s what Goodhart’s law is about. You say to some humans, OK, we want schools to improve so we’ll reward schools on the basis of how many A-C grades pupils get at GCSE. Before you impose the metric, that was probably a pretty good way of seeing how well a school was doing. But as soon as you impose it, then humans who know exactly what is actually intended will focus on all the kids who are on the D-C boundary and push them up while ignoring the kids on B or D-E.

These are humans who not only know, but also care about doing the right thing - that’s why they went into teaching. But nonetheless, they optimise for the thing they've been told to optimise for. Now, take an alien intelligence which doesn't necessarily care at all about the things that we care about. It's just been given this number to maximise. The big question becomes whether we can make it care about the things we care about. Or at least not destroy us in the process of trying to maximise the thing we've told it to care about.

In fact we could very well make the case that AI is a bigger existential threat to humanity than climate change is, and that we have the scale of these threats the wrong way around in terms of the amount of attention we give them.

I think that's right. Let’s take “existential threat” to mean either the extinction of humanity, or some equivalently bad thing in which humanity is locked into a completely terrible state forever and its entire potential is destroyed. Now, as I understand the science of climate change, and going by IPCC data and the scientific mainstream (bearing in mind it’s possible that the IPCC is over-optimistic) the median forecast is that the world will be much worse than it would have been without climate change, but also that people will still live and continue to get richer, there will continue to be more food than there was. There will be droughts in places that weren't previously droughts and floods in places that weren't previously floods, but on average, the median human in 2100 will be better off than the median human was in 2000. They won’t be as much better off as they would have been given no climate change. So climate change is holding humanity back in a very real and non-trivial way. But it is not doing so sufficiently to overcome the broad trends established over the last three hundred years or so.

That is my understanding of it anyway. You’ve got to bear in mind tail risks - maybe the IPCC is wrong and we actually spiral out of control, there’s a Venus-type greenhouse effect and the oceans boil. You've got to pay attention to the really bad possible outcomes. But the central estimate is that climate change isn't an existential risk, or is very unlikely to be.

With AI, it could plausibly kill everyone because it would have the ability to hunt out survivors. A bioengineered pandemic could kill everyone in a way that climate change seems very unlikely to do. So I don't think we should use ‘existential risk’, in that sense, when we talk about climate change. Or indeed nuclear weapons, where my understanding is that humanity would still survive, albeit in a much worse state for the first 50 or 100 years afterwards, even in the worst scenarios for nuclear war. But there is a quite realistic scenario in which AI is a genuine existential threat. So yes, I agree with that.

Can’t we just put this very basic rule into all AI: “Don't do any harm to humans”?

Answer to this after the jump. I also ask Tom why we couldn’t just pull the plug on an AI that was doing things we don’t like. We touch on some of the actual, rather terrifying scenarios that might unfold. Then we discuss the reasons not to be terrified - the reasons to think it will work out OK and might indeed be really great. And we talk about how governments should think about it. Finally, I ask Tom how he personally feels about all this - about his and his family’s future.

Subscribe now and you’ll get full access to previous FLASHPOINTS on human errors in healthcare, free speech, the gender gap in mental health, the ethics of immigration, and other thorny topics. Plus you get access to the whole stash of thought-stimulating brainfood behind the paywall in the regular Ruffian
Share

Keep reading with a 7-day free trial

Subscribe to The Ruffian to keep reading this post and get 7 days of free access to the full post archives.