How to think Bayes
A personal take on one of the most influential British thinkers of the last three centuries (plus the usual jamoboree of juicy links).
Thomas Bayes, born 1701 or thereabouts. Solid looking chap.
A Ruffian subscriber remarked to me this week that unlike with most newsletters, he never knows what to expect from The Ruffian. He liked that; for myself I’m not sure if it’s a bug or a feature, but at least for now I’m maintaining the uncertainty by doing something new: one long piece, followed the usual miscellany of juicy links.
BACK TO THE IF
Thomas Bayes may be the most influential British thinker of the twenty-first century, despite having died in 1761 after publishing nothing of consequence. Bayes was a Presbyterian minister and amateur mathematician. Following his death, his friend Richard Price (one of the great social connectors of seventeenth century intellectual life) discovered among Bayes’ papers an essay about probability. Price found it so interesting that he found a way to get it published. The method of analysis it describes has in recent years undergone a big resurgence, or maybe just a surgence, and is widely used in machine learning, neuroscience, epidemiology, climate change, and many other fields in academia and industry.
At the heart of the paper is what’s now known as Bayes’ theorem, which you can read here. I can just about grasp it mathematically (it’s not complicated, it’s just that I’m not very good at maths) but here I want to treat it as an idea, a way of thinking, a lens through which to see the world. It has helped me to understand aspects of the pandemic, among other things. What follows is my non-mathy, intuitive take on what it means to think like a Bayesian. Since I’m on somewhat shaky ground here, epistemically speaking, I’m leaving comments open in case anyone with expertise wishes to correct me.
The essential insight of Thomas Bayes is that the likelihood of an event occurring is always conditional on prior likelihoods. The reason that’s important is that we often forget to properly assess those “priors” before making a judgement on the question at hand. We look at what’s in front of us and draw conclusions which are too narrow. In our hurry to arrive at ‘thens’ we neglect to interrogate the ‘ifs’ (or the ‘given thats’) which precede them.
You’re at home at night, everyone’s asleep, and you hear a creaking sound. You (well, me) immediately conclude that a burglar is in the house. But the Bayesian in you says, if there hasn’t been a burglary in my street for ten years, what’s the likelihood that my house has been broken into? Very low. So let’s look at other explanations first, like old timber contracting, before we revise a pretty solid prior that the house is free of invaders. (In order to speak Bayes I’m afraid you have to get used to the deployment of ‘prior’ as a noun). Another way of putting this is that Bayesians are always thinking about the context of evidence, not just the evidence itself.
In his book, The Drunkard’s Walk, the scientist Leonard Mlodinow offers a dramatic example of Bayesian reasoning from his own life. One day in 1989, when the HIV-AIDS epidemic was ravaging the US, Mlodinow took a call from his doctor - the kind of call we all have nightmares about. It was highly likely, he was told, that he was infected with HIV, and would die within ten years. The doctor was reporting the result of an HIV test, that Mlodinow had taken after being refused life insurance, which had come back positive. The test was known to produce a false positive result in only 1 in 1000 samples.
After recovering from the shock, Mlodinow used his understanding of Bayes to conclude that his odds of being uninfected were a lot better than 1 in 1000. Here is his reasoning. Mlodinow was heterosexual, monogamous, and not a drug user. That meant the prior probability of him being infected with HIV, based on epidemiological data, was about 1 in 10,000. So now Mlodinow had one piece of information telling him he was probably infected. But he also had all this other information telling him he was incredibly unlikely to infected. He was presented with two unlikely, mutually exclusive possibilities - 1) The test is wrong 2) He had contracted HIV. Crucially, the second possibility was much more unlikely than the first.
In numbers, there was a 1 in 1000 chance of the test being wrong. But there was a 1 in 10,000 chance of Mlodinow being positive in the first place. Run that through Bayes’ theorem and it implies only a 1 in 10 chance Mlodinow was infected. That’s not nothing - the positive test definitely changed the likelihood Mlodinow had HIV - just not by nearly as much as the doctor thought. The doctor had failed to absorb the fact that Mlodinow was vanishingly unlikely to have contracted AIDS in the first place, and that one blood test, even one with a small number of false positives, was not nearly enough to conclude he was carrying the virus. As it turned out, he wasn’t.
One lesson I draw from Bayes is that when we judge whether something is true or not, we shouldn’t be overly swayed by new information - by whatever is currently before our eyes. First, we should place it in context. We should consider if the event is likely or unlikely and to what degree, then decide how much weight we should give to the new data. This is something we are always in danger of forgetting, especially in a world of information overload and everyone screaming at you THIS CHANGES EVERYTHING.
Let’s say a new poll out tomorrow shows Joe Biden with a 90% approval rating. Many hot takes immediately get written about how he has united the country. The Bayesian sniffs and says, ‘Hmm, let’s wait. My prior is that it’s extremely unlikely any president will unite America’s polarised electorate, and it will take more than one poll to shift that belief.” She will be aware that she might be wrong, and she’ll pay close attention to the next few polls - which may well reveal that the 90% poll was a one-off and that all those hot takes were meaningless.
We live in a world where new information, new data, rains down on us constantly, much of it engineered to seize our attention. Spend a lot of time reading the news and you start to feel like the latest thing is the most important thing. Actually, most news shouldn’t change your priors very much about most things. Bayesians aspire to a certain sang froid: they are quite reluctant to get too excited by whatever new fact has just popped up. They don’t ignore it; it may be used to tweak a prior, as we’ll see. But they seek to fuse the new with the already known, and to weight each appropriately.
This helps you decide what to pay attention to. If I believe something is very probably true, I won’t be that interested in new evidence that says it is, since I’ve already priced that in. I’ll also demand a lot of counter-evidence to be persuaded it’s not. The danger of this, of course, is that I dismiss all evidence that doesn’t fit my priors, and become a victim of confirmation bias. In Bayesian terms, then, I must ‘update’ my priors in the face of contradictory evidence. If, after the floorboard creaks, I hear someone crashing around in the living room, then I’ll need to do some hasty updating. When the new evidence is not strong, I’ll only update a tiny amount. The Bayesian is always asking, just how much do I need to update my belief about reality, given this new information? A little, a lot, not at all?
Proper Bayesians put numbers on their guesses about reality - on their priors, updates, and predictions - but you don’t have to do that to find this approach helpful. To take an example from social life (another arena where we have more ‘news’ and data than ever, in the form of texts, ‘likes’, emails, and the rest). Say you message your friend about something and he doesn’t reply. It’s easy to get into a state of anxiety about why he hasn’t returned your message. Is he mad at me? Did I upset him? The Bayesian in you says chill. Yes, the non-returned text is a little bit of evidence your friend is deliberately ignoring you. But you also have all this prior evidence - your friendship, your friend’s warm and dependable personality - that tells you he’s probably just busy right now. So you should update your prior very slightly and wait for more data, while staying fairly confident you’re still friends. (Thanks to master Bayesian Julia Galef for this example).
Intuitively I think about updates as ‘votes’. That Biden poll would be like one vote in some internal forum in my head. It’s a vote for the “Biden has united the nation” belief. I won’t ignore it: I’ll put in on the record, as it were, and slightly reduce my confidence that Biden can’t unite the nation. But it’s just one vote, and I’ll need many more votes on that side of my internal debate before fully changing my mind.
All of the above might seem merely reasonable, but even experts in risk don’t think like this much of the time, and make mistakes as a consequence. Case in point: the absurd and dangerous reluctance of EU authorities to authorise the AZ/Oxford vaccine for over-65s. The reason given was that AZ’s Phase III trial provided very little data on efficacy among over-65s. True. But what’s the prior probability of a vaccine that’s effective in all other age groups suddenly becoming ineffective among older people? Very low indeed. Vaccines don’t work like that, by and large. Not to mention there was solid evidence from Phase II trials that the vaccine induces an immune response in that age group.
The regulators should have factored in that contextual evidence and concluded that the vaccine was very likely to work among over-65s, in the absence of strong disconfirming evidence from trials. (If you want a more extensive critique of how many regulators have behaved, read this post from a Marginal Revolution commenter. The author doesn’t use the word Bayesian but that’s his mode of analysis, from the opening line onwards).
There is something about getting too close to new information, about having one’s face pressed right up against the data, that makes very smart people, even those who might grasp these principles in the abstract, forget to be Bayesian. And I think that’s what makes regulators susceptible to mistakes: they become fixated on the evidence in front of them - in this case, clinical trial data - and fail to reason from first principles.
It’s commonplace to say that if there isn’t conclusive evidence either way on a question, we should reserve judgement. That’s correct insofar as it means you shouldn’t be certain of your answer, but if you think like a Bayesian, you can still have a definite point of view on the matter, a sketch map of the territory.
To offer a slightly controversial illustration of this principle, I think there’s a pretty good chance that Covid-19 was caused by an accident in a Chinese laboratory. I don’t know what that chance is, but it’s a solid one (somewhere between 30% and 60% maybe?). Some scientists have dismissed this theory on the basis that there’s no evidence for it, and who am I to argue with virologists? Nobody. I have zero claim to expertise here. But I think you only need to know a few simple facts to adopt a prior belief in its possibility.
The first fact is that the pandemic started in Wuhan, China. The second is that the only laboratory in that very large country licensed to experiment with deadly pathogens is in Wuhan. The third is that scientists in China - and elsewhere, including the US - have for decades experimented with deadly viruses and lab accidents do happen. There is more on this - see here or here - but I think those brute facts are enough establish a prior belief that the theory is at least plausible, and we don’t need direct biological evidence of a link to the lab to say so.
Of course, none of this means the theory is true either. Thinking like a Bayesian means getting comfortable with uncertainty - with admitting to yourself and others that on most matters you are making a guess, albeit one informed by evidence and reasoning. That’s hard, because we like to project confidence in our views on reality, and we’re often rewarded for doing so. It’s also hard because even when you profess uncertainty, it can be interpreted as veiled certainty. When you say “I think it’s plausible that…” people often assume you mean “I think it’s true that…” - and maybe you do, secretly. But Bayesians insist that the middle ground between believing and disbelieving is real, and indeed that it’s where all the interesting action takes place.
MISCELLANY
I’m delighted that Adam Grant has chosen CONFLICTED as one of his Spring picks.
LinkedIn isn’t a place one expects to find emotional depth but this is a really thoughtful self-reflection on the experience of being made redundant during lockdown, by ad executive Paul Wilson.
How to get someone to help you (in three tweets).
A brilliant thread on how to direct actors, full of insight into creative collaboration and leadership.
Magisterial interview with Howard University historian Daryl Michael Scott, questioning fashionable views on racism in US history, and urging academics to remember they’re meant to be in the business of truth. (Worth doing the free sign-up to get access).
Fact: America’s spending on Covid relief will total $5.5 trillion in 12 months. Adjusted for inflation, World War II cost the American government $4.8 trillion. (That sounds amazing, and it is, but it’s also a lot to do with how much bigger the US economy is now than it was then).
“On July 2, 1982, Larry Walters attached 43 balloons to his lawn chair, filled them with helium, put on a parachute, and strapped himself in…”
Perhaps I’ll do a longer post on why Amazon does not deserve its vilification by the left, but its leadership on the American minimum wage is obviously one reason. That is worth infinitely more than the airy ‘social purpose’ signalling of most big companies. (Also Jeff Bezos really values clear writing and knows how much time it takes, please recommend The Ruffian and buy my book).
Quote: “There are no solutions, only trade-offs.” Thomas Sowell.
I really like string quartet music, and I really like Come Together, by The Beatles. Still, the idea of a string quartet version of Come Together sounds very bad. But oh WOW this is amazing. Kudos to Quatour Ébène.
Please do spread the word about The Ruffian. Referrals and recommendations have dropped off in the last few weeks, possibly because I stopped asking you for them in order to promote CONFLICTED instead. Well, too bad, because now I’m asking you to do both: spread the word about this FREE newsletter (using this link) AND buy my book. If I had to choose…don’t make me choose.
How to buy CONFLICTED - here are links to your favourite booksellers (UK and US).
Seems to me Bayes is also evident in the OODA loop: observe, orient, decide, act - described by an American Air Force officer John Boyd. Orient - or context - is critical.