Probability in formal and non-formal situations

27 May 2023

Under what circumstances is it appropriate to think in terms of probabilities? I’m writing this in response to Vaughn Tan’s “How to think more clearly about risk” and the following Interintellect discussion. Broadly speaking, I’m making an argument against the idea that there are situations of uncertainty in which one shouldn’t think probabilistically.

As I understand it, Tan is drawing a distinction between situations where there is “formal risk”, in which it is possible to calculate probabilities, and those which are not formally risky and in which we can’t calculate probabilities. Tan doesn’t mention him, but it seems to me that he is drawing heavily on Frank Knight’s work. Knight made the distinction between risk and uncertainty: risk being that which is susceptible to measurement, and uncertainty being that which is unmeasurable. Knight’s point is that where the probabilities are measurable you can construct systems such as insurance that bundle up like cases so as to make the bad outcomes manageable. If the possibility of a bad outcome is unmeasurable then you can’t bundle in this way.

Tan’s argument is not just terminological though. More importantly, he thinks that the using the wrong terminology encourages us to use the wrong tools:

In other words, when we call a situation “risky,” we almost always decide on how to act on it with some form of implicitly or explicitly quantitative risk analysis (whether we call it risk modeling, conformal prediction, cost-benefit analysis, expected outcomes theory, expected value theory etc). Quantitative risk analysis is almost the only approach we teach people to use in thinking about and acting on futures that aren’t completely certain.

This is a trap because quantitative risk analysis as a decisionmaking mindset only works as expected when the decisionmaking situation is actually one of formal risk (as in Example 1). In all other situations of not-knowing, quantitative risk analysis involves made-up numbers, the comfort of false certainty, and the real possibility of bad outcomes.

My argument then is that it is legitimate and appropriate to use quantitative thinking – assigning probabilities, calculating expected value – in all of these non-formal situations. Furthermore, I claim that probabilities are not objectively right or wrong. Rather they are an indication of your own personal uncertainty about an event.

Incidentally, I think Tan is too pessimistic and has missed an important example that he should count as formally risky by his own definitions:

Almost no real-life situations are formally risky other than flipping a fair coin or throwing fair dice. Close to 100% of the time, “risk” is used to describe several kinds of situations of not-knowing that aren’t formally risky.

The situation is actually not quite as bad than that, even on his own terms. There are situations where we don’t know the probabilities a priori but where we can make measurements. E.g. the number of defective widgets made in a widget factory. The field of frequentist statistics exists almost entirely to make such situations formally risky.

My first claim then is simply that you can entirely legitimately put a numeric probability onto any proposition, regardless of how non-formal it is.

As evidence for this I will mention betting markets and Tetlock’s superforecasters. It turns out that both of those things produce well-calibrated probability estimates for all manner of complex, exceedingly non-formal events. The fact that these probabilities are well-calibrated – that they reflect the actual frequencies at which events occur – means that they are legitimate, and that you can reasonably proceed to use quantitative concepts such as expected value that work with that probability.

Notably, we don’t know what mathematical processes are being used by superforecasters or the participants in a betting market. Not only are the events non-formal, but also the prediction process as well.

So far as I can tell, the formal/non-formal distinction rests on something like the notion that there is a “real” probability for any event: For formal situations like coin tosses we know the real probability. For more complex situations we don’t know the real probability. And therefore we can’t work with probabilities in the latter cases because the numbers are “made up”.

And so my second claim is that these probabilities are instead an indication of your own personal uncertainty about the proposition. They are not in themselves right or wrong: There is no such thing as a real probability; there is no objective probability out there in the world that your statement can correctly or incorrectly refer to. Instead, the probability is a measure of your personal ignorance.

Consider this: If you were to observe a coin toss from a known starting position and could see the force put into the spin and where it was caught, then, knowing the weight of the coin and the drag from the air and so on, you could calculate how many times it had turned and thus whether it had landed heads or tails. It’s your lack of knowledge of those factors and the limits on your abilities to calculate that cause you to assign 50% to both heads and tails.

Having said that probability is a measure of your personal uncertainty, arguably there might be a correct initial prior and a correct way of updating such that two perfect Bayesian reasoners would always arrive at the same probability given the same sequence of evidence. I’m somewhat unsure about this, given that in order to make predictions you have to form models, and I don’t know whether there’s a perfect process of model formation. In any case, I’d expect that process to be computationally intractable, so you’re always going to be limited by trade-offs about the amount of thought that you’re willing to put into a question.

How precise should one be when giving probabilities? Is there any value in saying that the probability of an event is 61.357% rather than saying that it’s around 60-ish? Mental work is work, and there is a cost to calculating an estimate with precision. Each additional digit of precision is additional work, usually with very rapidly diminishing returns. Furthermore, there is always going to be a lot of noise in subjective estimates. It doesn’t do you any good to know the fifth decimal place of my estimate when more significant digits than that are being swayed by whether I’m in a good mood that day.

Furthermore, there are situations in which you should expect your estimates to change drastically with only a little additional evidence, but without knowing in which direction. Take the covid-19 lab leak hypothesis for example. There is very little conclusive evidence to be had, so one day you might be swayed in one direction by a paper about raccoon dogs, and the next day you might be swayed in the opposite direction by a Twitter thread about a mahjong room. And subjectively it feels hard to put a precise number on these things because any number you give will change with the wind.

And how should we treat situations for which there is no precedent. For example, what’s the probability that there will be an alien invasion tomorrow?

I will argue that you do know something about alien invasions: you know at least that you haven’t yet experienced one. There may be no precedents for alien invasions, but there are many, many precedents for days without alien invasions. So you might use LaPlace’s rule of succession in such a case, with the reckoning that there have so far been around 4.5 billion × 365 days of the Earth’s existence with no alien invasions. You could of course blend that with a model that says that aliens haven’t been seen through our telescopes either, so that might bring your probability down a little. And you might blend that again with a model that says that there’s some non-zero plausibility that some UFO sightings are in fact sightings of aliens, bringing the probability up somewhat. So you do have all sorts of evidence about aliens in one way or another, and you can figure out for yourself what credence you actually put in those different bits of evidence.

It seems to me that there is a fear surrounding using probabilities because those probabilities might be wrong. Probabilities are numbers, and numbers are in the realm of mathematics, which means the realm of correct procedures and right and wrong answers. If you give the wrong number then you will get a bad mark from the teacher. Safer to throw up your hands and say that it’s unknowable.

But there is no teacher standing over you; no one to whom you can plead and whine that you don’t know the answer, and no one who will give you a bad mark except reality itself. Instead, you – yes, you! – can and must think things through for yourself. Probabilities are the tool we have for working sanely with our own ignorance.