Bayes Theorem: a probability within probability
If you can’t wrap your head around Bayes Theorem like I am, give me five minutes, and let me take a crack at it with 2 sections: What is Bayes Theorem and Why is it hard to understand.
This isn’t a quick and dirty crash course. This isn’t even a normal attempt at Bayes theorem. It is rather a philosophical approach of Bayesian thinking. If you are looking for a quick formula to plug in numbers, then this article isn’t for you. But if you have some time to sit down and see how to view the world through a Bayesian lens, then you are at the right place. And eventually, the minor calculation details will unfold it self.
What is Bayes Theorem
Let me begin with a door example.
Imagine there are 2 doors in front of you. You are then told that 50% of people behind the blue door are engineers as oppose to only 10% for the red door. And now you are asked: behind which one of these doors, you will most likely see John - an engineer? The reasonable choice would be the blue door because frequency statistics tells us to pick the door with the highest probability of the desired outcome, engineer in this case. But Bayesian thinking says we need more information: the amount of people behind each door.
Now you are told that there are 10 people behind the blue door, and 1000 people behind the red door. Will this new piece of information change your mind? If you feel something isn’t right but you can’t explain it, then you are at the doorstep of Bayesian thinking. If we quantify the engineers in each doors, we get 5 engineers for the blue door and 100 engineers for the red door. There are more engineers behind the blue door than the red door, so what? The probability of finding an engineer is still higher for the blue door, right? Yes, it is. But we don’t care about the blue door. We care about John, specifically, which door has a higher probability of having John, not finding engineers. Forget for a second about who isn’t an engineer behind each doors. We can even completely ignore them. If I tell you we gathered 105 engineers, including John, and randomly split 5 for the blue door and the remaining for the red door, intuitively you would say John would probably be behind the red door with probability 100/105 vs 5/105 for the blue door. This is Bayes Theorem.
The probability of engineers behind blue door, p(E|H)= 50%, multiply by the total population behind blue door, p(H)= 10, is equal to 5. And the probability of engineers behind red door, p(E|~H)= 10%, multiply by the total population behind red door, p(~H)= 1000, equals to 100. Therefore, the chances of finding John behind the red door is way higher.
Wait, that makes sense, but how come it doesn’t?
Why is it hard to understand
Bayes Theorem sounds so reasonable, but why is it so hard to have a full grasp of the idea? I sum up to 3 reasons: relativity error, dimensionality error, and sufficient/necessary error.
Relativity error
In psychology there is a test to confuse people’s perception of size by manipulating the main focus’s juxtaposition objects.
Both orange circles on the left and right have the same size, but our perception has been altered by the size of the black circles around each of them. As a result, we perceive the left orange circle to be bigger than the right.
Let’s translate this image to a statistical lingo: The left orange circle occupies 10% of the surface area, while the right orange circle occupies 70% of the surface area, which circle is bigger?
We understand the world through a frame of reference: the frame and the subject. However, the real world requires more dimensions than frame and subject to have a better picture.
Dimensionality error
What happens when our frame of reference breaks down and we are faced with problems that require higher dimensional intuitions?
No doubt that the left bar has more volume than the right bar. But Bayesian thinking will require additional information: the depth. The truth surfaces when we rotate these 2 tubs clockwise in the third dimension axis. It is then we realized that the right bar actually has a lot more volume. This new piece of information, like the amount of people or the size of circles, shakes the core of our prior conclusion.
The 3 spatial dimensional concept is just an abstract idea to help us understand Bayesian thinking. Besides spatial analogy, think of extra dimensions as labels for our variables or features. You do not have to think in 4th dimension, the take away is that Bayesian thinking requires that extra information. I encouraged you to draw the ideas out, because with so many moving parts, our brain can quickly loose track, and nothing can confuse our brain faster than sufficient/necessary statements.
Sufficient/necessary error
Both the blue door and John share the attribute of engineers, but the engineer attribute alone is not enough to conclude that John is behind it.
Allow me to adjust the previous example a little to make a point. Let’s change the statement 50% of the total population behind the blue door are engineers to 100%, and we get the diagram below. Let’s also call the term “Blue Door” a sufficient statement and “Engineers” a necessary statement.
In plain English, this diagram above translates to “if it is a blue door, then everyone behind it are engineers”. The sufficient statement, blue door, leads to the conclusion engineers. The negate of necessary condition, there is no engineer, can also lead to the negate of sufficient condition, the door is not blue (imagine that the blue door in side the “Blue Door” circle are the only blue doors in this universe). However, stating that there are engineers is not enough to say the door is blue, because as you can see that there are plenty of room to be engineers in the the “Engineers” circle without being inside the “Blue Door” circle. In other words, “Engineers” is a bigger category of umbrella that is not exclusive to “Blue Door”. John can, but does not have to, be inside the “Blue Door” circle. It is perfectly reasonable to have John end up outside of the “Blue Door” circle as long as it is still inside the “Engineers” circle. However, the probability of John being inside the “Blue Door” circle increases as the circle enlarges inside the “Engineers” circle. Thus, the sizes of “Blue Door” circle is a critical bits of information that Bayes Theorem inquires, the total population behind the blue door.
This kind of argument is called “deductive reasoning” where the probability of engineers behind the blue door is 100% as oppose to “inductive reasoning”, which means the probability is less than 100%. As you can see, even with deductive reasoning we still cannot conclude sufficient statement based on necessary statement, let alone inductive reasoning with less than 100% probability.
Conclusion
Bayesian thinking reminds me of a famous parable dated around 500 BCE: Blind men and an elephant.
It reminds us how quickly we can lose track of things when there are many layers, like a Russian doll. Even though this 2500 years parable is still relevant today, its implication is almost unrecognizable, connecting the the past and the future. Thank you for reading this. For additional sources, I find the link below the best at explaining Bayes Theorem.