# Bayes Theorem: a probability within probability

If you can’t wrap your head around Bayes Theorem like I am, give me five minutes, and let me take a crack at it with 2 sections: What is Bayes Theorem and Why is it hard to understand.

This isn’t a quick and dirty crash course. This isn’t even a normal attempt at Bayes theorem. It is rather a philosophical approach of Bayesian thinking. If you are looking for a quick formula to plug in numbers, then this article isn’t for you. But if you have some time to sit down and see how to view the world through a Bayesian lens, then you are at the right place. And eventually, the minor calculation details will unfold it self.

## What is Bayes Theorem

Let me begin with a door example.

Imagine there are 2 doors in front of you. You are then told that 50% of people behind the blue door are engineers as oppose to only 10% for the red door. And now you are asked: behind which one of these doors, you will most likely see John - an engineer? The reasonable choice would be the blue door because frequency statistics tells us to pick the door with the highest probability of the desired outcome, engineer in this case. But Bayesian thinking says we need more information: the amount of people behind each door.

Now you are told that there are 10 people behind the blue door, and 1000 people behind the red door. Will this new piece of information change your mind? If you feel something isn’t right but you can’t explain it, then you are at the doorstep of Bayesian thinking. If we quantify the engineers in each doors, we get 5 engineers for the blue door and 100 engineers for the red door. There are more engineers behind the blue door than the red door, so what? The probability of finding an engineer is still higher for the blue door, right? Yes, it is. But we don’t care about the blue door. We care about John, specifically, which door has a higher probability of having John, not finding engineers. Forget for a second about who isn’t an engineer behind each doors. We can even completely ignore them. If I tell you we gathered 105 engineers, including John, and randomly split 5 for the blue door and the remaining for the red door, intuitively you would say John would probably be behind the red door with probability 100/105 vs 5/105 for the blue door. This is **Bayes Theorem**.

For blue door: 5 equals to p(E|H)= 50% multiply by p(H)= 10, and for red door: 100 equals to p(E|~H)= 10% multiply p(~H)= 1000.

Wait, that makes sense, but how come it doesn’t?

## Why is it hard to understand

Bayes Theorem sounds so reasonable, but why is it so hard to have a full grasp of the idea? I sum up to 3 reasons: sufficient/necessary error, relativity error, and dimensionality error.

**Sufficient/necessary error**

Both the blue door and John has the attribute of engineers, but the engineer attribute alone is not enough to conclude that John is behind it.

In plain English, this diagram above translates to “if it is a blue door, then everyone behind it are engineers”. The sufficient statement, blue door, leads to the conclusion engineers. The negate of necessary condition, there is no engineer, can also lead to the negate of sufficient condition, the door is not blue. However, stating that there are engineers is not enough to say the door is blue. This is because “Engineers” is a bigger category of umbrella that is not exclusive to “Blue Door”. John can, but does not have to, be inside the “Blue Door” circle. It is perfectly reasonable to have John end up outside of the “Blue Door” circle as long as it is still inside the “Engineers” circle. However, the probability of John being inside the “Blue Door” circle increases as the circle enlarges inside the “Engineers” circle. Thus, the sizes of both circles are critical bits of information that Bayes Theorem inquires.

This kind of argument is called “deductive reasoning” as oppose to “inductive reasoning”, which means the probability is less than 100%. As you can see, even with deductive reasoning we still cannot conclude sufficient statement based on necessary statement, let alone inductive reasoning with less than 100% probability.

**Relativity error**

In psychology there is a test to confuse people’s perception of size by manipulating the main focus’s juxtaposition objects.

Both orange circles on the left and right have the same size, but our perception has been altered by the size of the black circles around each of them. As a result, we perceive the left orange circle to be bigger than the right*.*

Let’s translate this image to a statistical lingo: The left orange circle occupies 10% of the surface area, while the right orange circle occupies 70% of the surface area, which circle is bigger?

We understand the world through a frame of reference: the frame and the subject. However, the real world requires more dimensions than frame and subject to have a better picture.

**Dimensionality error**

What happens when our frame of reference breaks down and we are faced with problems that require higher dimensional intuitions? Grab a pencil and paper, no literally, this part will make a lot more sense if you draw the ideas out.

No doubt that the left bar has more volume than the right bar. But Bayesian thinking will require additional information: the depth. The truth surfaces when we rotate these 2 tubs clockwise in the third dimension axis. It is then we realized that the right bar actually has a lot more volume. This new piece of information, like the amount of people or the size of circles, shakes the core of our prior conclusion.

The 3 spatial dimensional concept is just an abstract idea to help us understand Bayesian thinking. Besides spatial analogy, think of extra dimensions as labels for our variables or features. You do not have to think in 4th dimension, the take away is that Bayesian thinking requires that extra information. And this is why I encouraged you to draw the ideas out. With so many moving parts, our brain can quickly loose track.

**Conclusion**

Bayesian thinking reminds me of a famous parable dated around 500 BCE: Blind men and an elephant.

It reminds us how quickly we can lose track of things when there are many layers, like a Russian doll. Even though this 2500 years parable is still relevant today, its implication is almost unrecognizable, connecting the the past and the future. Thank you for reading this. For additional sources, I find the link below the best at explaining Bayes Theorem.