Bayes’ theorem


Bayes’ theorem is today considered one of the main theorems in statistics, and one of the most applied formulae in science.

its importance grew steadily until the middle of the last century. And it is now considered essential in all statistics courses, and applied in almost every field of research and not least in Bioinformatics, where it has been applied extensively to the biological system analysis.

At first glance, Bayes’ theorem can seem confusing, counterintuitive, and hard to grasp. We know that, for many, statistics is not intuitive, as with other aspects of mathematics.

However, if we analyse the thought processes leading Bayes to his theorem, we see that these are natural and logical ways of thinking.

History of Bayes’ theorem

Bayes theory is named after the Reverend Thomas Bayes. Around 1740 Bayes performed a thought experiment: he imagined putting his back against a flat, square table, and launching a ball onto the table without knowing where it would land.

Then, he thought of launching a second ball, this time asking his assistant if the ball landed to the left, to the right, in front of, or behind the first ball. Using this system, he was able to narrow the position of the first ball with each new launch.

It is not possible to know exactly where the first ball landed using this system; but it created a method, or a way of thinking, whereby each new piece of evidence improved the estimate.

Bayes never published his idea. After his death in 1761, his friend Richard Price found his notes, re-edited them, extended them (at some points), and eventually published them.

He contributed greatly to Bayes’ theorem and by modern standards, we would normally refer to the theorem as the Bayes-Price theorem, giving the appropriate credit to Price. Despite Price’s work and the publication, the theorem remained unknown until it was rediscovered, reinterpreted, and brought to modern formulation in 1774 by Pierre-Simon Laplace.

Here, it was stated that “The probability of a cause (given an event) is proportional to the probability of the event (given its cause)“. In more modern words: Bayes’ theorem describes the likelihood of an event, based on prior knowledge of conditions that might be related to the event.

Classical Representation and Examples

The most classical representation of the formula is the following:


Where A and B are two events that we want to analyse.

P(A) and P(B) are the distinct and independent probabilities of the event A and event B (prior probabilities).

P(A|B) (read: P of A such that B) is the conditional probability, the probability of the event A, given that the event B has occurred (posterior probability).

Vice versa, P(B|A) is the probability of B given A.

Therefore, this formula relates the prior probabilities to the posterior probabilities, with the possibility to integrate new observations to an established model based on previous observations.

Let us now move on from this formal terminology and look at some real examples.

Example 1:

In this example, let us consider a school composed of 60% boys and 40% girls, in which all boys have a short haircut while the percentages of girls with long hair and girls with short hair are equal.

We meet a student with a short haircut. What is the probability that the student is a girl?

We can use the Bayes formula to answer this question. In order to do so, let us identify the elements of the formula.

In this case, four elements are involved: the gender of the students, boys (B) and girls (G); and the hairstyle, long (L) or short (S).

The hairstyle events are connected and dependent on the gender of the student and those are therefore our dependent variables. The gender of the students is the independent variable.

From the composition of the school, we can compute that the independent probability the student is a girl (event G) is 40% P(G). Conversely the independent probability the student is a boy (event B) is 60% P(B).

The probability that a student has a short haircut (event S) considering the entire school P(S) is: all males plus half of females 60\%+(40\%/2) = 80\% of all students.

The probability that a student has a long haircut (event L) is: P(L):1-P(S)=20\%.

The dependent probability of a student with short hair, given that the student is a girl P(S\mid G), is 50%. This is because the girls are equally divided between long and short haircuts.

The dependent probability of a student with short hair, given that the student is a boy P(S\mid B), is 100%, given that all boys have short hair.

Having established this, we can now apply the formula to compute the conditional probability:

P(G|S)=\frac{P(G)P(S|G)}{p(S)}=\frac{0.5\times 0.4}{0.8}=0.25\rightarrow 25\%

P(B|S)=\frac{P(B)P(B|G)}{p(S)}=\frac{0.6\times 1}{0.8}=0.70 = 75\%

In conclusion, the probability that the student with short hair is a girl is 25%.

Interestingly, we could have computed P(G\mid S) even without knowing P(S) by doing:

P(G|S)=\frac{P(G)P(S|G)}{P(G)P(S|G)+P(-G)P(S|-G)} =

\frac{0.4 \times 0.5}{0.4\times 0.5+0.6\times 1}=\frac{0.2}{0.8}=0.25 = 25\%

Example 2:

Let us assume that a test for a disease is positive. We know that the test performed has an accuracy of 99%, and that the disease it tests for affects one person out of a thousand. What is the likelihood of having the disease?

One might say 99% because of the accuracy of the test. This is not correct, because the accuracy of the test actually means the percentage of testing positive (event T), given that you have the disease (event D) is P(T\mid D).

What we really want to know is the probability of having the disease, given that the result of the test is positive: P(D\mid T).

If the P(D) is equal at the percentage of the population 0.001%, we can now apply Bayes’ theorem, so that we have:

P(D\mid T)=\frac{P(T\mid D)P(D)}{P(T)}=\frac{P(T\mid D)P(D)}{P(T\mid D)P(D)+P(T\mid -D)P(-D)}=\frac{ 0.99 \times 0.001 }{ 0.99 \times 0.001+ 0.01 \times 0.999 } = 0.09 = 9\%

9% is less scary than 99%. But why is this value so low?

We know that the disease affects only one person in one thousand and that the test has a 99% accuracy, therefore in a sample of one thousand, only one person would be a true positive, but ten people would results as false positive. See Figure below:

Positive and false positive results of the test: In this picture, we see a sample of 1,000 people tested with a test which has 99% accuracy. A priori, we know that only one individual in 1,000 is positive for the test (in red), and all the other results are negative. However, because the test is only 99% accurate, we are going to have 10 individuals who will test positive, even though they do not actually have the disease, thus false positive (in grey). Therefore, according to the Bayesian theory, the probability of a true positive result for a 99% accuracy test for an incidence of 1 over 1,000 is not 99%, as one could think, but only 1 over 11 (all the person considered positive, true and false combined) thus, 9%.

Therefore, an individual resulting positive has a 9% possibility to be a true positive.

1/11\ = 0.09 \rightarrow 9\%

Now, if we repeat the test and the result is still positive, how much is P(D|T)?

Again, we can use Bayes, using the previous result as our new independent variable P(D) , therefore:

\frac{0.99 \times 0.9}{0.99 \times 0.9+ 0.01 \times 0.91}=0.9073\rightarrow 90.73\%

A third time would be 99.89% and so on.

0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments