# What Is Bayes Theorem?

Given a set, S, having a subset A and a subset X, then the overlap, OVL, of A by X equals the overlap of X by A. This is simply a statement of identity. Consequently,

(OVL/A) / (OVL/X) = X/A

Also,

(X/S) / (A/S) = X/A

Therefore,

(OVL/A) / (OVL/X) = (X/S) / (A/S)

Or

OVL/A = ((X/S) / (A/S)) x OVL/X

Rearranging,

OVL/A = (X/S) x ((OVL/X) / (A/S)) Equation 1

We may verbally designate:

OVL/A as the fraction of A that is (also) X

OVL/X as the fraction of X that is (also) A

A/S as the fraction of S that is A

X/S as the fraction of S that is X

By definition, probability is the fractional concentration of an element in a logical set. Therefore,

A/S is the probability of A with respect to S

X/S is the probability of X with respect to S

OVL/A is the probability of X with respect to A

OVL/X is the probability of A with respect to X

If we drop the ‘respect to S’, because S is the full set under consideration, Equation 1 is verbally,

The probability of X with respect to A equals the probability of X times a ratio, namely, the probability of A with respect to X divided by the probability of A.

This is Bayes’ theorem.

However, the verbiage is typically changed when it is noted to be Bayes’ theorem.

The calculated probability of X with respect to A, i.e. OVL/A, is said to be the probability of X posterior to the calculation. In jargon, it is said to be the probability of the ‘event’ of X given the ‘truth’ of ‘event’ A.

The probability, X/S, is said to be the probability of X prior to the calculation or simply the prior. The calculation is based on another factor, other than the prior. This factor or ‘antecedent’ is said to be the likelihood function, (OVL/X)/(A/S). This likelihood function is the probability of A with respect to X divided by the probability of A.

The jargon renders Equation 1 as: Given the prior probabilities, A and X, and given the probability of A with respect to X, then we can calculate the probability of the event X, when A is true.

An Example, Without the Jargon

Let the fraction of persons with Native ancestry as a fraction of a population, X/S, be known to be 2 per million.
Let the fraction of persons with high cheekbones as a fraction of the population, A/S, be known to be 3 per million.
Let the fraction of persons with Native ancestry that also have high cheekbones, OVL/X be 90 per 100.
By Eq. 1 we can calculate the fraction of persons with high cheekbones that also are of Native ancestry, OVL/A.
It is:
OVL/A = (2/10^6) x (0.9/(3/10^6)) = 0.6 or 60%

In this population, the fraction of persons with high cheekbones, who are also of Native ancestry, is 60%. Fractional concentration is the definition of probability. Therefore, we may say that, for this population, the probability that a person with high cheekbones is of Native ancestry is 60%.

The Conclusion in Jargon and Its Implication

The fact that OVL/A as a fractional concentration is thereby a probability, i.e. a fraction of a logical set, we can lose our way due to the use of jargon.

In jargon, we may say that the calculation, OVL/A, represents the certitude of the ‘truth’ of X when we know A is a fact.

In such jargon, we might make the statement, ‘Person A from this population, who has high cheekbones, is of Native ancestry’ is true with a certitude of 60%. We might think that this implies that we have assessed the truth of the statement, ‘Person A is of Native ancestry’.

Furthermore, we might infer that the truth of whether Person A is or is not of Native ancestry is determined by population data. Then, from that inference we might extrapolate that the determination of truth, in general, is based on population data, i.e. on the identification of subsets.

Another example

Let X/S, the fraction of canines that are coyotes in a rural county of New York State, be 0.05% and A/S, the fraction of canines which are gray, be 1%. Further, let OVL/X, the fraction of coyotes which are gray, be 100%. Then, OVL/A, the fraction of gray canines which are coyotes, is 5%.

OVL/A = (X/S) x ((OVL/X) / (A/S)) = (0.05%) x (100% / 1%) = 5%

Probability is the fractional concentration of an element in a logical set. Therefore the probability of coyotes in the set of gray canines is OVL/A = 5%

The Popular Interpretation of Bayesian Inference

The calculation of the fractional overlap of a subset X onto a subset A by Bayes’ theorem is popularly said to be the determination of the truth of X given that A is a fact.

The popular expression of Bayesian inference implies that it assesses the truth of a ‘belief’, X, based on some known fact, A.

Assessment of the Popular Interpretation

The popular expression spurns direct assessment of the ‘belief’. This is because it is not a belief at all. It is a second marker, X, by which some elements of a set, S, are identified in addition to another marker, A. The Bayesian variable calculated is simply the fraction of the set A, which possesses the marker, X, in addition to the marker, A.

In the coyote example, S is the set of canines, X is the subset of coyotes, A is the subset of gray canines, and OVL is the overlap of the subsets, X and A.

The popular interpretation would be: The numerical example does not support the ‘belief’ that ‘this canine, which is known to be gray, is a coyote’. That ‘belief’ would be ‘true’ in only 5% of instances of observing a gray canine.

A Comparable Interpretation

The calculation of the fractional overlap of a subset X onto a subset A by Bayes’ theorem is the quantification of prejudice.

The calculation quantifies the validity of the prejudice that characteristic or marker, X, is possessed by an element because that element has been identified as possessing characteristic or marker, A. In the numerical example, based solely on the observation or knowledge that a canine was gray, we would be prejudiced against its being a coyote at a level of 95%.