Bayes’ theorem is a simple algebraic relationship among fractions of a set or population of elements. Based on common expositions of it, one would think that it was complicated in itself and that it resolved a mystery through its implications.
The population of elements to which Bayes’ theorem applies, may be viewed as a surface over which the population density varies. A Bayesian surface is partitioned by two independent criteria. One criterion may be viewed as dividing the surface into two horizontal rows, while the other criterion divides it into two vertical columns. The result is the formation of four quadrants, which differ in population due to the non-uniformity of the population density. One important thing is that the four quadrants are mutually related. Each may be expressed by the same algebraic formulation in its relationships to the other three.
The two rows formed by the horizontal partitioning may be distinguished as the horizontal top row, HT, and the horizontal bottom row, HB. The two columns formed by the vertical partitioning may be distinguished as the vertical left column, VL, and the vertical right column, VR. The two rows, HT + HB, add up to the total, T, as do the two columns, VL and VR. The quadrants are designated as Q1 through Q4. Each row or column is the sum of two quadrants, e.g. HT = Q1 + Q2 and VL = Q1 + Q3.
In the Tabulated Bayesian Population, the column VR has the role of non-VL. Thus, rather than being one column, VR, may be any number of columns, whose sum is the complement of VL. Analogously, the row, HB, has the role of non-HT. Consequently, Bayes’ theorem is applicable to any number of rows and any number of columns, where the additional rows and columns may be treated in their sum, respectively as non-HT and non-VL, i.e. as HB and VR, respectively.
Bayes’ theorem, in its algebraic expression, which focuses on Q1, is:
Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1
The two terms, HT, cancel out as do the two terms, T. This leaves the identity, Q1/VL ≡ Q1/VL, which proves the validity of Bayes’ theorem.
In the application of Bayes’ theorem the numerical values of the numerators and the denominators of the fractions are not given. What is given are the numerical values of the three fractions on the right hand side of Eq. 1, which permits the calculation of the numerical value of the fraction, Q1/VL, as a fraction.
Reciprocity of Various Expressions of Bayes’ Theorem
Eq. 1 expresses Bayes’ algebraic formulation by focusing on the top, left quadrant, Q1. However, it must be remembered that the same algebraic formulation of relationships with the other three quadrants, could be applied to any quadrant. This can be seen in that each of the other three quadrants can be successively designated as quadrant, Q1, by rotating the population surface in increments of 90 degrees.
In the application of Bayes’ theorem, Eq. 1 is viewed as representing Q1/VL as directly proportional to HT/T, where the constant of proportionality is (Q1/HT) / (VL/T). Because each of the fractions of Eq. 1 is ratio of a subset to a set, each of the fractions is a probability. Expressing the direct proportionality of Eq. 1 using the word, probability, rather than the word, fraction, yields: The probability of quadrant Q1 with respect to the column VL is directly proportional to the probability of the row HT with respect to the total population, T.
Typically, the numerical value of the probability, HT/T, is given along with the numerical value of the constant of proportionality. The numerical value of the probability, Q1/VL, is calculated. Common jargon refers to the given probability, HT/T, as the prior probability and the calculated probability, Q1/VL, as the posterior or the final probability.
If the numerical value of Q1/VL were given along with the constant of proportionality, then the probability HT/T could be calculated. We would be viewing Eq. 1 in the form,
HT/T = ((VL/T) / (Q1/HT)) * (Q1/VL) Eq. 2
Common jargon would then label Q1/VL as the prior probability and HT/T as the posterior or final probability, i.e. vice versa to the common jargon applied to Eq. 1.
Eq. 1 and Eq.2 are fully equivalent. With respect to Eq. 1, common jargon in determining the probability of a hypothesis, would claim that the prior probability of row HT with respect to the total was revised to the posterior or final probability of Q1 with respect to column VL.
With respect to Eq. 2, common jargon would claim that the prior probability of Q1 with respect to column VL was revised to the posterior or final probability of row HT with respect to the total.
What this apparently contradictory jargon means is (1) that given the constant of proportionality and HT/T, then Q1/VL can be calculated, while (2) given the constant of proportionality, and Q1/VL, then HT/T can be calculated. Both probabilities remain completely distinct. Neither replaces the other or is revised to equal the other.
A numerical value, which is given, is prior in our knowledge to a numerical value, which is calculated. But in no sense does one replace the other or is one revised to be the other. To use the words, replace and/or revise is to use misleading jargon.
Identifying one probability within Bayes’ equation as prior and one as posterior, where the posterior replaces or supersedes the prior, is a misleading mystification of simple algebra, where the two probabilities are distinct and do not change in their algebraic relationship to one another.
An Illustration of Bayes’ Theorem
Let us use an easily comprehended set of elements to illustrate Bayes’ theorem. That set is a bunch of playing cards. Not a standard deck, a bunch. All of the cards in the set, i.e. the bunch, are not of the customary thirteen ranks, but of only two ranks, Kings and Queens. All of the cards in the set are not of four, but of only two suits, Diamonds and Spades.
Let us view Bayes’ theorem as telling us that Q1/VL, is directly proportional to HT/T. The constant of proportionality would then be (Q1/HT) / (VL/T).
Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1
In this example the elements of the set are cards. T is the total number of cards. HT is the total number of Kings. VL is the total number of Diamonds. Q1 is the number of cards that are both Kings and Diamonds.
The person, who formed the set of cards, tells us that 70% of the Kings are Diamonds; that 50% of the cards are Diamonds and that 40% of the cards are Kings. Referring to Eq. 1: (1) If 70% of the Kings are Diamonds, then Q1/HT = 0.7. (2) If 50% of the cards are Diamonds, then VL/T = 0.5. (3) If 40% of the cards are Kings, then HT/T = 0.4. The constant of proportionality, (Q1/HT) / (VL/T), equals 0.7/0.5 = 1.4.
The fraction of Diamonds that are Kings, Q1/VL is directly proportional to HT/T, the fraction of all cards that are Kings.
The fraction of Diamonds that are Kings = (.7/.5) * the fraction of all cards that are Kings.
Q1/VL = (.7/.5) * (H/T)
The fraction of Diamonds that are Kings = (1.4) * 0.4 = 0.56 = 56%
Q1/VL = 56%
Verbalization of Bayes’ Theorem
In the illustration, common jargon would state that the prior probability of a card’s being a King, HT/T or 40%, is revised to the posterior probability, namely the probability of a King’s being a Diamond, Q1/VL or 56%. However, if HT/T were the given and Q1/VL were calculated, then, based on the same equation, common jargon would have to state that the prior probability of a King’s being a Diamond or 56%, was revised to the posterior probability, namely the probability of a card’s being a King or 40%.
It is easy to fall into the rut of such jargon, if HT/T is thought of as the probability of a generic card’s being a Diamond, and Q1/VL as the probability that a card specified as being a King is a Diamond. It is as if the generic was being replaced by the specific. Such a nuanced inference is not warranted by the mathematics, because the reciprocal relation is equally valid. The reciprocal relationship is given the numerical value of the specific, the numerical value of the generic can be calculated.
The use of replace and revise in common jargon confuses a displacement based on inequality with a replacement based on equality. Such a displacement of inequality does not elucidate Bayes’ theorem, which is the equality expressed by, Eq. 1.
The criticism of common jargon in this essay does not preclude the successive iteration of an algorithm based on Bayes’ theorem, which could involve a displacement. In such a case, the succeeding iteration uses the specific probability of the prior iteration as its generic probability. The iteration of the algorithm calculates a new specific probability based on some added or omitted characteristic. It thereby calculates a partitioning, i.e. a probability, not of the prior population, but, of a newly limited sub-population.
It should be noted that it is inappropriate and misleading to identify as Bayes’ theorem an algorithm, which iteratively employs Bayes’ theorem, just as it would be inappropriate and misleading to identify as the Pythagorean theorem an algorithm, which iteratively employs the Pythagorean theorem.
Common jargon confuses Bayes’ theorem with its algorithmic iteration.