Bayes’ theorem is a simple algebraic relationship among fractions of a set or population of elements. Based on common expositions of it, one would think that it was complicated in itself and that it resolved a mystery through its implications.

The population of elements to which Bayes’ theorem applies, may be viewed as a surface over which the population density varies. A Bayesian surface is partitioned by two independent criteria. One criterion may be viewed as dividing the surface into two horizontal rows, while the other criterion divides it into two vertical columns. The result is the formation of four quadrants, which differ in population due to the non-uniformity of the population density. One important thing is that the four quadrants are mutually related. Each may be expressed by the same algebraic formulation in its relationships to the other three.

The two rows formed by the horizontal partitioning may be distinguished as the horizontal top row, HT, and the horizontal bottom row, HB. The two columns formed by the vertical partitioning may be distinguished as the vertical left column, VL, and the vertical right column, VR. The two rows, HT + HB, add up to the total, T, as do the two columns, VL and VR. The quadrants are designated as Q1 through Q4. Each row or column is the sum of two quadrants, e.g. HT = Q1 + Q2 and VL = Q1 + Q3.

Tabulation of a Bayesian Population

In the Tabulated Bayesian Population, the column VR has the role of non-VL. Thus, rather than being one column, VR, may be any number of columns, whose sum is the complement of VL. Analogously, the row, HB, has the role of non-HT. Consequently, Bayes’ theorem is applicable to any number of rows and any number of columns, where the additional rows and columns may be treated in their sum, respectively as non-HT and non-VL, i.e. as HB and VR, respectively.

Bayes’ theorem, in its algebraic expression, which focuses on Q1, is:

Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1

The two terms, HT, cancel out as do the two terms, T. This leaves the identity, Q1/VL ≡ Q1/VL, which proves the validity of Bayes’ theorem.

In the application of Bayes’ theorem the numerical values of the numerators and the denominators of the fractions are not given. What is given are the numerical values of the three fractions on the right hand side of Eq. 1, which permits the calculation of the numerical value of the fraction, Q1/VL, as a fraction.

Reciprocity of Various Expressions of Bayes’ Theorem

Eq. 1 expresses Bayes’ algebraic formulation by focusing on the top, left quadrant, Q1. However, it must be remembered that the same algebraic formulation of relationships with the other three quadrants, could be applied to any quadrant. This can be seen in that each of the other three quadrants can be successively designated as quadrant, Q1, by rotating the population surface in increments of 90 degrees.

In the application of Bayes’ theorem, Eq. 1 is viewed as representing Q1/VL as directly proportional to HT/T, where the constant of proportionality is (Q1/HT) / (VL/T). Because each of the fractions of Eq. 1 is ratio of a subset to a set, each of the fractions is a probability. Expressing the direct proportionality of Eq. 1 using the word, probability, rather than the word, fraction, yields: The probability of quadrant Q1 with respect to the column VL is directly proportional to the probability of the row HT with respect to the total population, T.

Typically, the numerical value of the probability, HT/T, is given along with the numerical value of the constant of proportionality. The numerical value of the probability, Q1/VL, is calculated. Common jargon refers to the given probability, HT/T, as the prior probability and the calculated probability, Q1/VL, as the posterior or the final probability.

If the numerical value of Q1/VL were given along with the constant of proportionality, then the probability HT/T could be calculated. We would be viewing Eq. 1 in the form,

HT/T = ((VL/T) / (Q1/HT)) * (Q1/VL) Eq. 2

Common jargon would then label Q1/VL as the prior probability and HT/T as the posterior or final probability, i.e. vice versa to the common jargon applied to Eq. 1.

Eq. 1 and Eq.2 are fully equivalent. With respect to Eq. 1, common jargon in determining the probability of a hypothesis, would claim that the prior probability of row HT with respect to the total was revised to the posterior or final probability of Q1 with respect to column VL.

With respect to Eq. 2, common jargon would claim that the prior probability of Q1 with respect to column VL was revised to the posterior or final probability of row HT with respect to the total.

What this apparently contradictory jargon means is (1) that given the constant of proportionality and HT/T, then Q1/VL can be calculated, while (2) given the constant of proportionality, and Q1/VL, then HT/T can be calculated. Both probabilities remain completely distinct. Neither replaces the other or is revised to equal the other.

A numerical value, which is given, is prior in our knowledge to a numerical value, which is calculated. But in no sense does one replace the other or is one revised to be the other. To use the words, replace and/or revise is to use misleading jargon.

Identifying one probability within Bayes’ equation as prior and one as posterior, where the posterior replaces or supersedes the prior, is a misleading mystification of simple algebra, where the two probabilities are distinct and do not change in their algebraic relationship to one another.

An Illustration of Bayes’ Theorem

Let us use an easily comprehended set of elements to illustrate Bayes’ theorem. That set is a bunch of playing cards. Not a standard deck, a bunch. All of the cards in the set, i.e. the bunch, are not of the customary thirteen ranks, but of only two ranks, Kings and Queens. All of the cards in the set are not of four, but of only two suits, Diamonds and Spades.

Let us view Bayes’ theorem as telling us that Q1/VL, is directly proportional to HT/T. The constant of proportionality would then be (Q1/HT) / (VL/T).

Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1

In this example the elements of the set are cards. T is the total number of cards. HT is the total number of Kings. VL is the total number of Diamonds. Q1 is the number of cards that are both Kings and Diamonds.

The person, who formed the set of cards, tells us that 70% of the Kings are Diamonds; that 50% of the cards are Diamonds and that 40% of the cards are Kings. Referring to Eq. 1: (1) If 70% of the Kings are Diamonds, then Q1/HT = 0.7. (2) If 50% of the cards are Diamonds, then VL/T = 0.5. (3) If 40% of the cards are Kings, then HT/T = 0.4. The constant of proportionality, (Q1/HT) / (VL/T), equals 0.7/0.5 = 1.4.

The fraction of Diamonds that are Kings, Q1/VL is directly proportional to HT/T, the fraction of all cards that are Kings.

The fraction of Diamonds that are Kings = (.7/.5) * the fraction of all cards that are Kings.
Q1/VL = (.7/.5) * (H/T)

The fraction of Diamonds that are Kings = (1.4) * 0.4 = 0.56 = 56%
Q1/VL = 56%

Verbalization of Bayes’ Theorem

In the illustration, common jargon would state that the prior probability of a card’s being a King, HT/T or 40%, is revised to the posterior probability, namely the probability of a King’s being a Diamond, Q1/VL or 56%. However, if HT/T were the given and Q1/VL were calculated, then, based on the same equation, common jargon would have to state that the prior probability of a King’s being a Diamond or 56%, was revised to the posterior probability, namely the probability of a card’s being a King or 40%.

It is easy to fall into the rut of such jargon, if HT/T is thought of as the probability of a generic card’s being a Diamond, and Q1/VL as the probability that a card specified as being a King is a Diamond. It is as if the generic was being replaced by the specific. Such a nuanced inference is not warranted by the mathematics, because the reciprocal relation is equally valid. The reciprocal relationship is given the numerical value of the specific, the numerical value of the generic can be calculated.

Caution

The use of replace and revise in common jargon confuses a displacement based on inequality with a replacement based on equality. Such a displacement of inequality does not elucidate Bayes’ theorem, which is the equality expressed by, Eq. 1.

The criticism of common jargon in this essay does not preclude the successive iteration of an algorithm based on Bayes’ theorem, which could involve a displacement. In such a case, the succeeding iteration uses the specific probability of the prior iteration as its generic probability. The iteration of the algorithm calculates a new specific probability based on some added or omitted characteristic. It thereby calculates a partitioning, i.e. a probability, not of the prior population, but, of a newly limited sub-population.

It should be noted that it is inappropriate and misleading to identify as Bayes’ theorem an algorithm, which iteratively employs Bayes’ theorem, just as it would be inappropriate and misleading to identify as the Pythagorean theorem an algorithm, which iteratively employs the Pythagorean theorem.

Common jargon confuses Bayes’ theorem with its algorithmic iteration.

Bayes’ theorem is a fraction, expressed algebraically in terms of other fractions.

The theorem applies to a set of data that may be tabulated in a two by two format. The data set consists of two rows by two columns. Tabulated data, with more than two rows and/or more than two columns, may be reduced to the two by two format. All rows, but the top row may be combined to form a single, bottom row as the complement of the top row. Similarly, all columns, but the left column may be combined to form a single, right column, as the complement of the left column.

Let the rows be labeled X and non-X. Let the columns be labeled A and non-A. The table presents four quadrants of data. Let the upper left quadrant be identified as (X,A). Let the total of row X be labeled TX, the total of column A be labeled TA and the grand total of the data be labeled T.

The Algebraic Form: Fractions

Bayes’ theorem or Bayes’ equation is,

(X,A) / TA = ((TX / T) * ((X,A) / TX)) / (TA / T) Eq. 1

The validity of Bayes’ equation can easily be demonstrated in that both T and TX cancel out on the right hand side of the equation, leaving the identity, (X,A) / TA ≡ (X,A) / TA

In accord with the fact that (X,A) + (non-X,A) = TA, the denominator, TA / T, is often expressed as,

((TX / T) * ((X,A) / TX) + (((Tnon-X) / T) * ((non-X,A) / (Tnon-X)) Eq. 2

The Verbal Form: Fractions

Verbalizing Eq. 1, we have,
Cell (X,A) as a fraction of Column A equals
(Row X as a fraction of the grand total, times Cell (X,A) as a fraction of row X) divided by column A as a fraction of the grand total.

Eq. 2, the denominator, i.e. column A as a fraction of the grand total, may be expressed as,
(Row X as a fraction of the grand total, times the Cell (X,A) as a fraction of row X) plus
(Row non-X as a fraction of the grand total, times Cell (non-X,A) as a fraction of row non-X)

Replacing the Row, Column and Element Labels

On page 50 of Proving History, Richard Carrier replaces the row, column and element labels. In place of the row labels, X and non-X, he uses ‘true’ and ‘isn’t true’. In place of the column label, A, he uses ‘our’. Instead of referring to the data elements of the table as elements, Carrier refers to them as explanations. The only data in a Bayesian analysis are the elements of the table. Consequently, the only evidence considered in a Bayesian analysis is the data. In Carrier’s terminology, the only data, thus the only ‘evidence’, are the ‘explanations’.

Carrier’s Terminology for the Fractions of Bayes’ Theorem

Probability is the fraction or ratio of a subset with respect to a set. Thus, probability is a synonym for those fractions, which are the ratio of a subset to a set. Each fraction in Bayes’ theorem is a probability, the ratio of a subset to a set.

Accordingly, Carrier uses the word, probability, for the lone fraction on the left hand side of Eq. 1. However, on the right hand side of the equation, he does not use the word, probability. He uses synonyms for probability. He refers to the ratio of probability as ‘how typical’ the subset is with respect to the set. Instead of probability, he also refers to probability as ‘how expected’ the subset is with respect to the set.

Probability and improbability are complements of one, just as the paired subsets in Bayes’ theorem are complements of the set. Thus, the probability of a subset with respect to a set may be referred to as the improbability of the complementary subset. Carrier does not use the expression, improbability. Instead of referring to the improbability of the complementary subset, he refers to ‘how atypical’ is the complementary subset.

Carrier’s Verbalization of Bayes’ Theorem

Left hand side Eq. 1

Adopting Carrier’s terminology, ‘Cell (X,A) as a fraction of Column A’ would be, ‘the probability of our true explanations with respect to our total explanations’. Carrier renders it, ‘the probability our explanation is true’. It is as if probability primarily referred to just one isolated explanation rather than a subset of explanations as a fraction of a set of explanations to which the subset belongs.

The Right Hand Side, Eq. 1, the Numerator

Adopting Carrier’s terminology, the first term of the numerator, ‘Row X as a fraction of the grand total’, would be ‘how typical all true explanations are with respect to total explanations’, i.e. the fraction is TX/T. Carrier renders it ‘how typical our explanation is’. Thus, Carrier would have it to be TA/T, rather than TX/T.

In Carrier’s terminology the second term of the numerator, ‘Cell (X,A) as a fraction of row X’ would be ‘how expected are our true explanations among the set of all true explanations’. Carrier renders it ‘how expected the evidence is, if our explanation is true’. The evidence, i.e. the data, that our explanations are true, is Cell (X,A). Carrier’s rendition is thus, ‘how expected are our true explanations among the set of our true explanations’. That would be the ratio, Cell (X,A) / Cell (X,A), and not Cell (X,A) / TX.

The Right Hand Side, Eq. 1, the Denominator as Eq. 2

The first two terms of Eq. 2 are the same as the numerator of Eq. 1. Thus, there are only two more terms to be considered, namely the two terms of Eq. 2, after the ‘plus’. The first is ‘Row non-X as a fraction of the grand total’. Adopting Carrier’s terminology, this would be, ‘‘how atypical true explanations are with respect to total explanations’, i.e. the fraction is (Tnon-X)/T, which is the improbability (i.e. the atypicality) of TX/T. Carrier renders it ‘how atypical our explanation is’. Carrier would have it to be (Tnon-A)/T, which is the improbability of TA/T, rather than the improbability of TX/T.

The other term is ‘Cell (non-X,A) as a fraction of row non-X’. Adopting Carrier’s terminology, this would be, ‘how expected are our non-true explanations among the set of all non-true explanations’. Carrier renders it, ‘how expected the evidence is, if our explanation isn’t true’. The evidence, i.e. the data, that our explanations aren’t true, is Cell (non-X,A). Carrier’s rendition is thus, ‘how expected are our non-true explanations among the set of our non-true explanations’. That would be the ratio, Cell (non-X,A) / Cell (non-X,A), and not Cell (non-X,A) / Tnon-X.

Valid, but Obscurant

Each fraction in Bayes’ theorem is a fraction, which may be expressed as a probability, but also as an improbability or an atypicality. For a Bayesian tabulation of explanations, where the top row is true and the left column is our, Bayes’ theorem is the probability of true explanations among our explanations. It is also the atypicality or the improbability of non-true explanations among our explanations. However, the words, atypicality and improbability can obscure rather than elucidate the meaning of Bayes’ theorem.

Conclusion

Bayes’ theorem can be verbalized using much of Carrier’s terminology including, probability, our, explanations, true, typical, expected and atypical. However, Carrier’s actual use of his terminology does not merely obscure, but totally obliterates the algebraic and intentional meaning of Bayes’ theorem.

On page 58 of Proving History, Richard C. Carrier states,

“So even if there was only a 1% chance that such a claim would turn out to be true, that is a prior probability of merely 0.01, the evidence in this case (e1983) would entail a final probability of at least 99.9% that this particular claim is nevertheless true. . . . Thus, even extremely low prior probabilities can be overcome with adequate evidence.”

The tabulated population data implied by Carrier’s numerical calculation, which uses Bayes’ theorem, is of the form:

Bayes’ theorem permits the calculation of Cell(X,A) / Col A by the formula,

((Row X / Total Sum) * (Cell(X,A) / Row X)) / (Col A / Total Sum)

The numerical values, listed within the equations on page 58, imply,

From these, the remaining values of the table can be determined as,

Carrier’s application of Bayes’ theorem in calculating the final probability and in identifying the prior probability are straight forward and without error.

How Error Slips In

In Bayesian jargon the ‘prior’ probability of X is the Sum of Row X divided by the Total Sum. It is 0.01 or 1%. The final probability or more commonly the consequent or posterior probability is the probability of X based solely on Column A, completely ignoring Column B. The probability of X, considering only Column A, is 0.01/0.0100099 or 99.9%. One may call this the final probability, the consequent probability, the posterior probability or anything else one pleases, but to pretend it is something other than based on a scope, exclusionary of Column B, is foolishness. It is in no sense ‘the overcoming of a low prior probability with sufficient evidence’ unless one is willing to claim that the proverbial ostrich by putting its head to the sand has a better view of its surroundings by restricting the scope of its view to the sand.

The way this foolishness comes about is this. The prior probability is defined as the probability that ‘this’ element is a member of the subpopulation X, simply because it is a member of the overall population. The consequent or posterior probability (or as Carrier says, the final probability) is the probability consequent or posterior to identifying the element, no longer as merely a generic member of the overall population, but now identifying it as an element of subpopulation A. The probability calculated by Bayes’ theorem is that of sub-subpopulation, Cell(X,A), as a fraction of subpopulation A, thereby having nothing directly to do with Column B or the total population. In Bayesian jargon we say the prior probability of X of 1% is revised to the probability of X of 99.9%, posterior to the observation that ‘this element’ is a member of the subpopulation A and not merely a generic member of the overall population.

Clarification of the Terminology

The terminology, ‘prior probability’ and ‘posterior probability’, refers to before and after the restriction of the scope of consideration from a population to a subpopulation. The population is one which is divided into subsets by two independent criteria. This classifies the population into subsets which may be displayed in a rectangular tabulation. One criterion identifies the rows. The second criterion identifies the columns of the table. Each member of the population belongs to one and only one of the cells of the tabulation, where a cell is a subset identified by a row and a column.

An Example

A good example of such a population would be the students of a high school. Let the first criterion, identify two rows, those who ate oatmeal for breakfast this morning and those who did not. The second criterion, which identifies the columns will be the four classes, freshmen, sophomores, juniors and seniors. Notice that the sum of the subsets of each criterion is the total population. In other words, the subsets of each criterion are complements forming the population.

In the high school example, the prior probability is the fraction of the students of the entire high school who ate oatmeal for breakfast. The prior is the scope of consideration before we restrict that scope to one of the subsets of the second criterion. Let that subset of the second criterion be the sophomore class. We restrict our scope from the entire high school down to the sophomore class. The posterior probability is the fraction of sophomores who ate oatmeal for breakfast. Notice the posterior probability eliminates from consideration the freshmen, junior and senior classes. They are irrelevant to the posterior fraction.

In Bayesian Jargon, prior refers to the full scope of the population prior to restricting the scope. Posterior refers to after restricting the scope. The posterior renders anything outside of the restricted scope irrelevant.

In Carrier’s example, the full scope covers all years, prior to restricting that scope to the year, 1983, thereby ignoring all other years. This is parallel to the high school example, where the full scope covers all class years, prior to restricting that scope to the class year, sophomores, thereby ignoring all other class years.

By some quirk let it be that 75% of the sophomore class ate oatmeal for breakfast, but none of the students of the other three classes did so. Let the four class sizes be equal. We would then say, ala Carrier, “The low prior probability (18.75%) of the truth that a student ate oatmeal for breakfast, was overcome with adequate evidence, so that the final probability of the truth that a sophomore student ate oatmeal for breakfast was 75%.” Note that this ‘adequate evidence’ consists in ignoring any evidence concerning the freshmen, juniors and seniors, which evidence was considered in determining the prior.

This conclusion of ‘adequate evidence’ contrasts a fraction based on a full scope of the population, ‘the prior’, to a fraction based on a restricted scope of the population, ‘the final’. The final does not consider further evidence. The final simply ignores everything about the population outside the restricted scope.

Prejudice as a Better Jargon

A more lucid conclusion, based on the restriction of scope, may be made in terms of prejudice. The following conclusion adopts the terminology of prejudice. It is based on the same data used in the discussion above.

Knowledge of the fraction of students in this high school, who ate oatmeal, serves as the basis for our prejudging ‘this’ high school student. We know the prior probability of the truth that ‘this’ student is ‘one of them’, i.e. those who ate oatmeal for breakfast, is 18.75%. Upon further review, in noting that ‘this’ student is a sophomore, we can hone our prejudice by restricting it in scope to the sophomore class. We can now restrict the scope upon which our original prejudice was based, by ignoring all of the other subsets of the population, but the sophomore class. We now know the final probability of the truth of our prejudice that ‘this’ student is ‘one of them’ is 75%, based on his belonging to the sophomore class.

This is what Carrier is doing. His prior is the prejudice, i.e. the probability based on all years of the population. His final is the prejudice, which ignores evidence from all years except 1983.

We can now see more clearly what Carrier means by adequate evidence. He means considering only knowledge labeled 1983 and ignoring knowledge from other years. Similarly, adequate evidence to increasing our prejudice that this student ate oatmeal, would mean considering only the knowledge that he is in the sophomore year and ignoring knowledge from other class years. It was the consideration of all years upon which our prior prejudice was based. Similarly it was all years, including 1983, upon which Carrier’s prior prejudice is based.

To form our prior prejudice, we consider the total tabulated count. We restrict the scope of our consideration of the tabulated count to a subset in order to form our final or posterior prejudice.

We refine our prejudice by restricting the scope of its application from the whole population to a named subpopulation. Is this what is conveyed by saying that even a low chance of a statement’s being true can be increased by evidence, or, that the low probability of its truth was overcome by adequate evidence? To me, that is not what is conveyed. From the appellations of truth and evidence, I would infer that more data were being introduced into the tabulation, or at least more of the tabulated data was being considered, rather than that much of the tabulated data was being ignored.

Conclusion

Carrier’s discussion of Bayes’ theorem gives the impression that the final probability of the 1983 data depends intrinsically upon the tabulated data from all the other years. In fact, the data from all the other years are completely extrinsic, i.e. irrelevant, to the final probability of the1983 data. The ‘final’ probability is the ratio of one subset of the 1983 data divided by the set of 1983 data, ignoring all other data.

Probability is the ratio of a subpopulation of data to a population of data. In Carrier’s discussion, the population of his ‘prior’ is the entire data set. The population of his ‘final’ is solely the 1983 data, ignoring all else. He is not evaluating the 1983 data, or any sub-portion of it, in light of non-1983 evidence.

One can easily be misled by the jargon of the ‘prior probability’ of ‘the truth’, the ‘final probability’ of ‘the truth’ and ‘adequate evidence’.

In the previous essay, Bayes’ theorem was illustrated in the case of continuous sets. This essay focuses on sets of discrete elements in a tabulated format.

Let a set be visualized as a column of tiers, i.e. a vertical array of subsets. Let the column have the header or marker, A. Let the subsets or tiers have headers or markers X, Y, Z etc. Let the number of elements per subset or cell of the vertical array be Cell(i), where i = X, Y, Z etc. The total number of elements in the singular linear dimension is Sum Column A. There are no overlapping subsets, so Bayes’ theorem is inapplicable.

In expanding the set by introducing more columns with markers or headers B, C, D etc., we would then have a two dimensional array of subsets or cells. The array would be one of multiple rows and multiple columns. Each i, would be the header of a row. Each column would be identified by a header or marker, j. The number of elements in each subset or cell of the two dimensional array would be Cell(i,j). The total number of elements in any row, e.g. row D would be designated Sum Row D. This two dimensional array of cells could be extended by one more orthogonal category of IDs or markers within geometry. However, it can be extended to any number of independent categories of IDs or markers algebraically.

Each subset, identified by a specific i and j, is a cell, the overlap of a row and a column. Bayes’ theorem may be applied to such a two dimensional array because of the overlap of rows and columns which form cells each identified by two markers, i and j. For illustrative simplicity we will use a two by two tabulated array,

The Form and Formation of Bayes’ Theorem

Bayes’ theorem depends upon an identity of the following algebraic form.

(R/C) ≡ (R/C)

We can then multiply both sides of the identity by 1, thereby preserving the equality. We multiply the left side numerator and denominator by L/RC and the right side numerator and denominator by 1/T. This yields,

(L / C) / (L / R) = (R / T) / (C / T)

Multiplying both sides by L/R yields,

(L/C) = ((L / R) * (R / T)) / (C/T)

By replacing L with Cell(X,A); C with Sum Col A; R with Sum Row X and T with Total Sum, we have Bayes’ theorem as applied to our illustrative table.

(Cell(X,A) / Sum Col A) =
((Cell(X,A) / Sum Row X) ⃰ (Sum Row X / Total Sum)) / (Sum Col A / Total Sum)     Eq. 1

However, the denominator, Sum Col A / Total Sum, is usually modified to,

((Cell(X,A) / Sum Row X) ⃰ (Sum Row X / Total Sum)) +
((Cell(Y,A) / Sum Row Y) ⃰ (Sum Row Y / Total Sum))

Bayes’ theorem is used to calculate (Cell(x,A) / Sum Col A). However, if we had the data from the table, we would just use these two table values for the calculation, we would not use Bayes’ theorem. We do use Bayes’ theorem when the numerical information we have is limited to the fractions of the right hand side of the equation. For illustration let these numerical values be:

Cell(X,A) / Sum Row X = 0.7857
Sum Row X / Total = 0.7 (N.B. therefore Sum Row Y / Total = 0.3)
Cell(Y,A) / Sum Row Y = 0.3333

From these values, Bayes’ theorem, Eq. 1, with the denominator modified, is

(Cell(X,A) / Sum Col A) =
(0.7857 * 0.7) / ((0.7857 * 0.7) + (0.3333 * 0.3)) = .5499/.64989 ≈ 55/65

From this information we can construct the table as percentages of the Total Sum of 100%, beginning with the four values in bold known from above and 100%

Verbally, Eq. 1 is:
The fraction of column A, that is also identified by row marker X
equals
(the fraction of row X, that is also identified by column marker A)
times
(the fraction of the total set, that is identified by row marker, X)
with this product divided by
(the fraction of the total set, that is identified by column marker A)

Probability is the fractional concentration of an element in a logical set. Consequently, a verbal expression of Eq. 1 is:

The probability of both marker A and marker X with respect to the subset, marker A
equals
(the probability of both marker A and marker X with respect to the subset, marker X )
times
(the overall probability of the marker X)
with this product divided by
(the overall probability of marker A)

However, this is not how it is usually expressed.

In one instance of common jargon, Bayes’ theorem is expressed as:
Given the truth of A, the truth of belief X is credible to the degree, which can be calculated by Bayes’ theorem.

Another expression in common jargon is:
Bayes’ theorem expresses the probability of X, posterior to the observation of A, in contrast to the probability of X prior to the observation of A. In other words, the prior probability of X, which was 70% is revised to 55%, due to the observation of A.

Another expression is: The Bayesian inference derives the posterior probability of 55% as a consequence of two antecedents, the prior probability of 70% and the likelihood function, which numerically is 0.7857.

The Bayesian inference is also viewed as the likelihood of event X, given the observation of event A. The inference is based on three priors. The priors are the probability of event A given event X, 55/70 as 78.57%, the probability of event X, 70%, and the probability of event A, 65%.

Evaluation of Common Jargon

To label A an observed fact of evidence in support of the truth of belief, X is gratuitous, because the meanings of evidence and belief imply extrapolation beyond the context of nominal markers. Philosophical conclusions, e.g. when labeled beliefs, are not nominal Bayesian markers. It is also gratuitous to label elements of sets, ‘events’.

Probability is the fractional concentration of an element in a logical set. The IDs of the elements are purely nominal because none of the characteristics associated with the ID is relevant to probability. The only characteristic of an element that matters within the context of probability is its capacity to be counted.

A Proper View

From a valid Bayesian perspective, some markers of elements of sets are observed, in the example, A and B, while some markers are not observed, in the example, X and Y. Remember that the ID of each element is a pair of markers, one from a row and one from a column. The Bayesian inference provides quantification of the prejudice that an element has one of the unobserved markers, such as X, where the prejudice is based upon observing that this element has one of the observable markers, such as A.

Bayesian inference is the quantification of prejudice, not the provision of evidence of the truth of a verbal belief.

Such quantification of prejudice is useful in making prudential decisions, e.g. in industry where the past performances (X, Y etc.) of a variety of material processing methods (A, B etc) serve as the basis of predicting their future performance. There are a variety of other areas in which Bayesian analysis may be incorporated into algorithms for reaching decisions. Of course, prudence is not the determination of truth, but the selection of a good course of action to achieve a goal.

The Bayesian quantification of prejudice can be harmful in social and employment settings.

A Contrast of ‘Truth’ Jargon vs. ‘Prejudice’ Jargon

Using the table for a numerical example, focused on Cell(X,A):

In common jargon, the Bayesian inference is: Given the truth of observation, A, the probability of unobserved belief, X, is revised from the prior value for the probability of belief X, which was 70%, to the posterior probability that belief X is true, given the truth of A. The posterior probability is 55%.

To more clearly elucidate the Bayesian inference, I prefer the jargon: The observation of marker A prejudices the presence of the unobserved marker, X, at a quantitative level of 55%. If the presence of A is not specified, the probability of marker X for the population as a whole = 70%.

A Further Critique of the Jargon of the Truth of Observation Leading to the Truth of Belief

Bayes’ theorem applies to a population of elements in which each element is identified by two markers, one from each of two categories of markers. The first category of markers are the IDs of the rows of a rectangular display of the elements of the population. The second category of markers are the IDs of the columns of the rectangular display. In such a rectangular or orthogonal distribution of a population, the elements with respect to the one category of markers is independent of the distribution of the elements with respect to the other category of markers.

Bayes’ theorem expresses the probability of one marker of category one, i.e. a row marker, with respect to the entire column of a given marker of category two, i.e. a column marker.

The probabilities summed, row plus row, within a column equal the probability of the marker of that column. In other words, as is typical of probability, the probabilities of the rows within a column are supplements forming the probability of the column as a whole. Consequently, the marker of one row cannot be the antithesis of the marker of another row. Subsets which add to form a column must be compatible as parts of a whole.

Complements are of the forms, ‘Some are red’ and ‘Some are non-red’. Their sum is the whole. The row markers of a population to which Bayes’ theorem can be applied, may be just two. However, the row markers of such a population may be any number, the sum of which within a column is the complement of that column. Just as the sum of the elements row by row within a column are the complement of the column, the same is true of the probabilities. Likewise, the sum of the column probabilities within a row equal the probability of the row.

No row marker in an orthogonally identified population to which Bayes’ theorem is applied, may be viewed as the antithesis of another row marker. They may be nominally antithetical as true and false. However, in the context of Bayes’ theorem, subsets labelled true and false must be compatible as complements.

The rows of an orthogonal population distribution within a column are complementary, as the parts of the column as a whole. The rows of an orthogonal population distribution overall are the complementary parts of the whole population.

From this perspective, identifying a column marker as true and thereby leading to a judgment of the degree of certitude of the truth of a row marker, as a belief, is misleading to say the least. It confounds mathematical probability with probability in the sense of human certitude.

Mathematical probability is the fractional concentration of an element in a logical set. Probability as human certitude is a quality characterizing one’s own subjective judgment even when one employs a quasi-quantitative value to express his subjectivity.

In contrast, for a given element of a population, a column marker may be said to be observed in the element and thereby a Bayesian calculation may be said to determine the degree of prejudice of the presence of an unobserved row marker in that element.

Note: What I have mistakenly called contradictories in this essay, are, in classical logic, called contraries (every vs. none). What I have called contraries, are classically called sub-contraries (some are vs. some are not).

Given a set, S, having a subset A and a subset X, then the overlap, OVL, of A by X equals the overlap of X by A. This is simply a statement of identity. Consequently,

(OVL/A) / (OVL/X) = X/A

Also,

(X/S) / (A/S) = X/A

Therefore,

(OVL/A) / (OVL/X) = (X/S) / (A/S)

Or

OVL/A = ((X/S) / (A/S)) x OVL/X

Rearranging,

OVL/A = (X/S) x ((OVL/X) / (A/S)) Equation 1

We may verbally designate:

OVL/A as the fraction of A that is (also) X

OVL/X as the fraction of X that is (also) A

A/S as the fraction of S that is A

X/S as the fraction of S that is X

By definition, probability is the fractional concentration of an element in a logical set. Therefore,

A/S is the probability of A with respect to S

X/S is the probability of X with respect to S

OVL/A is the probability of X with respect to A

OVL/X is the probability of A with respect to X

If we drop the ‘respect to S’, because S is the full set under consideration, Equation 1 is verbally,

The probability of X with respect to A equals the probability of X times a ratio, namely, the probability of A with respect to X divided by the probability of A.

This is Bayes’ theorem.

However, the verbiage is typically changed when it is noted to be Bayes’ theorem.

The calculated probability of X with respect to A, i.e. OVL/A, is said to be the probability of X posterior to the calculation. In jargon, it is said to be the probability of the ‘event’ of X given the ‘truth’ of ‘event’ A.

The probability, X/S, is said to be the probability of X prior to the calculation or simply the prior. The calculation is based on another factor, other than the prior. This factor or ‘antecedent’ is said to be the likelihood function, (OVL/X)/(A/S). This likelihood function is the probability of A with respect to X divided by the probability of A.

The jargon renders Equation 1 as: Given the prior probabilities, A and X, and given the probability of A with respect to X, then we can calculate the probability of the event X, when A is true.

An Example, Without the Jargon

Let the fraction of persons with Native ancestry as a fraction of a population, X/S, be known to be 2 per million.
Let the fraction of persons with high cheekbones as a fraction of the population, A/S, be known to be 3 per million.
Let the fraction of persons with Native ancestry that also have high cheekbones, OVL/X be 90 per 100.
By Eq. 1 we can calculate the fraction of persons with high cheekbones that also are of Native ancestry, OVL/A.
It is:
OVL/A = (2/10^6) x (0.9/(3/10^6)) = 0.6 or 60%

In this population, the fraction of persons with high cheekbones, who are also of Native ancestry, is 60%. Fractional concentration is the definition of probability. Therefore, we may say that, for this population, the probability that a person with high cheekbones is of Native ancestry is 60%.

The Conclusion in Jargon and Its Implication

The fact that OVL/A as a fractional concentration is thereby a probability, i.e. a fraction of a logical set, we can lose our way due to the use of jargon.

In jargon, we may say that the calculation, OVL/A, represents the certitude of the ‘truth’ of X when we know A is a fact.

In such jargon, we might make the statement, ‘Person A from this population, who has high cheekbones, is of Native ancestry’ is true with a certitude of 60%. We might think that this implies that we have assessed the truth of the statement, ‘Person A is of Native ancestry’.

Furthermore, we might infer that the truth of whether Person A is or is not of Native ancestry is determined by population data. Then, from that inference we might extrapolate that the determination of truth, in general, is based on population data, i.e. on the identification of subsets.

Another example

Let X/S, the fraction of canines that are coyotes in a rural county of New York State, be 0.05% and A/S, the fraction of canines which are gray, be 1%. Further, let OVL/X, the fraction of coyotes which are gray, be 100%. Then, OVL/A, the fraction of gray canines which are coyotes, is 5%.

OVL/A = (X/S) x ((OVL/X) / (A/S)) = (0.05%) x (100% / 1%) = 5%

Probability is the fractional concentration of an element in a logical set. Therefore the probability of coyotes in the set of gray canines is OVL/A = 5%

The Popular Interpretation of Bayesian Inference

The calculation of the fractional overlap of a subset X onto a subset A by Bayes’ theorem is popularly said to be the determination of the truth of X given that A is a fact.

The popular expression of Bayesian inference implies that it assesses the truth of a ‘belief’, X, based on some known fact, A.

Assessment of the Popular Interpretation

The popular expression spurns direct assessment of the ‘belief’. This is because it is not a belief at all. It is a second marker, X, by which some elements of a set, S, are identified in addition to another marker, A. The Bayesian variable calculated is simply the fraction of the set A, which possesses the marker, X, in addition to the marker, A.

In the coyote example, S is the set of canines, X is the subset of coyotes, A is the subset of gray canines, and OVL is the overlap of the subsets, X and A.

The popular interpretation would be: The numerical example does not support the ‘belief’ that ‘this canine, which is known to be gray, is a coyote’. That ‘belief’ would be ‘true’ in only 5% of instances of observing a gray canine.

A Comparable Interpretation

The calculation of the fractional overlap of a subset X onto a subset A by Bayes’ theorem is the quantification of prejudice.

The calculation quantifies the validity of the prejudice that characteristic or marker, X, is possessed by an element because that element has been identified as possessing characteristic or marker, A. In the numerical example, based solely on the observation or knowledge that a canine was gray, we would be prejudiced against its being a coyote at a level of 95%.

Probability is defined in mathematics in the context of discrete elements in sets. However, it can be transitioned into an analogous definition in continuous mathematics. Thirdly, it can be represented as a meld of discrete and continuous concepts as a continuous probability function.

The Discrete Context

In discrete mathematics, probability is the fractional concentration of an element in a logical set. It is the ratio of the quantity of elements of the same ID to the total number of elements in the set. If the numerator is zero, the probability is zero. The probability is never negative because the numerator is never negative and the denominator is a minimum of one. Probability reaches a maximum of one, where the set of elements is homogeneous in ID. Probability can have any value from one to zero, because the denominator can be increased to any positive integer. Thus, probability in its discrete definition is itself a continuous variable, having a range of zero to one.

A probability and its improbability are complements of one.

The Continuous Context

Probability is a fraction of a whole set of discrete elements. If, however, we define a whole as continuous, we can then define probability in a continuous context analogous to its definition in a discrete context. One example is had in identifying the area of a circle as a continuous whole and identifying segments of area using different IDs, such as in a pie chart. Another example is that in statistics where the whole is defined as the area under the normal curve. Probability is then a fraction of the area under the curve.

Melding the Discrete and Continuous as a Probability Function

The simplest set of discrete elements is that in which all the elements have the same ID. The next simplest is that in which the elements are either of two IDs, where the quantities of the elements of each ID are equal. Thus, the probability of each element is one-half and the improbability of each is one-half.

If we choose a continuous function which oscillates between two extremes, we could associate the one extreme with an ID and the other extreme with a second ID. We could view the first ID as having a probability of one at the first extreme and having a probability of zero at the other extreme. We would thus be viewing the function as a probability, which transitions continuously through the intermediate values as it cycles between a probability of one and a probability of zero, i.e. between the two IDs.

At this second extreme, the second ID has a probability of one, which is also the improbability of the first ID.

A visual example would be a rotating line segment oscillating at a constant angular velocity between a horizontal orientation as one ID and a vertical orientation as the second ID. The continuous equation for this would be the cos^2 (α). The function oscillates from the horizontal or from a probability of one at α = 0 degrees to the vertical or to a probability of zero at α = 90 degrees. At α = 180 degrees, it is horizontal again with a probability of one. The probability goes to zero at 270 degrees and back to one at α = 360 degrees. The intermediate values of the function are transient values of probability forming the cycle from one to zero to one and back again from one to zero to one.

The improbability of horizontal, namely the probability of vertical, would be the sin^2 (α). The probability of horizontal plus its improbability equals one. Thusly, cos^2 (α) + sin^2 (α) = 1.

The Flip of a Coin

The score was tied at the end of regular time in the NFC Championship Game in 2016 between the Green Bay Packers and the Arizona Cardinals. This required a coin toss between heads and tails to determine which team would receive the ball to start the overtime. However, in the first toss, the coin didn’t rotate about its diameter. The coin didn’t flip. Therefore the coin was tossed a second time.

If, rather than visualizing a line segment, we envision a coin rotating at a constant angular velocity, we wouldn’t choose horizontal vs. vertical as the probability and improbability, because we wish to distinguish one horizontal orientation, heads, from its flipped horizontal orientation, tails.

A suitable continuous function of probability, P, oscillating between a value of one or heads at the horizontal α = 0 degrees and a value of zero, or tails at the horizontal α = 180 degrees, would be
P = [(1/2) × cos (α)] + (1/2), where the angular velocity is constant.

The probability of tails is 1 – P = (1/2) – [(1/2) × cos (α)]

The probability of heads and the probability of tails are both one-half at α = 90 degrees and α = 270 degrees.

The probability of heads plus its improbability, which is the probability of tails, is one

Whether we visualize these functions, the one as oscillating between horizontal and vertical and the other as oscillating between heads and tails, the functions are waves.

We are thus visualizing a probability function as a wave oscillating continuously between a probability of one and zero. The probability is the fraction of the maximum magnitude of the wave as a function of α.

An Unrelated Meaning of Probability

We use the word, probability, to designate our lack of certitude of the truth or falsity of a proposition. This meaning of probability, reflects the quality of our human judgment, designating that judgment as personal opinion, rather than a judgment of certitude of the truth. This meaning of probability has nothing to do with mathematical probability, which is the fraction of an element in a logical set or, by extension, the fraction of a continuous whole.

Driven by our love of quantification, we frequently characterize our personal opinion as a fraction of certitude. This, however, itself is a personal or subjective judgment. A common error is to mistake this quantitative description of personal opinion to be the fractional concentration of an element in a mathematical set.

Errors Arising within Material Analogies of Probability

A common error is to identify material analogies or simulations of the mathematics of probability as characteristic of the material entities employed in the analogies. In the mathematics of probability and randomness the IDs of the elements are purely nominal, i.e. they are purely arbitrary. The probability relationships of a set of six elements consisting of three elephants, two saxophones and one electron are identical to those of a set of three watermelons, two paperclips and one marble. This is so because the IDs are purely nominal with respect to the relationships of probability.

In analogy, the purely logical concepts of random mutation and probability are not properties inherent in material entities such as watermelons and snowflakes. This is in contrast to measureable properties, which, as the subject of science, are inherent in material entities.

The jargon employed in analogies of the mathematical concepts also leads us to confuse logical relationships among mathematical concepts with the properties of material entities. In the roll of dice we say the probability of the outcome of boxcars is 1/36. We think of the result of the roll as a material event, which becomes a probability of one or zero after the roll, while it was a probability of 1/36 prior to the roll of the dice. In fact, the outcome of the roll had nothing to do with probability and everything to do with the forces to which the dice were subjected in being rolled. The analogy to mathematical probability is just that, a visual simulation of purely logical relationships.

We are also tempted to think of the probability 1/36 as the potential of boxcars to come into existence, which after the roll is now in existence at an actuality of one, or non-existent as a probability of zero. In this, we confuse thought with reality. Probability relationships are solely logical relationships among purely logical elements designated by nominal IDs. Material relationships are those among real entities, whose natures determine their properties as potential and as in act.

Quantum Mechanics

In quantum mechanics, it is useful to treat energy as continuous waves in some instances and as discrete quanta in others. It is useful to view the wave as a probability function and the detection or lack of detection of a quantum of energy as the probability function’s collapse into a probability of one or of zero, respectively.

An Illustration

As an aid to illustrate the relationship of a probability function as a wave and its outcome as one or zero in quantum mechanics, Physicist, Stephen Barr, proposed the analogy,

“This is where the problem begins. It is a paradoxical (but entirely logical) fact that a probability only makes sense if it is the probability of something definite. For example, to say that Jane has a 70% chance of passing the French exam only means something if at some point she takes the exam and gets a definite grade. At that point, the probability of her passing no longer remains 70%, but suddenly jumps to 100% (if she passes) or 0% (if she fails). In other words, probabilities of events that lie in between 0 and 100% must at some point jump to 0 or 100% or else they meant nothing in the first place.”

Problems with the Illustration

The illustration fails to distinguish the purely logical relationships of mathematical probability from the existential relationships among the measurable properties of material entities. The illustration identifies probabilities as being of events rather than identifying probabilities as logical relationships among purely logical entities designated by nominal IDs. It claims that probability must transition from potency to act or it is undefined. In contrast, probability is the fractional concentration of an element in a logical set. The definition has nothing to do with real entities, whose natures have potency and are expressed in act.

Another fault of the illustration is that it is not an illustration of mathematical probability, but an illustration of probability in the sense of personal opinion. Some unidentified individual is of the opinion that Jane will probably pass the French exam. The unidentified individual lacks human certitude of the truth of the proposition that Jane will pass and uses a tag of 70% to express his personal opinion in a more colorful manner.

It is a serious error to pick an example of personal opinion to illustrate a wave function, viewed as a probability function. A wave function, such as that associated with the flip of a coin oscillating between heads as a probability of one and tails as a probability of zero, would have served the purpose well.

Of course, a wave, viewed as a probability function, is not the probability of an event. It is the continuous variable, probability, whose value oscillates between one and zero, and as such assumes these and the intermediate values of probability transiently. The additional condition is that when the oscillation is arrested, the wave collapses to either of the discrete values, one and zero, the presence or absence of a quantum. The collapse is the transition of logical state from one of continuity to one of discreteness.

Synonym or Antonym

Algorithmic and systematic are synonyms.

We typically think of systematic and random as antonyms, namely as ordered and non-ordered. Consistently, we superficially think of random selection as non-systematic mutation.

Yet, in the mathematics of sets, defining the process of random mutation is algorithmic. The primary conclusion to be demonstrated in this essay is: In the mathematics of sets, random mutation is algorithmically defined, i.e. it is systematic / ordered selection.

A set may be defined by the complement of its probabilities. Probability is the fractional concentration of an element in a logical set. Two sets having the same complement of probabilities may differ from one another by an integral multiple of their elements.

The simplest set defined in terms of its complement of probabilities is a set of one unique element. The probability of that element is one. The next simplest set consists of two unique elements for which the complement of probabilities consists of one-half and one-half.

In defining new sets based on the probabilities of a source set, the probabilities of the source set are retained, but the sets differ in that the ‘elements’ of the derived set are actually subsets composed of the elements of the source set.

The derived set, in its complement of probabilities, is derived by an algorithm, which identifies by ordered mutation, the contents of its subsets, where its subsets are viewed as the elements of the derived set.

In illustration, this is all very simple

Consider a source set of two unique elements, Heads and Tails. One set derived from the source set is a set consisting of subsets of three elements, which are ‘randomly selected’ from the source set. By this algorithm of ‘random mutation’ eight subsets are defined, if sequence is retained in the subsets. These eight subsets, which comprise the derived set, are: H,H,H; H,H,T; H,T,H; H,T,T; T,H,H; T,H,T; T,T,H and T,T,T. The complement of probabilities of the derived set are eight probabilities of 1/8 each.

If the algorithm of derivation does not retain sequence, then the derived set would contain only four unique subsets of three elements each. These would be one subset of three Heads, two subsets of two Heads and one Tail, two subsets of one Head and two Tails and one subset of three Tails. That is a total of six subsets, which are deemed the elements of the derived set. In this derived set, the complement of probabilities would be 1/6, 1/3, 1/3, and 1/6, respectively.

One Objection and Its Clarification

Suppose it is objected that a probability can be calculated on the basis of random selection from a source set without advertence to the composition of a derived set. For example, the probability of selecting H,T,H from the source set is (1/2) × (1/2) ×(1/2) = 1/8. It would seem that the probability is independent of a derived set. However, by definition probability is the fractional concentration of an element in a logical set. Consequently, the probability of 1/8 has meaning only in the context of a set of eight elements derived from the source set. This objection is based on shrinking the focus of attention, on not seeing the big picture.

Note that in any derivation of a new set from a source set by random mutation, if subsets are not viewed as elements, but the elements of the derived set are identified exactly as they are in the source set, then the complement of probabilities of the derived set is identical to the complement of probabilities of the source set. From this perspective the derived set is simply an integral multiple of the source set. In the derived set illustrated, there are a total of 12 Heads and 12 Tails. The probabilities of Heads and of Tails in both the source set and the derived set are one-half and one-half.

Another Objection and its Clarification

But doesn’t viewing random selection or mutation as a fully systematic algorithm contradict our common conception of random as non-ordered? Not really. Rather, it requires us to be more cognizant of what we mean by random.

Consider the conventional sequence of the ten Arabic numerical symbols: 0,1,2,3,4,5,6,7,8,9. We would consider that sequence and Sequences I through III as non-random. Sequence I: 0,2,4,6,8,1,3,5,7,9. Sequence II: 0,9,8,7,6,5,4,3,2,1. Sequence III: 8,6,4,2,0,1,3,5,7,9. However, we consider these four sequences non-random only because it is easy for us to impute a pattern to them. There are 10! = 3,628,800 different sequences of ten symbols. Our intellectual capacity is such that we cannot be equally familiar with each sequence to see each of them on a par with the others. It is only by familiarity with a convention which renders the sequences of this paragraph ‘ordered’ and roughly three million others ‘random’. In other words, our labelling anything as ‘random’ does not designate that which is labelled as non-ordered. The label, ‘random’, designates a limitation in our knowledge. It designates our ignorance. The lack of order or randomness is in our knowledge of the subject, not it the subject itself.

Each of the 3,628,800 different sequences of ten arbitrary symbols is an ordered sequence. It is only be defining an arbitrary conventional sequence that one or more of the sequences can be considered ordered and the others as non-ordered. In imposing such a convention, the symbols are no longer arbitrary, but are defined in relation to one another.

Other Sources of Confusion

In the mathematics of probability and randomness the IDs of the elements are purely nominal, i.e. they are purely arbitrary. The probability relationships of a set of six elements consisting of three elephants, two saxophones and one electron are identical to those of a set of three watermelons, two paperclips and one marble.

The fact that probability and randomness are purely logical concepts is befogged by the jargon employed in material emulations. We identify the probability as 1/4 for the top card’s being a diamond after shuffling a deck of cards. Also, the outcome of a diamond in two successive trials is 1/16. It appears as if probability characterizes an event or an outcome. We think that this probability has nothing to do with deriving a new set from a source set. We are oblivious to the fact that two diamonds, or two D’s is one out of sixteen subsets of a set derived from a source set of four elements, such as Spades, Hearts, Diamonds, Clubs or S,H,D,C. The sixteen subsets or elements of the derived set are: SS, SH, SD, SC; HS, HH, HD, HC; DS, DH, DD, DC; CS, CH, CD, CC. The fractional concentration of DD in this set is 1/16. Absent this logical set of sixteen elements, to say that the probability of the outcome of two diamonds in succession is 1/16 would be meaningless. Probability is the fractional concentration of an element in a logical set. It is not an event or outcome.

In material emulations of mathematical probability, a material outcome is truly an event, all by itself, in its materiality. The confusion arises because we tend to characterize the material event by the numerical value of probability, when probability is meaningful only in reference to a logical set and not to anything material. The analog, which is the basis for the material emulation of the logical concept, equates randomness with the absence of human knowledge of the material causes of the material outcome. For example, we are ignorant of the forces which determine the outcome of a coin flip and label the outcome random.

The fact that probability is meaningful only in reference to a fully defined logical set prompts the error of thinking that in material analogies the material set must exist materially to validate the analogy. For the material analogy, where the sequence of a deck of cards after shuffling represents a probability of one in 52! = 8.06 × 10^ 67, that many decks of playing cards cannot possibly have material existence. This error, of transferring a logical requirement into a material requirement to validate a material analogy, leads to the proposed existence of a multiverse to explain, e.g. the probability of the physical constants of the universe.

The fog, which diminishes our ability to see the concept of mathematical probability clearly, is compounded by the fact that the word, probability, has a meaning entirely apart from mathematics. In this other meaning, probability is qualitative. This other meaning of probability is the certitude with which one judges a proposition to be true. Human certitude of the truth of a proposition has nothing to do with the fractional concentrations of elements in logical sets.

Conclusion

In mathematics, probability is the fractional concentration of an element in a logical set. Therefore, random mutation is systematic selection in spite of (1) jargon, which deflects or shrinks the focus of our attention and (2) the non-mathematical use of the word, probability, which refers to the quality of human certitude.

Logic requires a fully defined logical set in order to specify a numerical value of probability as a fraction of that set. In a material analogy, this does not necessitate the material existence of a set, but only its logical definition, in order to specify a numerical value of probability. Such analogies are emulations of the logical concepts.

In analogy, the purely logical concepts of random mutation and probability are not properties inherent in material entities such as watermelons and snowflakes. This is in contrast to measureable properties, which, as the subject of science, are inherent in material entities.