# Critique

The simplest population or set of elements, to which Bayes’ Theorem applies, is one in which the set is divided into two subsets by each of two independent criteria or markers. One marker may be viewed as dividing the set into two rows, while the other marker divides the set into two columns. This results in the set of elements being divided into four quadrants or cells, where each cell is identified by a row marker and a column marker. Each cell is a subset of the total set.

In addition to the four subsets of the total set, which subsets are identified by row and column, another four subsets of the total set are identifiable. Two of these subsets are the sum of each row. The other two subsets are the sum of each column.

In Bayes’ theorem, the row marker is viewed as identifying one row as the Row, and the other row as non-Row. Similarly, the column marker is viewed as identifying one column as the Column, and the other column as non-Column. The specificity of each cell is by its row and column. One specific example from the following table is the cell, Cell(Row, non-Column) or Cell(true, non-our).

Probability is the ratio of a subset to a set. Consequently, due to the Bayesian divisions of the set of elements, sixteen probabilities may be identified:
Four probabilities are the ratios of each of the four cells as a subset to the total set.
Four probabilities are the ratios of each of the four cells as a subset to its row sum as a set.
Four probabilities are the ratios of each of the four cells as a subset to its column sum as a set.
Two probabilities are the ratios of each row sum as a subset to the total set.
Two probabilities are the ratios of each column sum as a subset to the total set.

Bayes’ theorem concerns only four of these ratios or probabilities. Bayes’ theorem is an equation for one of these probabilities as equaling an algebraic expression involving three of the other probabilities.

The subset for which Bayes’ theorem is the focus is Cell(Row, Column). The set of focus is Column Sum. The probability of focus is their ratio of subset to set, namely

Cell(Row, Column) / Column Sum.

Let us call this Probability One. Bayes’ theorem permits the numerical calculation of Probability One, if the numerical values of three other probabilities are known. The other three probabilities are:
Probability Two: Row Sum / Total
Probability Three: Cell(Row, Column) / Row Sum
Probability Four: Column Sum / Total

Bayes’ equation is:

Probability One = (Probability Two * Probability Three) / Probability Four

On page 50 of Proving History, Richard Carrier considers a Bayesian population of explanations. The Row is true. The non-Row is non-true. The Column is our. The non-Column is non-our. The four probabilities of Bayes theorem in terms of this particular Bayesian population of elements are:
One: The probability of a true explanation of ours with respect to our total explanations
Two: The probability of a true explanation with respect to total explanations
Three: The probability of a true explanation of ours with respect to total true explanations
Four: The probability of an explanation of ours with respect to total explanations.

Four is, of course, Column Sum / Total. However,

Column Sum = Cell(Row, Column) + Cell(non-Row, Column)

Therefore, Probability Four may expressed as:

Cell(Row, Column) / T + Cell(non-Row, Column) / T Eq. 1

We can multiply and divide the first ratio of Eq. 1 by Row Sum without changing the ratio. This yields:

(Row Sum / Total) * (Cell(Row, Column) / Row Sum)

This is Probability Two times Probability Three.

We can multiply and divide the second ratio of Eq. 1 by non-Row Sum without changing the ratio. This yields:

(non-Row Sum / Total) * (Cell(non-Row, Column) / non-Row Sum).

We now have all of the terminology to verbalize Bayes’ theorem in the manner of Carrier, in his example of page 50 of Proving History.

A lucid verbalization of Bayes’ Theorem for the set of explanations defined by Carrier is:

In contrast, Carrier’s verbalization of Bayes’ Theorem from page 50 of Proving History is obscure:

Probability is the ratio of a subset to a set. When a set is composed of discrete elements such as explanations, the probability is the ratio of the number of elements in the subset to the number of elements in the set.

In the lucid version of Bayes’ theorem, each of the five verbal probabilities is of the form: ‘the probability of an element of a subset with respect to the elements of a set’. This has the same meaning as the form ‘The ratio of the elements of a subset to the elements of a set’. For each of the five probabilities, the subset numerator and the set denominator of the ratio are clearly identified in the lucid verbalization of Bayes’ theorem.

Carrier’s verbalization uses the words, typical and expected, as synonyms for probability and the word, atypical, as a synonym for improbability. Also, the only evidence is the data, the elements of the Bayesian data set. Carrier has identified the elements of the set as explanations. Therefore, Carrier’s verbalization uses the words, evidence and explanations, as synonyms.

In Carrier’s verbalization of Bayes’ theorem, if probability, typical, expected and atypical are replaced with ratio, it is difficult, but not impossible to determine the numerator and the denominator of each ratio specified. However, these ratios are not valid expressions of the probabilities of Bayes’ theorem. Thus, Carrier’s verbalization is invalid.

Consider Carrier’s verbal expressions of the five ratios, each purportedly a probability specified by Bayes’ theorem.

First: “The probability our explanation is true”.
This literally designates the ratio of our total explanations to total true explanations. However, neither our total explanations nor total true explanations is a subset of the other. The ratio verbalized is not a probability.
Also, “The probability our explanation is true” cannot refer to or be a test of a particular explanation of ours, because probability refers to sets of elements, not to a particular element. If the numerical value of Bayes’ equation were 1, then each particular explanation of ours would be true, because all are true. However, this is an inference from a calculated result, not the implication of Bayes’ theorem in its algebraic expression, which accommodates any numerical value from 0 to 1.

Second: “How typical our explanation is”.
This literally designates the ratio of our total explanations to total explanations. Our total explanations is a subset of total explanations. That ratio is Probability Four above. However, Probability Four is the denominator of Bayes’ theorem, not an expression in its numerator as it is in Carrier’s verbalization.

Third: “How expected the evidence is, if our explanation is true”.
As noted, the only evidence is the elements of the data set. The first phrase of this quotation, “How expected the evidence is”, would then read, ‘The probability of an element’. Every probability is the probability of an element of a subset. Thus, “How expected is the evidence” can be replaced with ‘the probability’. We then have ‘the probability, if our explanation is true’. This is of the form, ‘the probability if A is B’. ‘A is B’ means A is a subset of B, which alludes to the probability of A with respect to B. The entire expression, “How expected the evidence is, if our explanation is true”, is, thus, reduced to ‘the probability our explanation is true’, which is identical to the expression of the First consideration. The ratio verbalized is not a probability.

Fourth: “How atypical our explanation is”.
This literally designates the ratio of non-our total explanations to total explanations. This ratio verbalized is a probability.

Fifth: “How expected the evidence is, if our explanation isn’t true”.
In line with the rationale above in the Third and First considerations, this can be reduced to ‘the probability of our total explanations with respect to total non-true explanations’. However, neither our total explanations nor total non-true explanations is a subset of the other. The ratio verbalized is not a probability.

Another thing to note is that in the lucid verbalization, only the second of the probabilities in the numerator of Bayes’ theorem identifies a subset as ‘our’. Yet, in Carrier’s verbalization, each of the two expressions for probability (one referred to as a typicality, the other as an expectation) in the numerator of Bayes’ theorem identify a subset as ‘our’.

There is a marked contrast between the clarity of specification of the five ratios of probability in the lucid verbalization of Bayes’ theorem and the obscurity of the ratios in Carrier’s verbalization. Three out of Carrier’s five verbal ratios are not probabilities.

Bayes’ theorem is not the equation for ‘The probability our explanation is true’, in Carrier’s words. Rather, Bayes’ theorem is the equation for ‘Among our explanations, the probability that an explanation is true’. It is the ratio of our true explanations to our total explanations.

Just as I have previously concluded from a different perspective of his verbalization: “Carrier’s actual use of his terminology does not merely obscure, but totally obliterates the algebraic and intentional meaning of Bayes’ theorem.”

Equation 1 expresses relationships among ratios based on the following table.

a / (a + c) = [a / (a + b)] * [(T) / (a + c)] * [(a + b) / T]             Eq. 1

Equation 1 is Bayes’ theorem, where a, b, c and d individually are positive or zero. The equation would obviously have no utility, if we knew the numerical values of a and c. Its utility lies in the ability to calculate the value of the ratio a/(a + c), without knowing the values of ‘a’ and ‘c’, in the invent that we do know the numerical values of the ratios: a/(a + b), T/(a + c), and (a + b)/T.

In an attempt to employ Bayes’ theorem in a ‘Case Study, the Death of Herod Agrippa’, Raphael Lataster proposed a mathematical problem where (a+b)/T may be considered to be virtually zero because T is much larger than (a+b). This implies that ‘a’ is virtually zero with respect to T. Based on this implication and allegedly on Bayes’ theorem, Lataster argues that if ‘a’ is virtually zero when compared to the size of T, then the ratio, a/(a + c), is also virtually zero. This would be true, if the ratio, (a + b)/T, was an essential factor in the algebraic expression of a/(a + c), i.e. in Eq. 1, Bayes’ Theorem.

Lataster’s argument views Eq. 1 as having the form,

a / (a + c) = Factor1 * Factor2* [(a + b) / T]                          Eq. 2

The argument is that a/(a+c) is virtually zero, because it is the product of factors, one of which is virtually zero, namely (a + b)/T. This would be a valid argument, if T were not a factor within the product, Factor1*Factor2. However, T is such a factor.

Factor1 * Factor2 = {a /[(a + b) * (a + c)]} * (T)                     Eq. 3

Consequently Eq. 1, which is Bayes’ theorem, can be expressed as Eq. 4, in which a/(a+c) is not a function of T and is not a product of which (a+b)/T is a factor.

a / (a + c) = [a / (a + b)] * [1 / (a + c)] * (a + b)                      Eq. 4

Also, from an examination of the table, it is apparent that the values of ‘a’ and ‘c’ in column One are independent of the values of ‘b’ and ‘d’ in column Two. Therefore, the value of a/(a+c) does not depend on the value of T. That a/T is virtually zero, is irrelevant to the value of a/(a+c).

Consequently, in his case study of the death of Herod Agrippa, not only is Lataster’s conclusion, that a/(a+c) is virtually zero, non-sequitur from his premise that a/T is virtually zero, but his argument is not based on Bayes’ theorem.

Notes:
1) In Lataster’s nomenclature, P(h/b) is the ratio, (a+b)/T, in Eq. 1 of this essay.

2) For a critique of Lataster’s expression of Bayes’ theorem, which is that of Richard Carrier, see the essays, “Verbalizing Bayes Theorem” and “Carrier’s Explanation of Bayes’ Theorem is False”.

3) Adopting Lataster’s nomenclature in his verbal expression of Bayes’ Theorem: The table is a table of explanations; Row One is true explanations; Row Two is non-true explanations; Column One is Our explanations and Column Two is other than our explanations.
a / (a + c) = [a / (a + b)] * [(T) / (a + c)] * [(a + b) / T]             Eq. 1
Using Lataster’s nomenclature and verbalizing Eq. 1, i.e. Bayes’ theorem, yields:
Among all of our explanations, ‘(a+c)’, the probability that any one, ‘a’, is true
EQUALS
[among all true explanations, ‘(a+b)’, the probability that any one, ‘a’, is ours]
TIMES
[one over the probability that among all explanations, ‘T’, any one, ‘(a+c)’, is ours]
TIMES
[among all explanations. ‘T’, the probability that any one, ‘(a+b)’, is true]

4) Probability is the ratio of a subset to a set. Bayes’ theorem permits the numerical calculation of the fraction of all our explanations, which are true, or synonymously, the probability that among our explanations any one is true. This probability concerns only ‘our explanations’, namely ‘(a+c)’ and ‘our true explanations’, namely ‘a’. Consequently, this probability is independent of all other explanations in the total set, because the quantities, ‘a’ and ‘c’, are independent of all other explanations in the set. In contrast, Lataster claims that he is determining the validity of our explanations on the basis of other explanations in the set, which claim is a contradiction of Bayes’ theorem. See Bayes, Baseball and Bowling, which illustrates the parallel error of claiming that the validity of bowling scores can be determined on the basis of baseball scores using Bayes’ theorem.

In the Dawkins-Pell debate of April, 2012, Cardinal Pell asked, “Could you explain what non-random means?” Richard Dawkins replied, “Yes, of course, I could. It’s my life’s work. There is random genetic variation and non-random survival and non-random reproduction. . . . That is quintessentially non-random. . . . Darwinian evolution is a non-random process. . . . It is the opposite of a random process.”

Obviously, Dawkins did not explain the meaning of non-random or random. He merely gave examples of algorithmic processes labeled as one or the other. In the Darwinian scheme, a selection ratio, when applied to generation, is labeled random. That same numerical selection ratio, when applied to survival, is labeled non-random. When a mathematical algorithm or selection ratio is labeled random or non-random, as in the Darwinian scheme of random genetic variation and non-random survival, what do random and non-random mean?

#### The Meaning of (Non-) Random

In the context of mathematics, the terms random and non-random refer to the members of a set, in those instances where the set is defined as composed of equivalent members. (Note: a member may be a subset or a single element). When a member is identified with respect to its generic equivalency to the other members of the set, it is designated as random. Each member, when identified in its specificity, is designated as non-random. The mathematical distinction of random and non-random is the distinction between the genus and the species of a member of a set.

Mutation or selection in this context uses a variable to represent a member of the set. Mutation refers to a succession of variations of the variable. If the algorithm of variation identifies the members generically, the algorithmic process of variation is designated as random mutation or random selection. If the algorithm of variation identifies the members by their specificity, the algorithmic process is designated as non-random mutation or selection. Both algorithms are systematic, i.e. ordered.

#### The Definitions

In a set of generically equivalent members, random identifies a member in its generic membership in the set. In a set of generically equivalent members, non-random identifies a member in its specific identity.

#### Intuitive Assessment of the Definitions

Consider in illustration, the set defined as the set of subsets composed of each possible sequence of any five symbols of a set of ten ordered symbols, e.g. the ten digits, 0 through 9. Three of the one hundred thousand subsets are: 12345, 21212 and 40719. Each of these sequences, when viewed solely as a subset, i.e. identified in its generic equivalency, is random. Each of these sequences, when identified as a specific sequence, is non-random.

These identifications of random and non-random don’t sit well with our intuition.

Our natural prejudice would prompt us to classify 12345 as non-random, because we recognize in it the order of convention. Similarly, we would want to classify 21212 as non-random, because in that sequence we readily recognize an order of symmetry. We would be adverse to consider these two sequences random in any circumstance. In the sequence 40719, we do not easily recognize any order, so that our prejudice prompts us readily to accept the sequence, 40179, as random according to the mathematical definition. Notice, however, that if 40719 were the sequence of symbols on your auto license plate, you would be inclined not to consider it random. It wouldn’t overwhelm your capacity for the familiarity of specific order, as do most of the one hundred thousand specific sequences.

We thus have a hint from whence our prejudice arises to deny the mathematical definitions.

We intuitively identify order or pattern as non-random. This is in accord with philosophy, which not only identifies order in the quantifiable through measurement as non-random, but identifies all of material reality as intelligible and, therefore, ordered and non-random. It would seem then that philosophy leaves no room for anything to be random. Indeed, it does leave no room at the level of reality. However, it does leave room in the logic of mathematics.

Our knowledge of reality is limited and our intelligence is limited in itself. We are sometimes incapable of perceiving order in that which is within our purview. Three grains of sand and fifty grains of sand may be considered within our purview. We readily appreciate the order of the three, but are overwhelmed by quantity to see the order of the fifty. Also, things may not be fully within our purview. We know that the outcome of rolling a pair of dice is due to the forces to which the dice are subjected. Yet, we accept the outcome as random by convention, because those forces, in detail, are not within our purview. In such cases we equate human ignorance with randomness and apply the mathematics of randomness to compensate for our ignorance.

Similarly, the causes of genetic variation are micro or smaller, while some of those of death are readily observable. We are inclined to treat genetic variation as random and survival as non-random as did Charles Darwin in the 19th century. Seven years after the publication of the Origin of Species, Gregor Mendel presented his work in which he employed the mathematics of randomness. He did not infer that genetic inheritance was random. Rather, his results established what would be expected at the micro level, namely that genetic inheritance, based as it is on two sexes, involves binary division and recombination.

#### The Mathematics

The mathematics of randomness is algorithmic and thereby fully ordered. Material application of the mathematics is by analogy. It is by way of illustration only. The mathematics is not inferred from measurement as are the mathematical formulae of the physical sciences.

Thus, within the context of the logic of mathematics, we have fully proper definitions of random and non-random. Random designates the generic identity of a member of a set. Non-random designates the specific identity of a member of a set.

#### Probability

Probability is the fractional concentration of an element in a logical set. Probability is the ratio of a subset to a set. Probability is the ratio of the specific to the generic within a set. Probability is the ratio of the non-random to the random within a set.

In contrast to the four different expressions of the singular definition above, the common definition of probability is the likelihood or chance of an event. However, such is not a definition. It is a list of synonyms. One could just as well define chance or likelihood as the probability of an event.

#### Illustration of Random and Non-random Mutation/Selection

Consider the set used by Richard Dawkins in illustration. It consists of the sequences of three digits, each of which varies over the range, 1 through 6. The set consists of the 216 specific sequences from 111 to 666.

A three-digit variable would generically represent the specific members of this set. If successive mutations or selections may be any of the 216, the variation is generic and identified as random mutation or random selection. If the variation is constrained by an algorithm of specific order, it is identified as non-random mutation or non-random selection. The result of variation, whether random or non-random, may be viewed as a pool of mutants or variants.

The pool of mutants formed by random mutation would also be random, unless it was constrained by some ordering algorithm, such as being limited to one copy each of the 216 mutations.

If a pool is subjected to an algorithm, which culls mutants based on specific identity, then the culling is non-random mutation/selection.

In his illustration (minute 7:50), Dawkins indicates that the maximum number of variants in a pool, needed to insure that one copy of the specific variant 651 is present, is 216. This means that, although the sequence of generation of its elements may have been random, the resultant pool is non-random. It is constrained to be of the exact same composition as the set from which it was formed by mutation. When this generated pool is subjected to non-random selection of variant 651, the probability of success is 100%.

In contrast, if the pool size was 216 and if the pool was random, the probability of its containing at least one copy of the specific element 651, would be 63.3%. We can readily calculate the probability of not selecting the specific element 651 in 216 random mutations. It is (215/216)^216. The probability of selecting at least one copy of 651 would be 1 minus this value. P = 1 – (215/216)^216 = 0.633

When this generically generated pool of mutants is subjected to the specific selection filter for 651, it would have a success rate of 63.3%. The random generation of pools of size 216 for each pool, defines a population of such pools, where the size of the population is 216^216 = 1.75 x 10^504. Of this population of defined pools, 63.3% contain at least one copy of the specific element 651.

Richard Dawkins is correct that Darwinian evolution consists of two algorithms. The first is labeled random. It is random numbers generation, a generic process. The second is labeled non-random. It is the culling of the pool of generated numbers through a specific number filter. The filtering process has a probability of less than one. Consequently, Darwinian evolution must be characterized as random as Dawkins has done on page 121 of The God Delusion. He identifies the result of each sub-stage of Darwinian evolution as “slightly improbable, but not prohibitively so”. Dawkins was right in 2006 in his characterization of Darwinian evolution as random in The God Delusion and wrong in his statement in the Dawkins-Pell debate of 2012, “Darwinian evolution is a non-random process.”

#### Summary

The algorithmic processing of members of a set, based on the member’s specific identity, is non-random. The algorithmic processing of members of a set, based on the member’s generic identity as a member of the set, is random. Colloquially, generic means ‘one’s as good as another’. In both instances, the algorithm, as an algorithm, is necessarily systematic, i.e. orderly.

Richard Dawkins’ argument for ‘Why there almost certainly is no God’ (Chap 4, The God Delusion), is mathematical. He proposes that there is no mathematical solution to ‘the problem of improbability’ of God, whereas there is a mathematical solution to ‘the problem of improbability’ of a large stage of Darwinian evolution.

The problem occurs when an improbability is prohibitively improbable. According to Dawkins, the solution to a problem of improbability is to replace its complementary probability with a series of the factors of its complementary probability. Each improbability, complementary to the probability of a factor, is ‘slightly improbable, but not prohibitively so’ (p. 121, The God Delusion).

The problem of improbability of the success of natural selection for a large stage of Darwinian evolution is solved by replacing the large stage with a series of smaller sub-stages. The series of sub-stages represents the staged evolution of the ultimate mutation. In this gradualism, for each sub-stage there is a mutation, which survives the test of natural selection for that sub-stage. In contrast, for the overall, large stage, all mutations are subjected to but one, the ultimate, test of natural selection. This test of natural selection represents a much larger and prohibitive improbability than that of the test of natural selection within each sub-stage in the series.

In order for there to be a solution to the improbability of God, God would have to come into being by a series of sub-stages, where each sub-stage improbability is not prohibitively large as is the single stage improbability of the existence of God. Obviously, God would not be God, if he came into existence gradually. Therefore, there is no solution to the improbability of God.

Dawkins’ Elucidation of the Role of Gradualism Is Self-Criticism

In delineating the role of gradualism in Darwinian evolution, Dawkins’ demonstrated that it has no effect on the probability of the evolutionary success of natural selection. He showed that the role of gradualism is to increase the efficiency of mutation. Thus, he disproved that gradualism is the solution to the mathematical problem of improbability. In criticizing his own solution to the problem of improbability, Dawkins disproved his rationale for why there almost certainly is no God.

To illustrate the role of gradualism in Darwinian evolution, Dawkins chose an example of three mutation sites of three mutations each. He accordingly noted that this defines 6x6x6 = 216 different mutations. If the one mutation of these 216, which is capable of surviving natural selection, is unknown, then a minimum of one copy of each would have to be generated non-randomly to ensure 100% evolutionary success of natural selection.

He compared this large stage of non-random mutation and natural selection with a series of three sub-stages, each affecting one of the three mutation sites. Each sub-stage would require the generation of six non-random mutations to ensure 100% evolutionary success. That would be a total of 18 non-random mutations for 100% overall success of natural selection for the series of sub-stages.

The difference between the single, overall stage and the series of sub-stages is not in the success of natural selection. For both, the success of natural selection is 100%. The difference is in the total number of non-random mutations to ensure 100% success. The difference is that of 216 and 18 total mutations. The series is mutationally more efficient by a factor of 216/18 = 12, at 100% probability of success of natural selection.

If the pools of mutations subjected to natural selection in the illustration are generated by random mutation, rather than non-random mutation, a similar efficiency in the number of mutations is achieved, without any change in the probability of success of natural selection.

A pool of 19 randomly generated mutations in each of three sub-cycles would yield a probability of success of natural selection of 96.9% for each cycle and an overall probability of success of natural selection of 90.9%. For a single cycle of random mutation involving all three mutation sites, a pool of 516 random mutations would be required to yield a probability of success of natural selection of 90.9%. The efficiency factor in random mutations would be 516/57 = 9 due to gradualism with no change in the probability of 90.9% success of natural selection.

The probability, P, of at least one copy of the mutation surviving natural selection in a pool of x randomly generated mutations with a base of n different mutations is: P = 1 – ((n -1)/n)^x.

Thus, as his own critic, Dawkins has disproved his claim that the problem of improbability of success of a large stage of Darwinian evolution is solved by replacing it with a series of sub-stages.

Dawkins’ Criticism of ‘The Problem of Improbability’

Dawkins has also demonstrated that there is no such problem as ‘the problem of improbability’. He has labeled those who propose such a problem as being persons of a ‘discontinuous mind‘.

Dawkins noted that some variables, which are defined over a range of 0 to 1, are essentially fractions of a whole of some thing or property. The range is 0% to 100%. Implicitly, any two values of such a variable differ from one another by degree, not by kind. Dawkins claims that only a person with a ‘discontinuous mind’ would propose that there is some value in the range which distinguishes two kinds of the variable. One kind would be 0% to the arbitrary point marking the discontinuity from the second kind, defined from the point of discontinuity to 100%.

Of course, wearing his mathematician’s hat, Dawkins is correct. In being correct he has demonstrated that there is no valid definition of a ‘problem of improbability’. Defining the problem requires an arbitrary point demarcating a discontinuity in the range of improbability of 0% to 100%, thereby forming two kinds of improbability, non-prohibitive and prohibitive. It is these two kinds of improbability, which form the basis of his discussion of ‘the problem of improbability’ of Darwinian evolution and of why there almost certainly is no God.

Conclusion

Dawkins has proved
1. Gradualism does not solve ‘the problem of improbability’ of the success of natural selection in a large stage of Darwinian evolution. It has no effect on probability. It merely increases the efficiency of mutation.
2. ‘The problem of improbability’ is a self-contradiction. It proposes a distinction of kind between two subsets, within the defined range of a continuous variable whose values vary by degree, not kind.

Joe Average bowled in a recreational league on Tuesday nights from April through August. On Wednesday mornings, relying on his memory, Joe entered his three game scores in a log before breakfast. After breakfast it was his habit to read the box scores of those American League baseball games played on Tuesdays and reported in the Wednesday morning edition of USA Today. That typically consisted in seven games with six data each, namely the runs, hits and errors of the two teams. That was a total of forty-two baseball data compared to the three data, which were Joe’s bowling scores.

(E,J) is the number of Joe’s logged bowling scores that are erroneous.

E is the sum of Joe’s erroneous scores plus reported erroneous AL box scores.

J is the total of Joe’s logged scores, erroneous plus correct.

T is the grand total of scores, Joe’s logged bowling scores plus the reported AL box scores

Given:

(E,J) / E = X = 2/3

E/T = Y = 1/300

J/T = Z = 1/15

What is the probability that a bowling score in Joe’s log is erroneous?

Employing Bayes’ Theorem,

(E,J) / J = (X * Y) / Z

(E,J) / J = ((2/3) * (1/300)) / (1/15)

(E,J) / J = 1/30

The fraction of erroneous bowling scores in Joe’s log was 0.0333

If the time period consisted in twenty weeks, Joe would have recorded 60 scores of which 0.0333 or 2 were in error. If over the same time period, USA Today recorded 42 * 20 = 840 AL box score data, then the total data in the population would be 60 + 840 = 900 of which 0.00333 were in error or 900 * 0.00333 = 3. Thus, there was one typo in the AL box scores recorded by USA Today over the same time period.

By Bayes’ theorem, the probability of error in Joe’s logging of his bowling scores was calculated to be 3.333%.

Could we conclude that the box score data of the American League determined the probability of Joe’s making an error in his bowling log?

What would be the standard of comparison for determining the correctness or error of a datum in Joe’s bowling log? Could the standard of comparison inherently be the data of American League baseball box scores?

To apply Bayes’ theorem a population of data must be partitioned by two independent criteria. In the above example, one criterion partitioned the population into Joe’s data and non-Joe’s data. The other criterion partitioned the population into erroneous data and non-erroneous data.

What is often lost sight of in applying Bayes’ theorem is that the theorem does not treat subsets as antithetical to one another. Rather, it deals with subsets as compatible, as complementary in forming a whole. In the illustration, the baseball scores are not treated as baseball data, but as non-Joe’s data, the complement of Joe’s data.

In Proving History, page 50 ff, Richard Carrier partitions a population of data into historical reports from Source A and reports from non-Source A. Carrier’s other criterion partitions the population of reports into true reports and non-true reports. He then employs Bayes’ theorem to calculate the probability of true reports among all the reports of Source A. That is not what he indicates he has done. He indicates that what he has done is to evaluate the truth of a Source A report where the evaluation is based on the content of non-Source A reports. That would be comparable to claiming that a datum, in Joe’s bowling log, could be determined to be correct or erroneous based on the content of American League baseball box scores as reported in USA Today by employing Bayes’ theorem.

Both Bayes’ theorem and the reports of the American League box scores are pertinent to calculating the probability of errors in Joe’s bowling log. That probability is the fraction of his logged scores which are erroneous. The pertinence is due to the fact that both Bayes’ theorem and probability deal with complementary subsets. In this instance, the complementary subsets are: Some of Joe’s logged scores are erroneous. Some are non-erroneous.

Neither Bayes’ theorem nor the reports of the American League box scores are pertinent to determining whether any particular score in Joe’s log is erroneous or correct. That distinction is between antithetical propositions: This score is erroneous. This score is not erroneous.

Subsets subject to Bayes’ theorem may be nominally antithetical, such as true and non-true, and, in that sense, incompatible. Yet, relevant to Bayes’ theorem, such subsets are merely complementary and in that sense compatible. Their sum equals the entire set. It is their compatibility as complementary which renders the subsets subject to Bayes’ theorem.

Carrier in Proving History, p 50 ff, by conflating antithetical with different, while ignoring the complementary of subsets, completely misrepresents Bayes’ theorem and its utility.

For an algebraic validation of Bayes’ theorem see the first five paragraphs of the essay.

On page 50 of Proving History, Richard Carrier states,

Notice that the bottom expression (the denominator) represents the sum total of all possibilities, and the top expression (the numerator) represents your theory (or whatever theory you are testing the merit of), so we have a standard calculation of odds: your theory in ratio to all theories.

Carrier is proposing that Bayes’ theorem can be used to determine the truth of your theory which is one among many theories. Carrier implicitly claims that Bayes’ theorem can be used to determine the truth of your theory according to the numerical value of the probability of your theory with respect to all theories, i.e. ‘your theory in ratio to all theories’.

If there are n theories of which yours is one, then the probability of your theory is 1/n, but so too the probability of every other theory in the set of all theories is 1/n. Consequently, such a probability is no indication of the truth or non-truth of your theory. If Carrier’s statement of what is calculated by Bayes’ theorem were true, then Bayes’ theorem has no relevance to determining the truth of your theory.

What Probabilities of Your Theory(s) are Determinable by Bayes’ Theorem?

Probability is the ratio of a subset to a set. Thus, what we are asking is what ratios, within the context of Bayes’ theorem, have your theory(s) alone in the numerator and your theory(s) plus other theories in the denominator.

The population of elements to which Bayes’ theorem applies, may be viewed as a surface over which the population density varies. A Bayesian population is divided into two portions by each of two independent criteria. One criterion may be viewed as dividing the population into two horizontal portions, while the other criterion divides it into two vertical portions. The result is the formation of four quadrants, which differ in population due to the non-uniformity of the population density.

The two portions formed by the horizontal division may be distinguished as the horizontal top row, HT, and the horizontal bottom row, HB. The two portions formed by the vertical division may be distinguished as the vertical left column, VL, and the vertical right column, VR. The two portions, HT + HB add up to the total, T, as do the two portions, VL and VR. The quadrants are designated as Q1 through Q4. Each of the portions is the sum of two quadrants, e.g. HT = Q1 + Q2 and VL = Q1 + Q3.

Tabulation of a Bayesian Population

In the illustrated Bayesian population, the column VR has the role of non-VL. Thus, rather than being one column, VR, may be any number of columns, whose sum is the complement of VL. Analogously, the row, HB, has the role of non-HT. Consequently, Bayes’ theorem is applicable to any number of rows and any number of columns, where the additional rows and columns may be treated in their sum, respectively as non-HT and non-VL, i.e. as HB and VR, respectively.

Bayes’ theorem, in its algebraic expression, which focuses on Q1, is:

Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1

The two terms, HT, cancel out as do the two terms, T. This leaves the identity, Q1/VL ≡ Q1/VL, which proves the validity of Bayes’ theorem. In the application of Bayes’ theorem the numerical values of the numerators and the denominators of the fractions are not given. What is given are the numerical values of the three fractions on the right hand side of the equation, which permits the calculation of the numerical value of the fraction, Q1/VL, as a fraction.

In the context of the quotation of Carrier: HT are true theories and HB are non-true theories; VL are your theories and VR are non-your or others’ theories. Thus Q1/VL, which is calculated by Bayes’ theorem is the probability of your true theories in ratio to all of your theories. This is what Carrier falsely states is ‘your theory in ratio to all theories’. (I will substantiate that Carrier is referring to Q1/VL later in this essay.)

Let me first list the other probabilities of your theory(s) calculable using Bayes’ theorem, Eq. 1. We can solve Eq. 1 for three other probabilities of your true theories, and of your theories, besides Q1/VL. They are Q1/HT, VL/T and Q1/T.

Q1/HT = (Q1/VL) * ((VL/T)) / (HT/T)) Eq. 2

VL/T = ((Q1/HT) / (Q1/VL)) * (HT/T) Eq. 3

Q1/T = (Q1/VL) * (VL/T) Eq. 4

To What Bayesian Ratio is Carrier Referring as ‘your theory in ratio to all theories’?

In Eq. 2, Q1/HT is the probability of your true theory(s) in ratio to all true theories. This probability is restricted to true theories. If this probability were what Carrier is referring to by ‘your theory in ratio to all theories’, he would be granting that your theory is true, and is not a ‘theory you are testing the merit of’.

In Eq. 3, VL/T is the probability of all of your theories, true and non-true, in ratio to all theories. This probability lumps both your true theories and your non-true theories together, so it could not be a test of the merit of your theory(s). For example, you have ten theories, whether true or non-true, the fact that there are five or a million other theories, has no relevance to the merit of your theory(s).

In Eq. 4, Q1/T is the probability of your true theory(s) in ratio to all theories. This ratio, which acknowledges the truth of your true theory cannot be a test of the merit of your theory. Nevertheless, Q1/T appears close to ‘your theory in ratio to all theories’. It lacks the word, true, after the word, your. However, as shown below, Carrier cannot be referring to Q1/T, but must be referring to Q1/VL.

Carrier in the Quote is Referring to Q1/VL

The common expression of Bayes’ theorem is Eq. 1, which calculates Q1/VL.

Q1/VL is the probability of your true theories in ratio to all of your theories. It is this which Carrier falsely labels ‘your theory in ratio to all theories’. Admittedly, Carrier’s expression, ‘your theory’ can be understood as your true theory(s), but it is obvious that by the words, ‘all theories’, Carrier means all theories and does not mean only all of your theories.

We must ask if Carrier could not have been referring to Q1/T, expressed as Eq. 4 and not Q1/VL, expressed in Eq. 1. The reason that it is Q1/VL becomes apparent by his verbal presentation of Bayes’ theorem as,

Typically, Bayes’ theorem is expressed as Eq. 1. In Eq. 1, the denominator is VL/T. However, VL/T is often expressed as the sum,

VL/T = (Q1/HT) * (HT/T) + (Q3/HB) * (HB/T) Eq. 5

The denominator of Carrier’s verbalized version of the Bayesian equation is undeniably an attempt to express this sum.

The validity of Eq. 5 is apparent in that,

VL/T = Q1/T + Q3/T = (Q1 + Q3)/T, where Q1 + Q3 = VL

Due to the fact that Carrier is attempting to verbalize the standard expression of Bayes’ theorem, i.e. Eq. 1, then the denominator is VL/T. VL/T is the ratio of all your theories to all theories. It cannot be in any way construed to be simply ‘all theories’ as Carrier claims. VL/T is obviously a ratio, in which T is all theories.

If Carrier had meant to express, Q1/T, as in Eq. 4, by his verbalization, the term VL/T, expressed as a sum, would then be a direct factor as it is in Eq. 4. VL/T would not be in the denominator, i.e. an inverse factor, as it is in Carrier’s verbalization and as it is in Eq. 1.

There is another reason that it is apparent that Carrier’s verbalization is expressing Q1/VL as in Eq. 1. The numerator of Eq. 1 is (Q1/HT) * (HT/T). This is the first term of VL/T when VL/T is expressed as a sum as in Eq. 5. In his verbalization, Carrier acknowledges that the first term of the sum of his denominator is his numerator. Thus, Carrier’s verbalization is meant to express Eq. 1, where the denominator, VL/T, is not ‘all theories’, as Carrier claims. VL/T is the ratio of all your theories to all theories.

Also, it should be noted that the numerator of Bayes’ theorem, Eq. 1, which is (Q1/HT) * (HT/T), is Q1/T. Thus, the numerator of Bayes’ theorem is the probability of your true theories over all theories, and is not as Carrier claims simply ‘your theory’.

Conclusion

Carrier’s explanation of Bayes’ theorem on page 50, of Proving History as ‘your theory in ratio to all theories’ is completely erroneous.

Bayes’ theorem is a simple algebraic relationship among fractions of a set or population of elements. Based on common expositions of it, one would think that it was complicated in itself and that it resolved a mystery through its implications.

The population of elements to which Bayes’ theorem applies, may be viewed as a surface over which the population density varies. A Bayesian surface is partitioned by two independent criteria. One criterion may be viewed as dividing the surface into two horizontal rows, while the other criterion divides it into two vertical columns. The result is the formation of four quadrants, which differ in population due to the non-uniformity of the population density. One important thing is that the four quadrants are mutually related. Each may be expressed by the same algebraic formulation in its relationships to the other three.

The two rows formed by the horizontal partitioning may be distinguished as the horizontal top row, HT, and the horizontal bottom row, HB. The two columns formed by the vertical partitioning may be distinguished as the vertical left column, VL, and the vertical right column, VR. The two rows, HT + HB, add up to the total, T, as do the two columns, VL and VR. The quadrants are designated as Q1 through Q4. Each row or column is the sum of two quadrants, e.g. HT = Q1 + Q2 and VL = Q1 + Q3.

Tabulation of a Bayesian Population

In the Tabulated Bayesian Population, the column VR has the role of non-VL. Thus, rather than being one column, VR, may be any number of columns, whose sum is the complement of VL. Analogously, the row, HB, has the role of non-HT. Consequently, Bayes’ theorem is applicable to any number of rows and any number of columns, where the additional rows and columns may be treated in their sum, respectively as non-HT and non-VL, i.e. as HB and VR, respectively.

Bayes’ theorem, in its algebraic expression, which focuses on Q1, is:

Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1

The two terms, HT, cancel out as do the two terms, T. This leaves the identity, Q1/VL ≡ Q1/VL, which proves the validity of Bayes’ theorem.

In the application of Bayes’ theorem the numerical values of the numerators and the denominators of the fractions are not given. What is given are the numerical values of the three fractions on the right hand side of Eq. 1, which permits the calculation of the numerical value of the fraction, Q1/VL, as a fraction.

Reciprocity of Various Expressions of Bayes’ Theorem

Eq. 1 expresses Bayes’ algebraic formulation by focusing on the top, left quadrant, Q1. However, it must be remembered that the same algebraic formulation of relationships with the other three quadrants, could be applied to any quadrant. This can be seen in that each of the other three quadrants can be successively designated as quadrant, Q1, by rotating the population surface in increments of 90 degrees.

In the application of Bayes’ theorem, Eq. 1 is viewed as representing Q1/VL as directly proportional to HT/T, where the constant of proportionality is (Q1/HT) / (VL/T). Because each of the fractions of Eq. 1 is ratio of a subset to a set, each of the fractions is a probability. Expressing the direct proportionality of Eq. 1 using the word, probability, rather than the word, fraction, yields: The probability of quadrant Q1 with respect to the column VL is directly proportional to the probability of the row HT with respect to the total population, T.

Typically, the numerical value of the probability, HT/T, is given along with the numerical value of the constant of proportionality. The numerical value of the probability, Q1/VL, is calculated. Common jargon refers to the given probability, HT/T, as the prior probability and the calculated probability, Q1/VL, as the posterior or the final probability.

If the numerical value of Q1/VL were given along with the constant of proportionality, then the probability HT/T could be calculated. We would be viewing Eq. 1 in the form,

HT/T = ((VL/T) / (Q1/HT)) * (Q1/VL) Eq. 2

Common jargon would then label Q1/VL as the prior probability and HT/T as the posterior or final probability, i.e. vice versa to the common jargon applied to Eq. 1.

Eq. 1 and Eq.2 are fully equivalent. With respect to Eq. 1, common jargon in determining the probability of a hypothesis, would claim that the prior probability of row HT with respect to the total was revised to the posterior or final probability of Q1 with respect to column VL.

With respect to Eq. 2, common jargon would claim that the prior probability of Q1 with respect to column VL was revised to the posterior or final probability of row HT with respect to the total.

What this apparently contradictory jargon means is (1) that given the constant of proportionality and HT/T, then Q1/VL can be calculated, while (2) given the constant of proportionality, and Q1/VL, then HT/T can be calculated. Both probabilities remain completely distinct. Neither replaces the other or is revised to equal the other.

A numerical value, which is given, is prior in our knowledge to a numerical value, which is calculated. But in no sense does one replace the other or is one revised to be the other. To use the words, replace and/or revise is to use misleading jargon.

Identifying one probability within Bayes’ equation as prior and one as posterior, where the posterior replaces or supersedes the prior, is a misleading mystification of simple algebra, where the two probabilities are distinct and do not change in their algebraic relationship to one another.

An Illustration of Bayes’ Theorem

Let us use an easily comprehended set of elements to illustrate Bayes’ theorem. That set is a bunch of playing cards. Not a standard deck, a bunch. All of the cards in the set, i.e. the bunch, are not of the customary thirteen ranks, but of only two ranks, Kings and Queens. All of the cards in the set are not of four, but of only two suits, Diamonds and Spades.

Let us view Bayes’ theorem as telling us that Q1/VL, is directly proportional to HT/T. The constant of proportionality would then be (Q1/HT) / (VL/T).

Q1/VL = ((Q1/HT) / (VL/T)) * (HT/T) Eq. 1

In this example the elements of the set are cards. T is the total number of cards. HT is the total number of Kings. VL is the total number of Diamonds. Q1 is the number of cards that are both Kings and Diamonds.

The person, who formed the set of cards, tells us that 70% of the Kings are Diamonds; that 50% of the cards are Diamonds and that 40% of the cards are Kings. Referring to Eq. 1: (1) If 70% of the Kings are Diamonds, then Q1/HT = 0.7. (2) If 50% of the cards are Diamonds, then VL/T = 0.5. (3) If 40% of the cards are Kings, then HT/T = 0.4. The constant of proportionality, (Q1/HT) / (VL/T), equals 0.7/0.5 = 1.4.

The fraction of Diamonds that are Kings, Q1/VL is directly proportional to HT/T, the fraction of all cards that are Kings.

The fraction of Diamonds that are Kings = (.7/.5) * the fraction of all cards that are Kings.
Q1/VL = (.7/.5) * (H/T)

The fraction of Diamonds that are Kings = (1.4) * 0.4 = 0.56 = 56%
Q1/VL = 56%

Verbalization of Bayes’ Theorem

In the illustration, common jargon would state that the prior probability of a card’s being a King, HT/T or 40%, is revised to the posterior probability, namely the probability of a King’s being a Diamond, Q1/VL or 56%. However, if HT/T were the given and Q1/VL were calculated, then, based on the same equation, common jargon would have to state that the prior probability of a King’s being a Diamond or 56%, was revised to the posterior probability, namely the probability of a card’s being a King or 40%.

It is easy to fall into the rut of such jargon, if HT/T is thought of as the probability of a generic card’s being a Diamond, and Q1/VL as the probability that a card specified as being a King is a Diamond. It is as if the generic was being replaced by the specific. Such a nuanced inference is not warranted by the mathematics, because the reciprocal relation is equally valid. The reciprocal relationship is given the numerical value of the specific, the numerical value of the generic can be calculated.

Caution

The use of replace and revise in common jargon confuses a displacement based on inequality with a replacement based on equality. Such a displacement of inequality does not elucidate Bayes’ theorem, which is the equality expressed by, Eq. 1.

The criticism of common jargon in this essay does not preclude the successive iteration of an algorithm based on Bayes’ theorem, which could involve a displacement. In such a case, the succeeding iteration uses the specific probability of the prior iteration as its generic probability. The iteration of the algorithm calculates a new specific probability based on some added or omitted characteristic. It thereby calculates a partitioning, i.e. a probability, not of the prior population, but, of a newly limited sub-population.

It should be noted that it is inappropriate and misleading to identify as Bayes’ theorem an algorithm, which iteratively employs Bayes’ theorem, just as it would be inappropriate and misleading to identify as the Pythagorean theorem an algorithm, which iteratively employs the Pythagorean theorem.

Common jargon confuses Bayes’ theorem with its algorithmic iteration.