On page 58 of Proving History, Richard C. Carrier states,
“So even if there was only a 1% chance that such a claim would turn out to be true, that is a prior probability of merely 0.01, the evidence in this case (e1983) would entail a final probability of at least 99.9% that this particular claim is nevertheless true. . . . Thus, even extremely low prior probabilities can be overcome with adequate evidence.”
The tabulated population data implied by Carrier’s numerical calculation, which uses Bayes’ theorem, is of the form:
Bayes’ theorem permits the calculation of Cell(X,A) / Col A by the formula,
((Row X / Total Sum) * (Cell(X,A) / Row X)) / (Col A / Total Sum)
The numerical values, listed within the equations on page 58, imply,
From these, the remaining values of the table can be determined as,
Carrier’s application of Bayes’ theorem in calculating the final probability and in identifying the prior probability are straight forward and without error.
How Error Slips In
In Bayesian jargon the ‘prior’ probability of X is the Sum of Row X divided by the Total Sum. It is 0.01 or 1%. The final probability or more commonly the consequent or posterior probability is the probability of X based solely on Column A, completely ignoring Column B. The probability of X, considering only Column A, is 0.01/0.0100099 or 99.9%. One may call this the final probability, the consequent probability, the posterior probability or anything else one pleases, but to pretend it is something other than based on a scope, exclusionary of Column B, is foolishness. It is in no sense ‘the overcoming of a low prior probability with sufficient evidence’ unless one is willing to claim that the proverbial ostrich by putting its head to the sand has a better view of its surroundings by restricting the scope of its view to the sand.
The way this foolishness comes about is this. The prior probability is defined as the probability that ‘this’ element is a member of the subpopulation X, simply because it is a member of the overall population. The consequent or posterior probability (or as Carrier says, the final probability) is the probability consequent or posterior to identifying the element, no longer as merely a generic member of the overall population, but now identifying it as an element of subpopulation A. The probability calculated by Bayes’ theorem is that of sub-subpopulation, Cell(X,A), as a fraction of subpopulation A, thereby having nothing directly to do with Column B or the total population. In Bayesian jargon we say the prior probability of X of 1% is revised to the probability of X of 99.9%, posterior to the observation that ‘this element’ is a member of the subpopulation A and not merely a generic member of the overall population.
Clarification of the Terminology
The terminology, ‘prior probability’ and ‘posterior probability’, refers to before and after the restriction of the scope of consideration from a population to a subpopulation. The population is one which is divided into subsets by two independent criteria. This classifies the population into subsets which may be displayed in a rectangular tabulation. One criterion identifies the rows. The second criterion identifies the columns of the table. Each member of the population belongs to one and only one of the cells of the tabulation, where a cell is a subset identified by a row and a column.
A good example of such a population would be the students of a high school. Let the first criterion, identify two rows, those who ate oatmeal for breakfast this morning and those who did not. The second criterion, which identifies the columns will be the four classes, freshmen, sophomores, juniors and seniors. Notice that the sum of the subsets of each criterion is the total population. In other words, the subsets of each criterion are complements forming the population.
In the high school example, the prior probability is the fraction of the students of the entire high school who ate oatmeal for breakfast. The prior is the scope of consideration before we restrict that scope to one of the subsets of the second criterion. Let that subset of the second criterion be the sophomore class. We restrict our scope from the entire high school down to the sophomore class. The posterior probability is the fraction of sophomores who ate oatmeal for breakfast. Notice the posterior probability eliminates from consideration the freshman, junior and senior classes. They are irrelevant to the posterior fraction.
In Bayesian Jargon, prior refers to the full scope of the population prior to restricting the scope. Posterior refers to after restricting the scope. The posterior renders anything outside of the restricted scope irrelevant.
In Carrier’s example, the full scope covers all years, prior to restricting that scope to the year, 1983, thereby ignoring all other years. This is parallel to the high school example, where the full scope covers all class years, prior to restricting that scope to the class year, sophomores, thereby ignoring all other class years.
By some quirk let it be that 75% of the sophomore class ate oatmeal for breakfast, but none of the students of the other three classes did so. Let the four class sizes be equal. We would then say, ala Carrier, “The low prior probability (18.75%) of the truth that a student ate oatmeal for breakfast, was overcome with adequate evidence, so that the final probability of the truth that a sophomore student ate oatmeal for breakfast was 75%.” Note that this ‘adequate evidence’ consists in ignoring any evidence concerning the freshman, juniors and seniors, which evidence was considered in determining the prior.
This conclusion of ‘adequate evidence’ contrasts a fraction based on a full scope of the population, ‘the prior’, to a fraction based on a restricted scope of the population, ‘the final’. The final does not consider further evidence. The final simply ignores everything about the population outside the restricted scope.
Prejudice as a Better Jargon
A more lucid conclusion, based on the restriction of scope, may be made in terms of prejudice. The following conclusion adopts the terminology of prejudice. It is based on the same data used in the discussion above.
Knowledge of the fraction of students in this high school, who ate oatmeal, serves as the basis for our prejudging ‘this’ high school student. We know the prior probability of the truth that ‘this’ student is ‘one of them’, i.e. those who ate oatmeal for breakfast, is 18.75%. Upon further review, in noting that ‘this’ student is a sophomore, we can hone our prejudice by restricting it in scope to the sophomore class. We can now restrict the scope upon which our original prejudice was based, by ignoring all of the other subsets of the population, but the sophomore class. We now know the final probability of the truth of our prejudice that ‘this’ student is ‘one of them’ is 75%, based on his belonging to the sophomore class.
This is what Carrier is doing. His prior is the prejudice, i.e. the probability based on all years of the population. His final is the prejudice, which ignores evidence from all years except 1983.
We can now see more clearly what Carrier means by adequate evidence. He means considering only knowledge labeled 1983 and ignoring knowledge from other years. Similarly, adequate evidence to increasing our prejudice that this student ate oatmeal, would mean considering only the knowledge that he is in the sophomore year and ignoring knowledge from other class years. It was the consideration of all years upon which our prior prejudice was based. Similarly it was all years, including 1983, upon which Carrier’s prior prejudice is based.
To form our prior prejudice, we consider the total tabulated count. We restrict the scope of our consideration of the tabulated count to a subset in order to form our final or posterior prejudice.
We refine our prejudice by restricting the scope of its application from the whole population to a named subpopulation. Is this what is conveyed by saying that even a low chance of a statement’s being true can be increased by evidence, or, that the low probability of its truth was overcome by adequate evidence? To me, that is not what is conveyed. From the appellations of truth and evidence, I would infer that more data were being introduced into the tabulation, or at least more of the tabulated data was being considered, rather than that much of the tabulated data was being ignored.
Carrier’s discussion of Bayes’ theorem gives the impression that the final probability of the 1983 data depends intrinsically upon the tabulated data from all the other years. In fact, the data from all the other years is completely extrinsic, i.e. irrelevant, to the final probability of the1983 data. The ‘final’ probability is the ratio of one subset of the 1983 data divided by the set of 1983 data, ignoring all other data.
Probability is the ratio of a subpopulation of data to a population of data. In Carrier’s discussion, the population of his ‘prior’ is the entire data set. The population of his ‘final’ is solely the 1983 data, ignoring all else. He is not evaluating the 1983 data, or any sub-portion of it, in light of non-1983 evidence.
One can easily be misled by the jargon of the ‘prior probability’ of ‘the truth’, the ‘final probability’ of ‘the truth’ and ‘adequate evidence’.