In the previous essay, Bayes’ theorem was illustrated in the case of continuous sets. This essay focuses on sets of discrete elements in a tabulated format.
Let a set be visualized as a column of tiers, i.e. a vertical array of subsets. Let the column have the header or marker, A. Let the subsets or tiers have headers or markers X, Y, Z etc. Let the number of elements per subset or cell of the vertical array be Cell(i), where i = X, Y, Z etc. The total number of elements in the singular linear dimension is Sum Column A. There are no overlapping subsets, so Bayes’ theorem is inapplicable.
In expanding the set by introducing more columns with markers or headers B, C, D etc., we would then have a two dimensional array of subsets or cells. The array would be one of multiple rows and multiple columns. Each i, would be the header of a row. Each column would be identified by a header or marker, j. The number of elements in each subset or cell of the two dimensional array would be Cell(i,j). The total number of elements in any row, e.g. row D would be designated Sum Row D. This two dimensional array of cells could be extended by one more orthogonal category of IDs or markers within geometry. However, it can be extended to any number of independent categories of IDs or markers algebraically.
Each subset, identified by a specific i and j, is a cell, the overlap of a row and a column. Bayes’ theorem may be applied to such a two dimensional array because of the overlap of rows and columns which form cells each identified by two markers, i and j. For illustrative simplicity we will use a two by two tabulated array,
The Form and Formation of Bayes’ Theorem
Bayes’ theorem depends upon an identity of the following algebraic form.
(R/C) ≡ (R/C)
We can then multiply both sides of the identity by 1, thereby preserving the equality. We multiply the left side numerator and denominator by L/RC and the right side numerator and denominator by 1/T. This yields,
(L / C) / (L / R) = (R / T) / (C / T)
Multiplying both sides by L/R yields,
(L/C) = ((L / R) * (R / T)) / (C/T)
By replacing L with Cell(X,A); C with Sum Col A; R with Sum Row X and T with Total Sum, we have Bayes’ theorem as applied to our illustrative table.
(Cell(X,A) / Sum Col A) =
((Cell(X,A) / Sum Row X) ⃰ (Sum Row X / Total Sum)) / (Sum Col A / Total Sum) Eq. 1
However, the denominator, Sum Col A / Total Sum, is usually modified to,
((Cell(X,A) / Sum Row X) ⃰ (Sum Row X / Total Sum)) +
((Cell(Y,A) / Sum Row Y) ⃰ (Sum Row Y / Total Sum))
Bayes’ theorem is used to calculate (Cell(x,A) / Sum Col A). However, if we had the data from the table, we would just use these two table values for the calculation, we would not use Bayes’ theorem. We do use Bayes’ theorem when the numerical information we have is limited to the fractions of the right hand side of the equation. For illustration let these numerical values be:
Cell(X,A) / Sum Row X = 0.7857
Sum Row X / Total = 0.7 (N.B. therefore Sum Row Y / Total = 0.3)
Cell(Y,A) / Sum Row Y = 0.3333
From these values, Bayes’ theorem, Eq. 1, with the denominator modified, is
(Cell(X,A) / Sum Col A) =
(0.7857 * 0.7) / ((0.7857 * 0.7) + (0.3333 * 0.3)) = .5499/.64989 ≈ 55/65
From this information we can construct the table as percentages of the Total Sum of 100%, beginning with the four values in bold known from above and 100%
Verbally, Eq. 1 is:
The fraction of column A, that is also identified by row marker X
(the fraction of row X, that is also identified by column marker A)
(the fraction of the total set, that is identified by row marker, X)
with this product divided by
(the fraction of the total set, that is identified by column marker A)
Probability is the fractional concentration of an element in a logical set. Consequently, a verbal expression of Eq. 1 is:
The probability of both marker A and marker X with respect to the subset, marker A
(the probability of both marker A and marker X with respect to the subset, marker X )
(the overall probability of the marker X)
with this product divided by
(the overall probability of marker A)
However, this is not how it is usually expressed.
Misleading Common Jargon
In one instance of common jargon, Bayes’ theorem is expressed as:
Given the truth of A, the truth of belief X is credible to the degree, which can be calculated by Bayes’ theorem.
Another expression in common jargon is:
Bayes’ theorem expresses the probability of X, posterior to the observation of A, in contrast to the probability of X prior to the observation of A. In other words, the prior probability of X, which was 70% is revised to 55%, due to the observation of A.
Another expression is: The Bayesian inference derives the posterior probability of 55% as a consequence of two antecedents, the prior probability of 70% and the likelihood function, which numerically is 0.7857.
The Bayesian inference is also viewed as the likelihood of event X, given the observation of event A. The inference is based on three priors. The priors are the probability of event A given event X, 55/70 as 78.57%, the probability of event X, 70%, and the probability of event A, 65%.
Evaluation of Common Jargon
To label A an observed fact of evidence in support of the truth of belief, X is gratuitous, because the meanings of evidence and belief imply extrapolation beyond the context of nominal markers. Philosophical conclusions, e.g. when labeled beliefs, are not nominal Bayesian markers. It is also gratuitous to label elements of sets, ‘events’.
Probability is the fractional concentration of an element in a logical set. The IDs of the elements are purely nominal because none of the characteristics associated with the ID is relevant to probability. The only characteristic of an element that matters within the context of probability is its capacity to be counted.
A Proper View
From a valid Bayesian perspective, some markers of elements of sets are observed, in the example, A and B, while some markers are not observed, in the example, X and Y. Remember that the ID of each element is a pair of markers, one from a row and one from a column. The Bayesian inference provides quantification of the prejudice that an element has one of the unobserved markers, such as X, where the prejudice is based upon observing that this element has one of the observable markers, such as A.
Bayesian inference is the quantification of prejudice, not the provision of evidence of the truth of a verbal belief.
Such quantification of prejudice is useful in making prudential decisions, e.g. in industry where the past performances (X, Y etc.) of a variety of material processing methods (A, B etc) serve as the basis of predicting their future performance. There are a variety of other areas in which Bayesian analysis may be incorporated into algorithms for reaching decisions. Of course, prudence is not the determination of truth, but the selection of a good course of action to achieve a goal.
The Bayesian quantification of prejudice can be harmful in social and employment settings.
A Contrast of ‘Truth’ Jargon vs. ‘Prejudice’ Jargon
Using the table for a numerical example, focused on Cell(X,A):
In common jargon, the Bayesian inference is: Given the truth of observation, A, the probability of unobserved belief, X, is revised from the prior value for the probability of belief X, which was 70%, to the posterior probability that belief X is true, given the truth of A. The posterior probability is 55%.
To more clearly elucidate the Bayesian inference, I prefer the jargon: The observation of marker A prejudices the presence of the unobserved marker, X, at a quantitative level of 55%. If the presence of A is not specified, the probability of marker X for the population as a whole = 70%.
A Further Critique of the Jargon of the Truth of Observation Leading to the Truth of Belief
Bayes’ theorem applies to a population of elements in which each element is identified by two markers, one from each of two categories of markers. The first category of markers are the IDs of the rows of a rectangular display of the elements of the population. The second category of markers are the IDs of the columns of the rectangular display. In such a rectangular or orthogonal distribution of a population, the elements with respect to the one category of markers is independent of the distribution of the elements with respect to the other category of markers.
Bayes’ theorem expresses the probability of one marker of category one, i.e. a row marker, with respect to the entire column of a given marker of category two, i.e. a column marker.
The probabilities summed, row plus row, within a column equal the probability of the marker of that column. In other words, as is typical of probability, the probabilities of the rows within a column are supplements forming the probability of the column as a whole. Consequently, the marker of one row cannot be the antithesis of the marker of another row. Subsets which add to form a column must be compatible as parts of a whole.
Complements are of the forms, ‘Some are red’ and ‘Some are non-red’. Their sum is the whole. The row markers of a population to which Bayes’ theorem can be applied, may be just two. However, the row markers of such a population may be any number, the sum of which within a column is the complement of that column. Just as the sum of the elements row by row within a column are the complement of the column, the same is true of the probabilities. Likewise, the sum of the column probabilities within a row equal the probability of the row.
No row marker in an orthogonally identified population to which Bayes’ theorem is applied, may be viewed as the antithesis of another row marker. They may be nominally antithetical as true and false. However, in the context of Bayes’ theorem, subsets labelled true and false must be compatible as complements.
The rows of an orthogonal population distribution within a column are complementary, as the parts of the column as a whole. The rows of an orthogonal population distribution overall are the complementary parts of the whole population.
From this perspective, identifying a column marker as true and thereby leading to a judgment of the degree of certitude of the truth of a row marker, as a belief, is misleading to say the least. It confounds mathematical probability with probability in the sense of human certitude.
Mathematical probability is the fractional concentration of an element in a logical set. Probability as human certitude is a quality characterizing one’s own subjective judgment even when one employs a quasi-quantitative value to express his subjectivity.
In contrast, for a given element of a population, a column marker may be said to be observed in the element and thereby a Bayesian calculation may be said to determine the degree of prejudice of the presence of an unobserved row marker in that element.
Note: What I have mistakenly called contradictories in this essay, are, in classical logic, called contraries (every vs. none). What I have called contraries, are classically called sub-contraries (some are vs. some are not).