Archive

Monthly Archives: November 2021

The significance of ‘prior’ in the jargon of Bayes’ theorem is that of priority in acquiring knowledge. It is the distinction between an algebraic term as given, i.e., as a ‘prior’ in contrast to an algebraic term to be calculated. This renders the distinction trivial because Bayes’ theorem consists of four algebraic terms, each of which can be expressed as to be calculated as a function of the other three as given, i.e., as ‘prior’.

Bayes’ theorem is an algebraic equation of the form

Y = X × (U/V)                      Bayes’ theorem equation

This, of course is the classic equation of a straight line, where Y, is a straight-line function of X of which (U/V) is the slope of the line. Implicitly, X is similarly a straight-line function of Y, of which (V/U) is the slope of the line. Depending on context, the expression of Y as a straight-line function of X may be apropos or the expression of X as a straight-line function of Y may be apropos, or neither may be appropriate.

Bayes’ theorem is of the form, where Y is expressed as a straight-line function of X. In Bayesian jargon, X is said to be the ‘prior’ of Y. In other words, X, which is the probability with respect to the general population, is known and Y, which is the specific probability of a subset of the population, is calculated. Of course, the ratio (U/V) of the two probabilities, U and V, must also be known. Three probabilities must be known to calculate the fourth. Of course, to calculate the fourth, the ‘prior’ may be known plus the ratio, U/V, without knowledge of U and V individually.

Let us call all populations which have probability relationships identical to those to which Bayes’ theorem is applicable, Bayesian populations. If for such a population we knew the value of (U/V) and the probability of a specific subset, Y, rather than the probability of the whole population, X, we could calculate the value of X. Even though the calculation is not technically that of Bayes’ theorem, we could employ Bayesian jargon and call the known value of the probability of the specific subpopulation, the ‘prior’. This illustrates the triviality of the term, ‘prior’ in Bayesian jargon. One non-Bayesian equation applicable to the Bayesian population is,

X =Y × (V/U)                       One non-theorem equation of a Bayesian population

The difference between the equation of Bayes’ theorem and the equation of this non-theorem is whether one knows (is given) as ‘prior’ the value of the probability of the general population, X, or the probability of a specific subset, Y, of the population. Indeed, depending on the particular population, one might more likely know the probability of the subset of the population rather than the probability of the general population.

An Illustration

In illustration, consider the population of students in a high school, where the property, whose probability is of concern, is blue eyes and the particular subset is freshman. This is a Bayesian population. It is a population divided into two subsets by each of two properties, blue eyes and non-blue eyes and freshmen and non-freshman. The two properties are independent of one another. For both equations:

X is the probability of blue eyes in the general student population.

Y is the probability of blue eyes in a specific subset of the population, the subset composed of all freshmen.

U is the probability of blue-eyed freshman in the subset of all blue-eyed students.

V is the probability of freshmen in the general student population.

For the equation of Bayes’ theorem, X is the ‘prior’, the ‘known’ as given, while Y is the unknown to be calculated. Thus, knowledge starts with the probability of blue eyes for the general population, i.e. the entire student body. The probability of blue eyes is then calculated for the specific subset, namely, the subset of freshmen.

For the non-theorem equation of the Bayesian population, Y is the ‘prior’, the ‘known’ as given, while X is the unknown to be calculated. Thus, knowledge starts with the probability of blue eyes for the specific population, the subset of freshmen, while the probability of blue eyes for the general student population is calculated.

For both equations, the ratio of U/V is given. In the equation of the theorem, it appears as U/V and in the non-theorem equation as V/U. U is the probability of blue eyed freshmen among all blue eyed students. V is the probability of freshmen among all students.

A Non-application to a Bayesian Population

No one would apply Bayes’ theorem to the freshman-blue eyes illustration because of the obscurity of what must be known in order to apply it. Ranking X, Y, U, V as least to most obscure, yields:

V = the fraction of students, who are freshmen.

Y = Of freshman, the fraction that have blue eyes.

X = Of all students, the fraction that have blue eyes.

U = Of students with blue eyes, the fraction who are freshmen

In this example, Bayes’ theorem calculates Y, the next to the least obscure information in the list. Therefore, one would not use Bayes’ theorem. One would simply survey the freshman class to determine the fraction of freshman with blue eyes. That would be much easier to ascertain than (1) X, the fraction of all students with blue eyes plus (2) U, the fraction of freshman in the set of all students with blue eyes. Such obscure information is required by Bayes’ theorem to calculate that which is less obscure.

An Application of Bayes’ Theorem

Let there be an inexpensive dermatological test for TB, which has no false negatives, but some false positives. The test would identify as positive all those with TB plus some without TB. Those few that tested positive could be given more expensive chest X-rays, that definitively distinguish between those who have TB and those who do not.

Suppose you test positive with the dermatological test. While waiting for the chest X-ray and its result, you would like to know the probability of your test’s being a true positive. You could employ Bayes’ theorem to calculate Y, if you knew the values of X, U, and V, where

Y = the fraction of those who test positive, who have TB

X = the fraction of the general population, who have TB

U = the fraction of those with TB, who test positive.

V = the fraction of the general population, who test positive.

A numerical example of this would be a population of which:

  • 2 % had TB (X), all of which tested positive (U = 1). There are no false negatives.
  • 4% of the population tests positive (V).

Employing Bayes’ theorem,

Y = X × (U/V) = 2% × (1/4%) = 0.5. The probability of having TB based on a positive dermatological test, in this example, would be 50%.

Another Non-Application of Bayes’ Theorem

The jargon of Bayes’ Theorem emphasizes the knowledge of prior probabilities. Contrast the valid conjecture of the general probability of blue eyes in the students of two high schools prior to the possible application of Bayes’ theorem to the freshman classes. One high school is in Sweden, the other in Kenya. Most likely the general probability of blue eyed students for the Swedish school would be high and that for the Kenyan school would be low.

Would this be an example of Bayesian reasoning with respect to ‘priors’? Definitely not, for several reasons. One of these is that the comparison in the example is between two different populations, whereas Bayes’ theorem considers a population and a subset of that one population.

Conclusion

Bayes’ Theorem is clearly defined, but its jargon often renders its apparent application ambiguous and often erroneous. Also, Bayes’ theorem is often not useful because it requires more obscure information in order to calculate the less obscure.