Given a set, S, having a subset A and a subset X, then the overlap, OVL, of A by X equals the overlap of X by A. This is simply a statement of identity. Consequently,

(OVL/A) / (OVL/X) = X/A

Also,

(X/S) / (A/S) = X/A

Therefore,

(OVL/A) / (OVL/X) = (X/S) / (A/S)

Or

OVL/A = ((X/S) / (A/S)) x OVL/X

Rearranging,

OVL/A = (X/S) x ((OVL/X) / (A/S)) Equation 1

We may verbally designate:

OVL/A as the fraction of A that is (also) X

OVL/X as the fraction of X that is (also) A

A/S as the fraction of S that is A

X/S as the fraction of S that is X

By definition, probability is the fractional concentration of an element in a logical set. Therefore,

A/S is the probability of A with respect to S

X/S is the probability of X with respect to S

OVL/A is the probability of X with respect to A

OVL/X is the probability of A with respect to X

If we drop the ‘respect to S’, because S is the full set under consideration, Equation 1 is verbally,

The probability of X with respect to A equals the probability of X times a ratio, namely, the probability of A with respect to X divided by the probability of A.

This is Bayes’ theorem.

However, the verbiage is typically changed when it is noted to be Bayes’ theorem.

The calculated probability of X with respect to A, i.e. OVL/A, is said to be the probability of X posterior to the calculation. In jargon, it is said to be the probability of the ‘event’ of X given the ‘truth’ of ‘event’ A.

The probability, X/S, is said to be the probability of X prior to the calculation or simply the prior. The calculation is based on another factor, other than the prior. This factor or ‘antecedent’ is said to be the likelihood function, (OVL/X)/(A/S). This likelihood function is the probability of A with respect to X divided by the probability of A.

The jargon renders Equation 1 as: Given the prior probabilities, A and X, and given the probability of A with respect to X, then we can calculate the probability of the event X, when A is true.

An Example, Without the Jargon

Let the fraction of persons with Native ancestry as a fraction of a population, X/S, be known to be 2 per million.
Let the fraction of persons with high cheekbones as a fraction of the population, A/S, be known to be 3 per million.
Let the fraction of persons with Native ancestry that also have high cheekbones, OVL/X be 90 per 100.
By Eq. 1 we can calculate the fraction of persons with high cheekbones that also are of Native ancestry, OVL/A.
It is:
OVL/A = (2/10^6) x (0.9/(3/10^6)) = 0.6 or 60%

In this population, the fraction of persons with high cheekbones, who are also of Native ancestry, is 60%. Fractional concentration is the definition of probability. Therefore, we may say that, for this population, the probability that a person with high cheekbones is of Native ancestry is 60%.

The Conclusion in Jargon and Its Implication

The fact that OVL/A as a fractional concentration is thereby a probability, i.e. a fraction of a logical set, we can lose our way due to the use of jargon.

In jargon, we may say that the calculation, OVL/A, represents the certitude of the ‘truth’ of X when we know A is a fact.

In such jargon, we might make the statement, ‘Person A from this population, who has high cheekbones, is of Native ancestry’ is true with a certitude of 60%. We might think that this implies that we have assessed the truth of the statement, ‘Person A is of Native ancestry’.

Furthermore, we might infer that the truth of whether Person A is or is not of Native ancestry is determined by population data. Then, from that inference we might extrapolate that the determination of truth, in general, is based on population data, i.e. on the identification of subsets.

Another example

Let X/S, the fraction of canines that are coyotes in a rural county of New York State, be 0.05% and A/S, the fraction of canines which are gray, be 1%. Further, let OVL/X, the fraction of coyotes which are gray, be 100%. Then, OVL/A, the fraction of gray canines which are coyotes, is 5%.

OVL/A = (X/S) x ((OVL/X) / (A/S)) = (0.05%) x (100% / 1%) = 5%

Probability is the fractional concentration of an element in a logical set. Therefore the probability of coyotes in the set of gray canines is OVL/A = 5%

The Popular Interpretation of Bayesian Inference

The calculation of the fractional overlap of a subset X onto a subset A by Bayes’ theorem is popularly said to be the determination of the truth of X given that A is a fact.

The popular expression of Bayesian inference implies that it assesses the truth of a ‘belief’, X, based on some known fact, A.

Assessment of the Popular Interpretation

The popular expression spurns direct assessment of the ‘belief’. This is because it is not a belief at all. It is a second marker, X, by which some elements of a set, S, are identified in addition to another marker, A. The Bayesian variable calculated is simply the fraction of the set A, which possesses the marker, X, in addition to the marker, A.

In the coyote example, S is the set of canines, X is the subset of coyotes, A is the subset of gray canines, and OVL is the overlap of the subsets, X and A.

The popular interpretation would be: The numerical example does not support the ‘belief’ that ‘this canine, which is known to be gray, is a coyote’. That ‘belief’ would be ‘true’ in only 5% of instances of observing a gray canine.

A Comparable Interpretation

The calculation of the fractional overlap of a subset X onto a subset A by Bayes’ theorem is the quantification of prejudice.

The calculation quantifies the validity of the prejudice that characteristic or marker, X, is possessed by an element because that element has been identified as possessing characteristic or marker, A. In the numerical example, based solely on the observation or knowledge that a canine was gray, we would be prejudiced against its being a coyote at a level of 95%.

Probability is defined in mathematics in the context of discrete elements in sets. However, it can be transitioned into an analogous definition in continuous mathematics. Thirdly, it can be represented as a meld of discrete and continuous concepts as a continuous probability function.

The Discrete Context

In discrete mathematics, probability is the fractional concentration of an element in a logical set. It is the ratio of the quantity of elements of the same ID to the total number of elements in the set. If the numerator is zero, the probability is zero. The probability is never negative because the numerator is never negative and the denominator is a minimum of one. Probability reaches a maximum of one, where the set of elements is homogeneous in ID. Probability can have any value from one to zero, because the denominator can be increased to any positive integer. Thus, probability in its discrete definition is itself a continuous variable, having a range of zero to one.

A probability and its improbability are complements of one.

The Continuous Context

Probability is a fraction of a whole set of discrete elements. If, however, we define a whole as continuous, we can then define probability in a continuous context analogous to its definition in a discrete context. One example is had in identifying the area of a circle as a continuous whole and identifying segments of area using different IDs, such as in a pie chart. Another example is that in statistics where the whole is defined as the area under the normal curve. Probability is then a fraction of the area under the curve.

Melding the Discrete and Continuous as a Probability Function

The simplest set of discrete elements is that in which all the elements have the same ID. The next simplest is that in which the elements are either of two IDs, where the quantities of the elements of each ID are equal. Thus, the probability of each element is one-half and the improbability of each is one-half.

If we choose a continuous function which oscillates between two extremes, we could associate the one extreme with an ID and the other extreme with a second ID. We could view the first ID as having a probability of one at the first extreme and having a probability of zero at the other extreme. We would thus be viewing the function as a probability, which transitions continuously through the intermediate values as it cycles between a probability of one and a probability of zero, i.e. between the two IDs.

At this second extreme, the second ID has a probability of one, which is also the improbability of the first ID.

A visual example would be a rotating line segment oscillating at a constant angular velocity between a horizontal orientation as one ID and a vertical orientation as the second ID. The continuous equation for this would be the cos^2 (α). The function oscillates from the horizontal or from a probability of one at α = 0 degrees to the vertical or to a probability of zero at α = 90 degrees. At α = 180 degrees, it is horizontal again with a probability of one. The probability goes to zero at 270 degrees and back to one at α = 360 degrees. The intermediate values of the function are transient values of probability forming the cycle from one to zero to one and back again from one to zero to one.

The improbability of horizontal, namely the probability of vertical, would be the sin^2 (α). The probability of horizontal plus its improbability equals one. Thusly, cos^2 (α) + sin^2 (α) = 1.

The Flip of a Coin

The score was tied at the end of regular time in the NFC Championship Game in 2016 between the Green Bay Packers and the Arizona Cardinals. This required a coin toss between heads and tails to determine which team would receive the ball to start the overtime. However, in the first toss, the coin didn’t rotate about its diameter. The coin didn’t flip. Therefore the coin was tossed a second time.

If, rather than visualizing a line segment, we envision a coin rotating at a constant angular velocity, we wouldn’t choose horizontal vs. vertical as the probability and improbability, because we wish to distinguish one horizontal orientation, heads, from its flipped horizontal orientation, tails.

A suitable continuous function of probability, P, oscillating between a value of one or heads at the horizontal α = 0 degrees and a value of zero, or tails at the horizontal α = 180 degrees, would be
P = [(1/2) × cos (α)] + (1/2), where the angular velocity is constant.

The probability of tails is 1 – P = (1/2) – [(1/2) × cos (α)]

The probability of heads and the probability of tails are both one-half at α = 90 degrees and α = 270 degrees.

The probability of heads plus its improbability, which is the probability of tails, is one

Whether we visualize these functions, the one as oscillating between horizontal and vertical and the other as oscillating between heads and tails, the functions are waves.

We are thus visualizing a probability function as a wave oscillating continuously between a probability of one and zero. The probability is the fraction of the maximum magnitude of the wave as a function of α.

An Unrelated Meaning of Probability

We use the word, probability, to designate our lack of certitude of the truth or falsity of a proposition. This meaning of probability, reflects the quality of our human judgment, designating that judgment as personal opinion, rather than a judgment of certitude of the truth. This meaning of probability has nothing to do with mathematical probability, which is the fraction of an element in a logical set or, by extension, the fraction of a continuous whole.

Driven by our love of quantification, we frequently characterize our personal opinion as a fraction of certitude. This, however, itself is a personal or subjective judgment. A common error is to mistake this quantitative description of personal opinion to be the fractional concentration of an element in a mathematical set.

Errors Arising within Material Analogies of Probability

A common error is to identify material analogies or simulations of the mathematics of probability as characteristic of the material entities employed in the analogies. In the mathematics of probability and randomness the IDs of the elements are purely nominal, i.e. they are purely arbitrary. The probability relationships of a set of six elements consisting of three elephants, two saxophones and one electron are identical to those of a set of three watermelons, two paperclips and one marble. This is so because the IDs are purely nominal with respect to the relationships of probability.

In analogy, the purely logical concepts of random mutation and probability are not properties inherent in material entities such as watermelons and snowflakes. This is in contrast to measureable properties, which, as the subject of science, are inherent in material entities.

The jargon employed in analogies of the mathematical concepts also leads us to confuse logical relationships among mathematical concepts with the properties of material entities. In the roll of dice we say the probability of the outcome of boxcars is 1/36. We think of the result of the roll as a material event, which becomes a probability of one or zero after the roll, while it was a probability of 1/36 prior to the roll of the dice. In fact, the outcome of the roll had nothing to do with probability and everything to do with the forces to which the dice were subjected in being rolled. The analogy to mathematical probability is just that, a visual simulation of purely logical relationships.

We are also tempted to think of the probability 1/36 as the potential of boxcars to come into existence, which after the roll is now in existence at an actuality of one, or non-existent as a probability of zero. In this, we confuse thought with reality. Probability relationships are solely logical relationships among purely logical elements designated by nominal IDs. Material relationships are those among real entities, whose natures determine their properties as potential and as in act.

Quantum Mechanics

In quantum mechanics, it is useful to treat energy as continuous waves in some instances and as discrete quanta in others. It is useful to view the wave as a probability function and the detection or lack of detection of a quantum of energy as the probability function’s collapse into a probability of one or of zero, respectively.

An Illustration

As an aid to illustrate the relationship of a probability function as a wave and its outcome as one or zero in quantum mechanics, Physicist, Stephen Barr, proposed the analogy,

“This is where the problem begins. It is a paradoxical (but entirely logical) fact that a probability only makes sense if it is the probability of something definite. For example, to say that Jane has a 70% chance of passing the French exam only means something if at some point she takes the exam and gets a definite grade. At that point, the probability of her passing no longer remains 70%, but suddenly jumps to 100% (if she passes) or 0% (if she fails). In other words, probabilities of events that lie in between 0 and 100% must at some point jump to 0 or 100% or else they meant nothing in the first place.”

Problems with the Illustration

The illustration fails to distinguish the purely logical relationships of mathematical probability from the existential relationships among the measurable properties of material entities. The illustration identifies probabilities as being of events rather than identifying probabilities as logical relationships among purely logical entities designated by nominal IDs. It claims that probability must transition from potency to act or it is undefined. In contrast, probability is the fractional concentration of an element in a logical set. The definition has nothing to do with real entities, whose natures have potency and are expressed in act.

Another fault of the illustration is that it is not an illustration of mathematical probability, but an illustration of probability in the sense of personal opinion. Some unidentified individual is of the opinion that Jane will probably pass the French exam. The unidentified individual lacks human certitude of the truth of the proposition that Jane will pass and uses a tag of 70% to express his personal opinion in a more colorful manner.

It is a serious error to pick an example of personal opinion to illustrate a wave function, viewed as a probability function. A wave function, such as that associated with the flip of a coin oscillating between heads as a probability of one and tails as a probability of zero, would have served the purpose well.

Of course, a wave, viewed as a probability function, is not the probability of an event. It is the continuous variable, probability, whose value oscillates between one and zero, and as such assumes these and the intermediate values of probability transiently. The additional condition is that when the oscillation is arrested, the wave collapses to either of the discrete values, one and zero, the presence or absence of a quantum. The collapse is the transition of logical state from one of continuity to one of discreteness.

Synonym or Antonym

Algorithmic and systematic are synonyms.

We typically think of systematic and random as antonyms, namely as ordered and non-ordered. Consistently, we superficially think of random selection as non-systematic mutation.

Yet, in the mathematics of sets, defining the process of random mutation is algorithmic. The primary conclusion to be demonstrated in this essay is: In the mathematics of sets, random mutation is algorithmically defined, i.e. it is systematic / ordered selection.

A set may be defined by the complement of its probabilities. Probability is the fractional concentration of an element in a logical set. Two sets having the same complement of probabilities may differ from one another by an integral multiple of their elements.

The simplest set defined in terms of its complement of probabilities is a set of one unique element. The probability of that element is one. The next simplest set consists of two unique elements for which the complement of probabilities consists of one-half and one-half.

In defining new sets based on the probabilities of a source set, the probabilities of the source set are retained, but the sets differ in that the ‘elements’ of the derived set are actually subsets composed of the elements of the source set.

The derived set, in its complement of probabilities, is derived by an algorithm, which identifies by ordered mutation, the contents of its subsets, where its subsets are viewed as the elements of the derived set.

In illustration, this is all very simple

Consider a source set of two unique elements, Heads and Tails. One set derived from the source set is a set consisting of subsets of three elements, which are ‘randomly selected’ from the source set. By this algorithm of ‘random mutation’ eight subsets are defined, if sequence is retained in the subsets. These eight subsets, which comprise the derived set, are: H,H,H; H,H,T; H,T,H; H,T,T; T,H,H; T,H,T; T,T,H and T,T,T. The complement of probabilities of the derived set are eight probabilities of 1/8 each.

If the algorithm of derivation does not retain sequence, then the derived set would contain only four unique subsets of three elements each. These would be one subset of three Heads, two subsets of two Heads and one Tail, two subsets of one Head and two Tails and one subset of three Tails. That is a total of six subsets, which are deemed the elements of the derived set. In this derived set, the complement of probabilities would be 1/6, 1/3, 1/3, and 1/6, respectively.

One Objection and Its Clarification

Suppose it is objected that a probability can be calculated on the basis of random selection from a source set without advertence to the composition of a derived set. For example, the probability of selecting H,T,H from the source set is (1/2) × (1/2) ×(1/2) = 1/8. It would seem that the probability is independent of a derived set. However, by definition probability is the fractional concentration of an element in a logical set. Consequently, the probability of 1/8 has meaning only in the context of a set of eight elements derived from the source set. This objection is based on shrinking the focus of attention, on not seeing the big picture.

Note that in any derivation of a new set from a source set by random mutation, if subsets are not viewed as elements, but the elements of the derived set are identified exactly as they are in the source set, then the complement of probabilities of the derived set is identical to the complement of probabilities of the source set. From this perspective the derived set is simply an integral multiple of the source set. In the derived set illustrated, there are a total of 12 Heads and 12 Tails. The probabilities of Heads and of Tails in both the source set and the derived set are one-half and one-half.

Another Objection and its Clarification

But doesn’t viewing random selection or mutation as a fully systematic algorithm contradict our common conception of random as non-ordered? Not really. Rather, it requires us to be more cognizant of what we mean by random.

Consider the conventional sequence of the ten Arabic numerical symbols: 0,1,2,3,4,5,6,7,8,9. We would consider that sequence and Sequences I through III as non-random. Sequence I: 0,2,4,6,8,1,3,5,7,9. Sequence II: 0,9,8,7,6,5,4,3,2,1. Sequence III: 8,6,4,2,0,1,3,5,7,9. However, we consider these four sequences non-random only because it is easy for us to impute a pattern to them. There are 10! = 3,628,800 different sequences of ten symbols. Our intellectual capacity is such that we cannot be equally familiar with each sequence to see each of them on a par with the others. It is only by familiarity with a convention which renders the sequences of this paragraph ‘ordered’ and roughly three million others ‘random’. In other words, our labelling anything as ‘random’ does not designate that which is labelled as non-ordered. The label, ‘random’, designates a limitation in our knowledge. It designates our ignorance. The lack of order or randomness is in our knowledge of the subject, not it the subject itself.

Each of the 3,628,800 different sequences of ten arbitrary symbols is an ordered sequence. It is only be defining an arbitrary conventional sequence that one or more of the sequences can be considered ordered and the others as non-ordered. In imposing such a convention, the symbols are no longer arbitrary, but are defined in relation to one another.

Other Sources of Confusion

In the mathematics of probability and randomness the IDs of the elements are purely nominal, i.e. they are purely arbitrary. The probability relationships of a set of six elements consisting of three elephants, two saxophones and one electron are identical to those of a set of three watermelons, two paperclips and one marble.

The fact that probability and randomness are purely logical concepts is befogged by the jargon employed in material emulations. We identify the probability as 1/4 for the top card’s being a diamond after shuffling a deck of cards. Also, the outcome of a diamond in two successive trials is 1/16. It appears as if probability characterizes an event or an outcome. We think that this probability has nothing to do with deriving a new set from a source set. We are oblivious to the fact that two diamonds, or two D’s is one out of sixteen subsets of a set derived from a source set of four elements, such as Spades, Hearts, Diamonds, Clubs or S,H,D,C. The sixteen subsets or elements of the derived set are: SS, SH, SD, SC; HS, HH, HD, HC; DS, DH, DD, DC; CS, CH, CD, CC. The fractional concentration of DD in this set is 1/16. Absent this logical set of sixteen elements, to say that the probability of the outcome of two diamonds in succession is 1/16 would be meaningless. Probability is the fractional concentration of an element in a logical set. It is not an event or outcome.

In material emulations of mathematical probability, a material outcome is truly an event, all by itself, in its materiality. The confusion arises because we tend to characterize the material event by the numerical value of probability, when probability is meaningful only in reference to a logical set and not to anything material. The analog, which is the basis for the material emulation of the logical concept, equates randomness with the absence of human knowledge of the material causes of the material outcome. For example, we are ignorant of the forces which determine the outcome of a coin flip and label the outcome random.

The fact that probability is meaningful only in reference to a fully defined logical set prompts the error of thinking that in material analogies the material set must exist materially to validate the analogy. For the material analogy, where the sequence of a deck of cards after shuffling represents a probability of one in 52! = 8.06 × 10^ 67, that many decks of playing cards cannot possibly have material existence. This error, of transferring a logical requirement into a material requirement to validate a material analogy, leads to the proposed existence of a multiverse to explain, e.g. the probability of the physical constants of the universe.

The fog, which diminishes our ability to see the concept of mathematical probability clearly, is compounded by the fact that the word, probability, has a meaning entirely apart from mathematics. In this other meaning, probability is qualitative. This other meaning of probability is the certitude with which one judges a proposition to be true. Human certitude of the truth of a proposition has nothing to do with the fractional concentrations of elements in logical sets.

Conclusion

In mathematics, probability is the fractional concentration of an element in a logical set. Therefore, random mutation is systematic selection in spite of (1) jargon, which deflects or shrinks the focus of our attention and (2) the non-mathematical use of the word, probability, which refers to the quality of human certitude.

Logic requires a fully defined logical set in order to specify a numerical value of probability as a fraction of that set. In a material analogy, this does not necessitate the material existence of a set, but only its logical definition, in order to specify a numerical value of probability. Such analogies are emulations of the logical concepts.

In analogy, the purely logical concepts of random mutation and probability are not properties inherent in material entities such as watermelons and snowflakes. This is in contrast to measureable properties, which, as the subject of science, are inherent in material entities.

In an exchange of comments with Phil Rimmer on the website, StrangeNotions.com, I attempted to explain the distinction between probability and efficiency. The topic deserves this fuller exposition.

I have argued that Richard Dawkins does not understand Darwinian evolution because he claims that the role of replacing a single stage of random mutation and natural selection with a series of sub-stages increases the probability of evolutionary success. In The God Delusion (p 121) he titles this ‘solving the problem of improbability’, i.e. the problem of low probability. My claim is that replacing the single stage with the series of sub-stages increases the efficiency of mutation while having no effect upon the probability of success.

Using Dawkins’ example of three mutation sites of six mutations each, I have illustrated the efficiency at a level of probability of 85.15%, where the series requires only 54 random mutations, while the single stage requires 478.

It may be noted that at a given number of mutations, the probability of success is greater for the series than for the single stage. A numerical example would be at 54 total mutations. For the series the probability of success in 85.15%, whereas at 54 total mutations, the probability of the single stage is only 22.17%. The series has the greater probability of success at a total of 54 mutations.

This would appear to be a mortal blow to my argument. It would seem that Richard Dawkins correctly identifies the role of the series of sub-stages as increasing the probability of success, while not denying its role of increasing the efficiency of mutation. It would seem that Bob Drury errs, not in identifying the role of the series as increasing the efficiency of mutation, but in denying its role in increasing the probability of evolutionary success.

Hereby, I address this apparently valid criticism of my position.

The Two Equations of Probability as a Function of Random Mutations

The probability of evolutionary success for the single stage, PSS, as a function of the total number of random mutations, MSS, is:
PSS = 1 – (215/216)^MSS

The probability of evolutionary success for the series of three sub-stages, PSR, as a function of the total number of random mutations per sub-stage, MSUB, is:
PSR = (1 – (5/6)^MSUB)^3.

For the series, the total number of mutations is 3 x MSUB.

Comparison of Probability at the Initial Condition

At zero mutations, both probabilities are zero. Initially, the probability of both processes, namely the single stage and the series of sub-stages is the same.

For the single stage at one random mutation, which is the minimum for a positive value of probability, the probability of success is 1/216 = 0.46%.

For the series of three stages, at one random mutation per stage, which is the minimum for a positive value of probability, the probability of success is (1/6)^3 = 1/216 = 0.46%. At this level of probability, the single stage has the greater mutational efficiency. It takes the series three random mutations to achieve the same probability of success as the single stage achieves in one random mutation.

Comparison of the Limit of Probability

For both the single stage and for the series of three stages, the limit of probability with the increasing number of mutations is the asymptotic value of 100% probability.

Comparison of the Method of Increasing Probability

For both the single stage and for the series of three stages, the method of increasing the probability is the same, namely increasing the number of random mutations. For both, probability is a function of the number of random mutations.

Comparison of the Intermediate Values between the Initial Condition and the Limit

For both the single stage and for the series of three stages, probability varies, but continually increases from the initial condition toward the limit.

Excepting for values of total mutations less than six, i.e. two per sub-stage, at every level of probability, the series requires fewer mutations than does the single stage. Correspondingly, at any number of mutations greater than six, the series has a higher value of probability than the single stage. Thus, if the comparison is at a constant value of probability, the series requires fewer mutations. If the comparison is at a constant value of mutations, the series has a higher value of probability.

Apparent Conclusion

Richard Dawkins is right in that the series increases the probability of success, without denying that it also increases the efficiency of mutation. Bob Drury is wrong in denying the increase in probability.

The Apparent Conclusion Is False, in Consideration of the Concept of Efficiency

Both the single stage and the series of sub-stages are able to achieve any value of probability over the range from zero toward the asymptotic limit.

Efficiency is the ratio of output to input. One system or process is more efficient than another, if its efficiency is numerically greater. There is no difficulty in comparing two processes where the efficiency of both systems is constant. In such a case, output starts at zero at input equals zero. Output is a linear function of input, having a constant positive slope. The process with the higher positive slope is more efficient than the other. However, in cases where the efficiencies vary, the comparison of efficiencies must be determined at the same value for the numerator of the ratio of efficiency, i.e. the output, or at the same value for the denominator, the input.

In this comparison of the single stage vs. the series of sub-stages, the output is probability and the input is the number of random mutations. Remember both processes increase probability by the same means, namely by increasing the number of random mutations. That is, output increases with increasing input. Also, remember that both processes do not differ in that they both approach the same limit of probability asymptotically.

Dawkins’ comparison of replacing the single stage with a series of sub-stages is the comparison of two processes.

In the numerical examples above we can calculate and compare the efficiencies of the two processes at a constant output, e. g. of 85.15% probability and at a constant input, e.g. of 54 mutations.

At the constant output of 85.15%, the efficiency for the single stage 85.15/478 = 0.18. For the series of sub-stages the efficiency is 85.15/54 = 1.57. The mutational efficiency is greater for the series than for the single stage at the constant output of 85.15% for both processes.

At the constant input of 54 mutations, the probability for the single stage is P = 1 – (215/216)^54 = 22.17%. Therefore, the efficiency is 22.17/54 = 0.41. At this constant input, efficiency for the series is 85.15/54 = 1.57. The mutational efficiency is greater for the series than for the single stage at the constant input of 54 mutations for both processes.

At the 85.15% probability level, the series is greater in mutational efficiency than the single stage by a factor of 478/54 = 8.8

Further evidence that Dawkins is illustrating an increase in efficiency and not an increase in probability is that he compares the temporal efficiencies of two computer programs. For both programs, the input of the number of random mutations is equated with the time of operation from initiation to termination. Termination is upon the random inclusion of one specific mutation. The sub-stages based program typically wins the race against the single stage based program. This demonstrates the greater mutational efficiency of the sub-series, not the greater probability of success.

In the numerical example of three sites of six mutations each, the specific mutation would be one of 216. Let us modify the computer program races slightly. This will give us a greater insight into the meaning of probability and the meaning of efficiency.

Let each program be terminated after 54 and 478 mutations for the series and the single stage, respectively. If the comparison is performed 10,000 times, one would anticipate that on the average, both programs would contain at least one copy of the specific mutation in 8,515 of the trials and no copies in 1,485 of the trials. The series program would be more efficient because it took only 54 mutations or units of time, compared to 478 mutations or units of time for the single stage program to achieve a probability of 85.15%.

For the numerical illustration of three mutation sites of three mutations each, both the single stage and the series of sub-stages have the same initial probability of success greater than zero, namely, 0.46%. Both can achieve any value of probability short of the asymptotic value of 100%. They do not differ in the probability of success attainable.

It doesn’t matter whether we compare the relative efficiencies of the series vs. the single stage at a constant output or a constant input, the series has the greater mutational efficiency for total mutations greater than six.

For the numerical illustration of three mutation sites of three mutations each, at a probability of 85.15%, the series is greater in mutational efficiency by a factor of 8.8. At 90% probability, the factor of efficiency is 8.9 in favor of the series. At a probability of 99.9999%, the factor of efficiency is 12.1 in favor of the series.

Analogy to a Different Set of Two Equations

Let the distance traveled by two autos be plotted as a function of fuel consumption. Distance increases with the amount of fuel consumed. Let the distance traveled at every value of fuel consumption be greater for auto two than auto one. Similarly, at every value of distance traveled, auto two would have used less fuel than auto one. My understanding would be inadequate and lacking comprehension, if I said that replacing auto one with auto two increases the distance traveled. It would be equally inane to say that auto two solves the problem of too low a distance. My understanding would be complete and lucid, if I said that replacing auto one with auto two increases fuel efficiency.

There is a distinction between distance and fuel efficiency. Understanding the comparison between the two autos is recognizing it as a comparison of fuel efficiency. Believing it to be a comparison of distances is a failure to understand the comparison.

For both the single stage and the series of sub-stages of evolution, probability increases with the number of random mutations. Except for the minimum number for the sub-series, at every greater number of random mutations, the probability is greater for the series of sub-stages than for the single stage of evolution. Similarly, except for the minimum positive value, at every value of probability, the series requires fewer random mutations. My understanding would be inadequate and lacking comprehension, if I said that replacing the single stage with the series increases the probability attained. It would be equally inane to say that the series solves the problem of too low a probability. My understanding would be complete and lucid, if I said that replacing the single stage with the series increases mutational efficiency.

The role of a series of sub-stages in replacing a single stage of random mutation and natural selection is to increase the efficiency of random mutation while having no effect on the probability of evolutionary success. This is evident by comparing the equations of probability for the series and for the single stage as functions of the number of random mutations. This is the very comparison proposed by Richard Dawkins for the sake of understanding evolution. He misunderstood it as “a solution to the problem of improbability” (The God Delusion, page 121), i.e. as solving the problem of too low a probability.

There is a distinction between probability and mutational efficiency. Understanding the comparison between the series of sub-stages and the single stage is recognizing it as a comparison of mutational efficiency. Believing it to be a comparison of probabilities is a failure to understand the comparison.

In order to understand Darwinian evolution, every high school student must know the basic arithmetic involved. The following example illustrates the mathematical explanation of Darwinian evolution by evolutionary biologist, Richard Dawkins (The God Delusion, page 121).

The Example

The improbability of the result of the simultaneous flip of a coin, the roll of a die and the random selection of a card from a deck is 99.84%, i.e. 1 – (0.5 x 0.166 x 0.0185). Sequentially flipping a coin, rolling a die and randomly selecting a card breaks up the improbability of 99.84% into three smaller pieces, namely, 50%, which is 1 – 0.5; 83.33%, which is 1 – 0.166 and 98.08%, which is 1 – 0.0185. This is how natural selection works.

The Explanation

“(N)atural selection is a cumulative process which breaks the problem of improbability up into small pieces. Each of the small pieces is slightly improbable, but not prohibitively so. When large numbers of these slightly improbable events are stacked up in series, the end product is very very improbable indeed, improbable enough to be far beyond the reach of chance. It is these end products that form the subjects of the creationist’s wearisomely recycled argument. The creationist completely misses the point, because he (women should once not mind being excluded by the pronoun) insists on treating the genesis of statistical improbability as a single, one-off event. He doesn’t understand the power of accumulation.”

Dawkins illustrated this with his own numerical example of three mutation sites of six mutations each. Taking random mutation as a one-off event affecting all three sites simultaneously, the improbability of the outcome is 99.54% for one random mutation, i.e. 1 – (1/216). Subjecting each site individually to random mutation, the improbability of each of the three stages is 83.33%, i.e. 1 – (1/6), for one random mutation in each sub-stage. In biological evolution a sub-stage is terminated by natural selection. Thereby, natural selection breaks the improbability of 99.54% into three smaller pieces each of which is 83.33%.

The creationist, would claim that there is no change in the improbability. The overall improbability of the series equals the improbability of the single one-off stage. The improbability would be 99.54% in both the single stage affecting all three sites and in the series of three sub-stages, each affecting only one site. The creationist mistakenly thinks that the probabilities of a series multiply, thereby yielding the same improbability for the one-off event and for the series. The creationist doesn’t understand the power of accumulation, which applied to a single, one-off event, breaks up the improbability into smaller pieces of improbability.

Coordinately, the power of accumulation must break up the small piece of probability of the single, one-off event into three larger pieces of probability forming the series. The small piece of probability of the single, one-off event is 0.46%, i.e. 1/216, while each of the three larger pieces of probability, into which the small piece is broken, is 16.67%, i.e. 1/6.

According to Dawkins, the creationist thinks the probability of the single, one-off event is equal to the product, not the sum, of a sub-series of probabilities. Thereby, according to Dawkins, the creationist is oblivious to the power of accumulation, i.e. to the power of addition in arithmetic.

Evaluation

In fact it is Dawkins who has the arithmetic wrong. The probabilities of a series are the factors, whose multiplication product is the overall probability of the series. The overall probability of a series is not the sum of the probabilities of a series, nor is the overall improbability of a series the sum of the improbabilities of the series. The overall improbability is not broken up into smaller pieces, which are united by accumulation, i.e. by summation. Neither is the overall probability broken up into larger pieces, which are united by accumulation, i.e. by summation.

Dawkins’ confusion of the arithmetical operations of addition and multiplication, leads him to the false belief that sub-staging in Darwinian evolution increases the probability of evolutionary success. It also blinds Dawkins to what natural selection truly accomplishes through sub-stages. Through sub-staging, natural selection increases the efficiency of random mutation.

In the case of one random mutation per sub-stage the probability of evolutionary success per sub-stage is 16.67%. Overall the probability of success is 0.46%, while the total number of mutations for the three stages is three. To achieve this same probability of success, i.e. 0.46%, the single stage requires only one mutation. At the 0.46% level of success, the single stage is more efficient in the number of mutations by a factor of three. However, at higher levels of evolutionary success, this quickly changes, resulting in greater mutational efficiency for the series of sub-stages.

At two random mutations per sub-stage for a total of six, the probability of success overall is 2.85%, while for that level of success, the single stage also requires six mutations by rounding down from a calculated 6.23 mutations. At three random mutations per sub-stage for a total of nine, the probability of success overall is 7.48%, while for that level of success, the single stage requires sixteen random mutations by rounding down from a calculated 16.75 mutations.

Mutational efficiency in favor of sub-stages increases the higher the level of evolutionary success. Eighteen random mutations per sub-stage for a total of fifty-four random mutations, yield an overall probability of evolutionary success of 89.15%. To achieve the 89.15% level of probability of success, the single, one-off stage requires 478 random mutations.

For levels of 0.46%, 2.85%, 7.48% and 89.15% of the probability of Darwinian evolutionary success, the efficiency factor for random mutations in favor of the series of sub-stages goes from less than one, namely 1/3, to 6/6 to 16/9 to 478/54. That last ratio is an efficiency factor of 8.85.

By confusing multiplication and addition, Dawkins fails to understand the role of sub-stages in Darwinian evolution. Sub-staging has no effect on the probability of evolutionary success. Rather, it increases the efficiency of random mutation in Darwinian evolution.

Note

The probability, P, of Darwinian evolutionary success for a stage of random mutation and natural selection is a function of n, the total number of different mutations defined by a stage, and of x, the number of random mutations occurring in that stage. P = 1 – ((n – 1)/n)^x. The probability of a series is the product of the probabilities of the stages in the series.

This essay is presented from the perspective of the philosophy of probability.

It is of the nature of a horse to have two eyes. Such is the ancient and naive ‘explanation’. However, that is not an explanation or even an answer. At best, it is simply a statement of the observation, which prompts the question. At worst, it is a copout, which evades the question.

More recent observations are that predatory animals have two eyes in a plane enabling binary vision. In contrast, animals, like the horse, which are prey to others, have eye placement approximately in two planes, which are the two sides of their heads. Such placement affords nearly a complete view of the sphere from which attack by a predator may occur.

In prey, the placement of the eyes approximately forms the axis of a globe of visual sensation, over which the eyes are uniformly distributed. In the case of fewest eyes, namely two, the eyes are at the axis of the globe. The axis may be viewed as the diameter of one longitudinal circumference.

The more eyes distributed uniformly over the globe of visual sensation, the more complete and uniform would be the monitoring of the sphere of predatory attack. Let the algorithm of adding uniformly distributed eyes to the globe of visual sensation be by adding equally spaced longitudinal circumferences, where each circumference has a number of eyes equaling twice the number of circumferences, while maintaining the location of the two original eyes at the axis of the globe.

By this algorithm, the relationship between the number of equally spaced eyes, N, and the number of longitudinal circumferences, n, would be: N = (n^2 – (n -1)) × 2. For n = 1, 2, 3 and 4, the number of uniformly distributed eyes, N, is 2, 6, 14 and 26. The practical upper limit would be determined by the size of the eyes and the size of a horse’s head. Also, the practical number of eyes would be diminished by those of the virtual globe of visual sensation which would be eliminated due to the neck of the horse.

Why do horses have two eyes? Horses have two eyes as a uniform distribution of visual sensors forming a global base for monitoring an attack from any point of the sphere of predation.

It should be apparent that the fact that animals of prey in the scope of human observation, such as the horse, have just two eyes is one possibility among several. It happens that in our universe the number is two. However, the fact that our observation is limited to our earth in our universe voids the rationale, which claims that it is of some fundamental character of a horse that it has two eyes. Indeed, it has two eyes in our universe, but horses may have any range of a number of eyes, notably greater than two, in other regions of the multiverse. It is the multiverse which explains the fact that we observe within our universe just one of many possibilities, where in accord with probability, the number of eyes of earth horses is two.

It is evident that the existence of the multiverse in cosmology is not a consequence solely of the science of physics and the numerical values of the physical constants in our universe. The multiverse is harmonious with biology as well and with what seems like a simple question, ‘Why do horses have two eyes?’

This essay is presented by its author on the supposition of his virtual assignment to the debate side expressed by the title. It is prompted by the impression that published views, on the con side of the debate, typically dismiss the pro side as intellectually and philosophically trivial. Consequently, the con side has not adequately addressed the issue of debate.

The issue or thesis is that human knowledge of material reality is the inference of mathematical probability. Hahn and Wiker (Answering the New Atheism, p 10) accuse Dawkins of an irrational faith in chance when Dawkins has explicitly denied chance as a solution (The God Delusion, p 119-120). Feser (The Last Superstition) does not even discuss mathematical probability, although identifying Dawkins as his main philosophical opponent. In a few instances Feser uses the word, probability, but in the sense of human certitude, not in the mathematical sense.

The Historical Issues

There were two dichotomies with which the ancient Greek philosophers wrestled. One was the discrete and the continuous. The other was the particular and the universal.

The Discrete and the Continuous

Zeno of Elia was a proponent of the discrete to the denial of the continuous. This took the form of a discrete analysis of motion. Any linear local motion takes a finite time to proceed halfway, leaving the remainder of the motion in the same situation. If local motion were real, it would take an infinite number of finite increments of time and also of distance to complete the motion. Therefore, motion is an illusion. From this perspective, it is assumed that the discrete is real. When subjected to discrete analysis, motion, which is continuous, is seen to be untenable.

Heraclitus of Ephesus took the opposite view. Everything is always changing. It is change, which is real. Things as entities, i.e. as implicitly stable, are mental constructs. They are purely logical. It is continuous fluidity which is reality.

The Particular and the Universal

It was apparent to both Plato and his student, Aristotle, that the object of sense knowledge was particular, completely specified. In contrast, intellectual concepts were universal, not characterized by particularity, but compatible with a multitude of incompatible particulars. Plato proposed that sense knowledge of the particular was a prompt to intellectual knowledge, recalling a memory when the human soul, apart from the body, had known the universals.

Aristotle proposed that material entities or substances were composites of two principles. One was intelligible and universal, the substantial form. The other was the principle of individuation or matter, which enabled the expression of that universal form in a complete set of particulars. The human soul had the power to abstract the universal form from a phantasm presented to it by sense knowledge of the individual material entity in its particularities.

From this binary division into the two principles of substantial form and matter arose the concept of formal causality. The form of an entity made an entity to be what it was. It was the formal cause, whereas the particular material substance, as a composite of form and matter, was the effect. Thus, cause and effect were binary variables. The cause is absent, 0, or present, 1, and its effect was correspondingly binary as absent, 0, or present, 1. Thereby, the philosophy of formal causality was tied to the discrete mathematics of binary arithmetic.

The Modern Assessment of Form

This discrete and binary view of formal causality was subtly undermined in the 19th century. What led to its demise was the study of variation in biological forms. Darwin proposed that the modification of biological forms was due to the generation of variants by random mutation and their differential survival due to natural selection.

Superficially this appeared to be consonant with the distinction of one substantial form, or identity of one species, as discretely distinct from another. However, it was soon realized that the spectrum of seemingly discrete and graduated forms was, in its limit, continuous variation. One species in an evolutionary line did not represent a discretely distinct substantial form from the next substance in the spectrum. Rather, they were related by continuous degree http://www.richarddawkins.net/news_articles/2013/1/28/the-tyranny-of-the-discontinuous-mind#. The distinction of one biological form from another, as substantial, was an imposition of the human mind on biological reality. To save at least the jargon of Aristotelian philosophy, it could be said that the evolutionary and graduated differences among living things were accidental differences among individuals of one substantial form, namely the substantial form, living thing.

The Resultant Modern Assessment of Efficient Causality

Apart from formal causality, Aristotle also identified efficient causality, namely the transition of potency to act. This would include all change, both substantial change and local motion. In keeping with the limitations of binary arithmetic, efficient causality and its effect were identified as absent, 0, and present, 1. However, concomitant to the implication of the random mutation of forms, which renders the substantial form of living things a continuum, is the implication of mathematical probability as the outcome of an event. Just as the realization that the mutation of forms defined a continuous spectrum for formal causality, probability defines a continuous spectrum from 0 to 1, for efficient causality. Efficient causality is the probability of an outcome, the probability of an event. The outcome or event as the effect is within a continuous spectrum and proportional to its continuous efficient cause, which is mathematical probability. Thus, the inference of mathematical probability as the mode of human knowledge of material reality, frees efficient causality and its effect from the restrictions of binary arithmetic.

Causality was no longer discrete and binary. Causality was the amplitude from 0 to 1 of the continuous variable, probability. Causality had now the nuance of degree, made possible by the rejection of discrete, binary arithmetic in favor of continuity. The magnitude of the effect was directly proportional to the amplitude of the cause. The simplicity of discrete, binary arithmetic, which is so satisfying to the human mind, was replaced by what we see in nature, namely degree.

A Clearer Understanding of Chance

Hume had rejected the idea of efficient causality. He claimed that, which we propose as cause and effect, was simply a habit of association of a sequence of events. In this view, we label as an effect the next in a series of events according to what we anticipate due to our habit of association. The understanding of probability as causality having amplitude restores cause and effect, negating Hume’s denial.

Mathematical probability is the fractional concentration of an element, x, of quantity, n, in a logical set of N elements. This fraction, n/N, has a lower limit of 0 as n → 0. The limit, 0, is a non-fraction. The upper limit of the fraction, probability, n/N, as n → N is 1, a non-fraction. These non-fractional limits represent the old, binary conception of causality. Properly understood, these limits demarcate the continuum of probability, the continuum of efficient causality.

The binary definition of chance was an effect of 1, where the cause was 0. In recognizing probability as efficient causality, this does not change. No one offers chance as an explanation (The God Delusion, p 119-120). In the context of probability, however, the binary concept of chance yields to a properly nuanced understanding. Chance is directional within the continuum of probability. Causality tends toward chance as the probability tends toward 0. This is mathematically the same as improbability increasing toward 1. Consequently, Dawkins notes that a decrease in probability is moving away from chance by degree, “I want to continue demonstrating the problem which any theory of life must solve: how to escape from chance.” (The God Delusion, p120). This escape from chance by degree is explicit, “The answer is that natural selection is a cumulative process, which breaks the problem of improbability up into small pieces. Each of the small pieces is slightly improbable, but not prohibitively so.” (The God Delusion, p121)

Often in common parlance, chance and probability are synonyms: The chance or probability of heads in flipping a coin is one-half. In recognizing probability as the spectrum of efficient causality they are not synonyms. Chance is properly understood as directional movement toward the lower end of the spectrum of probability.

Mathematical Probability and Human Certitude Merge

The recognition of efficient causality as the continuum of probability introduces a distinction between mathematical chance as directional and mathematical probability as spectrum. On the other hand, this recognition merges the meaning of mathematical probability and probability in the sense of an individual’s certitude of the truth of a proposition.

In the Aristotelian discrete binary view of efficient causality, an individual’s certitude of the truth of a proposition though commonly labeled, probability, was strictly qualitative and subjective. One could of course, describe his certitude on a numerical scale, but this was simply a subjective accommodation. For example, stating a numerical degree of one’s certitude was just for the fun of it within a discussion of politics by TV pundits. In spite of adopting an arbitrary scale such as zero to ten, to express a pundit’s certitude, human certitude was still recognized as qualitative.

The recognition of efficient causality as the continuum of mathematical probability, implies that human knowledge is the inference of mathematical probability and, indeed, a matter of degree. There is no distinction between the probability of efficient causality and the degree of certitude of human knowledge. Human certitude, which was thought to be qualitative, is quantitative because human knowledge is the inference of mathematical probability.

Final Causality

Final causality or purpose is characteristic of human artifacts. However enticing as it may be, it is simply anthropomorphic to extrapolate purpose from human artifacts to material reality (The God Delusion, p 157). In the binary context of form and matter, it was quite easy to give in to the temptation. Once binary arithmetic was discarded with respect to formal and efficient causality, the temptation vanished. The continuity of probability not only erased the discrete distinctions among forms, but melded formal causality and efficient causality into the one continuous variable of probability. Final causality is identifiable in human artifacts and in a philosophy based on binary arithmetic. It serves no purpose in a philosophy based on the continuity arising from the inference of mathematical probability from material reality.

Conclusion: Regarding the Existence of God

Binary arithmetic was Aristotle’s basis for the distinction of substantial form and matter in solving the problem of the particular and the universal. The form was the intelligible principle which explained the composite, the particular substance. The composite was identified as the nature of the individual material entity. However, this implied a discrete distinction between the nature of the individual substance and its existence. One binary solution led to another binary problem: How do you explain the existence of the individual when its form, in association with matter, merely explains its nature? The Aristotelian solution lay in claiming there must be a being, outside of human experience in which there was no discrete distinction between nature and existence. That being would be perfectly integral in itself. Thereby, it would be its own formal, efficient and final causes. Its integrity would be the fix needed to amend the dichotomy of the nature and existence of the entities within our experience.

Both the problem and its solution arise out of the mindset of binary arithmetic. The problem is to explain a real, discrete distinction between nature and existence in material entities. Its solution is God, an integral whole. In contrast, the problem does not arise in the philosophy of probability, which expands philosophical understanding to permit the concept of mathematical continuity. That philosophy allows the human epistemological inference of mathematical probability. Probability and its inference from material reality, do not require a dichotomy between formal and efficient causality. In that inference, expressed as amplitude, both form and existence are integral. There is no need of a God, an external source, to bind into a whole that which is already integral in itself.

In Aristotelian philosophy, it is said that there is only a logical distinction between God’s nature and God’s existence, whereas there is a real distinction of nature and existence in created entities. The philosophy of probability avoids the dichotomies arising out of Aristotelian binary arithmetic. In the philosophy of probability there is only a logical distinction between formal and efficient causality in material things. There is no real dichotomy for a God to resolve.