Kevin's Math Page # Analysis of the Probability of the Last Digit of the Next Prime Kevin Gong March 17th, 2016

There was a recent article making the rounds concerning the distribution of prime numbers: Mathematicians Discovered Something Super Freaky About Prime Numbers. The article says that, in the first hundred million primes, a prime ending in 1 is followed by another ending in 1 just 18.5 percent of the time, and that:

 "That shouldn't happen if they were truly random - we should expect to see this happen 25 percent of the time (keep in mind that primes can only end in 1, 3, 7, or 9)."

It immediately struck me that the 25% probability was wrong. In fact, the probability would be less. It is true that if you took two completely random different positive integers ending in 1, 3, 7, or 9, that the probability the would both end in the same digit is 25%. However, that's not what we're measuring here. There are two important differences:

1. The numbers are bounded, in this case, no bigger than the 100 millionth prime.
2. There are more than two numbers.

First, let's see how boundedness changes the probability. Suppose that instead of the numbers being unbounded, you picked two random different positive integers ending in 1, 3, 7, or 9, but they both had to be less than 10. What's the probability they both end in the same digit? Zero, since it's impossible. Ok...what if the numbers had to be less then 20? Then there are 8 such numbers (1, 3, 7, 9, 11, 13, 17, 19). Once you pick the first number, there are 7 remaining numbers, only one of which has the same last digit. So the probability is 1/7 = 14.3% (not even close to 25%).

One brief analogy: In Texas Hold'em poker, the probability of getting dealt a pocket pair is not 1 in 13, as the authors of the prime article might have you believe. It's 1 in 17. Suppose you are dealt one ace, Then there are 3 aces out of the remaining 51 cards (3/51 = 1/17).

Now, of course as you raise the upper bound of the numbers, the probability will get closer and closer to 25%. However, as we combine this with the second issue, we'll see it doesn't get close to 25% very fast.

The second issue has to do with the density of the primes. Suppose, instead of picking just 2 numbers, I asked you to pick 6 random different positive integers ending in 1, 3, 7, or 9, and they all have to be less than 20. What's the probability that any two consecutive numbers end in the same digit? Zero. The possible numbers are (1, 3, 7, 9, 11, 13, 17, 19), so it isn't hard to prove that, when we pick 6 of them, it's impossible that any two consecutive numbers end in the same digit. What if we pick 5 numbers? Then it can only happen in a few ways: (1, 11, 13, 17,19), (1, 3, 13, 17, 19), (1, 3, 7, 17, 19), and (1, 3, 7, 9, 19). That's 4 ways, and there are (8 choose 6) = 28 ways of picking the numbers. So in 24 of the cases, none of the 4 pairs of consecutive numbers would have the same last digit. In the other 4 cases, one out of each of the 4 pairs would have the same last digit. So that's 4 / 112 = 1/28, or 3.6%. Not even close to 25%. Again, the probability will get closer to 25% as the upper bound increases and the density decreases (density = number chosen divided by the upper bound).

To find the true expected probabilities, we need to know the density of primes. You can find this information on many web sites. Of course you can calculate the smaller values by hand. For example, there are 25 primes less than 100 (density = 25%). Let's start with this example. There may be a complicated mathematical analysis to determine the expected probability in this case, but I found it easier to do a Monte Carlo simulation using a computer. My program picked 25 random numbers ending in 1, 3, 7, and 9 less than 100, then calculated which consecutive numbers had the same last digit. Over millions of simulations, this only happened about 3.1% of the time (not even close to 25%). When you think about it, there are only 40 numbers less than 100 that end in 1, 3, 7, or 9, so 25/40 is pretty dense - so it's not that surprising that the probability is only 3.1% in this case.

I then increased the size of the upper bound (adjusting the density appropriately) and found this:

Max Prime
Number of Primes
Prime Density
Probability of Same Digit
Actual Pct
100
25
25.0%
3.1%
0.0%
1000
168
16.8%
9.2%
10.2%
10000
1229
12.3%
13.3%
11.8%
100000
9592
9.6%
15.8%
13.9%
1000000
78498
7.8%
17.5%
15.4%
10000000
664579
6.6%
18.6%
16.3%
100000000
5761455
5.7%
19.5%
17.2%
1000000000
50847534
5.1%
20.2%
17.9%
2038074743
100000000
4.9%
20.3%
18.1%

Note: to be truly accurate, the model should account for the fact that the primes are more dense for lower numbers. I did this adjustment and it didn't actually make much difference: For example, adjusting the value for primes under 1000 based on this fact would change the probability to 9.0% instead of 9.2%. The effect gets smaller and smaller with larger numbers, so I didn't bother including it in the above table.

In any case, as you can see, if primes were truly random, the probability of two consecutive prime numbers (within the first hundred million primes) ending in the same digit should be about 20.3%. So the 18.5% listed in the article for 1's isn't too far off (though a bit unexpected).

Having not seen the entire original research, I can't say if the authors understood all this. I am not saying that the probabilities they found are not surprising. They are, just not in the way you would expect. I found the throwaway line about 25% a bit disturbing. It showed me that people sometimes aren't very good at probability, or are careless about it.

Also, Gizmodo incorrectly states "In terms of the back-to-back distribution of the other numbers, primes ending in 3 and 7 appeared 30 percent of the time, and consecutive 9s appears about 22 of the time." That's simply not true. It appears to be sloppy reporting by Gizmodo, as the New Scientist article referenced by the Gizmodo article says "Primes ending in 3 and 7 take up the slack, each following a 1 in 30 per cent of primes, while a 9 follows a 1 in around 22 per cent of occurrences." and that is true. So while Gizmodo is saying that a 3 is followed by a 3 30% of the time, it's actually 1 that is followed by a 3 30% of the time, and so forth. This is what I found, up to the hundred millionth prime:

Last Digit of Prime
Last Digit of Next Prime = 1
Last Digit of Next Prime = 3
Last Digit of Next Prime = 7
Last Digit of Next Prime = 9
1
18.5%
29.7%
30.0%
21.8%
3
24.0%
17.8%
28.2%
30.0%
7
25.5%
27.0%
17.8%
29.7%
9
32.0%
25.5%
24.1%
18.5%

Here are the expected probabilities (if the primes were randomly distributed integers ending in 1, 3, 7, or 9):

Last Digit of Prime
Last Digit of Next Prime = 1
Last Digit of Next Prime = 3
Last Digit of Next Prime = 7
Last Digit of Next Prime = 9
1
20.3%
30.1%
26.4%
23.2%
3
23.2%
20.3%
30.1%
26.4%
7
26.4%
23.2%
20.3%
30.1%
9
30.1%
26.4%
23.2%
20.3%

As you can see, the expected and actual frequencies are not very far off in most cases. The case that shows the most difference is the 7,3 case (23.2% expected, 27.0% actual).

One more thing. Here's how density would affect the expected probabilities (up to 2038074743, which is the hundred millionth prime. This table shows the expected probability of the last digit of the next prime after a prime with last digit 1, if primes had the specified density:

Density of Primes
Last Digit of Next Prime = 1
Last Digit of Next Prime = 3
Last Digit of Next Prime = 7
Last Digit of Next Prime = 9
10.0%
15.4%
36.6%
27.4%
20.6%
5.0%
20.3%
30.2%
26.4%
23.1%
3.0%
22.1%
28.0%
25.9%
24.0%
1.0%
24.0%
26.0%
25.3%
24.7%
0.1%
24.9%
25.1%
25.0%
25.0%

As you would expect, the probabilities get closer to 25% as the densities decrease.

Kevin's Math Page