# Miller-Rabin primality test¶

A prime number is an integer, $n > 1$, which is divisible only by $1$ and itself. Primes have a ton of interesting properties and their use in modern day cryptography is essential to the digital society. Testing for primality (to property of being prime) is an important algorithmic effort. One of the more interesting types of primality tests are so called probabilistic tests. The Miller-Rabin primality test (or actually compositiveness test) is a probabilistic test that is also quite easy to implement. Here, I implement a version of the test and compare it to a naïve divisor test.

## Licenses¶

The article text and code samples are published here under different licenses. Briefly, you are allowed to use the code part of this article as you please, but the article text can only be shared unaltered within non-comercial context as long as this article posting is referred to as the source and the author is attributed by full name.

The text for this post is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

### MIT License for the code¶

Copyright 2017 Juuso Korhonen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

```
import sys
import random
import math
import time
import numpy as np
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
```

## Import a list of primes¶

For testing, I first import a list of 10 000 first primes. I have the primes in a text file called *list_of_primes.txt*.

```
def read_primes(filename):
primes = []
with open(filename, "r", encoding="utf-8") as f:
for line in f:
primes += line.split()
return [int(x) for x in primes]
primes = read_primes("list_of_primes.txt")
n_primes = len(primes)
fmt_str = "Read in {} primes: " + "{}, "*10 + "..."
print(fmt_str.format(n_primes, *primes[:10]))
```

## Naïve method¶

The comparison algorithm is a naïve method that tests if $p$ divides $n$ for $p < \sqrt{n}$. I call the algorithm naïve here, because it is usully the first method for testing that almost anyone can figure out on their own.

```
def is_prime_naive(n):
if n <= 0 or n != int(n):
raise AttributeError("n should be an integer and n > 0.")
if n == 1: # 1 is not prime
return False
if n == 2:
return True
if n % 2 == 0:
return False
p = int(math.sqrt(n))
while p > 1:
if n % p == 0:
return False
p -= 1
return True
```

```
t_start = time.time()
primes_found = []
x = 0
while len(primes_found) < n_primes: # Test first 10000 primes
x += 1
if is_prime_naive(x):
primes_found.append(x)
t_end = time.time()
t_run = (t_end - t_start) * 1000.0 # in ms
if primes == primes_found:
print("Found all primes in the test list.")
else:
print("Algorithm did not find all the primes in the test list")
print("Found {} primes in {:.2f} ms: ".format(len(primes_found), t_run))
```

## Miller-Rabin implementation¶

Miller-Rabin finds composite numbers, ie. those that are not prime. If the algorithm says that a number is composite, it is so with 100% certainty. However, whenever it does not find a composite number, the nubmer is a prime with a probability $\alpha$. The probability can be made small by iterating the algorihm (roughly $\alpha^k$) and a small number of iterations is required for small primes, such as the ones we are checking here.

The details of the Miller-Rabin algorighm can be read, for example, from Wikipedia.

```
def miller_rabin(n, k=10):
"""
Returns True if n is probably prime.
Returns False if n is composite.
"""
if n <= 0 or n != int(n):
raise AttributeError("n should be an integer and n > 0.")
if n == 1:
return False
if n == 2:
return True
if n % 2 == 0:
return False
# Start state, find s,d that satisfy 2**s*d = n-1, where d is odd
d = 0
s = n-1
while s % 2 == 0:
d += 1
s >>= 1
# Repeat k times (set as function parameter)
for _ in range(k):
# Select 1 < a < n-1
a = random.randint(2, n-1)
# Start checking
x = pow(a, s, n) # x = a**d % n
if x == 1:
continue
for _ in range(d-1):
if x == n-1:
break
x = pow(x, 2, n) # x = x**2 % n
if x != n-1:
return False
return True
```

```
t_start = time.time()
primes_found = []
x = 0
while len(primes_found) < n_primes: # Test first 10000 primes
x += 1
if miller_rabin(x, k=4):
primes_found.append(x)
t_end = time.time()
t_run = (t_end - t_start) * 1000.0 # in ms
if primes == primes_found:
print("Found all primes in the test list.")
else:
print("Algorithm did not find all the primes in the test list")
print("Found {} primes in {:.2f} ms: ".format(len(primes_found), t_run))
```

## Comparison of the algorithms¶

For small primes, $n < 10000$, there was a noticable speed advantage (about 50 %) in favor of the Miller-Rabin method. This advantage should get larger with larger primes.

```
n_max = 1e5
def time_func(prime_func, n_max):
primes_found = 0
t_start = time.time()
x = 0
while primes_found < n_max:
x += 1
if prime_func(x):
primes_found += 1
t_end = time.time()
t_run = (t_end - t_start) * 1000.0 # in ms
return primes_found, t_run, x
n_naive, t_naive, x_naive = time_func(is_prime_naive, n_max)
print("Naïve method found {} primes within {} integers in {:.2f} ms.".
format(n_naive, x_naive, t_naive))
n_miller_rabin, t_miller_rabin, x_miller_rabin = time_func(miller_rabin, n_max)
print("Miller-Rabin method found {} primes within {} integers in {:.2f} ms.".
format(n_miller_rabin, x_miller_rabin, t_miller_rabin))
t_diff = (t_naive - t_miller_rabin)/t_naive
print("Miller-Rabin method is {:.2f}% faster than the naïve method.".format(t_diff*100.0))
```

## Visualizing the scaling¶

The timing can be visualized in a chart. First we need to populate a few data points.

```
ns = [1e1, 5e1, 1e2, 5e2, 1e3, 5e3, 1e4, 5e4, 1e5, 5e5]
ns_naive, ts_naive, xs_naive = [], [], []
ns_miller_rabin, ts_miller_rabin, xs_miller_rabin = [], [], []
t_start = time.time()
for n_max in ns:
n, t, x = time_func(is_prime_naive, n_max)
ns_naive.append(n)
ts_naive.append(t)
xs_naive.append(x)
n, t, x = time_func(miller_rabin, n_max)
ns_miller_rabin.append(n)
ts_miller_rabin.append(t)
xs_miller_rabin.append(x)
t_end = time.time()
t_run = (t_end - t_start)*1000.0
print("Process finished in {:.2f} ms.".format(t_run))
```

## Use linear regression to estimate the slope¶

I use the `scipy.stats.linregress`

method to create a linear regression from the logarithmically transformed $n$ and $t$.

```
log_ns = np.log(ns)
log_ts_naive = np.log(ts_naive)
log_ts_miller_rabin = np.log(ts_miller_rabin)
linregr_naive = stats.linregress(log_ns, log_ts_naive)
linefit_naive = np.exp(linregr_naive.intercept) * ns**linregr_naive.slope
linregr_miller_rabin = stats.linregress(log_ns, log_ts_miller_rabin)
linefit_miller_rabin = np.exp(linregr_miller_rabin.intercept) * ns**linregr_miller_rabin.slope
print("Results of linear regression")
print()
print("Naïve method:")
print(" Slope : {:.3f}".format(linregr_naive.slope))
print(" Intercept : {:.3f}".format(linregr_naive.intercept))
print(" r-value : {:.3f}".format(linregr_naive.rvalue))
print(" p-value : {:.3f}".format(linregr_naive.pvalue))
print(" Std.err. : {:.3f}".format(linregr_naive.stderr))
print(" Model : t = {:.6f} . n^{:.3f}".format(
np.exp(linregr_naive.intercept), linregr_naive.slope))
print()
print("Miler-Rabin method:")
print(" Slope : {:.3f}".format(linregr_miller_rabin.slope))
print(" Intercept : {:.3f}".format(linregr_miller_rabin.intercept))
print(" r-value : {:.3f}".format(linregr_miller_rabin.rvalue))
print(" p-value : {:.3f}".format(linregr_miller_rabin.pvalue))
print(" Std.err. : {:.3f}".format(linregr_miller_rabin.stderr))
print(" Model : t = {:.6f} . n^{:.3f}".format(
np.exp(linregr_miller_rabin.intercept), linregr_miller_rabin.slope))
```

### Create a plot and show it¶

I use the stardard matplotlib library for creating the plots.

```
fig = plt.figure(figsize=(10,7))
ax = fig.add_subplot(111)
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlim(5e0, 1e6)
# Plot linear regression model
ax.plot(ns, linefit_naive, 'b--', lw=1, alpha=0.6)
# Plot data points
ax.plot(ns, ts_naive,
'bo', markersize=10, mfc='w', alpha=0.8,
label=r"Naïve method, $t \sim n^{1.595}$")
# Plot linear regression model
ax.plot(ns, linefit_miller_rabin, 'r--', lw=1, alpha=0.6)
# Plot data points
ax.plot(ns, ts_miller_rabin,
'ro', markersize=10, mfc='w', alpha=0.8,
label=r"Miller-Rabin method, $t \sim n^{1.102}$")
ax.set_yticks([])
ax.set_ylabel("Run time (a.u., log)")
ax.set_xlabel("Number of found primes")
ax.legend(fontsize="large");
```

## Conclusion¶

Miller-Rabin method runs considerably faster then the naïve divisor method for larger primes, starting roughly from the thousand prime. It scales roughly as $t \sim n^{1.1}$, while the naïve method scales rougly as $t \sim n^{1.6}$. Note, that an algorithm that would run in a fixed time for each prime, would run this test as $t \sim n^{1.0}$.