I'm intimidated by modern mathematical algorithms, like those used for float parsing and formatting, or famously the inverse square root.
I do love math, but it often feels like these algorithms are unapproachable to casual understanding. That is, effectively understanding the intuition behind these algorithms requires a lot of pre-requisite reading of logic- and notation-dense papers.
I don't necessarily have the time or mathematical background to dig into the existing literature for a lot of these algorithms. But recently, I've been interested in algorithms for multiplication, after hearing that some bigint libraries rely on the fast Fourier transform to perform multiplication in some scenarios.
The below is what I hope to be the most comprehensive overview of modern multiplication algorithms with explanations written in plain English.
Integers
Looping
Fundamentally multiplication describes successive additions. If we say x * 2
, we can expand this as x + x
. Similarly, x * 4
is the same as x + x + x + x
.
The easiest way to implement this in code would be a for
loop:
fn multiply(multiplicand: i32, multiplier: i32) -> i32 {
let mut product = 0;
for _ in 0..multiplier {
product += multiplicand;
}
product
}
We loop multiplier
times, adding multiplicand
to the total product each time. For right now we're ignoring complexities like overflow.
For small multipliers, this doesn't seem so bad. However, as our multiplier grows, the number of additions we have to perform grows similarly. In the worst case scenario, if our muliplier is u32::MAX
, we'd be performing over 4 billion additions -- this is pretty slow.
Ideally our algorithm wouldn't scale with the value of our multiplier, but rather the scale. That is, our algorithm would get slower relative to the number of digits used to represent the number, not the actual value it contains.
Long Multiplication
Long multiplication solves the problem of scaling relative to the number of digits.
It's also likely the way you were taught in school to multiply two numbers. If we look at a simple base-10 example:
25
* 7
----
We multiply all individual digits and then sum them up:
20 * 7 + 5 * 7 = 140 + 35 = 175
If our multiplier has two digits:
25
* 34
---------
4 * 5
+ 4 * 20
+ 30 * 5
+ 30 * 20
---------
20
+ 80
+ 150
+ 600
---------
850
We end up performing 4 additions and multiplications.
Long multiplication has a time complexity of O(n2), and so it isn't really suitable for extremely large numbers. You may also be familiar with other methods for doing multiplication by hand; in general, these are all also O(n2).
Although this algorithm is quadratic, it also is quite simple to implement and has a pretty low constant factor. This makes it suitable -- and even optimal -- for small (say, <=64 bit) integers. This algorithm forms the basis of most hardware implementations of multiplication, as we'll discuss later in this post.
Karatsuba (1962)
Anatoly Karatsuba's algorithm for multiplication was the first to break the O(n2) complexity barrier. Previously it had been conjectured that no algorithm could do better than O(n2).
Karatsuba is a recursive divide and conquer algorithm.
If we split our multiplication up into multiple separate multiplications, we can re-use some of the intermediate results.
The algorithm starts by splitting our multiplicand and multiplier in half, getting four numbers total. For example, we would split the number 23
into 20 + 3
, or the number 123456
into 123000 + 456
.
More formally: given a number of n
digits, we split it into two numbers a
and b
such that our original number is equal to $$a \times 10^{\frac{n}{2}} + b$$.
Let's look at a multiplication of two numbers, x
and y
. Using the splitting idea we just mentioned,
$$ x = a + b $$
$$ y = c + d $$
This is the same equation used above, but for our multiplier y
we use the variables c
and d
. If we substitute these values into our $$x \times y$$ equation, we get
$$ x \times y = (a + b)(c + d) $$
If we distribute everything out:
$$ x \times y = (a \times 10^{\frac{n}{2}}) \times (c \times 10^{\frac{n}{2}}) + (ad \times 10^{\frac{n}{2}}) + (bc \times 10^{\frac{n}{2}}) + bd $$
Simplifying:
$$ x \times y = ac \times 10^{n} + (ad + bc) \times 10^{\frac{n}{2}} + bd $$
Toom–Cook
The Toom–Cook algorithm is a generalization of Karatsuba multiplication. Where Karatsuba multiplication splits a number into 2 constituent parts, Toom–Cook splits the number into a potentially arbitrary number of parts. Though, most commonly you'll see 3 parts (Toom 3) or 4 parts (Toom 4). Karatsuba, in this way, can also be thought of as Toom 2. Regular long multiplication is Toom 1.
Schönhage–Strassen (1971)
Schönhage–Strassen is also sometimes called "FFT multiplication," as the core optimization of this algorithm is that it uses the fast Fourier transform to achieve O($$n \log n \log \log n$$) runtime.
Schönhage–Strassen works by splitting the multiplier and multiplicand, similar to Karatsuba, and constructs two polynomials with the decomposed integers as coeffecients. It can then make use of existing algorithms to multiply two polynomials in O($$n \log n$$) time.
To understand this algorithm more concretely, we first have to understand Fourier transforms, the fast Fourier transform, and some abstract algebra.
Fürer (2007)
- paper: https://ivv5hpp.uni-muenster.de/u/cl/WS2007-8/mult.pdf
- https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=7368&context=etd
De–Kurur–Saha–Saptharishi (2008)
- https://arxiv.org/abs/0801.1416
Harvey–van der Hoeven–Lecerf (2014)
- https://arxiv.org/abs/1407.3361
Covanov–Thomé (2015)
- https://arxiv.org/abs/1502.02800
Harvey–van der Hoeven (2019)
- https://annals.math.princeton.edu/2021/193-2/p04
- https://hal.science/hal-02070778v2/document
Reals
Floats
Fixed point
Hardware Algorithms
One of the simpler and faster ways to multiply two numbers on modern computers is:
mov rax, 25
mul rax, 34
; rax = 850
Shift + Add
Modified Baugh-Wooley
Booth (1950)
Wallace Trees (1964)
Dadda Multiplier (1965)
Modular Multiplication
Kochanski
Montgomery
Floats
https://github.com/microsoft/referencesource/blob/master/System.Numerics/System/Numerics/BigIntegerBuilder.cs https://github.com/rust-num/num-bigint/blob/f09eee83f174619ac9c2489e3feec62544984bc5/src/biguint/multiplication.rs#L95