Computing Fibonacci Numbers

Comparing Our Solutions

Python is not a very fast programming language, and it makes it hard to write efficient code, so I've written all of the algorithms here in Rust to compare them against each other. This repository contains all of my implementations. They're by no means the most efficient possible implementations of each one, but they're good enough to provide a basis for comparison.

Opposite we have all of the algorithms and how well they did. Let's go through them one by one:

The naïve approach (no tail-call optimization) is slow as molasses.
The memoized recursive approach and dynamic programming (DP) are essentially the same algorithm, but dynamic programming saves having to constantly check whether we've already computed a given Fibonacci number and uses less space. That makes it more efficient, but as you can see they both still scale poorly.
Matrix exponentiation was our first algorithm that could compute $F(n)$ in less than $n$ operations—it's where we really got cooking. Note how this and all of the future algorithms have a characteristic hooked shape. In terms of the number of multiplications, all of these algorithms are using on the order of $\log n$ operations, which looks like the tapering-off curve before around $10^3$ . The problem is that, as we get larger and larger, the time to multiply numbers grows too. Once we stop being able to use the very efficient hardware instructions to multiply numbers on a CPU, and once the memory usage of reading in these large numbers becomes significant, each multiplication becomes more expensive than the last and the runtimes start to increase exponentially.
Binet's formula (the version using arbitrary-precision floating point arithmetic) is the slowest of these faster algorithms due to the extra costs associated with computing $\sqrt{5}$ and the need to carry all of the digits through the entire computation. Another important slowdown that this shares with matrix exponentiation is that, once you've computed the powers, you need to go through and multiply together the correct powers of 2 to get the answer: it requires 1 square for every bit of $n$ and 1 additional multiplication for every bit that's a 1.
The version of Binet that uses integer arithmetic requires three multiplications for every bit of $n$ , using a trick to save the fourth multiplication that's similar to the rewriting we did of the matrix powers. This makes it the fastest of the previous approaches, but to get faster we have to cut out the extra work associated with exponentiation by squaring.
The first version to do that is what I've called Efficient Matrix Exponentiation. This is the approach that uses three squares per bit of $n$ that we derived by rewriting $M^{2n}$ , but before the last optimization that only needs two squares to do the same. It's therefore slightly slower than the next two.
GMP Algorithm Port applies that last optimization, so it only uses two squares per bit of $n$ .
Lastly, GMP directly calls GMP's Fibonacci function. It's a little faster than my implementation because it stores a table for the early values (as you can see by how fast it is for the starting numbers compared to the port) and saves a multiplication on the last step, but as you can see the performance is near-identical.

Comparing Our Solutions

Opposite we have all of the algorithms and how well they did. Let's go through them one by one:

The naïve approach (no tail-call optimization) is slow as molasses.
The memoized recursive approach and dynamic programming (DP) are essentially the same algorithm, but dynamic programming saves having to constantly check whether we've already computed a given Fibonacci number and uses less space. That makes it more efficient, but as you can see they both still scale poorly.
Matrix exponentiation was our first algorithm that could compute $F(n)$ in less than $n$ operations—it's where we really got cooking. Note how this and all of the future algorithms have a characteristic hooked shape. In terms of the number of multiplications, all of these algorithms are using on the order of $\log n$ operations, which looks like the tapering-off curve before around $10^3$ . The problem is that, as we get larger and larger, the time to multiply numbers grows too. Once we stop being able to use the very efficient hardware instructions to multiply numbers on a CPU, and once the memory usage of reading in these large numbers becomes significant, each multiplication becomes more expensive than the last and the runtimes start to increase exponentially.
Binet's formula (the version using arbitrary-precision floating point arithmetic) is the slowest of these faster algorithms due to the extra costs associated with computing $\sqrt{5}$ and the need to carry all of the digits through the entire computation. Another important slowdown that this shares with matrix exponentiation is that, once you've computed the powers, you need to go through and multiply together the correct powers of 2 to get the answer: it requires 1 square for every bit of $n$ and 1 additional multiplication for every bit that's a 1.
The version of Binet that uses integer arithmetic requires three multiplications for every bit of $n$ , using a trick to save the fourth multiplication that's similar to the rewriting we did of the matrix powers. This makes it the fastest of the previous approaches, but to get faster we have to cut out the extra work associated with exponentiation by squaring.
The first version to do that is what I've called Efficient Matrix Exponentiation. This is the approach that uses three squares per bit of $n$ that we derived by rewriting $M^{2n}$ , but before the last optimization that only needs two squares to do the same. It's therefore slightly slower than the next two.
GMP Algorithm Port applies that last optimization, so it only uses two squares per bit of $n$ .
Lastly, GMP directly calls GMP's Fibonacci function. It's a little faster than my implementation because it stores a table for the early values (as you can see by how fast it is for the starting numbers compared to the port) and saves a multiplication on the last step, but as you can see the performance is near-identical.