Going in Circles By Speeding Up Squares
Unfortunately, this method isn't performing any better than the other two. The problem is that expanding out requires four multiplications: each of those multiplicands are a little smaller than the numbers themselves, but not by much, and so we're still basically trading a single floating-point multiplication for several integer multiplications. The overhead of computing or small inefficiencies are pretty small compared to the raw cost of doing big multiplications.
We've basically hit the limit of what powerful, abstract techniques can achieve. Closed forms can be found for many recurrences, and matrix exponentiation pops up all over the place. Techniques to compute these quickly can miss specific structure inherent to our problem. To get the fastest possible function, we need to hand-tune a specific implementation to really shine for our specific task.
Going in Circles By Speeding Up Squares
Unfortunately, this method isn't performing any better than the other two. The problem is that expanding out requires four multiplications: each of those multiplicands are a little smaller than the numbers themselves, but not by much, and so we're still basically trading a single floating-point multiplication for several integer multiplications. The overhead of computing or small inefficiencies are pretty small compared to the raw cost of doing big multiplications.
We've basically hit the limit of what powerful, abstract techniques can achieve. Closed forms can be found for many recurrences, and matrix exponentiation pops up all over the place. Techniques to compute these quickly can miss specific structure inherent to our problem. To get the fastest possible function, we need to hand-tune a specific implementation to really shine for our specific task.