Skip to content
Tech News
← Back to articles

Even faster asin() was staring right at me

read original get ASIN Lookup Tool → more articles
Why This Matters

This article highlights how optimizing the implementation of the asin() approximation using Estrin's Scheme can significantly improve performance in computational tasks. Such enhancements are crucial for the tech industry, especially in graphics, simulations, and real-time applications where speed and efficiency directly impact user experience and system capabilities.

Key Takeaways

I don't normally do follow-ups and never this quick. After posting that last article, it was fun to read the comments on Reddit and Hacker News as they rolled in. I even found other discussions. I couldn't help wonder, "Could I have made it even more performant?". After heading home I decided to take another look.

There was.

Gotta Go Fast

Look at the implementation of the Cg asin() approximation;

double asin_cg(const double x) { // Original Minimax coefficients constexpr double a0 = 1.5707288; constexpr double a1 = -0.2121144; constexpr double a2 = 0.0742610; constexpr double a3 = -0.0187293; // Strip sign const double abs_x = abs(x); // Evaluate polynomial using Horner's method double p = a3 * abs_x + a2; p = p * abs_x + a1; p = p * abs_x + a0; // Apply sqrt term and pi/2 offset const auto x_diff = sqrt(1.0 - abs_x); const double result = (Pi / 2.0) - (x_diff * p); // Restore sign natively return copysign(result, x); }

Does something about how p is computed look a little curious? It can be rewritten to be fully const . While it's not a requirement, it's generally something you should strive for. This was missed in the first pass.

const double p = ((a3 * abs_x + a2) * abs_x + a1) * abs_x + a0;

From here we can do a polynomial expansion and factoring. This is where the magic happens. Showing the work step by step:

p = ((a3 * abs_x + a2) * abs_x + a1) * abs_x + a0 p = (a3 * abs_x * abs_x + a2 * abs_x + a1) * abs_x + a0 p = (a3 * abs_x^2 + a2 * abs_x + a1) * abs_x + a0 p = a3 * abs_x^3 + a2 * abs_x^2 + a1 * abs_x + a0 p = (a3 * abs_x^3 + a2 * abs_x^2) + (a1 * abs_x + a0) p = (a3 * abs_x + a2) * abs_x^2 + (a1 * abs_x + a0)

Taking that last term for p , we have this in code:

... continue reading