Why Tail-Recursive Functions Are Loops

One story every computing enthusiast should hear is the lesson of how loops and tail-recursion are equivalent. We like recursive functions because they’re amenable to induction, and we can derive them in a way that is in direct correspondence with the definition of the datatype over which they recur. We like loops because they’re fast and make intuitive sense as long as variables don’t change in too tricky a way.

In general, recursive functions are slower than loops because they push stack frames: the performance of most programs today is dominated by memory reads/writes. The data we touch the most lives in the cache–we do not want to evict a ton of stuff from the cache, under any circumstance. In a direct-style implementation of a recursive function, the recursive call has to push a stack frame to remember what to do once the function returns:

;; Racket (define (sum l) (if (empty? l) 0 (+ (first l) (sum (rest l))))) // C int sum(int *l, int length) { if (length == 0) return 0; else return l[0] + sum(l + 1, length - 1); }

When we get to (+ (first l) (sum (rest l))) , we first call (first l) (which returns the first element). While we’re making that call, we have to remember to come back and do (sum (rest l)) –to be fully precise, we remember that we need to do (rest l) , then take its result x and call (sum x) , remembering to come back and finally take that result and add it to the result of (first l) . The reason we have to do this is because we need to remember those partial results (in this case the result of (first l) ): we have to store them somewhere after all, and each time we make the recursive call, we need to remember the result of (first l) from this call–we need O(n) stack space for a list of size n.

Of course, if we use iteration this all goes away:

;; Racket (define (sum l) (define x 0) (for ([elt l]) (set! x (+ x elt))) x) // C int sum(const int *l, int length) { int x = 0; for (int i = 0; i < length; i++) { x += l[i]; } return x; }

We all have an intuitive sense of what the loop is doing: once we hit the end of the loop, we do not make a recursive call (we never issue a call instruction in assembly), we simply jump up to the beginning of the loop. The key is that x is being used as an accumulator, growing a partial result in a bottom-up fashion as the computation proceeds, eventually yielding the final value at the end. Instead of keeping partial results on the stack, the loop takes a constant amount of space but linear time.

In a tail-recursive implementation, the rule is that every recursive call must be a tail call. Intuitively, a tail call is a call which is “immediately returned.” More formally, a subexpression of an expression is in tail position if the return value from that expression is the return value from the whole expression. For example, in (if guard e-t e-f) , both e-t and e-f are in tail position, but the guard is not: after we decide which branch to take, we’re committed:

(define (foo ...) ... (if guard (f x y ...) (g z ...)))

Once we finish executing guard , it would be useless (but correct) to (a) push a stack frame, (b) wait on the result of the subordinate call, and (c) merely return that same result, because all we’d be doing is copying the return value from the callee and propagating it back as the return value of the caller. Being a tail call is a syntactic property of a callsite: we (and the compiler) can easily look at a piece of code and cheaply decide when a call is a tail call versus not.

... continue reading