Recursion: A Broader View

Recursive functions are often deemed impractical, since they can be slower and more memory-intensive than their iterative counterparts. But a closer look at DFS shows that switching to iteration can actually worsen performance, if you’re not careful. When switching from recursive functions to iteration, understanding recursion rigorously is very important.

First off, there’s an important distinction between recursive algorithms and recursive functions:

A recursive algorithm is one that references its own description.
A recursive function is a function that calls itself.

Recursion is everywhere, even where you’d least expect it: the for loop.

The For Loop

ForLoop(n) =
    0. Let i be 1.
    1. If i > n, done.
    2. Execute some code.
    3. Increment i.
    4. Go to step 1.

This is recursive! Within ForLoop, we invoke part of ForLoop. But the implementation is a lightweight goto instruction.

ForLoop, and many other recursive algorithms, can be implemented iteratively. Some developers will trivialize the task by saying, “Just use a stack!”. But is it really that straightforward?

Textbook iterative DFS is the prime example of a recursive algorithm that can be implemented easily using a stack, but confusingly, the textbook iterative implementation is a slightly different algorithm that has worse memory usage than proper recursive DFS.

DFS

DFS (depth first search) is a recursive algorithm:

DFS(v) =
    1. Execute some code on v.
    2. Mark v as visited.
    3. For each neighbor of v, DFS(neighbor) unless neighbor is visited.

Here’s the code for “textbook iterative DFS”:

while stack:
    node = stack.pop()
    if not visited[node]:
        # Execute some code on node
        visit()
        visited[node] = true

    for neighbor in node.neighbors():
        if not visited[neighbor]:
            stack.push(neighbor)

Notice that we push all unvisited neighbors of a node onto the stack. In a graph where each vertex has many neighbors - ie a complete graph - stack memory usage will grow quadratically to the number of vertices in the graph: first n-1 nodes would get pushed onto the stack, then n-2 (since the previous node was visited), and so on.

We can do better if we avoid storing all neighbors on the stack upfront, and instead store them when needed. The “Just use a stack!” general strategy of turning a recursive algorithm into an iterative algorithm achieves this for DFS, reducing the memory usage from quadratic to linear. We’ll dive into that next.

DFS: Emulate the stack

The core idea behind it is to observe the call stack of recursive function DFS, and then emulate it using iterative constructs.

The call stack of a DFS at any time is the search path up until and including the node we’re currently visiting. For example, if we were doing a DFS traversal of Best Buy’s product tree, we may witness a call stack like this at some point in time:

Home -> Electronics -> TV -> LG -> LG500

Say LG500 has no unvisited neighbors. At this point the computer should pop the LG500 stack frame.

Home -> Electronics -> TV -> LG

Then the computer should push on the next unvisited neighbor in the LG stack frame. Let’s call it LG600.

Home -> Electronics -> TV -> LG -> LG600

And maybe LG600 is the last product in the LG product line, which is the last TV brand.

Home -> Electronics -> TV -> LG
Home -> Electronics -> TV

Some iterative code comes to fruition: when we examine the head of the stack, we increment the neighbor index stored in the node, and if it represents a valid neighbor, then we create a stack frame for the neighbor and push it onto the stack.

And on top of this logic, is a while loop that says, while stack is not empty, examine the head of the stack and take some action. When do we pop a stack frame off the stack? When it has no more unvisited neighbors to push onto the stack.

Here’s the corresponding python for the above iterative implementation of the recursive DFS algorithm:

# Syntactic Convenience: Stack Frames
NODE = 0
LAST_NBR_INDEX = 1

# Syntactic Convenience: Graph Nodes
LABEL = 0
NBRS = 1

def dfs(n):
    stack = [[n, 0]]
    visited = set([])

    while stack:
        # DEBUG
        print [stackFrame[NODE][LABEL] for stackFrame in stack]

        headStackFrame = stack[-1]

        node, nbrIndex = headStackFrame
        nbrs = node[NBRS]

        if nbrIndex == -1:
            # Execute arbitrary visit code here.
            visited.add(node)

        i = 0
        nextNode = node
        while nbrIndex + i < len(nbrs) and nextNode in visited:
            nextNode = nbrs[nbrIndex + i]
            i += 1

        if nextNode not in visited:
            # Save our execution state so we can resume when we are popped off next.
            headStackFrame[LAST_NBR_INDEX] = nbrIndex + i - 1

            # Cause loop to examine nextNode next.
            stack.append([nextNode, 0])
        else:
            stack.pop()

When run on this test case:

a = ('LG500', ())
b = ('LG600', ())
c = ('LG', (a, b))
d = ('TV', (c,))
e = ('Home', (d,))

dfs(e)

It outputs:

['Home']
['Home', 'TV']
['Home', 'TV', 'LG']
['Home', 'TV', 'LG', 'LG500']
['Home', 'TV', 'LG']
['Home', 'TV', 'LG', 'LG600']
['Home', 'TV', 'LG']
['Home', 'TV']
['Home']

You’ll notice this DFS code is more complex than the textbook DFS implementation. But, we avoid storing all neighbors of a node on the stack. Instead, we do it lazily by pausing the neighbor iteration, performing the recursive call on a neighbor, and then resuming the neighbor iteration.

For a more detailed analysis of space complexity, see http://wcipeg.com/wiki/Depth-first_search#Space

The iterative DFS above does not improve performance over recursive function DFS, since all we did was emulate the call stack without recursive function calls.

How do we make recursive functions faster/more efficient anyway? There’s one optimization that helps in certain special cases: Tail Call Optimization.

Performance: Tail Call Optimization

If you were to implement ForLoop as a recursive function, like this:

def for_loop(n, fn):
    if n <= 0:
        return
    fn()
    for_loop(n-1)

Your call stack might look like this for a call to for_loop(4):

4 -> 3 -> 2 -> 1

Sadly, it uses linear space.

We can turn this into constant space usage by destroying the caller’s stack frame before creating the callee’s frame. (In practice this amounts to reusing the callers stack frame for the callee’s function call).

We can do this because after the recursive call, we don’t need the caller’s backed up stack frame anymore. The recursive call is the last step of the function.

Some compilers can perform this optimization automatically if they detect that the recursive call is the last step of the function. This is called Tail Call Optimization, and is commonly implemented in functional programming languages, even modern JavaScript runtimes.

In practice, TCO amounts to reusing the callers stack frame for the callee’s function call, rather than destroying/creating stack frames as I described.

We can’t use this technique to improve the performance of DFS, but it’s an important one to know exists. Why then, wouldn’t you use recursion for your DFS? Maybe your language has a limit on the maximum stack size. Maybe you want more control over the stack. Maybe you want to be able to switch to BFS by replacing the stack with a queue. But otherwise, you’re probably safe using recursion.

Recursion isn’t simple. But switching to iteration isn’t the panacea. Recursive algorithms are going to be recursive no matter what. It’s the implementation you’re changing, and if you want to see performance gains, you better really understand what you’re really changing under the hood.