Complexity of Recursive Algorithms

20111109

Recurrence Relations

We have seen how to compute the Big-O complexity of programs that involve loops, sequences of operations, and if statements. Now we need to look at recursive functions.

Consider this recursive function:

def rec_fac(n):
    if n == 1:
        return 1
    else:
        return n*rec_fac(n-1)

It clearly makes sense to talk about the number of steps this function executes - there is no randomness or unpredictability involved. Executing a line like

x = rec_fac(3)

will always take the same number of steps, and

y = rec_fac(7)

will obviously take more steps than

x = rec_fac(6)

so there is a relationship between the size of the input to the function and the number of steps that are executed ... but what is the relationship?

Let's define some notation. We will use T(n) to represent the number of steps rec_fac(n) executes. By looking at the definition of rec_fac we can identify two cases:

1) if n = 1, rec_fac executes a constant number of steps - call it c1. This lets us state

T(1) = c1

2) if n > 1, rec_fac executes a constant number of steps - call it c2 - followed by a recursive call: rec_fac(n-1). But this recursive call must take T(n-1) steps (however many that is), so we can state

T(n) = c2 + T(n-1) for all n > 1

Putting these two things together gives us something called a recurrence relation:

        T(1) = c1
        T(n) = c2 + T(n-1) for all n > 1

Note the strong similarity between the form of the recurrence relation, the form of the recursive function, and the form of an inductive proof. They each have a base case, and a recursive/inductive part that uses the result for smaller numbers to obtain the result for larger numbers. Understanding induction, recursion and recurrence relations - they all use similar thought patterns - they are all part of learning to think like a computer scientist.

All well and good, but we need to establish the Big-O complexity of rec-fac, and that recurrence relation doesn't look anything like the functions we have dealt with before.

We deal with this by transforming the recurrence relation into a closed-form formula : one which does not involve any self-reference. There are many ways to achieve this. We will use one of the most popular, which goes by the name of expansion or substitution. The basic idea is this: we replace the recursive reference to T(n-1) by a different expression that has equal value ... but what?

Consider rewriting the recursive part of the recurrence relation as T(x) = c2 + T(x-1). This is clearly still valid - we just changed the n to x. Now let x = n-1, and substitute this into the equation for T(x) - we get
T(n-1) = c2 + T(n-1-1)
ie
T(n-1) = c2 +T(n-2)

So we substitute this into T(n) = c2 + T(n-1) and get

T(n) = c2 + c2 + T(n-2)

Now we expand T(n-2) to c2 + T(n-3), and substitute that into the line just above this one, and we get

T(n) = c2 + c2 + c2 + T(n-3)

The next expansion gives

T(n) = c2 + c2 + c2 + c2 + T(n-4)

Eventually we will get to

T(n) = c2 + ... + c2 + T(1)

ie

T(n) = c2 + ... + c2 + c1

We have successfully eliminated the recursive reference to T( ). The question is, how many c2`s are there in this expression?

It's not hard to see that at each stage of the expansion, the number of c2's plus the number inside the T( ) recursive reference always equals n. (For example, the first line has 1*c2 + T(n-1). The next line has 2*c2 + T(n-2). Then we get 3*c2 + T(n-3) etc.) So in the line

T(n) = c2 + ... + c2 + T(1)

there must be n-1 c2's, so

T(n) = c2*(n-1) + c1

But that's something we can easily find the Big O complexity of. Applying what we have already learned, we get

T(n) is in O(n)

We will look at several recurrence relations and for each one we will determine its Big O complexity. You should learn these. Here again is the one we have just seen:

Recurrence Relation	Big O Complexity
T(1) = c1 T(n) = c2 + T(n-1)	O(n)

Now consider this recursive function (it doesn't do anything except print a lot of numbers, but it is easy to understand)

def rec_fun(n):
    if n == 1:
        return
    else:
        for i in range(n):
            print i
        rec_fun(n-1)

The recurrence relation for rec_fun looks like this

T(1) = c1
T(n) = c2 + c3*n + T(n-1)

Make sure you understand how this recurrence relation is derived from the definition of the function

We solve this the same way as before. Note that T(n-1) = c2 + c3*(n-1) + T(n-2)

T(n) = c2 + c3*n + c2 + c3*(n-1) + T(n-2)

T(n) = c2 + c3*n + c2 + c3*(n-1) + c2 + c3*(n-2) + T(n-3)

regrouping, we get

T(n) = c2+c2+c2 + c3*(n + n-1 + n-2) + T(n-3)

When we expand this all the way, we get

T(n) = c2 + ... + c2 + c3*(n + n-1 + n-2 + ...) + T(1)

As before, we need to work out how many c2's are in this sum, and now we also have to work out the value of the expression that is multiplied by c3.

The number of c2's is easy: as with our previous recurrence relation, the number of c2's, added to the number in the T( ) recursive reference, is always n

The value of the expression that is multiplied by c3 is a little harder to calculate, but we can do it. First we observe that the last element in the sum inside the parentheses is always 1 more than the number in the T( ) recursive reference. (For example, in the line containing T(n-3), the last element of the sum inside the parentheses is n-2.) So when the recursive reference is T(1), the expression inside the parentheses is

n + n-1 + n-2 + ..... + 2

This is a lot like n + n-1 + n-2 + ... + 1, and we have a formula for that: we know n + n-1 + n-2 + ... + 1 = (n+1)*n/2

But the left side of this formula is 1 bigger than what we want (we want n + n-1 + n-2 + ..... + 2) so we can subtract 1 from each side of the formula to get

n + n-1 + n-2 + ... + 2 = (n+1)*n/2 - 1

Now we can replace all the unknowns in our formula for T(n)

T(n) = c2*(n-1) + c3*( (n+1)*n/2 - 1 ) + c1

Simplifying this is easy, applying our standard Big O analysis is even easier, and we end up with

T(n) is in O(n^2)

Now we have another standard pattern to add to our collection of recurrence relations:

Recurrence Relation	Big O Complexity
T(1) = c1 T(n) = c2 + T(n-1)	O(n)
T(1) = c1 T(n) = c2 + c3*n + T(n-1)	O(n^2)

Now consider the recursive binary search algorithm - it is well known so I won't repeat it here.

The recurrence relation for binary search is

T(1) = c1
T(n) = c2 + T(n/2)

DON'T JUST TAKE MY WORD FOR IT. GO THROUGH THE CODE AND MAKE SURE YOU SEE WHY THIS IS TRUE.

Applying our now standard expansion technique, we get

T(n) = c2 + c2 + T(n/4)

T(n) = c2 + c2 + c2 + T(n/8)

and if we expand this fully we get

T(n) = c2 + ... + c2 + T(1)

and as always, the question is "how many c2's are there"? Also, you may be saying that this expansion can't be correct because most of the time n won't divide evenly by 2, 4, 8 etc. Well, you are right, but it turns out not to matter. If you do the recurrence relation with all of the precise details, accounting for odd and even values of n ... you get exactly the same result as we will get by assuming that all the divisions work exactly. So I'm taking the easy route and ignoring those details.

So, back to the question of counting the c2's. Here we need a bit of clever insight. When the denominator inside the T( ) recursive reference is 2, there is 1 c2. When the denominator is 4, there are 2 c2's. When the denominator is 8, there are 3 c2's ..... not much of a pattern unless you notice that 2 = 2^1, 4 = 2^2, and 8 = 2^3 .... the number of c2's is equal to the exponent when the denominator is written as a power of 2.

So when we write

T(n) = c2 + ... + c2 + T(1)

we need to ask what is the denominator in the T( ) recursive reference.

In other words, what is x, when n/(2^x) = 1 ?

This is the same as solving for x in n = 2^x, and the solution is simply x = log n (the log is to the base 2, but we don't have to worry about that detail either)

Thus the number of c2's is log n, and we get

T(n) = c2* log n + c1

This doesn't look like the type of function we are used to dealing with in our Big-O complexity analysis, but we can handle it exactly the same way. Log n is just a function of n, and it comfortably fills the role of g(n) in the definition of Big-O complexity.

So we conclude that T(n) is in O(log n) .... and now we have another pattern to add to our table.

Recurrence Relation	Big O Complexity
T(1) = c1 T(n) = c2 + T(n-1)	O(n)
T(1) = c1 T(n) = c2 + c3*n + T(n-1)	O(n^2)
T(1) = c1 T(n) = c2 + T(n/2)	O(log n)

One more - recursive merge sort. You may want to look up an implementation of this, from which you can confirm that the recurrence relation for this recursive function is

T(1) = c1
T(n) = c2 + c3*n + 2*T(n/2)

We expand this in the usual way

T(n) = c2 + c3*n + 2*(c2 + c3*n/2 + 2*T(n/4))

T(n) = c2*(1+2) + c3*(n+n) + 4*T(n/4)

Expand again ...

T(n) = c2*(1+2) + c3*(n+n) + 4*(c2 + c3*n/4 + 2*T(n/8))

T(n) = c2*(1+2+4) + c3*(n+n+n) + 8*T(n/8)

The final expansion will give us

T(n) = c2*(1+2+4+...) + c3*(n+.....+n) + x*T(1)

Now we have to solve
(1+2+4+...)
and (n+....+n)
and x

Inspection shows us that at every stage of the expansion, the term inside the T( ) recursive reference is of the form n/(2^i), and the coefficient of T( ) is also 2^i. As before, we see that when the term inside T( ) is 1, we must have n/(2^i) = 1, which gives us n = 2^i, which gives us i = log n

We also see that the number of n's that are multiplied by c3 is the same i, so in the final expansion the number of n's is log n

This gives

T(n) = c2*(1+2+4+...) + c3*n*log n + n*T(1)

That just leaves the (1+2+4+...). These are powers of 2, and the last one is always 1/2 the coefficient of the T( ) recursive reference. So when we get to the end of the expansion, we have
(1 + 2 + 4 + .... + n/2)

This is another sum for which we have a simple formula: 1 + 2 + 4 + ... + n/2 = n -1

Putting this in the equation, and replacing T(1) with c1, we finally get

T(n) = c2*(n-1) + c3*n*log n + c1*n

In this function, the n*log n term grows the fastest (its growth rate lies between n and n^2), so our standard Big-O analysis gives

T(n) is in O(n*log n)

And now we can add the final row to our table of patterns

Recurrence Relation	Big O Complexity
T(1) = c1 T(n) = c2 + T(n-1)	O(n)
T(1) = c1 T(n) = c2 + c3*n + T(n-1)	O(n^2)
T(1) = c1 T(n) = c2 + T(n/2)	O(log n)
T(1) = c1 T(n) = c2 + c3n + 2T(n/2)	O(n*log n)

There are infinitely many possible recurrence relations, but these four will cover the vast majority of recursive functions that you will encounter in the real world. You should be able to analyse a recursive function and derive its recurrence relation. If the recurrence relation is one of these four you should be able to state the Big O complexity.