Fibonacci Sequence in PHP

Posted on by Chris Warburton

Exponential Definitions

The Fibonacci sequence is a well-known series of numbers which follow a pattern: it starts <? 1, 1 and each subsequent number is the sum of the two preceding it. We can calculate the <? $nth element of the sequence like this in PHP (using handy functions from PHP Prelude):

<?
defun('mkFib1', function($n) {
                  return ($n < 2)? 1  // Base case
                                 : mkFib1($n - 1) + mkFib1($n - 2);
                });

This is an incredibly naive approach, since each call to <? mkFib1 can involve 2 more calls to <? mkFib1. This recursion is well-founded, since it will always head towards the base case where <? $n < 2, but there are two problems:

mkFib1 is exponentially slow

What's more, a function is a rather clumsy way of representing a sequence, since the structure isn't immediately available and must be inferred by our users via some sort of counter. This violates the Once and Only Once rule, and is liable to cause off-by-one errors and such.

So what's the alternative? The trick is to realise that we don't need to provide every element, since our users will only use a finite number. The difficulty is that we don't know how many they'll need!

One simple approach that may spring to mind is asking the user up front how many elements they need:

<?
defun('mkFibs2', compose(map('mkFib1'), 'upto'));

However, this doesn't do what we want. The requirement to know how many elements are needed is too strong; many useful functions like filter and fold (AKA "reduce") require an unknown number of elements in general; for example, filters don't know how many elements they'll need to discard before finding a match. This forces us to generate longer and longer sequences, slowing us down by a logarithmic factor compared to <? mkFib1:

<?
// Note: Since there are infinitely many fibs, we can't fold all of them.
//       Instead, $f returns a pair [$stop, $result], where $stop tells us
//       when to cut-off the fold and $result is treated in the usual way.

// Fold a sequence generated by mkFib1
defun('fold1', function($f) {
                 // Fold over the Naturals $n = 0, 1, 2, ...
                 return loop(function($x, $n) use ($f) {
                               return $f($x, mkFib1($n));
                             });
               });

// Prevents us applying $f to too many fibs
defun('shortcircuit', function($f, $x, $fib) {
                        return $x[0]? $x  // If $x says stop, return it as-is
                                    : $f($x[1], $fib);  // Else call $f
                      });

// Fold a sequence generated by mkFib2
defun('fold2', function($f) {
                 // Fold over the Naturals $n = 0, 1, 2, ...
                 return loop(function($x, $n) use ($f) {
                               // Fold over 2^$n fibs
                               list($s, $y) = array_reduce(mkFibs2(pow(2, $n)),
                                                           shortcircuit($f),
                                                           [false, $x]);
                               return [$s, $s? $y    // If $s(top), return $y
                                             : $x];  // Else restart with $x
                             });
               });
Fibs generated (ignoring recursion in mkFib1)

In this blog post I'll show how we can improve all these things, ending up with a structured representation that, for any unknown N, can fold N elements in O(1) memory, O(N) time and O(1) stack frames.

Inductive Definitions

The problem with <? mkFib2 is that the size of a PHP array must be specified in advance, but we don't know how many elements we'll need. One simple alternative to the array is a list. Rather than storing all elements in a single place, a list stores them separately, then links them all together. There are many different kinds of list, but the simplest is called tail recursive (or "singly-linked") and can be implemented very easily in PHP by nesting arrays:

<?
// $x is our current element, $n is when to stop
defun('mkFibs3_', function($x, $n) {
                    return ($x < $n)? [mkFib1($x), mkFibs3_($x+1, $n)]
                                    : [];
                  });
defun('mkFibs3',  mkFibs3_(0));  // Always start at 0

The results look like this:


array(2) {
  [0]=>
  int(1)
  [1]=>
  array(2) {
    [0]=>
    int(1)
    [1]=>
    array(2) {
      [0]=>
      int(2)
      [1]=>
      array(2) {
        [0]=>
        int(3)
        [1]=>
        array(2) {
          [0]=>
          int(5)
          [1]=>
          array(0) {
          }
        }
      }
    }
  }
}

The results of <? mkFibs3 still have a finite size, but we're no longer at the mercy of PHP's implementation details. Each array has a known size (<? 0 or <? 2), no matter how many elements we ask for. Like <? mkFib1, <? mkFibs3 isn't tail recursive, so it may overflow the stack if we ask for too many elements. Folding lists is a little awkward in PHP, since there's no built-in equivalent to <? array_reduce:

<?
defun('fold3', function($f, $x) {
                 return trampoline(y(
                   function($g, $n, $fibs, $acc, $_) use ($f, $x) {
                     // If we've run out of $fibs, start again with more
                     if ($fibs === [])
                       return [false, $g($n+1, mkFibs3(pow(2, $n+1)), $x)];

                     // Apply $f to the next $fib
                     list($stop, $acc) = $f($acc, $fibs[0]);
                     return [$stop, $stop? $acc
                                         : $g($n, $fibs[1], $acc)];
                   }, 0, [], null));
               });

We can now do away with the exponentially-inefficient <? mkFib1. If we log the arguments we're sending to <? mkFib1, we can see that <? mkFibs3 is asking for the same values over and over:

mkFibs3(...) mkFib1(...)
0
1 0
2 0, 1
3 0, 1, 2, 1, 0
4 0, 1, 2, 1, 0, 3, 2, 1, 0, 1
5 0, 1, 2, 1, 0, 3, 2, 1, 0, 1, 4, 3, 2, 1, 0, 1, 2, 1, 0
6 0, 1, 2, 1, 0, 3, 2, 1, 0, 1, 4, 3, 2, 1, 0, 1, 2, 1, 0, 5, 4, 3, 2, 1, 0, 1, 2, 1, 0, 3, 2, 1, 0, 1

We can actually remove all of these calls, since <? mkFibs3 has all of the data it needs to construct each Fibonacci number itself. If we pass this data along the recursive calls, our runtime becomes linear:

<?
defun('mkFibs4_', function($fib_2, $fib_1, $x, $n) {
                    if ($x >= $n) return [];  // Stopping condition
                    $fib = ($x < 2)? 1 : ($fib_2 + $fib_1);
                    return [$fib, mkFibs4_($fib_1, $fib, $x+1, $n)];
                  });
defun('mkFibs4',  mkFibs4_(null, null, 0));
mkFibs3(N) vs mkFibs4(N)

Coinductive Definitions

Now we can deal with the finiteness problem. Every time PHP sees one of our lists, it does the following:

Our problem is that we're forced to construct the whole tail before we know how long it needs to be. The solution is to delay the tail, which we can do using a thunk: a function which returns a constant value. PHP will construct functions without running them, which gives us time to figure out how many elements we need.

<?
defun('mkFibs5_', function($fib_2, $fib_1, $x, $n) {
                    if ($x >= $n) return [];  // Stopping condition
                    $fib = ($x < 2)? 1 : ($fib_2 + $fib_1);
                    return [$fib, function($_) use ($fib_1, $fib, $x, $n) {
                                    return mkFibs5_($fib_1, $fib, $x+1, $n);
                                  }];
                  });
defun('mkFibs5',  mkFibs5_(null, null, 0));

Delaying computation this way is known as lazy evaluation, and this method of generating a bit of data and delaying the rest is called co-induction. Hence this kind of structure is known as a co-data structure.

The nice thing about codata is that we can define never-ending chains of values, wrapping each link in a thunk so that it only gets evaluated when needed (known as forcing the value). This is exactly what we need for our Fibonacci sequence, and all we need to do is throw away the stopping condition!

Notice that I'm giving these thunks a parameter <? $_ which is ignored. As far as PHP's concerned, I could have defined them as nullary functions, ie. taking no arguments, but that makes them harder to reason about and prevents some later simplifications:

<?
defun('fibsFrom6', function($fib_2, $fib_1) {
                     $fib = $fib_2 + $fib_1;
                     return [$fib, function($_) use ($fib_1, $fib) {
                                     return fibsFrom6($fib_1, $fib);
                                   }];
                   });
defun('fibs6',     function($_) {
                     return [1, function($_) {
                                  return [1, function($_) {
                                               return fibsFrom6(1, 1);
                                             }];
                                }];
                   });

This is the first of our definitions that's both infinite, like a function, and structured, like an array. This kind of codata structure is called a stream. Here's how to fold a stream, using a trampoline to optimise tail-calls:

<?
defun('fold6',  function($f, $acc, $stream) {
                  return trampoline(y(function($y, $acc, $s, $_) use ($f) {
                                        list($h,    $t)    = $s(null);
                                        list($stop, $acc)  = $f($acc, $h);
                                        return [$stop, $stop? $acc
                                                            : $y($acc, $t)];
                                      }, $acc, $stream));
                });

Refactoring

Now that we've created our infinite Fibonacci sequence, we can refactor the code to be a little less naff. I debated whether to define the <? fibsFrom function inline, but I think it's nice to keep two separate functions since they correspond exactly to the two rules defining the Fibonacci sequence: <? fibsFrom implements "sum the previous two", <? fibs implements "start with <? 1, <? 1".

The first refactoring we can do is to notice that we have functions returning functions, which is a manual form of currying. We can remove this separation, since our functions are curried automatically by <? defun (we couldn't do this if our thunks were nullary):

<?
defun('fibsFrom7', function($l, $m, $_) {
                     $n = $l + $m;
                     return [$n, fibsFrom7($m, $n)];
                   });
defun('fibs7',     function($_) {
                     return [1, function($_) {
                                  return [1, fibsFrom7(1, 1)];
                                }];
                   });

Next we can remove the redundant 1s in fibs7. This redundancy is due to fibsFrom not including its arguments in the stream it returns, but of course if we naively change fibsFrom to include its arguments, we'd just be shifting around the redundancy, not eliminating it.

The solution is to extrapolate the sequence backwards a couple of steps. In other words, we need to switch the initial <? 1, 1 to some other values <? $x and <? $y such that we get a sequence <? $x, $y, 1, 1, 2, 3, 5, 8, .... We can derive these values straight from the definition of the Fibonacci sequence:

<?
$y + 1 == 1      // Since $y, 1, 1 is in the sequence
$y     == 1 - 1
$y     == 0

$x + $y == 1  // Since $x, $y, 1 is in the sequence
$x + 0  == 1  // From the value of $y above
$x      == 1

Now we can pass <? $x and <? $y as arguments to <? fibsFrom and get back a stream of all fibs /after/ <? $x and <? $y, which is what we want. This simplifies our definition considerably:

<?
defun('fibs8', fibsFrom7(1, 0));

Finally, we can collapse <? fibsFrom into a one-liner, since PHP evaluates array elements in-order:

<?
defun('fibsFrom9', function($l, $m, $_) {
                     return [$n = $l + $m, fibsFrom9($m, $n)];
                   });
defun('fibs9',     fibsFrom9(1, 0));

Results

This is quite elegant, so let's leave it there and see how well our fold performs. Note that we can reuse <? fold6, since our interface hasn't changed:

mkFibs9(N) vs for-loop

Note that the memory usage is constant, and it turns out we don't pay any memory penalty for using streams compared to a hard-coded for-loop. We do pay a time penalty, most likely from all of the function calls involved, but it's only a constant factor so, as the log scale shows, the scaling behaviour isn't affected.

So there we have it, a PHP implementation of the Fibonacci sequence which:

<?
defun('fibsFrom', function($l, $m, $_) {
                    return [$n = $l + $m, fibsFrom($m, $n)];
                  });
defun('fibs',     fibsFrom(1, 0));