Fibonacci Sequence in PHP
Exponential Definitions
The Fibonacci sequence is a well-known series of numbers which follow
a pattern: it starts 1, 1
and each subsequent number is the sum of the two preceding it. We can
calculate the $n
th element of
the sequence like this in PHP (using handy functions from PHP Prelude):
'mkFib1', function($n) {
defun(return ($n < 2)? 1 // Base case
: mkFib1($n - 1) + mkFib1($n - 2);
; })
This is an incredibly naive approach, since each call to mkFib1
can involve 2 more calls to mkFib1
. This recursion is
well-founded, since it will always head towards the base
case where $n < 2
,
but there are two problems:
- It’s not tail-recursive, so it requires a linear number of stack frames, leading to stack overflows.
- It requires an exponential number of recursive calls to reach a base case, which gives us an exponential running time:
What’s more, a function is a rather clumsy way of representing a sequence, since the structure isn’t immediately available and must be inferred by our users via some sort of counter. This violates the Once and Only Once rule, and is liable to cause off-by-one errors and such.
So what’s the alternative? The trick is to realise that we don’t need to provide every element, since our users will only use a finite number. The difficulty is that we don’t know how many they’ll need!
One simple approach that may spring to mind is asking the user up front how many elements they need:
'mkFibs2', compose(map('mkFib1'), 'upto')); defun(
However, this doesn’t do what we want. The requirement to know how
many elements are needed is too strong; many useful functions
like filter
and fold
(AKA “reduce”) require an unknown number of elements in general; for
example, filters don’t know how many elements they’ll need to discard
before finding a match. This forces us to generate longer and longer
sequences, slowing us down by a logarithmic factor compared to mkFib1
:
// Note: Since there are infinitely many fibs, we can't fold all of them.
// Instead, $f returns a pair [$stop, $result], where $stop tells us
// when to cut-off the fold and $result is treated in the usual way.
// Fold a sequence generated by mkFib1
'fold1', function($f) {
defun(// Fold over the Naturals $n = 0, 1, 2, ...
return loop(function($x, $n) use ($f) {
return $f($x, mkFib1($n));
;
});
})
// Prevents us applying $f to too many fibs
'shortcircuit', function($f, $x, $fib) {
defun(return $x[0]? $x // If $x says stop, return it as-is
: $f($x[1], $fib); // Else call $f
;
})
// Fold a sequence generated by mkFib2
'fold2', function($f) {
defun(// Fold over the Naturals $n = 0, 1, 2, ...
return loop(function($x, $n) use ($f) {
// Fold over 2^$n fibs
list($s, $y) = array_reduce(mkFibs2(pow(2, $n)),
$f),
shortcircuit(false, $x]);
[return [$s, $s? $y // If $s(top), return $y
: $x]; // Else restart with $x
;
}); })
In this blog post I’ll show how we can improve all these things, ending up with a structured representation that, for any unknown N, can fold N elements in O(1) memory, O(N) time and O(1) stack frames.
Inductive Definitions
The problem with mkFib2
is that
the size of a PHP array must be specified in advance, but we don’t know
how many elements we’ll need. One simple alternative to the array is a
list. Rather than storing all elements in a single place, a
list stores them separately, then links them all together. There are
many different kinds of list, but the simplest is called tail
recursive (or “singly-linked”) and can be implemented very easily
in PHP by nesting arrays:
// $x is our current element, $n is when to stop
'mkFibs3_', function($x, $n) {
defun(return ($x < $n)? [mkFib1($x), mkFibs3_($x+1, $n)]
: [];
;
})'mkFibs3', mkFibs3_(0)); // Always start at 0 defun(
The results look like this:
array(2) {
[0]=>
int(1)
[1]=>
array(2) {
[0]=>
int(1)
[1]=>
array(2) {
[0]=>
int(2)
[1]=>
array(2) {
[0]=>
int(3)
[1]=>
array(2) {
[0]=>
int(5)
[1]=>
array(0) {
}
}
}
}
}
}
The results of mkFibs3
still have
a finite size, but we’re no longer at the mercy of PHP’s implementation
details. Each array has a known size (0
or 2
), no matter how
many elements we ask for. Like mkFib1
, mkFibs3
isn’t tail recursive, so it may
overflow the stack if we ask for too many elements. Folding lists is a
little awkward in PHP, since there’s no built-in equivalent to array_reduce
:
'fold3', function($f, $x) {
defun(return trampoline(y(
function($g, $n, $fibs, $acc, $_) use ($f, $x) {
// If we've run out of $fibs, start again with more
if ($fibs === [])
return [false, $g($n+1, mkFibs3(pow(2, $n+1)), $x)];
// Apply $f to the next $fib
list($stop, $acc) = $f($acc, $fibs[0]);
return [$stop, $stop? $acc
: $g($n, $fibs[1], $acc)];
, 0, [], null));
}; })
We can now do away with the exponentially-inefficient mkFib1
. If we log the arguments we’re
sending to mkFib1
, we can see that
mkFibs3
is asking for the same
values over and over:
mkFibs3(…) | mkFib1(…) |
---|---|
0 | |
1 | 0 |
2 | 0, 1 |
3 | 0, 1, 2, 1, 0 |
4 | 0, 1, 2, 1, 0, 3, 2, 1, 0, 1 |
5 | 0, 1, 2, 1, 0, 3, 2, 1, 0, 1, 4, 3, 2, 1, 0, 1, 2, 1, 0 |
6 | 0, 1, 2, 1, 0, 3, 2, 1, 0, 1, 4, 3, 2, 1, 0, 1, 2, 1, 0, 5, 4, 3, 2, 1, 0, 1, 2, 1, 0, 3, 2, 1, 0, 1 |
We can actually remove all of these calls, since mkFibs3
has all of the data it needs to
construct each Fibonacci number itself. If we pass this data along the
recursive calls, our runtime becomes linear:
'mkFibs4_', function($fib_2, $fib_1, $x, $n) {
defun(if ($x >= $n) return []; // Stopping condition
$fib = ($x < 2)? 1 : ($fib_2 + $fib_1);
return [$fib, mkFibs4_($fib_1, $fib, $x+1, $n)];
;
})'mkFibs4', mkFibs4_(null, null, 0)); defun(
Coinductive Definitions
Now we can deal with the finiteness problem. Every time PHP sees one of our lists, it does the following:
- Construct the first element (the Fibonacci number, known as the car or head)
- Construct the second element (the rest of the list, known as the cdr or tail)
- Construct the array containing them
Our problem is that we’re forced to construct the whole tail before we know how long it needs to be. The solution is to delay the tail, which we can do using a thunk: a function which returns a constant value. PHP will construct functions without running them, which gives us time to figure out how many elements we need.
'mkFibs5_', function($fib_2, $fib_1, $x, $n) {
defun(if ($x >= $n) return []; // Stopping condition
$fib = ($x < 2)? 1 : ($fib_2 + $fib_1);
return [$fib, function($_) use ($fib_1, $fib, $x, $n) {
return mkFibs5_($fib_1, $fib, $x+1, $n);
;
}];
})'mkFibs5', mkFibs5_(null, null, 0)); defun(
Delaying computation this way is known as lazy evaluation, and this method of generating a bit of data and delaying the rest is called co-induction. Hence this kind of structure is known as a co-data structure.
The nice thing about codata is that we can define never-ending chains of values, wrapping each link in a thunk so that it only gets evaluated when needed (known as forcing the value). This is exactly what we need for our Fibonacci sequence, and all we need to do is throw away the stopping condition!
Notice that I’m giving these thunks a parameter $_
which is
ignored. As far as PHP’s concerned, I could have defined them as
nullary functions, ie. taking no arguments, but that makes them
harder to reason about and prevents some later simplifications:
'fibsFrom6', function($fib_2, $fib_1) {
defun($fib = $fib_2 + $fib_1;
return [$fib, function($_) use ($fib_1, $fib) {
return fibsFrom6($fib_1, $fib);
;
}];
})'fibs6', function($_) {
defun(return [1, function($_) {
return [1, function($_) {
return fibsFrom6(1, 1);
;
}];
}]; })
This is the first of our definitions that’s both infinite, like a function, and structured, like an array. This kind of codata structure is called a stream. Here’s how to fold a stream, using a trampoline to optimise tail-calls:
'fold6', function($f, $acc, $stream) {
defun(return trampoline(y(function($y, $acc, $s, $_) use ($f) {
list($h, $t) = $s(null);
list($stop, $acc) = $f($acc, $h);
return [$stop, $stop? $acc
: $y($acc, $t)];
, $acc, $stream));
}; })
Refactoring
Now that we’ve created our infinite Fibonacci sequence, we can
refactor the code to be a little less naff. I debated whether to define
the fibsFrom
function inline, but I
think it’s nice to keep two separate functions since they correspond
exactly to the two rules defining the Fibonacci sequence: fibsFrom
implements “sum the previous
two”, fibs
implements “start with
1
, 1
”.
The first refactoring we can do is to notice that we have functions
returning functions, which is a manual form of currying. We can remove
this separation, since our functions are curried automatically by defun
(we couldn’t do this if our thunks
were nullary):
'fibsFrom7', function($l, $m, $_) {
defun($n = $l + $m;
return [$n, fibsFrom7($m, $n)];
;
})'fibs7', function($_) {
defun(return [1, function($_) {
return [1, fibsFrom7(1, 1)];
;
}]; })
Next we can remove the redundant 1
s in fibs7
. This redundancy is due to fibsFrom
not including its arguments in
the stream it returns, but of course if we naively change fibsFrom
to include its arguments, we’d
just be shifting around the redundancy, not eliminating it.
The solution is to extrapolate the sequence backwards a couple of
steps. In other words, we need to switch the initial 1, 1
to some other values $x
and $y
such that we
get a sequence $x, $y, 1, 1, 2, 3, 5, 8, ...
.
We can derive these values straight from the definition of the Fibonacci
sequence:
$y + 1 == 1 // Since $y, 1, 1 is in the sequence
$y == 1 - 1
$y == 0
$x + $y == 1 // Since $x, $y, 1 is in the sequence
$x + 0 == 1 // From the value of $y above
$x == 1
Now we can pass $x
and $y
as arguments to
fibsFrom
and get back a stream of
all fibs after $x
and $y
, which is what
we want. This simplifies our definition considerably:
'fibs8', fibsFrom7(1, 0)); defun(
Finally, we can collapse fibsFrom
into a one-liner, since PHP evaluates array elements in-order:
'fibsFrom9', function($l, $m, $_) {
defun(return [$n = $l + $m, fibsFrom9($m, $n)];
;
})'fibs9', fibsFrom9(1, 0)); defun(
Results
This is quite elegant, so let’s leave it there and see how well our
fold performs. Note that we can reuse fold6
, since our interface hasn’t
changed:
Note that the memory usage is constant, and it turns out we don’t pay any memory penalty for using streams compared to a hard-coded for-loop. We do pay a time penalty, most likely from all of the function calls involved, but it’s only a constant factor so, as the log scale shows, the scaling behaviour isn’t affected.
So there we have it, a PHP implementation of the Fibonacci sequence which:
- Has a sequential structure, like the sequence itself
- Is never-ending, like the sequence itself (thanks to co-induction)
- Uses constant stack-space (thanks to tail-call optimisation, manually implemented with trampolines)
- Uses constant memory (thanks to lexical closures)
- Uses linear time (again, thanks to lexical closures)
- Closely follows the two defining rules of the sequence
- Has a self-contained, abstract, composable, functional interface
- Only requires 4 lines to define, not counting the generic library
functions from
prelude.php
'fibsFrom', function($l, $m, $_) {
defun(return [$n = $l + $m, fibsFrom($m, $n)];
;
})'fibs', fibsFrom(1, 0)); defun(