Premature Optimisation
premature optimization is the root of all evil (or at least most of it) in programming – Donald Knuth
I’ve been throwing together a bunch of experiments recently, for manipulating syntax trees (s-expressions). Naturally, this is well-suited for a Lisp dialect, so I’m using Racket.
My Scheme is a little rusty, and I’m throwing away a lot of code as I investigate the efficacy of different approaches, and cobble together test suites to exercise them. Hence, I’m writing a whole lot of redundant, inefficient code.
Background
One transformation I want to perform is removing redundant function definitions. For example, let’s say we’re given some s-expressions like these:
(define-fun-rec
(par (a)append ((x (list a))
(list a)))
(y (list a)
(
(match xcase nil y)
(case (cons z xs) (cons z (append xs y)))))))
(
(define-fun-rec
(par (a)list a))
(join-together ((start (list a)))
(end (list a)
(
(match startcase nil end)
(case (cons element rest) (cons element (join-together rest end)))))))
(
(define-fun
(par (a)list a))
(join-backwards ((x (list a)))
(y (list a)
(reverse y)))))
(join-together x (
(define-fun
(par (a)list a)))
(repeat ((x (list a)
(append x x)))) (
The append
and join-together
functions are
clearly alpha-equivalent (i.e. they’re the same except for their choice
of local variable names), so we can throw one of them away. However, the
join-backwards
function is using join-together
and the repeat
function is using append
, so we
need to make sure these references are updated. Let’s say we keep
append
(because it appears first) and throw away
join-together
. The code we want to end up with is:
(define-fun-rec
(par (a)append ((x (list a))
(list a)))
(y (list a)
(
(match xcase nil y)
(case (cons z xs) (cons z (append xs y)))))))
(
(define-fun
(par (a)list a))
(join-backwards ((x (list a)))
(y (list a)
(append x (reverse y)))))
(
(define-fun
(par (a)list a)))
(repeat ((x (list a)
(append x x)))) (
So, how do we figure this out automatically? Normally I would solve this by using de Bruijn indices everywhere, but transforming into that form is tricky, as it requires keeping track of the context. Instead, since each function definition appears in its entirety, we can use a very simple bottom-up renaming strategy:
- We pick a standard format for variable names, which doesn’t overlap
with any existing variables. For example
"normalise-variable-<N>"
where<N>
denotes a natural number. - To transform an expression which introduces a variable, for example
the expression
(par (a) foo)
:- We transform the scoped expression, in this case
foo
, to getfoo2
. - We find the largest
N
in the variable names used infoo2
, and call itM
. - We replace all occurrences of
a
infoo2
with"normalise-variable-<M+1>"
, to getfoo3
. - The result will be
(par ("normalise-variable-<M+1>") foo3)
.
- We transform the scoped expression, in this case
- To transform an expression which doesn’t introduce a variable, for
example a function call
(f x y)
, we just transform each component individually. - Finally, replace the name of the definition to use a standard
format, e.g.
"defining-function-<N>"
, along with any recursive references it makes.
In the above examples, we will get:
append
:
(define-fun-rec
(par (normalise-var-5)list normalise-var-5))
(defining-function-1 ((normalise-var-4 (list normalise-var-5)))
(normalise-var-3 (list normalise-var-5)
(
(match normalise-var-4case nil normalise-var-3)
(case (cons normalise-var-2 normalise-var-1) (cons normalise-var-2 (defining-function-1 normalise-var-1 normalise-var-3))))))) (
join-together
:
(define-fun-rec
(par (normalise-var-5)list normalise-var-5))
(defining-function-1 ((normalise-var-4 (list normalise-var-5)))
(normalise-var-3 (list normalise-var-5)
(
(match normalise-var-4case nil normalise-var-3)
(case (cons normalise-var-2 normalise-var-1) (cons normalise-var-2 (defining-function-1 normalise-var-1 normalise-var-3))))))) (
join-backwards
:
(define-fun
(par (normalise-var-3)list normalise-var-3))
(defining-function-1 ((normalise-var-2 (list normalise-var-3)))
(normalise-var-1 (list normalise-var-3)
(reverse normalise-var-1))))) (join-together normalise-var-2 (
repeat
:
(define-fun
(par (normalise-var-2)list normalise-var-2)))
(defining-function-1 ((normalise-var-1 (list normalise-var-2)
(append normalise-var-1 normalise-var-1)))) (
As you can see, append
and join-together
give the same result when normalised in this way, and hence they are
alpha-equivalent and we can replace one with the other. They do
not give the same result as join-backwards
or
repeat
, which are also distinct from each other, so there
are no more alpha-equivalent definitions to discard from this set.
The Problem
Although this works, unfortunately my naive Racket implementation was
pretty slow. Measuring its runtime on one particular file gave
2m30.439s
. Since I have a bunch of tests around these
scripts, which try various different edge-cases, that was tediously
slow!
Solution 1
I noticed that one of my loops had the following pattern:
while read -r LINE
do
NORMAL_FORM=$(echo "$LINE" | ./getNormalForm.rkt)
if EXISTING=$(echo "$PREVIOUSLY_SEEN" | grep -F "$NORMAL_FORM")
then
EXISTING_NAMES=$(echo "$EXISTING" | cut -f1 | ./getDefinedNames.rkt)
THESE_NAMES=$(echo "$LINE" | ./getDefinedNames.rkt)
NAME_REPLACEMENTS=$(echo "$NAME_REPLACEMENTS";
paste <(echo "$THESE_NAMES") <(echo "$EXISTING_NAMES"))
else
PREVIOUSLY_SEEN=$(echo -e "$PREVIOUSLY_SEEN\n$LINE\t$NORMAL_FORM")
fi
done
This loop goes through each definition (they’ve previously been collapsed down to one line each) and builds up two variables, each containing two tab-separated columns:
PREVIOUSLY_SEEN
matches definitions to their normalised form.NAME_REPLACEMENTS
matches names we’re discarding to their replacements which we’re keeping (in the above example, we’d have a linejoin-together\tappend
)
For each definition, if we’ve already seen its normalised form, we
match all of its defined names to those of the previously-seen
definition (a single definition might define multiple names, if they’re
mutually recursive) and append them to NAME_REPLACEMENTS
.
If not, we append this new definition on to
PREVIOUSLY_SEEN
.
The problem which struck me is that the first column of
PREVIOUSLY_SEEN
is the entire definition, which we must
repeatedly extract the names from in the then
branch. Why
not extract the names once, and store those (space-delimited) instead of
the whole definition?
That results in the following pattern, using tr
to
convert between spaces and newlines:
while read -r LINE
do
NORMAL_FORM=$(echo "$LINE" | ./getNormalForm.rkt)
THESE_NAMES=$(echo "$LINE" | ./getDefinedNames.rkt)
if EXISTING=$(echo "$PREVIOUSLY_SEEN" | grep -F "$NORMAL_FORM")
then
EXISTING_NAMES=$(echo "$EXISTING" | cut -f1 | tr '\n' ' ')
NAME_REPLACEMENTS=$(echo "$NAME_REPLACEMENTS";
paste <(echo "$THESE_NAMES") <(echo "$EXISTING_NAMES"))
else
SPACE_NAMES=$(echo "$THESE_NAMES" | tr '\n' ' ')
PREVIOUSLY_SEEN=$(echo -e "$PREVIOUSLY_SEEN\n$LINE\t$NORMAL_FORM")
fi
done
We’ve replaced two uses of ./getDefinedNames.rkt
with
one, and it will only be called at most once per line, rather than being
called again and again if there are many duplicates. How does this
affect the timing?
2m46.696s
Uh oh, we’re taking an extra 16 seconds, which is a 10% increase!
The reason is that the old version would run
./getDefinedNames.rkt
twice per duplicate pair. If
there were multiple duplicates, it would run multiple times, but if
there were no duplicates, it wouldn’t run at all! The new
version runs once per definition regardless of whether there
are any duplicates, or how many there are.
Since, as I said at the beginning, this is very cobbled-together, I decided to stick with the new version rather than bother reverting things, since it’s still the same order of magnitude. So, what else can we try?
Solution 2
It’s actually wasteful to keep calling these scripts using
./
, since they’re using nix-shell shebangs to
ensure their dependencies (i.e. Racket) are available. If we make sure
Racket is available in the outer script (again, using a nix-shell
shebang) then we can bypass the shebangs of the inner scripts by
invoking them with racket
.
That makes our code look like this:
while read -r LINE
do
NORMAL_FORM=$(echo "$LINE" | racket getNormalForm.rkt)
THESE_NAMES=$(echo "$LINE" | racket getDefinedNames.rkt)
if EXISTING=$(echo "$PREVIOUSLY_SEEN" | grep -F "$NORMAL_FORM")
then
EXISTING_NAMES=$(echo "$EXISTING" | cut -f1 | tr '\n' ' ')
NAME_REPLACEMENTS=$(echo "$NAME_REPLACEMENTS";
paste <(echo "$THESE_NAMES") <(echo "$EXISTING_NAMES"))
else
SPACE_NAMES=$(echo "$THESE_NAMES" | tr '\n' ' ')
PREVIOUSLY_SEEN=$(echo -e "$PREVIOUSLY_SEEN\n$LINE\t$NORMAL_FORM")
fi
done
How does this impact performance?
2m37.432s
Not much of an increase, but at least it’s offset some of the penalty incurred earlier!
Solution 3
Even without the overhead of nix-shell on each invocation, it still
seems wasteful to be firing up racket
twice per loop
iteration. If we rewrite the loop in Racket, we can call the code used
by getNormalForm.rkt
and getDefinedNames.rkt
without having to set up and tear down the interpreter each time. Plus,
Racket is a much saner language for programming than Bash in any
case.
Here’s a relevant snippet of the port:
;; Read stdin into a list of (non-empty) strings
define given-lines
(filter (lambda (x) (not (equal? 0 (string-length x))))
(current-input-port))))
(port->lines (
;; Equivalent to the loop body
define (mk-output line so-far name-replacements)
(let* ([expr (read-from-string line)] ;; Read s-expressions from each line
([norm-line (norm expr)] ;; The function used by getNormalForm.rkt
[names (names-in expr)] ;; The function used by getDefinedNames.rkt
[existing (filter (lambda (x)
equal? norm-line (second x)))
(])
so-far)if (empty? existing)
(list (cons (list names norm-line) so-far)
(
name-replacements)list so-far
(append (zip names (first (car existing)))
(
name-replacements)))))
;; Equivalent to the loop
define (remove-redundancies lines so-far name-replacements)
(if (empty? lines)
(list so-far name-replacements)
(let* ([result (mk-output (car lines) so-far name-replacements)]
([new-sf (first result)]
[new-nr (second result)])
cdr lines)
(remove-redundancies (
new-sf
new-nr))))
define output
( (remove-redundancies given-lines null null))
There’s plenty of opportunity to replace the surrounding Bash code as well, but for now I just replaced this one loop, while maintaining the string-based input and output behaviour.
So, has my porting effort paid off?
2m10.083s
A 17% improvement is quite nice, but doesn’t seem large enough to justify porting any more of the bash across considering the effort involved. Since we’re still above 2 minutes for processing this single file, and I have around 400 files to process (not to mention all of the repeated calls during tests!), it seemed worth digging a little more to try and find more substantial optimisations.
Solution 4
In my Racket code I’ve been doing some pretty braindead things, in the name of cranking out the code (since I still didn’t know if these manipulations would work for my data set). Were there any low-hanging optimisation opportunities to be found on the Racket side?
How about the following, which is clearly redundant:
define (max-name pre expr)
(
(match expr[(cons a b) (if (< (max-name pre a) (max-name pre b))
(max-name pre b)]
(max-name pre a))[x (name-num pre x)]))
This finds the largest name in expr
, which begins with
pre
. For example,
(max-name "normalise-variable-" '(lambda (normalise-variable-4) (foo normalise-variable-4 normalise-variable-10)))
will return 10
.
The base case, (name-num pre x)
, just returns
pre
suffixed with 0
. The recursive case for
cons
is clearly performing redundant work, since it’s
recursing on each part to check which has the bigger number, then
recursing on that part again to return the number. These
expressions are all pretty tiny, so it can’t be slowing us down much.
Still, I thought I might as well do the obvious improvement and use
let
to store the results, so I can re-use them rather than
recalculating:
define (max-name pre expr)
(
(match expr[(cons a b) (let ([ma (max-name pre a)]
[mb (max-name pre b)])
if (< ma mb) mb ma))]
([x (name-num pre x)]))
How does this affect performance?
2m9.580s
Hmm, very underwhelming. Is there anything else lurking in the Racket code which we can optimise?
Solution 5
Looking further through the Racket code, I came across the following:
define (max-var expr)
(if (is-var? expr)
(
expr
(match expr[(cons a b) (if (var-lt? (max-var a) (max-var b))
(max-var b)]
(max-var a))[_ (string->symbol (string-append var-prefix "0"))])))
Look familiar? This performs pretty much the same job as
max-name
, but is specific to variable names. This was
already working when I decided to add support for normalising datatypes,
constructors and destructors, so rather than refactoring I just wrote
max-name
from scratch to be more generic. Variables are
still being handled by max-var
, so we might as well do the
same optimisation as we did for max-name
:
define (max-var expr)
(if (is-var? expr)
(
expr
(match expr[(cons a b) (let ([ma (max-var a)]
[mb (max-var b)])
if (var-lt? ma mb) mb ma))]
([_ (string->symbol (string-append var-prefix "0"))])))
Since the max-name
improvement was rather meagre, we
shouldn’t expect much from this either; still, it’s trivial enough to
perform so we might as well see what it does:
0m23.532s
Wow, that’s shaved 82% off the runtime! Not bad for a few minutes work.
There are still plenty more optimisation opportunities in the code, for example lots of boundaries where data is formatted into strings, only to be immediately parsed back somewhere else. For now though, I think 24 seconds is pretty workable, and if these numbers are representative, I’ve cut down my 400 file workload from 16 hours to 2 hours.
I suppose the lesson is to make those trivial improvements when you spot them; even if they appear miniscule, their aggregate effect may be substantial. Alternatively, the lesson might be to avoid being too braindead, even when you expect to be throwing away the code!