Premature Optimisation

Posted on 2016-06-03 by Chris Warburton

premature optimization is the root of all evil (or at least most of it) in programming – Donald Knuth

I’ve been throwing together a bunch of experiments recently, for manipulating syntax trees (s-expressions). Naturally, this is well-suited for a Lisp dialect, so I’m using Racket.

My Scheme is a little rusty, and I’m throwing away a lot of code as I investigate the efficacy of different approaches, and cobble together test suites to exercise them. Hence, I’m writing a whole lot of redundant, inefficient code.

Background

One transformation I want to perform is removing redundant function definitions. For example, let’s say we’re given some s-expressions like these:

(define-fun-rec
  (par (a)
    (append ((x (list a))
             (y (list a)))
            (list a)
      (match x
        (case nil         y)
        (case (cons z xs) (cons z (append xs y)))))))

(define-fun-rec
  (par (a)
    (join-together ((start (list a))
                    (end (list a)))
                   (list a)
      (match start
        (case nil                 end)
        (case (cons element rest) (cons element (join-together rest end)))))))

(define-fun
  (par (a)
    (join-backwards ((x (list a))
                     (y (list a)))
                    (list a)
      (join-together x (reverse y)))))

(define-fun
  (par (a)
    (repeat ((x (list a)))
            (list a)
      (append x x))))

The append and join-together functions are clearly alpha-equivalent (i.e. they’re the same except for their choice of local variable names), so we can throw one of them away. However, the join-backwards function is using join-together and the repeat function is using append, so we need to make sure these references are updated. Let’s say we keep append (because it appears first) and throw away join-together. The code we want to end up with is:

(define-fun-rec
  (par (a)
    (append ((x (list a))
             (y (list a)))
            (list a)
      (match x
        (case nil         y)
        (case (cons z xs) (cons z (append xs y)))))))

(define-fun
  (par (a)
    (join-backwards ((x (list a))
                     (y (list a)))
                    (list a)
      (append x (reverse y)))))

(define-fun
  (par (a)
    (repeat ((x (list a)))
            (list a)
      (append x x))))

So, how do we figure this out automatically? Normally I would solve this by using de Bruijn indices everywhere, but transforming into that form is tricky, as it requires keeping track of the context. Instead, since each function definition appears in its entirety, we can use a very simple bottom-up renaming strategy:

We pick a standard format for variable names, which doesn’t overlap with any existing variables. For example "normalise-variable-<N>" where <N> denotes a natural number.
To transform an expression which introduces a variable, for example the expression (par (a) foo):
- We transform the scoped expression, in this case foo, to get foo2.
- We find the largest N in the variable names used in foo2, and call it M.
- We replace all occurrences of a in foo2 with "normalise-variable-<M+1>", to get foo3.
- The result will be (par ("normalise-variable-<M+1>") foo3).
To transform an expression which doesn’t introduce a variable, for example a function call (f x y), we just transform each component individually.
Finally, replace the name of the definition to use a standard format, e.g. "defining-function-<N>", along with any recursive references it makes.

In the above examples, we will get:

append:

(define-fun-rec
  (par (normalise-var-5)
    (defining-function-1 ((normalise-var-4 (list normalise-var-5))
                          (normalise-var-3 (list normalise-var-5)))
                         (list normalise-var-5)
      (match normalise-var-4
        (case nil normalise-var-3)
        (case (cons normalise-var-2 normalise-var-1) (cons normalise-var-2 (defining-function-1 normalise-var-1 normalise-var-3)))))))

join-together:

(define-fun-rec
  (par (normalise-var-5)
    (defining-function-1 ((normalise-var-4 (list normalise-var-5))
                          (normalise-var-3 (list normalise-var-5)))
                         (list normalise-var-5)
      (match normalise-var-4
        (case nil normalise-var-3)
        (case (cons normalise-var-2 normalise-var-1) (cons normalise-var-2 (defining-function-1 normalise-var-1 normalise-var-3)))))))

join-backwards:

(define-fun
  (par (normalise-var-3)
    (defining-function-1 ((normalise-var-2 (list normalise-var-3))
                          (normalise-var-1 (list normalise-var-3)))
                         (list normalise-var-3)
      (join-together normalise-var-2 (reverse normalise-var-1)))))

repeat:

(define-fun
  (par (normalise-var-2)
    (defining-function-1 ((normalise-var-1 (list normalise-var-2)))
                         (list normalise-var-2)
      (append normalise-var-1 normalise-var-1))))

As you can see, append and join-together give the same result when normalised in this way, and hence they are alpha-equivalent and we can replace one with the other. They do not give the same result as join-backwards or repeat, which are also distinct from each other, so there are no more alpha-equivalent definitions to discard from this set.

The Problem

Although this works, unfortunately my naïve Racket implementation was pretty slow. Measuring its runtime on one particular file gave 2m30.439s. Since I have a bunch of tests around these scripts, which try various different edge-cases, that was tediously slow!

Solution 1

I noticed that one of my loops had the following pattern:

while read -r LINE
do
  NORMAL_FORM=$(echo "$LINE" | ./getNormalForm.rkt)

  if EXISTING=$(echo "$PREVIOUSLY_SEEN" | grep -F "$NORMAL_FORM")
  then
    EXISTING_NAMES=$(echo "$EXISTING" | cut -f1 | ./getDefinedNames.rkt)
       THESE_NAMES=$(echo "$LINE"               | ./getDefinedNames.rkt)

    NAME_REPLACEMENTS=$(echo "$NAME_REPLACEMENTS";
                        paste <(echo "$THESE_NAMES") <(echo "$EXISTING_NAMES"))
  else
    PREVIOUSLY_SEEN=$(echo -e "$PREVIOUSLY_SEEN\n$LINE\t$NORMAL_FORM")
  fi
done

This loop goes through each definition (they’ve previously been collapsed down to one line each) and builds up two variables, each containing two tab-separated columns:

PREVIOUSLY_SEEN matches definitions to their normalised form.
NAME_REPLACEMENTS matches names we’re discarding to their replacements which we’re keeping (in the above example, we’d have a line join-together\tappend)

For each definition, if we’ve already seen its normalised form, we match all of its defined names to those of the previously-seen definition (a single definition might define multiple names, if they’re mutually recursive) and append them to NAME_REPLACEMENTS. If not, we append this new definition on to PREVIOUSLY_SEEN.

The problem which struck me is that the first column of PREVIOUSLY_SEEN is the entire definition, which we must repeatedly extract the names from in the then branch. Why not extract the names once, and store those (space-delimited) instead of the whole definition?

That results in the following pattern, using tr to convert between spaces and newlines:

while read -r LINE
do
  NORMAL_FORM=$(echo "$LINE" | ./getNormalForm.rkt)
  THESE_NAMES=$(echo "$LINE" | ./getDefinedNames.rkt)

  if EXISTING=$(echo "$PREVIOUSLY_SEEN" | grep -F "$NORMAL_FORM")
  then
    EXISTING_NAMES=$(echo "$EXISTING" | cut -f1 | tr '\n' ' ')

    NAME_REPLACEMENTS=$(echo "$NAME_REPLACEMENTS";
                        paste <(echo "$THESE_NAMES") <(echo "$EXISTING_NAMES"))
  else
    SPACE_NAMES=$(echo "$THESE_NAMES" | tr '\n' ' ')
    PREVIOUSLY_SEEN=$(echo -e "$PREVIOUSLY_SEEN\n$LINE\t$NORMAL_FORM")
  fi
done

We’ve replaced two uses of ./getDefinedNames.rkt with one, and it will only be called at most once per line, rather than being called again and again if there are many duplicates. How does this affect the timing?

2m46.696s

Uh oh, we’re taking an extra 16 seconds, which is a 10% increase!

The reason is that the old version would run ./getDefinedNames.rkt twice per duplicate pair. If there were multiple duplicates, it would run multiple times, but if there were no duplicates, it wouldn’t run at all! The new version runs once per definition regardless of whether there are any duplicates, or how many there are.

Since, as I said at the beginning, this is very cobbled-together, I decided to stick with the new version rather than bother reverting things, since it’s still the same order of magnitude. So, what else can we try?

Solution 2

It’s actually wasteful to keep calling these scripts using ./, since they’re using nix-shell shebangs to ensure their dependencies (i.e. Racket) are available. If we make sure Racket is available in the outer script (again, using a nix-shell shebang) then we can bypass the shebangs of the inner scripts by invoking them with racket.

That makes our code look like this:

while read -r LINE
do
  NORMAL_FORM=$(echo "$LINE" | racket getNormalForm.rkt)
  THESE_NAMES=$(echo "$LINE" | racket getDefinedNames.rkt)

  if EXISTING=$(echo "$PREVIOUSLY_SEEN" | grep -F "$NORMAL_FORM")
  then
    EXISTING_NAMES=$(echo "$EXISTING" | cut -f1 | tr '\n' ' ')

    NAME_REPLACEMENTS=$(echo "$NAME_REPLACEMENTS";
                        paste <(echo "$THESE_NAMES") <(echo "$EXISTING_NAMES"))
  else
    SPACE_NAMES=$(echo "$THESE_NAMES" | tr '\n' ' ')
    PREVIOUSLY_SEEN=$(echo -e "$PREVIOUSLY_SEEN\n$LINE\t$NORMAL_FORM")
  fi
done

How does this impact performance?

2m37.432s

Not much of an increase, but at least it’s offset some of the penalty incurred earlier!

Solution 3

Even without the overhead of nix-shell on each invocation, it still seems wasteful to be firing up racket twice per loop iteration. If we rewrite the loop in Racket, we can call the code used by getNormalForm.rkt and getDefinedNames.rkt without having to set up and tear down the interpreter each time. Plus, Racket is a much saner language for programming than Bash in any case.

Here’s a relevant snippet of the port:

;; Read stdin into a list of (non-empty) strings
(define given-lines
  (filter (lambda (x) (not (equal? 0 (string-length x))))
          (port->lines (current-input-port))))

;; Equivalent to the loop body
(define (mk-output line so-far name-replacements)
  (let* ([expr      (read-from-string line)] ;; Read s-expressions from each line
         [norm-line (norm     expr)]         ;; The function used by getNormalForm.rkt
         [names     (names-in expr)]         ;; The function used by getDefinedNames.rkt
         [existing  (filter (lambda (x)
                              (equal? norm-line (second x)))
                            so-far)])
    (if (empty? existing)
        (list (cons (list names norm-line) so-far)
              name-replacements)
        (list so-far
              (append (zip names (first (car existing)))
                      name-replacements)))))

;; Equivalent to the loop
(define (remove-redundancies lines so-far name-replacements)
  (if (empty? lines)
      (list so-far name-replacements)
      (let* ([result (mk-output (car lines) so-far name-replacements)]
             [new-sf (first  result)]
             [new-nr (second result)])
        (remove-redundancies (cdr lines)
                             new-sf
                             new-nr))))

(define output
  (remove-redundancies given-lines null null))

There’s plenty of opportunity to replace the surrounding Bash code as well, but for now I just replaced this one loop, while maintaining the string-based input and output behaviour.

So, has my porting effort paid off?

2m10.083s

A 17% improvement is quite nice, but doesn’t seem large enough to justify porting any more of the bash across considering the effort involved. Since we’re still above 2 minutes for processing this single file, and I have around 400 files to process (not to mention all of the repeated calls during tests!), it seemed worth digging a little more to try and find more substantial optimisations.

Solution 4

In my Racket code I’ve been doing some pretty braindead things, in the name of cranking out the code (since I still didn’t know if these manipulations would work for my data set). Were there any low-hanging optimisation opportunities to be found on the Racket side?

How about the following, which is clearly redundant:

(define (max-name pre expr)
  (match expr
    [(cons a b) (if (< (max-name pre a) (max-name pre b))
                    (max-name pre b)
                    (max-name pre a))]
    [x          (name-num pre x)]))

This finds the largest name in expr, which begins with pre. For example, (max-name "normalise-variable-" '(lambda (normalise-variable-4) (foo normalise-variable-4 normalise-variable-10))) will return 10.

The base case, (name-num pre x), just returns pre suffixed with 0. The recursive case for cons is clearly performing redundant work, since it’s recursing on each part to check which has the bigger number, then recursing on that part again to return the number. These expressions are all pretty tiny, so it can’t be slowing us down much. Still, I thought I might as well do the obvious improvement and use let to store the results, so I can re-use them rather than recalculating:

(define (max-name pre expr)
  (match expr
    [(cons a b) (let ([ma (max-name pre a)]
                      [mb (max-name pre b)])
                  (if (< ma mb) mb ma))]
    [x          (name-num pre x)]))

How does this affect performance?

2m9.580s

Hmm, very underwhelming. Is there anything else lurking in the Racket code which we can optimise?

Solution 5

Looking further through the Racket code, I came across the following:

(define (max-var expr)
  (if (is-var? expr)
      expr
      (match expr
        [(cons a b) (if (var-lt? (max-var a) (max-var b))
                        (max-var b)
                        (max-var a))]
        [_          (string->symbol (string-append var-prefix "0"))])))

Look familiar? This performs pretty much the same job as max-name, but is specific to variable names. This was already working when I decided to add support for normalising datatypes, constructors and destructors, so rather than refactoring I just wrote max-name from scratch to be more generic. Variables are still being handled by max-var, so we might as well do the same optimisation as we did for max-name:

(define (max-var expr)
  (if (is-var? expr)
      expr
      (match expr
        [(cons a b) (let ([ma (max-var a)]
                          [mb (max-var b)])
                      (if (var-lt? ma mb) mb ma))]
        [_          (string->symbol (string-append var-prefix "0"))])))

Since the max-name improvement was rather meagre, we shouldn’t expect much from this either; still, it’s trivial enough to perform so we might as well see what it does:

0m23.532s

Wow, that’s shaved 82% off the runtime! Not bad for a few minutes work.

There are still plenty more optimisation opportunities in the code, for example lots of boundaries where data is formatted into strings, only to be immediately parsed back somewhere else. For now though, I think 24 seconds is pretty workable, and if these numbers are representative, I’ve cut down my 400 file workload from 16 hours to 2 hours.

I suppose the lesson is to make those trivial improvements when you spot them; even if they appear miniscule, their aggregate effect may be substantial. Alternatively, the lesson might be to avoid being too braindead, even when you expect to be throwing away the code!