Debugging random Haskell
I was asked to look at some Haskell code the other day, from some kind of online roleplaying game. The code is given, along with the briefing that it should find a valid password amongst a file of about 35,000. Its creator supposedly fired, and managed to mangle it before he was kicked out. Our task is to find the correct password.
Rather than look at the file, let’s dive straight into the code. I found this particularly enjoyable, since it’s essentially an exercise in debugging and refactoring, which is how I tend to write code. I’ll write a naive version which is generally very inefficient, over-complicated, repetitive, hard-coded and fragile. I’ll then repeatedly refactor it until it becomes efficient, simple, terse, generic and robust (I hope!). I assume no knowledge of Haskell, but knowing a language with first-class functions like Javascript may be useful.
import ParseLib
Import an unspecified parsing library; this isn’t too significant, but some Googling will find it if you really care.
runner :: [Char] -> [a]
This is a type annotation. It’s only a sanity check, since Haskell infers types. Thus there’s no point debugging it, so we remove it. I’ll ignore type annotations from now on, as a mangled annotation will just confuse us.
| (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) = False =
runner a
mzero| otherwise =
result [papply defB a]
This is an invalid function definition. Let’s clean up the line breaks:
| (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) = False = mzero
runner a | otherwise = result [papply defB a]
Each line is a different ‘clause’ of the function; they have the
following form (|
means
‘where’):
| myCondition = iWillRunIfMyConditionIsTrue
myFunction myArgument | otherwise = iWillRunIfMyConditionIsFalse
This syntax can simplify complicated function definitions, but in our case I think it’s clearer to use an equivalent if condition:
= if (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) = False
runner a then mzero
else result [papply defB a]
There’s a rookie error at the end of the if-condition, common from
many languages: it uses =
(definition)
in the if condition, instead of ==
(comparison). Let’s swap it:
= if (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) == False
runner a then mzero
else result [papply defB a]
Notice that we’re comparing something to False
; we can
get rid of the == False
if we swap the then
and else
branches
around:
= if length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])
runner a then result [papply defB a]
else mzero
The right-hand-side of the condition doesn’t contain a
, so it must be constant (due to
‘referential transparency’: the output of a function is completely
determined by its input, and here there are no inputs). Let’s calculate
it:
sum [product (3..5), product (6..9), 1]
There’s no such thing as (3..5)
or (6..9)
.
It could mean 3.5
and 6.9
, but that
wouldn’t type check, since product
wants
a list of numbers. This makes it more likely to be [3..5]
and [6..9]
which is shorthand for enumFromTo 3 5
and enumFromTo 6 9
,
which are, respectively, [3, 4, 5]
and [6, 7, 8, 9]
:
sum [product [3, 4, 5], product [6, 7, 8, 9], 1]
The products are easy to calculate:
sum [60, 3024, 1]
As is the sum:
3084
If we put this simplified condition back into our original function we get:
= if length (test (a ++ "z")) == 3084
runner a then result [papply defB a]
else mzero
++
is
a built-in function, defined like this (where []
is an empty list and x : y
is a
singly-linked list starting with element x
and followed by list y
):
++ list2 = list2 -- Concatenating an empty list does nothing
[] elem : list1) ++ list2 = elem : (list1 ++ list2) -- Recurse (
What about test
?
= init x test x
The function test
, of
argument x
, is defined as init
applied
to x
. Clearly:
= init test
That transformation is known as eta-reduction. The init
function
is built in to Haskell’s standard library too. It looks like the
following:
init (elem : []) = [] -- init of a one-element list is an empty list
init (elem : xs) = elem : init xs -- Recurse
init
chops an element off the end of a list. Since we’re concatenating with a
non-empty list, init
will
recurse over a
and end up at
"z"
.
Let’s simplify:
length (init (a ++ "z"))
length (a ++ init "z")
length (a ++ init ('z' : [])) -- "z" is a one-element list of Chars
length (a ++ [])
length a
This gives us a much simpler version of the original function:
= if length a == 3084
runner a then result [papply defB a]
else mzero
Now what’s result
?
= "Nope"
result [] :ys) = x result ((x,xs)
(foo, bar)
is a pair, so
result
is extracting the first
element of a pair, which itself is the first element of a list. We know
that this element will be a String
, since
it has to be of the same type as "Nope"
.
We can split this into two separate element extractions, using the
built-in functions fst
(get the
first element of a pair) and head
(get the
first element of a list):
= "Nope"
result [] = fst (head xs) result xs
Notice that runner
calls
result
on a one-element list
[papply defB x]
, therefore:
result [papply defB x]: []) -- [foo] is sugar for foo:[]
result ((papply defB x) fst (head ((papply defB x) : [])) -- Bring 'result' in-line
fst (papply defB x) -- 'head (x:y)' is just 'x'
This gives us a simplified runner
:
= if length x == 3084
runner x then fst (papply defB x)
else mzero
mzero
is funny; it’s a
‘method’ of a ‘type class’ called MonadPlus
. A
type class is basically an interface, and types can be instances of type
classes in the same way that Object Oriented classes can implement OO
interfaces (confusing re-use of existing terminology, I know!).
To know which implementation of mzero
we’re dealing with, we need to
know it’s type. This must be the same as the then
branch,
which we just discovered must be a String
(otherwise we couldn’t have put "Nope"
as a possible return value). As we saw above, String
is just
a synonym for a list of Chars
(ie.
[Char]
),
and indeed lists are an instance of the type class MonadPlus
.
Their implementation of mzero
is
the empty list []
, which (since
we’re dealing with [Char]
AKA
String
)
we can also write as ""
:
= if length x == 3084
runner x then fst (papply defB x)
else ""
That’s as simple as we can get with runner
. Although we can rearrange it
back to the two-clause version:
| length x == 3084 = fst (papply defB x)
runner x | otherwise = ""
Next we look at the parsers:
= do char 'X'
defB <- digit
baz 'X'
char 'W'
char <- defC
foo <- defD
bar 'a'
char <- many (do char 'f'
fooo
digit)'Y'
char 'Z'
char return (foo ++ bar)
This is ‘do
notation’,
which is nice syntactic sugar for Haskell’s famously scary feature,
Monads. A Monad
is just
a way to chain functions together. Here we’re chaining together simple
parsers into more complex ones. We just write do
followed by
a bunch of parsers and the Monad
will do
all of the plumbing, composition, backtracking, shortcutting, etc. for
us.
A few points to mention:
- Haskell’s layout rule lets us do without braces and semicolons in favour of indenting and dedenting; the indentation above needs straightening out.
- By default the intermediate return-values are ignored. To keep them
we use an arrow, eg.
foo <- bar
will store the result ofbar
infoo
. - Many of the results above are ignored, so there’s not point using
the arrows (eg.
baz
andfooo
are ignored, butfoo
andbar
are used at the end). - A
do
block is just a regular function, hence the nested argument tomany
- The return value of a
do
block is the return value of the last function in the chain; to override this, we can use thereturn
function which doesn’t perform any action (in our case any parsing) but does provide a return value.
= do char 'X'
defB
digit'X'
char 'W'
char <- defC
foo <- defD
bar 'a'
char do char 'f'
many (
digit)'Y'
char 'Z'
char return (foo ++ bar)
It turns out that defD
is
only used once in the whole program, so we may as well insert its
definition straight in here (suitably indented):
= do char 'X'
defB
digit'X'
char 'W'
char <- defC
foo <- do foo <- defC
bar do baar <- defC
digit<- many (do digit)
fooo <- defE
q return (q : "d")
'a'
char do char 'f'
many (
digit)'Y'
char 'Z'
char return (foo ++ bar)
Note that nested chains can be flattened; do A; (do B; C)
is the same as do A; B; C
,
as long as we keep the correct return values assigned to any
intermediate results. Likewise do digit
is
the same as digit
, since we’re
not chaining anything on to it:
= do char 'X'
defB
digit'X'
char 'W'
char <- defC
foo
defC
defC
digit
many digit<- defE
q 'a'
char do char 'f'
many (
digit)'Y'
char 'Z'
char return (foo ++ q ++ "d")
Likewise, defE
is never used
anywhere else. We may as well inline it too:
= do char 'X'
defB
digit'X'
char 'W'
char <- defC
foo
defC
defC
digit
many digit<- do foo <- defC
q do fooo <- defC
do foooo <- defC
return "i"
'a'
char do char 'f'
many (
digit)'Y'
char 'Z'
char return (foo ++ q ++ "d")
Simplifying out (we now know the value of q
is "i"
):
= do char 'X'
defB
digit'X'
char 'W'
char <- defC
foo
defC
defC
digit
many digit
defC
defC
defC'a'
char do char 'f'
many (
digit)'Y'
char 'Z'
char return (foo ++ "id")
Now let’s look at defC
. We
keep this separate since it’s used many times:
= do char '!'
defC <- digit
baz 's'
char do s <- many (do char 's')
return "Val"
+++ do char '@'
<- char 's'
baz
digitdo s <- many (do digit)
return 'Val'
This is actually in two parts, combined using +++
. I’ll
split them into separate functions to make it clearer:
= do char '!'
defCA <- digit
baz 's'
char do s <- many (do char 's')
return "Val"
= do char '@'
defCB <- char 's'
baz
digitdo s <- many (do digit)
return "Val"
= defCA +++ defCB defC
According to ParseLib, +++
is a
choice, so defC
will try to
match defCA
first. If it fails,
the input will be wound back and defCB
will be tried. defC
will only fail to match if both
defCA
and defCB
fail to match. Let’s clean it
up:
= do char '!'
defCA
digit's'
char 's')
many (char return "Val"
= do char '@'
defCB 's'
char
digit
many digitreturn "Val"
= defCA +++ defCB defC
We can see that both branches will return "Val"
if successful, so we no longer need to use the foo
variable in defB
, we can just use "Val"
directly:
= do char 'X'
defB
digit'X'
char 'W'
char
defC
defC
defC
digit
many digit
defC
defC
defC'a'
char do char 'f'
many (
digit)'Y'
char 'Z'
char return "Valid"
= do char '!'
defCA
digit's'
char 's')
many (char
= do char '@'
defCB 's'
char
digit many digit
That’s about all the ‘simplifying’ we can do, but with a little more work we can make it more succinct, if less simple.
First we can collect together individual characters into strings:
:cs) = do char c
string (c string cs
This reduces defB
and defCB
to:
= do char 'X'
defB
digit"XW"
string
defC
defC
defC
digit
many digit
defC
defC
defC'a'
char do char 'f'
many (
digit)"YZ"
string return "Valid"
= do string "@s"
defCB
digit many digit
We can also special-case the foo; many foo
pattern (+
in
regular-expression syntax). This works by recursing on the left until
the pattern p
fails to match,
backtracking once, then matching p
without recursing:
= (do p
plus p +++ p plus p)
This lets us write defB
,
3A
and 3B
like this:
= do char 'X'
defB
digit"XW"
string
defC
defC
defC
plus digit
defC
defC
defC'a'
char do char 'f'
many (
digit)"YZ"
string return "Valid"
= do char '!'
defCA
digit's')
plus (char = do string "@s"
defCB plus digit
Notice that defC
is always
used three times in a row? We can hard-code that, to get:
= do defC
defC3
defC
defC= do char 'X'
defB
digit"XW"
string
defC3
plus digit
defC3'a'
char do char 'f'
many (
digit)"YZ"
string return "Valid"
We can make this a bit more compact if we use explicit bind functions
>>
rather than do notation in a few places (do a; b
is
sugar for a >> b
),
and sprinkle some standard Haskell functions around to reduce
redundancy. This gives the final, complete program as:
| length x == 3084 = ""
runner x | otherwise = fst (papply defB x)
= do string "X"
defB
digit"XW"
string
defC3
plus digit
defC3"a"
string "f" >> digit)
many (string "YZ"
string return "Valid"
= join . (map char)
string = (p >> plus p) +++ p
plus p = defC >> defC >> defC
defC3 = (string "!" >> digit >> plus (string "s")) +++
defC "@s" >> plus digit) (string
It’s straightforward enough to see what this is doing even without running it. If you do want to run it, say as a shell command, you can add this:
= do stdInput <- getContents
main let inputLines = lines stdInput
= filter (("Valid" ==) . runner) inputLines
successes = unlines successes
formatted putStr formatted
Pipe passwords in to stdin and you’ll get valid ones out of stdout. Note that Haskell is non-strict, meaning your program will handle pipes properly, and not sit there buffering.