GADT Casting
The Setup
I recently hit a problem in Haskell, where I had a GADT of the following form:
data Command m a where
Foo :: Command m Bool
Bar :: String -> Command m Bool
Baz :: Int -> String -> Command m Bool
(In fact I’m using newtype
to avoid “stringly typed
programming”, but that’s not relevant to this post)
This GADT takes two type parameters:
m
is unconstrained. We can unify it with anything, and all of the constructors will work.a
is alwaysBool
. We can state the typeCommand m a
for any choice ofa
we like; but there are no values of that type unlessa
happens to beBool
.
I ended up with this while implementing an effect using the excellent
polysemy
library. In polysemy
we can define
“effects” at the type level as entities with two parameters (like
Command
). Their constructors define the “actions” that
result in such effects (like Foo
, Bar x
and
Baz x y
). The m
parameter appears to be
boilerplate (I assume it’s some underlying Monad
?) whilst
the a
parameter is the type of an action’s return value (we
can use ()
for an uninformative “null” value, or
Void
to indicate that it never returns anything (e.g. it
crashes or loops forever); see my
post on those types). In this example Foo
,
Bar x
and Baz x y
all use Command
effects to produce a Bool
.
Elsewhere we need to define at least one interpreter for
this effect, which will turn actions using Command
into
some other type of action: usually something more general (but more
dangerous) like IO
; or some pure, in-memory calculation. We
often have two interpreters: one “real” interpreter for writing
programs, and one in-memory interpreter for testing.
In my case I want to implement these Command
actions
using IO
, by invoking certain shell commands. I don’t care
about the stdio of those commands, but I do care whether or not
they exit successfully: that’s why they all yield a
Bool
.
My interpreter looks something like this:
commandToIO1 :: Member (Embed IO) r => Sem (Command ': r) a -> Sem r a
= interpret (embed . command)
commandToIO1 where command :: Command m a -> IO a
= case c of
command c Foo -> run ["foo" ]
Bar x -> run ["bar", x ]
Baz x y -> run ["baz", show x, y]
run :: [String] -> IO Bool
= do
run args let cmd = proc "myCommand" args
<- readCreateProcessWithExitCode cmd ""
(code, _, _) pure $ case code of
ExitFailure _ -> False
ExitSuccess -> True
The type of commandToIO1
tells us that it turns actions
with access to the Command
and IO
effects
(among others) into actions which no longer involve Command
(the Embed
and embed
boilerplate is due to
IO
not taking an m
parameter). We achieve this
by using the command
function to replace any
Command m a
action with an equivalent IO a
action; in this case invoking a shell command via run
.
The Problem
The above interpreter works fine, but I wanted to write some tests, to check that we’re generating plausible arguments for each action. That seems easy enough, we just need to pull the argument selection out of the interpreter:
commandArgs :: Command m a -> [String]
= case c of
commandArgs c Foo -> ["foo" ]
Bar x -> ["bar", x ]
Baz x y -> ["baz", show x, y]
commandToIO2 :: Member (Embed IO) r => Sem (Command ': r) a -> Sem r a
= interpret (embed . command)
commandToIO2 where command :: Command m a -> IO a
= run (commandArgs c)
command c
run :: [String] -> IO Bool
= do
run args let cmd = proc "myCommand" args
<- readCreateProcessWithExitCode cmd ""
(code, _, _) pure $ case code of
ExitFailure _ -> False
ExitSuccess -> True
Yet commandToIO2
fails to type check! Why?
The problem is that the return value of command
should
have type IO a
, where a
comes from the type of
its argument Command m a
; but we’re giving back the return
value of run
, which is an IO Bool
. GHC
correctly infers that a
needs to equal Bool
,
but it can’t prove that it does, so it gives up.
So why did the previous version work, when that also gave
back the return value of run
? The difference is
context: each time commandToIO1
returns
run [...]
, it’s in a specific branch of a case
statement: one for each constructor. Since the constructors are all
specialised to Bool
, matching one of those patterns gives
us some extra information: that a
is equal to
Bool
, and hence it’s OK to return an
IO Bool
.
Failed Solutions
Failure 1
How can we avoid this problem, whilst still selecting the arguments
with a separate, easily testable function? Maybe we should move the call
to run
back into the case
branches? We can
keep run
encapsulated inside commandToIO3
by
passing it in as a parameter:
commandArgs2 :: ([String] -> t a) -> Command m a -> t a
= case c of
commandArgs2 f c Foo -> f ["foo" ]
Bar x -> f ["bar", x ]
Baz x y -> f ["baz", show x, y]
commandToIO3 :: Member (Embed IO) r => Sem (Command ': r) a -> Sem r a
= interpret (embed . command)
commandToIO3 where command :: Command m a -> IO a
= commandArgs2 run c
command c
run :: [String] -> IO Bool
= do
run args let cmd = proc "myCommand" args
<- readCreateProcessWithExitCode cmd ""
(code, _, _) pure $ case code of
ExitFailure _ -> False
ExitSuccess -> True
Unfortunately this is not enough: commandToArgs2
is
happy to unify a
with Bool
, due to its
run
argument, since (once again) we’re only calling that
from the case
branches where the context contains proof
that it’s valid.
However, command
is certainly not happy that
(yet again) it’s being given an IO Bool
when it wants an
IO a
: the evidence that a
equals
Bool
was discovered during the type checking of
commandArgs2 run c
but thrown away once the type checker
left those case
branches.
Failure 2
Incidentally, you might wonder why we don’t just change the type of
command
to return an IO Bool
: if we try that,
we get the same problem about not being able to unify a
with Bool
; except this time from polysemy
’s
interpret
rather than our command
function.
Failure 3
Perhaps we could generalise the return type of run
, so
it gives an IO a
rather than an IO Bool
?
That’s a non-starter, since we’re explicitly returning True
or False
on the last two lines (remember, returning a value
of type a
doesn’t need to work for some type
a
; it must work for all types a
(subject to any constraints, but we don’t have any here)).
Failure 4
One way to avoid type errors is to re-introduce the pattern-matching
that gave command
the context it needed to spot that
a
is always Bool
. It’s easy enough to do
this:
commandArgs :: Command m a -> [String]
= case c of
commandArgs c Foo -> ["foo" ]
Bar x -> ["bar", x ]
Baz x y -> ["baz", show x, y]
commandToIO4 :: Member (Embed IO) r => Sem (Command ': r) a -> Sem r a
= interpret (embed . command)
commandToIO4 where command :: Command m a -> IO a
= let result = run (commandArgs c)
command c in case c of
Foo -> result
Bar _ -> result
Baz _ _ -> result
run :: [String] -> IO Bool
= do
run args let cmd = proc "myCommand" args
<- readCreateProcessWithExitCode cmd ""
(code, _, _) pure $ case code of
ExitFailure _ -> False
ExitSuccess -> True
This works, but is pretty horrible: we’ve introduced a whole load of
boilerplate into our interpreter, for no reason other than appeasing the
type checker. This boilerplate also has to be kept up to date when
changes are made to the Command
effect (e.g. changing the
constructors or their arguments). Another reason I count such
boilerplate as a failure is that it isn’t constant: every time we want
to solve this problem, we have to do it all again (it’s also worth
noting that my actual GADT currently has nine constructors, rather than
three!).
A Solution: Casting
The solution I came up with still involves pattern-matching
boilerplate, but we abstract it out into a reusable function (as any
good programmer should!). We can think of our function as providing the
evidence that the a
parameter is equal to
Bool
, and it takes the form of a cast:
castCommand :: Command m a -> Bool -> a
= case c of
castCommand c b Foo -> b
Bar _ -> b
Baz _ _ -> b
This type-checks, for the same reason we could use run
in each case
branch of command
: the
pattern-matching gives us the context to show that a
is
Bool
in every case. We can call this function whenever we
need a value of type a
but only have a
Bool
:
commandToIO :: Member (Embed IO) r => Sem (Command ': r) a -> Sem r a
= interpret (embed . command)
commandToIO where command :: Command m a -> IO a
= castCommand c <$> run (commandArgs c)
command c
run :: [String] -> IO Bool
= do
run args let cmd = proc "myCommand" args
<- readCreateProcessWithExitCode cmd ""
(code, _, _) pure $ case code of
ExitFailure _ -> False
ExitSuccess -> True