Safety through types

Posted on by Chris Warburton

Hoare said that our software should obviously have no bugs, rather than merely having no obvious bugs.

Let's say our program needs to finds the square root of the combined lengths of two lists. This sounds pretty simple, and in a language like Javascript we could write the following:

var sqrtOfLengths = function(x, y) {
    return Math.sqrt(x.length + y.length);
};

There's a lot of syntactic noise in there, but there's no obvious bugs. However, there is a huge potential for this code to contribute to a non-obvious bug, since it's making an awful lot of assumptions which could be completely wrong:

// We assume that x and y are defined, but they might not be
sqrtOfLengths(); // x and y are undefined
sqrtOfLengths('abc'); // y is undefined
sqrtOfLengths({}[0], 'abc'); // x is undefined

// We also assume that x and y have a length
sqrtOfLengths(1, 2); // x and y don't have a length
sqrtOfLengths(1, 'abc'); // x doesn't have a length
sqrtfLengths('abc', 1); // y doesn't have a length

// We assume that the lengths can be added
sqrtOfLengths({length: function(){}}, {length: function(){}});

// We assume that the sum of the lengths can be passed to Math.sqrt
sqrtOfLengths({length: -10}, 'abc'); // Math.sqrt rejects -7

Now, the usual response to such examples will be that they're 'obviously wrong'. But in Javascript there's no such thing; interfaces are things which just-so-happen to be, they cannot be relied upon. Also, all non-syntactic errors only get picked up at runtime, if a branch happens to get called (this is why test suites' code coverage matters). This means no informative messages for the programmer, who may be able to fix it; just a big crash for the user, who has no idea what's just happened.

We can improve this by annotating our programs. Javascript doesn't let us do this, so I'll switch to Haskell. This Haskell code won't be the prettiest, but it will be the most straightforward:

sqrtOfLengths x y = sqrt (fromIntegral (length x + length y))

This looks pretty much identical to our Javascript version, except we have fromIntegral in there and don't have Javascript's boilerplate var, function and return keywords. How is this any different? Well, the chaps who wrote the length function gave it a type:

length :: [a] -> Int

This means "length is a thing which turns lists of some type into Ints". Now, this type isn't particularly great (since Ints can be negative, but lengths can't) but it's better than our assumption-filled Javascript.

By taking the length of x and y, Haskell infers that they must be lists without us ever having to say it explicitly.

Addition has the type:

(+) :: Num a => a -> a -> a

This means that it takes two values of the same type and spits out a value of that type, but only if the type implements the `Num interface. Int implements Num, so Haskell finds no problem with adding two lengths.

The sqrt function is a bit crap, since it has the following type:

sqrt :: Floating a => a -> a

This says that sqrt takes a value which implements the Floating interface and spits out another value of that type. This means that sqrt won't find the square root of an Int for us, since Int isn't Floating. We can perform an explicit conversion by passing our Int to fromIntegral, which turns anything of Integral type (like Int) into some other kind of Num:

fromIntegral :: (Integral a, Num b) => a -> b

In this case, Haskell chooses to give out a Floating type, so that sqrt is happy.

After all of this checking, Haskell determines that our function has the type:

sqrtOfLengths :: Floating c => [a] -> [b] -> c

This means it takes two lists (of potentially different things) and produces a Floating result.

This is much better than the Javascript version, since many assumptions we are making are enforced by the language:

There are still some weaknesses here though:

The last point could be solved in a couple of ways; we could extend sqrt to give back Complex numbers, but in this case that's unnecessary. We know that the length of a list will be positive, even though its type doesn't say so. What we want is something like the following:

data Natural = Zero | OnePlus Natural
class Positive
instance Positive Natural

length :: [a] -> Natural
sqrt :: (Positive a, Positive b) => a -> b -- for simplicity we only care about the positive root

The new types are quite naive and inefficient here, but it will do for demonstration.

Now what about the infinite list problem? We could force Haskell to evaluate our lists before sending them to length (making length's argument 'strict'). That would fix length, but the problem would just be shifted up the hierarchy to sqrtOfLengths, and if we make that strict it would just shift the problem to whoever calls us. We're heading in the right direction, since it puts the burden of finiteness on the programmer who's creating the list, but this may not be the person using our sqrtOfLengths function; we need something which will permeate the whole call stack, all the way back to wherever the problem originates. For this, of course, we need a new type.

By putting a finiteness condition on our argument, it means that anyone calling us can either take care of the finiteness condition themselves, or they can pass the buck and put finiteness in their own type. Here we use a 'type class' to identify finite lists; this is a contract which other types can fulfil (a bit like an interface, in languages like Java). A type FiniteList l => l x means a finite list (l) which contains elements of type x.

-- For a type "l" to be a FiniteList it must have a "toList"
-- function, which turns an "l" full of "t"s into a list full
-- of "t"s
class FiniteList l where
  toList :: l t -> [t]

-- We can get the length of a FiniteList by turning it into a
-- regular list, then finding the length of that
finiteLength = length . toList -- "a . b" is a(b(...))

sqrtOfLengths xs ys = sqrt (finiteLength xs + finiteLength ys)

Our code will now refuse to compile if we don't prove that everything we give to sqrtOfLengths is finite.

How do we prove this? We use a common Haskell trick; we take a concept from the world of data and lift it to the world of types. The following types implement type-level singly-linked/tail-recursive lists:

-- Empty lists are known as 'nil'
-- The left hand side has a type variable "t", which is the type of
-- elements we can add to this list (Integer, String, etc.)
-- The right hand side is the constant we will represent empty lists
-- with ("t" will be inferred automatically)
data Nil t = Nil

-- Prepending an element on to a list is known as a 'cons'
-- The left hand side has a type variable "t" for the elements, just
-- like Nil does, but it also has another, "l", which is the list
-- we're prepending to; remember, our lists are themselves types!
-- The right hand side tells us how to make a "Cons l t", we need to
-- use the constructor function "Cons", give it something of type "t"
-- (ie. an element to prepend) and something of type "l t", which is
-- the list we're prepending to. Note that by passing "t" through to
-- "l" like this, we force all elements in the list to have the same
-- (polymorphic) type.
data Cons l t = Cons t (l t)

-- Empty lists are finite
instance FiniteList Nil where
    -- Any Nil value is equivalent to an empty Haskell list
    toList Nil = []

-- An element prepended on to a finite list is finite
instance (FiniteList l) => FiniteList (Cons l) where
    -- A Cons value is equivalent to Haskell's ":"
    toList (Cons x xs) = x : toList xs

The trick here is that the FiniteList condition for Cons gets passed along to the list we're prepending to. The only thing that's finite unconditionally, and can therefore end the chain, is Nil. Knowing this at compile time is a sufficient and neccessary condition for using length safely. Of course, we could get the compiler stuck in an infinite loop while it's trying to prove the finiteness of an infinite list, but compile-time errors like this are much better than run-time errors.

While I was writing this I wanted to see if there were any generally-known solutions to the finite list problem, and came across the question How to tell if a list is infinite? on StackOverflow. All of the answers showed this to be an instance of the Halting Problem, which is technically correct but misses the point. We don't need to know if a list is infinite; we need to know if it's finite, and ignore anything which fails (which includes all infinite lists and some finite ones). I think a rant about Constructivism is in order! My response is here :)

UPDATE:

I fixed some typos and added the quick-n-dirty Natural implementation. While adding that, it of course occured to me that we can define provably-finite naturals:

class FiniteNat n where
  toInteger :: n -> Integer

data Zero = Zero
data OnePlus n = OnePlus n

instance FiniteNat Zero where
  toInteger Zero = 0

instance (FiniteNat n) => FiniteNat (OnePlus n) where
  toInteger (OnePlus n) = 1 + toInteger n

This is exactly the same as the lists, except we don't have to care about containing elements. Of course, that naive implementation of toInteger for OnePlus can easily cause a stack overflow, so we should do something more like:

  toInteger (OnePlus n) = foldr (+) 1 [0..n]

I'm currently throwing a small library together for provably-finite and provably-infinite types.

PS: The similarities between Naturals and Lists are extremely apparent when we play around with Ornaments :)