loop
Many of the programs we have seen so far, for example those in here, suffer from indexitis. This is a term coined by Richard Bird which describes a tendency to perform low-level manipulations to iterate over the indices into a collection, opening the door to various off-by-one errors. Such errors can be eliminated by instead programming at a higher level, using a wholemeal approach where the emphasis is on using aggregate operations, like map
, fold
and reduce
.
Wholemeal programming is no panacea as it still requires us to take care when operating on different collections; if these collections are incompatible, e.g. have the wrong dimensions, then we end up with a fate worse than a crash, a possibly meaningless result. Fortunately, LiquidHaskell can help. Lets see how we can use measures to specify dimensions and create a dimension-aware API for lists which can be used to implement wholemeal dimension-safe APIs.1
Indexitis begone! As an example of wholemeal programming, lets write a small library that represents vectors as lists and matrices as nested vectors:
The Dot Product of two Vector
s can be easily computed using a fold:
Matrix Multiplication can similarly be expressed in a high-level, wholemeal fashion, by eschewing low level index manipulations in favor of a high-level iterator over the Matrix
elements:
The Iteration embodied by the for
combinator, is simply a map
over the elements of the vector.
for :: Vector a -> (a -> b) -> Vector b
for (V n xs) f = V n (map f xs)
Wholemeal programming frees us from having to fret about low-level index range manipulation, but is hardly a panacea. Instead, we must now think carefully about the compatibility of the various aggregates. For example,
dotProd
is only sensible on vectors of the same dimension; if one vector is shorter than another (i.e. has fewer elements) then we will won’t get a run-time crash but instead will get some gibberish result that will be dreadfully hard to debug.
matProd
is only well defined on matrices of compatible dimensions; the number of columns of mx
must equal the number of rows of my
. Otherwise, again, rather than an error, we will get the wrong output.2
In order to start reasoning about dimensions, we need a way to represent the dimension of a list inside the refinement logic. 3
Measures are ideal for this task. Previously we saw how we could lift Haskell functions up to the refinement logic. Lets write a measure to describe the length of a list: 4
As with refined data definitions, the measures are translated into strengthened types for the type’s constructors. For example, the size
measure is translated into:
data [a] where
[] :: {v: [a] | size v = 0}
(:) :: a -> xs:[a] -> {v:[a]|size v = 1 + size xs}
Multiple Measures may be defined for the same data type. For example, in addition to the size
measure, we can define a notEmpty
measure for the list type:
simply by conjoining the refinements in the strengthened constructors. For example, the two measures for lists end up yielding the constructors:
data [a] where
[] :: {v: [a] | not (notEmpty v) && size v = 0}
(:) :: a
-> xs:[a]
-> {v:[a]| notEmpty v && size v = 1 + size xs}
This is a very significant advantage of using measures instead of indices as in DML or Agda, as decouples property from structure, which crucially enables the use of the same structure for many different purposes. That is, we need not know a priori what indices to bake into the structure, but can define a generic structure and refine it a posteriori as needed with new measures.
We are almost ready to begin creating a dimension aware API for lists; one last thing that is useful is a couple of aliases for describing lists of a given dimension.
To make signatures symmetric lets define an alias for plain old (unrefined) lists:
A ListN is a list with exactly N
elements, and a ListX
is a list whose size is the same as another list X
. Note that when defining refinement type aliases, we use uppercase variables like N
and X
to distinguish value parameters from the lowercase type parameters like a
.
With the types and aliases firmly in our pockets, let us write dimension-aware variants of the usual list functions. The implementations are the same as in the standard library i.e. Data.List
, but the specifications are enriched with dimension information.
Exercise: (Map):
map yields a list with the same size as the input. Fix the specification of map
so that the prop_map
is verified.
Exercise: (Reverse): We can reverse
the elements of a list as shown below, using the tail recursive function go
. Fix the signature for go
so that LiquidHaskell can prove the specification for reverse
.
Hint: How big is the list returned by go
?
zipWith requires both lists to have the same size, and produces a list with that same size. 5
unsafeZip The signature for zipWith
is quite severe – it rules out the case where the zipping occurs only up to the shorter input. Here’s a function that actually allows for that case, where the output type is the shorter of the two inputs:
The output type uses the predicate Tinier Xs Ys Zs
which defines the length of Xs
to be the smaller of that of Ys
and Zs
.6
Exercise: (Zip Unless Empty): In my experience, zip
as shown above is far too permissive and lets all sorts of bugs into my code. As middle ground, consider zipOrNull
below. Write a specification for zipOrNull
such that the code below is verified by LiquidHaskell.
Hint: Yes, the type is rather gross; it uses a bunch of disjunctions ||
, conjunctions &&
and implications =>
.
Next, lets look at some functions that truncate lists, in one way or another.
Take lets us grab the first k
elements from a list:
The alias ListGE a n
denotes lists whose length is at least n
:
Exercise: (Drop): Drop
is the yang to take
’s yin: it returns the remainder after extracting the first k
elements. Write a suitable specification for it so that the below typechecks.
Exercise: (Take it easy): The version take'
above is too restrictive; it insists that the list actually have at least n
elements. Modify the signature for the real take
function so that the code below is accepted by LiquidHaskell.
The Partition function breaks a list into two sub-lists of elements that either satisfy or fail a user supplied predicate.
We would like to specify that the sum of the output tuple’s dimensions equal the input list’s dimension. Lets write measures to access the elements of the output:
{-@ measure fst @-}
fst (x, _) = x
{-@ measure snd @-}
snd (_, y) = y
We can now refine the type of partition
as:
where Sum2 V N
holds for a pair of lists dimensions add to N
:
Exercise: (QuickSort): Use partition
to implement quickSort
.
We can use the dimension aware lists to create a safe vector API.
Legal Vectors are those whose vDim
field actually equals the size of the underlying list:
When vDim
is used a selector function, it returns the vDim
field of x
.
The refined data type prevents the creation of illegal vectors:
As usual, it will be handy to have a few aliases.
To Create a Vector
safely, we can start with the empty vector vEmp
and then add elements one-by-one with vCons
:
To Access vectors at a low-level, we can use equivalents of head and tail, which only work on non-empty Vector
s:
To Iterate over a vector we can use the for
combinator:
Binary Pointwise Operations should only be applied to compatible vectors, i.e. vectors with equal dimensions. We can write a generic binary pointwise operator:
The Dot Product of two Vector
s can be now implemented in a wholemeal and dimension safe manner, as:
Exercise: (Vector Constructor): Complete the specification and implementation of vecFromList
which creates a Vector
from a plain list.
Exercise: (Flatten): Write a function to flatten
a nested Vector
.
The Cross Product of two vectors can now be computed in a nice wholemeal style, by a nested iteration followed by a flatten
.
The same methods let us create a dimension safe Matrix API which ensures that only legal matrices are created and that operations are performed on compatible matrices.
Legal Matrices are those where the dimension of the outer vector equals the number of rows mRow
and the dimension of each inner vector is mCol
. We can specify legality in a refined data definition:
Notice that we avoid disallow degenerate matrices by requiring the dimensions to be positive.
It is convenient to have an alias for matrices of a given size:
For example, we can use the above to write type:
Exercise: (Legal Matrix): Modify the definitions of bad1
and bad2
so that they are legal matrices accepted by LiquidHaskell.
Exercise: (Matrix Constructor): Write a function to construct a Matrix
from a nested list.
Exercise: (Refined Matrix Constructor): Refine the specification for matFromList
so that the following is accepted by LiquidHaskell.
Hint: It is easy to specify the number of rows from xss
. How will you figure out the number of columns? A measure may be useful.
Matrix Multiplication Finally, lets implement matrix multiplication. You’d think we did it already, but in fact the implementation at the top of this chapter is all wrong (run it and see!) We cannot just multiply any two matrices: the number of columns of the first must equal to the rows of the second – after which point the result comprises the dotProduct
of the rows of the first matrix with the columns of the second.
To iterate over the columns of the matrix my
we just transpose
it so the columns become rows.
Exercise: (Matrix Transpose): Use the Vector
API to complete the implementation of txgo
. For inspiration, you might look at the implementation of Data.List.transpose
from the prelude. Better still, don’t.
Hint: As shown by ok23
and ok32
, transpose
works by stripping out the head
s of the input rows, to create the corresponding output rows.
In this chapter, we saw how to use measures to describe numeric properties of structures like lists (Vector
) and nested lists (Matrix
).
Measures are structurally recursive functions, with a single equation per data constructor,
Measures can be used to create refined data definitions that prevent the creation of illegal values,
Measures can then be used to enable safe wholemeal programming, via dimension-aware APIs that ensure that operators only apply to compatible values.
We can use numeric measures to encode various other properties of data structures. We will see examples ranging from high-level AVL trees, to low-level safe pointer arithmetic.
In a later chapter we will use this API to implement K-means clustering.↩︎
In fact, while the implementation of matProd
breezes past GHC it is quite wrong!↩︎
We could just use vDim
, but that is a cheat as there is no guarantee that the field’s value actually equals the size of the list!↩︎
Recall that these must be inductively defined functions, with a single equation per data-constructor↩︎
As made explicit by the call to die
, the input type rules out the case where one list is empty and the other is not, as in that case the former’s length is zero while the latter’s is not, and hence, different.↩︎
In logic, if p then q else r
is the same as p => q && not p => r
.↩︎