loop
Many of the programs we have seen so far, for example those in here, suffer from indexitis. This is a
term coined by Richard
Bird which describes a tendency to perform low-level manipulations
to iterate over the indices into a collection, opening the door to
various off-by-one errors. Such errors can be eliminated by instead
programming at a higher level, using a wholemeal
approach where the emphasis is on using aggregate operations, like
map
, fold
and reduce
.
Wholemeal programming is no panacea as it still
requires us to take care when operating on different
collections; if these collections are incompatible, e.g. have
the wrong dimensions, then we end up with a fate worse than a crash, a
possibly meaningless result. Fortunately, LiquidHaskell can help. Lets
see how we can use measures to specify dimensions and create a
dimension-aware API for lists which can be used to implement wholemeal
dimension-safe APIs.1
Indexitis begone! As an example of wholemeal programming, let’s write a small library that represents vectors as lists and matrices as nested vectors:
The Dot Product of two Vector
s can be
easily computed using a fold:
Matrix Multiplication can similarly be expressed in a
high-level, wholemeal fashion, by eschewing low level index
manipulations in favor of a high-level iterator over the
Matrix
elements:
The Iteration embodied by the for
combinator, is simply a map
over the elements of the
vector.
for :: Vector a -> (a -> b) -> Vector b
for (V n xs) f = V n (map f xs)
Wholemeal programming frees us from having to fret
about low-level index range manipulation, but is hardly a panacea.
Instead, we must now think carefully about the compatibility of
the various aggregates. For example,
dotProd
is only sensible on vectors of the same
dimension; if one vector is shorter than another (i.e. has fewer
elements) then we will won’t get a run-time crash but instead will get
some gibberish result that will be dreadfully hard to debug.
matProd
is only well defined on matrices of
compatible dimensions; the number of columns of mx
must
equal the number of rows of my
. Otherwise, again, rather
than an error, we will get the wrong output.2
In order to start reasoning about dimensions, we need a way to represent the dimension of a list inside the refinement logic. 3
Measures are ideal for this task. Previously we saw how we could lift Haskell
functions up to the refinement logic. Lets write a measure to describe
the length of a list: 4
As with refined data definitions, the
measures are translated into strengthened types for the type’s
constructors. For example, the size
measure is translated
into:
data [a] where
[] :: {v: [a] | size v = 0}
(:) :: a -> xs:[a] -> {v:[a]|size v = 1 + size xs}
Multiple Measures may be defined for the same data
type. For example, in addition to the size
measure, we can
define a notEmpty
measure for the list type:
simply by conjoining the refinements in the strengthened constructors. For example, the two measures for lists end up yielding the constructors:
data [a] where
[] :: {v: [a] | not (notEmpty v) && size v = 0}
(:) :: a
-> xs:[a]
-> {v:[a]| notEmpty v && size v = 1 + size xs}
This is a very significant advantage of using measures instead of indices as in DML or Agda, as decouples property from structure, which crucially enables the use of the same structure for many different purposes. That is, we need not know a priori what indices to bake into the structure, but can define a generic structure and refine it a posteriori as needed with new measures.
We are almost ready to begin creating a dimension aware API for lists; one last thing that is useful is a couple of aliases for describing lists of a given dimension.
To make signatures symmetric let’s define an alias for
plain old (unrefined) lists:
A ListN is a list with exactly N
elements,
and a ListX
is a list whose size is the same as another
list X
. Note that when defining refinement type aliases, we
use uppercase variables like N
and X
to
distinguish value parameters from the lowercase type
parameters like a
.
With the types and aliases firmly in our pockets, let us write
dimension-aware variants of the usual list functions. The
implementations are the same as in the standard library i.e. Data.List
,
but the specifications are enriched with dimension information.
Exercise: (Map):
map yields a list with the same size as the input. Fix
the specification of map
so that the prop_map
is verified.
Exercise: (Reverse): We can reverse
the
elements of a list as shown below, using the tail recursive function
go
. Fix the signature for go
so that
LiquidHaskell can prove the specification for reverse
.
Hint: How big is the list returned by
go
?
zipWith requires both lists to have the same
size, and produces a list with that same size. 5
unsafeZip The signature for zipWith
is
quite severe – it rules out the case where the zipping occurs only up to
the shorter input. Here’s a function that actually allows for that case,
where the output type is the shorter of the two inputs:
The output type uses the predicate Tinier Xs Ys Zs
which
defines the length of Xs
to be the smaller of that of
Ys
and Zs
.6
Exercise: (Zip Unless Empty): In my experience,
zip
as shown above is far too permissive and lets all sorts
of bugs into my code. As middle ground, consider zipOrNull
below. Write a specification for zipOrNull
such that the
code below is verified by LiquidHaskell.
Hint: Yes, the type is rather gross; it uses a bunch
of disjunctions ||
, conjunctions &&
and implications =>
.
Next, let’s look at some functions that truncate lists, in one way or another.
Take lets us grab the first k
elements
from a list:
The alias ListGE a n
denotes lists whose length is at
least n
:
Exercise: (Drop): Drop
is the yang to
take
’s yin: it returns the remainder after extracting the
first k
elements. Write a suitable specification for it so
that the below typechecks.
Exercise: (Take it easy): The version
take'
above is too restrictive; it insists that the list
actually have at least n
elements. Modify the signature for
the real take
function so that the code below is
accepted by LiquidHaskell.
The Partition function breaks a list into two sub-lists
of elements that either satisfy or fail a user supplied predicate.
We would like to specify that the sum of the output tuple’s dimensions equal the input list’s dimension. Lets write measures to access the elements of the output:
{-@ measure fst @-}
fst (x, _) = x
{-@ measure snd @-}
snd (_, y) = y
We can now refine the type of partition
as:
where Sum2 V N
holds for a pair of lists dimensions add
to N
:
Exercise: (QuickSort): Use partition
to
implement quickSort
.
We can use the dimension aware lists to create a safe vector API.
Legal Vectors are those whose vDim
field
actually equals the size of the underlying list:
When vDim
is used a selector function, it returns the
vDim
field of x
.
The refined data type prevents the creation of illegal vectors:
As usual, it will be handy to have a few aliases.
To Create a Vector
safely, we can start
with the empty vector vEmp
and then add elements one-by-one
with vCons
:
To Access vectors at a low-level, we can use
equivalents of head and tail, which only work on
non-empty Vector
s:
To Iterate over a vector we can use the
for
combinator:
Binary Pointwise Operations should only be applied to
compatible vectors, i.e. vectors with equal dimensions. We can
write a generic binary pointwise operator:
The Dot Product of two Vector
s can be now
implemented in a wholemeal and dimension safe manner, as:
Exercise: (Vector Constructor): Complete the
specification and implementation of
vecFromList
which creates a Vector
from a plain list.
Exercise: (Flatten): Write a function to
flatten
a nested Vector
.
The Cross Product of two vectors can now be computed in
a nice wholemeal style, by a nested iteration followed by a
flatten
.
The same methods let us create a dimension safe Matrix API which ensures that only legal matrices are created and that operations are performed on compatible matrices.
Legal Matrices are those where the dimension of the
outer vector equals the number of rows mRow
and the
dimension of each inner vector is mCol
. We can specify
legality in a refined data definition:
Notice that we avoid disallow degenerate matrices by requiring the dimensions to be positive.
It is convenient to have an alias for matrices of a given size:
For example, we can use the above to write type:
Exercise: (Legal Matrix): Modify the definitions of
bad1
and bad2
so that they are legal matrices
accepted by LiquidHaskell.
Exercise: (Matrix Constructor): Write a function to
construct a Matrix
from a nested list.
Exercise: (Refined Matrix Constructor): Refine the
specification for matFromList
so that the following is
accepted by LiquidHaskell.
Hint: It is easy to specify the number of rows from
xss
. How will you figure out the number of columns? A
measure may be useful.
Matrix Multiplication Finally, let’s implement matrix
multiplication. You’d think we did it already, but in fact the
implementation at the top of this chapter is all wrong (run it and see!)
We cannot just multiply any two matrices: the number of columns
of the first must equal to the rows of the second – after which
point the result comprises the dotProduct
of the rows of
the first matrix with the columns of the second.
To iterate over the columns of the matrix my
we
just transpose
it so the columns become rows.
Exercise: (Matrix Transpose): Use the
Vector
API to complete the implementation of
txgo
. For inspiration, you might look at the implementation
of Data.List.transpose
from the prelude.
Better still, don’t.
Hint: As shown by ok23
and
ok32
, transpose
works by stripping out the
head
s of the input rows, to create the corresponding output
rows.
In this chapter, we saw how to use measures to describe numeric
properties of structures like lists (Vector
) and nested
lists (Matrix
).
Measures are structurally recursive functions, with a single equation per data constructor,
Measures can be used to create refined data definitions that prevent the creation of illegal values,
Measures can then be used to enable safe wholemeal programming, via dimension-aware APIs that ensure that operators only apply to compatible values.
We can use numeric measures to encode various other properties of data structures. We will see examples ranging from high-level AVL trees, to low-level safe pointer arithmetic.
In a later chapter we will use this API to implement K-means clustering.↩︎
In fact, while the implementation of
matProd
breezes past GHC it is quite wrong!↩︎
We could just use vDim
, but that is a cheat
as there is no guarantee that the field’s value actually equals the size
of the list!↩︎
Recall that these must be inductively defined functions, with a single equation per data-constructor↩︎
As made explicit by the call to die
, the
input type rules out the case where one list is empty and the
other is not, as in that case the former’s length is zero while the
latter’s is not, and hence, different.↩︎
In logic, if p then q else r
is the same as
p => q && not p => r
.↩︎