loop
Recall the following from the introduction:
ghci> :m +Data.Map
ghci> let m = fromList [ ("haskell" , "lazy")
, ("javascript", "eager")]
ghci> m ! "haskell"
"lazy"
ghci> m ! "python"
"*** Exception: key is not in the map
The problem illustrated above is quite a pervasive one; associative
maps pop up everywhere. Failed lookups are the equivalent of
NullPointerDereference
exceptions in languages like
Haskell. It is rather difficult to use Haskell’s type system to
precisely characterize the behavior of associative map APIs as
ultimately, this requires tracking the dynamic set of keys in
the map.
In this case study, we’ll see how to combine two techniques, measures and refined
data types, to analyze programs that implement and
use associative maps (e.g. Data.Map
or
Data.HashMap
).
Lets start by defining a refined API for Associative Maps that tracks the set of keys stored in the map, in order to statically ensure the safety of lookups.
Types First, we need a type for Map
s. As
usual, let’s parameterize the type with k
for the type of
keys and v
for the type of values:
data Map k v -- Currently left abstract
Keys To talk about the set of keys in a map, we will
use a measure
measure keys :: Map k v -> Set k
that associates each Map
to the Set
of its
defined keys. Next, we use the above measure, and the usual
Set
operators to refine the types of the functions that
create, add and lookup key-value bindings, in
order to precisely track, within the type system, the keys
that are dynamically defined within each Map
. 1
Empty Map
s have no keys in them. Hence, we
type the empty Map
as:
emp :: {m:Map k v | Empty (keys m)}
Add The function set
takes a key \(k\) a value \(v\) and a map m
and returns
the new map obtained by extending m
with the binding \({k \mapsto v}\). Thus, the set of
keys
of the output Map
includes those of the
input plus the singleton \(k\), that
is:
set :: k:k -> v -> m:Map k v -> {n: Map k v| AddKey k m n}
predicate AddKey K M N = keys N = Set_cup (Set_sng K) (keys M)
Query Finally, queries will only succeed for keys that
are defined a given Map
. Thus, we define an alias:
predicate HasKey K M = In K (keys M)
and use it to type mem
which checks if a key is
defined in the Map
and get
which actually
returns the value associated with a given key.
-- | Check if key is defined
mem :: k:k -> m:Map k v -> {v:Bool | v <=> HasKey k m}
-- | Lookup key's value
get :: k:k -> {m:Map k v | HasKey k m} -> v
Rather than jumping into the implementation of the above
Map
API, let’s write a client that uses
Map
s to implement an interpreter for a tiny language. In
particular, we will use maps as an environment containing the
values of bound variables, and we will use the refined API to
ensure that lookups never fail, and hence, that well-scoped
programs always reduce to a value.
Expressions Lets work with a simple language with
integer constants, variables, binding and arithmetic operators:2
Values We can use refinements to formally describe
values as a subset of Expr
allowing us to reuse a
bunch of code. To this end, we simply define a (measure
)
predicate characterizing values:
and then we can use the lifted measure
to define an
alias for Val
denoting values:
we can use the above to write simple operators on
Val
, for example:
Environments let us save values for the local”
i.e. let-bound variables; when evaluating an expression
Var x
we simply look up the value of x
in the
environment. This is why Map
s were invented! Lets define
our environments as Map
s from Var
iables to
Val
ues:
The above definition essentially specifies, inside the types, an
eager evaluation strategy: LiquidHaskell will prevent us from
sticking unevaluated Expr
s inside the environments.
Evaluation proceeds via a straightforward recursion
over the structure of the expression. When we hit a Var
we
simply query its value from the environment. When we hit a
Let
we compute the bound expression and tuck its value into
the environment before proceeding within.
The above eval
seems rather unsafe; whats the guarantee
that get x g
will succeed? For example, surely trying:
ghci> eval emp (Var "x")
will lead to some unpleasant crash. Shouldn’t we check if
the variables is present and if not, fail with some sort of
Variable Not Bound
error? We could, but we can do better:
we can prove at compile time, that such errors will not occur.
Free Variables are those whose values are not
bound within an expression, that is, the set of variables that
appear in the expression, but are not bound by a
dominating Let
. We can formalize this notion as a (lifted)
function:
An Expression is Closed with respect to an environment
G
if all the free variables in the expression
appear in G
, i.e. the environment contains bindings for all
the variables in the expression that are not bound within the
expression. As we’ve seen repeatedly, often a whole pile of informal
hand-waving, can be succinctly captured by a type definition that says
the free
variables in the Expr
must be
contained in the keys
of the environment
G
:
Closed Evaluation never goes wrong, i.e. we can ensure
that eval
will not crash with unbound variables, as long as
it is invoked with suitable environments:
We can be sure an Expr
is well-scoped if it has
no free variables.Lets use that to write a “top-level”
evaluator:
Exercise: (Wellformedness Check): Complete the
definition of the below function which checks if an
Expr
is well formed before eval
uating it:
Proof is all well and good, in the end, you need a few sanity tests to kick the tires. So:
Exercise: (Closures): Extend the language above to
include functions. That is, extend Expr
as below, (and
eval
and free
respectively.)
data Expr = ... | Fun Var Expr | App Expr Expr
Just focus on ensuring the safety of variable lookups; ensuring full type-safety (i.e. every application is to a function) is rather more complicated and beyond the scope of what we’ve seen so far.
We just saw how easy it is to use the Associative Map API to ensure the safety of lookups, even though the
Map
has a “dynamically” generated set of keys. Next, let’s
see how we can implement a Map
library that
respects the API using Binary Search
Trees
Data Type First, let’s provide an implementation of the
hitherto abstract data type for Map
. We shall use Binary
Search Trees, wherein, at each Node
, the left
(resp. right
) subtree has keys that are less than (resp.
greater than) the root key
.
Recall that the above refined data definition yields strengthened data constructors that statically ensure that only legal, binary-search ordered trees are created in the program.
Defined Keys Next, we must provide an implementation of
the notion of the keys
that are defined for a given
Map
. This is achieved via the lifted measure function:
Armed with the basic type and measure definition, we can start to
fill in the operations for Maps
.
Exercise: (Empty Maps): To make sure you are following,
fill in the definition for an emp
ty Map:
Exercise: (Insert): To add a key k'
to a
Map
we recursively traverse the Map
zigging
left
or right
depending on the result of
comparisons with the keys along the path. Unfortunately, the version
below has an (all too common!) bug, and hence, is rejected by
LiquidHaskell. Find and fix the bug so that the function is
verified.
Lookup Next, let’s write the mem
function
that returns the value associated with a key k'
. To do so
we just compare k'
with the root key, if they are equal, we
return the binding, and otherwise we go down the left
(resp. right
) subtree if sought for key is less (resp.
greater) than the root key
. Crucially, we want to check
that lookup never fails, and hence, we implement the
Tip
(i.e. empty) case with die
gets
LiquidHaskell to prove that that case is indeed dead code, i.e. never
happens at run-time.
Unfortunately the function above is rejected
by LiquidHaskell. This is a puzzler (and a bummer!) because in fact it
is correct. So what gives? Well, let’s look at the error for
the call get' k' l
src/07-case-study-associative-maps.lhs:411:25: Error: Liquid Type Mismatch
Inferred type
VV : Map a b | VV == l
not a subtype of Required type
VV : Map a b | Set_mem k' (keys VV)
In Context
VV : Map a b | VV == l
k : a
l : Map a b
k' : a
LiquidHaskell is unable to deduce that the key
k'
definitely belongs in the left
subtree
l
. Well, let’s ask ourselves: why must
k'
belong in the left subtree? From the input, we know
HasKey k' m
i.e. that k'
is somewhere
in m
. That is one of the following holds:
k' == k
or,HasKey k' l
or,HasKey k' r
.As the preceding guard k' == k
fails, we (and
LiquidHaskell) can rule out case (1). Now, what about the
Map
tells us that case (2) must hold, i.e. that case (3)
cannot hold? The BST invariant, all keys in r
exceed k
which itself exceeds k'
. That is, all
nodes in r
are disequal to k'
and
hence k'
cannot be in r
, ruling out case (3).
Formally, we need the fact that: \[\forall\
\vkey,\ \vt. \vt :: {\vMap\ \reft{\vkey'}{k}{\vkey' \not =
\vkey}\ v}
\ \Rightarrow\
\lnot (\vHasKey\ \vkey\ \vt)\]
Conversion Lemmas Unfortunately, LiquidHaskell
cannot automatically deduce facts like the above, as they
relate refinements of a container’s type parameters (here:
\(\vkey' \not = \vkey\), which
refines the Map
s first type parameter) with properties of
the entire container (here: \(\vHasKey\ \vkey\
\vt\)). Fortunately, it is easy to state, prove
and use facts like the above, via lemmas which are
just functions. 3
Defining Lemmas To state a lemma, we need only convert
it into a type by viewing universal
quantifiers as function parameters, and implications as function
types:
Proving Lemmas Note how the signature for
lemma_notMem
corresponds exactly to the missing fact from
above. The “output” type is a Bool
refined with the
proposition that we desire. We prove the lemma simply by
traversing the tree which lets LiquidHaskell build up a proof
for the output fact by inductively combining the proofs from the
subtrees.
Using Lemmas To use a lemma, we need to
instantiate it to the particular keys and trees we care about,
by “calling” the lemma function, and forcing its result to be in the
environment used to typecheck the expression where we want to
use the lemma. Say what? Here’s how to use lemmas to verify
get
:
By calling lemma_notMem
we create a dummy
Bool
refined with the fact not (HasKey k' r)
(resp. not (HasKey k' l)
). We force the calls to
get k' l
(resp. get k' r
) to be typechecked
using the materialized refinement by wrapping the calls in
assert
:
Ghost Values This technique of materializing auxiliary
facts via ghost values is a well known idea in program
verification. Usually, one has to take care to ensure that ghost
computations do not interfere with the regular computations. If we had
to actually execute lemma_notMem
it would wreck
the efficient logarithmic lookup time, assuming we kept the trees balanced, as we would traverse the entire
tree instead of just the short path to a node. 4
Laziness comes to our rescue: as the ghost value is
(trivially) not needed, it is never computed. In fact, it is
straightforward to entirely erase the call in the compiled
code, which lets us freely assert
such lemma
s
to carry out proofs, without paying any runtime penalty. In an eager
language we would have to do a bit of work to specifically mark the
computation as a ghost or irrelevant but
in the lazy setting we get this for free.
Exercise: (Membership Test): Capisce? Fix the
definition of mem
so that it verifiably implements the
given signature.
Exercise: (Fresh): To make sure you really understand
this business of ghosts values and proofs, complete the implementation
of the following function which returns a fresh
integer
that is distinct from all the values in its input list:
To refresh your memory, here are the definitions for
Elem
we saw earlier
In this chapter we saw how to combine several of the techniques from previous chapters in a case study. We learned how to:
Define an API for associative maps that used refinements
to track the set of keys
stored in a map, in order
to prevent lookup failures, the NullPointerDereference
errors of the functional world,
Use the API to implement a small interpreter that is
guaranteed to never fail with UnboundVariable
errors, as
long as the input expressions were closed,
Implement the API using Binary Search Trees; in
particular, using ghost lemmas to assert
facts
that LiquidHaskell is otherwise unable to deduce automatically.
Recall that Empty
, Union
,
In
and the other Set
operators are described
here.↩︎
Feel free to embellish the language with fancier features like functions, tuples etc.↩︎
Why does LiquidHaskell not automatically deduce this
information? This is tricky to describe. Intuitively, because there is
no way of automatically connecting the traversal corresponding
to keys
with the type variable k
. I wish I had
a better way to explain this rather subtle point; suggestions welcome!↩︎
Which is what makes dynamic contract checking inefficient for such invariants.↩︎