loop
Recall the following from the introduction:
ghci> :m +Data.Map
ghci> let m = fromList [ ("haskell" , "lazy")
, ("javascript", "eager")]
ghci> m ! "haskell"
"lazy"
ghci> m ! "python"
"*** Exception: key is not in the map
The problem illustrated above is quite a pervasive one; associative maps pop up everywhere. Failed lookups are the equivalent of NullPointerDereference
exceptions in languages like Haskell. It is rather difficult to use Haskell’s type system to precisely characterize the behavior of associative map APIs as ultimately, this requires tracking the dynamic set of keys in the map.
In this case study, we’ll see how to combine two techniques, measures and refined data types, to analyze programs that implement and use associative maps (e.g. Data.Map
or Data.HashMap
).
Lets start by defining a refined API for Associative Maps that tracks the set of keys stored in the map, in order to statically ensure the safety of lookups.
Types First, we need a type for Map
s. As usual, lets parameterize the type with k
for the type of keys and v
for the type of values:
data Map k v -- Currently left abstract
Keys To talk about the set of keys in a map, we will use a measure
measure keys :: Map k v -> Set k
that associates each Map
to the Set
of its defined keys. Next, we use the above measure, and the usual Set
operators to refine the types of the functions that create, add and lookup key-value bindings, in order to precisely track, within the type system, the keys
that are dynamically defined within each Map
. 1
Empty Map
s have no keys in them. Hence, we type the empty Map
as:
emp :: {m:Map k v | Empty (keys m)}
Add The function set
takes a key \(k\) a value \(v\) and a map m
and returns the new map obtained by extending m
with the binding \({k \mapsto v}\). Thus, the set of keys
of the output Map
includes those of the input plus the singleton \(k\), that is:
set :: k:k -> v -> m:Map k v -> {n: Map k v| AddKey k m n}
predicate AddKey K M N = keys N = Set_cup (Set_sng K) (keys M)
Query Finally, queries will only succeed for keys that are defined a given Map
. Thus, we define an alias:
predicate HasKey K M = In K (keys M)
and use it to type mem
which checks if a key is defined in the Map
and get
which actually returns the value associated with a given key.
-- | Check if key is defined
mem :: k:k -> m:Map k v -> {v:Bool | v <=> HasKey k m}
-- | Lookup key's value
get :: k:k -> {m:Map k v | HasKey k m} -> v
Rather than jumping into the implementation of the above Map
API, lets write a client that uses Map
s to implement an interpreter for a tiny language. In particular, we will use maps as an environment containing the values of bound variables, and we will use the refined API to ensure that lookups never fail, and hence, that well-scoped programs always reduce to a value.
Expressions Lets work with a simple language with integer constants, variables, binding and arithmetic operators:2
Values We can use refinements to formally describe values as a subset of Expr
allowing us to reuse a bunch of code. To this end, we simply define a (measure
) predicate characterizing values:
and then we can use the lifted measure
to define an alias for Val
denoting values:
we can use the above to write simple operators on Val
, for example:
Environments let us save values for the local" i.e. let-bound variables; when evaluating an expression Var x
we simply look up the value of x
in the environment. This is why Map
s were invented! Lets define our environments as Map
s from Var
iables to Val
ues:
The above definition essentially specifies, inside the types, an eager evaluation strategy: LiquidHaskell will prevent us from sticking unevaluated Expr
s inside the environments.
Evaluation proceeds via a straightforward recursion over the structure of the expression. When we hit a Var
we simply query its value from the environment. When we hit a Let
we compute the bound expression and tuck its value into the environment before proceeding within.
The above eval
seems rather unsafe; whats the guarantee that get x g
will succeed? For example, surely trying:
ghci> eval emp (Var "x")
will lead to some unpleasant crash. Shouldn’t we check if the variables is present and if not, fail with some sort of Variable Not Bound
error? We could, but we can do better: we can prove at compile time, that such errors will not occur.
Free Variables are those whose values are not bound within an expression, that is, the set of variables that appear in the expression, but are not bound by a dominating Let
. We can formalize this notion as a (lifted) function:
An Expression is Closed with respect to an environment G
if all the free variables in the expression appear in G
, i.e. the environment contains bindings for all the variables in the expression that are not bound within the expression. As we’ve seen repeatedly, often a whole pile of informal hand-waving, can be succinctly captured by a type definition that says the free
variables in the Expr
must be contained in the keys
of the environment G
:
Closed Evaluation never goes wrong, i.e. we can ensure that eval
will not crash with unbound variables, as long as it is invoked with suitable environments:
We can be sure an Expr
is well-scoped if it has no free variables.Lets use that to write a “top-level” evaluator:
Exercise: (Wellformedness Check): Complete the definition of the below function which checks if an Expr
is well formed before eval
uating it:
Proof is all well and good, in the end, you need a few sanity tests to kick the tires. So:
Exercise: (Closures): Extend the language above to include functions. That is, extend Expr
as below, (and eval
and free
respectively.)
data Expr = ... | Fun Var Expr | App Expr Expr
Just focus on ensuring the safety of variable lookups; ensuring full type-safety (i.e. every application is to a function) is rather more complicated and beyond the scope of what we’ve seen so far.
We just saw how easy it is to use the Associative Map API to ensure the safety of lookups, even though the Map
has a “dynamically” generated set of keys. Next, lets see how we can implement a Map
library that respects the API using Binary Search Trees
Data Type First, lets provide an implementation of the hitherto abstract data type for Map
. We shall use Binary Search Trees, wherein, at each Node
, the left
(resp. right
) subtree has keys that are less than (resp. greater than) the root key
.
Recall that the above refined data definition yields strengthened data constructors that statically ensure that only legal, binary-search ordered trees are created in the program.
Defined Keys Next, we must provide an implementation of the notion of the keys
that are defined for a given Map
. This is achieved via the lifted measure function:
Armed with the basic type and measure definition, we can start to fill in the operations for Maps
.
Exercise: (Empty Maps): To make sure you are following, fill in the definition for an emp
ty Map:
Exercise: (Insert): To add a key k'
to a Map
we recursively traverse the Map
zigging left
or right
depending on the result of comparisons with the keys along the path. Unfortunately, the version below has an (all too common!) bug, and hence, is rejected by LiquidHaskell. Find and fix the bug so that the function is verified.
Lookup Next, lets write the mem
function that returns the value associated with a key k'
. To do so we just compare k'
with the root key, if they are equal, we return the binding, and otherwise we go down the left
(resp. right
) subtree if sought for key is less (resp. greater) than the root key
. Crucially, we want to check that lookup never fails, and hence, we implement the Tip
(i.e. empty) case with die
gets LiquidHaskell to prove that that case is indeed dead code, i.e. never happens at run-time.
Unfortunately the function above is rejected by LiquidHaskell. This is a puzzler (and a bummer!) because in fact it is correct. So what gives? Well, lets look at the error for the call get' k' l
src/07-case-study-associative-maps.lhs:411:25: Error: Liquid Type Mismatch
Inferred type
VV : Map a b | VV == l
not a subtype of Required type
VV : Map a b | Set_mem k' (keys VV)
In Context
VV : Map a b | VV == l
k : a
l : Map a b
k' : a
LiquidHaskell is unable to deduce that the key k'
definitely belongs in the left
subtree l
. Well, lets ask ourselves: why must k'
belong in the left subtree? From the input, we know HasKey k' m
i.e. that k'
is somewhere in m
. That is one of the following holds:
k' == k
or,HasKey k' l
or,HasKey k' r
.As the preceding guard k' == k
fails, we (and LiquidHaskell) can rule out case (1). Now, what about the Map
tells us that case (2) must hold, i.e. that case (3) cannot hold? The BST invariant, all keys in r
exceed k
which itself exceeds k'
. That is, all nodes in r
are disequal to k'
and hence k'
cannot be in r
, ruling out case (3). Formally, we need the fact that: \[\forall\ \vkey,\ \vt. \vt :: {\vMap\ \reft{\vkey'}{k}{\vkey' \not = \vkey}\ v}
\ \Rightarrow\
\lnot (\vHasKey\ \vkey\ \vt)\]
Conversion Lemmas Unfortunately, LiquidHaskell cannot automatically deduce facts like the above, as they relate refinements of a container’s type parameters (here: \(\vkey' \not = \vkey\), which refines the Map
s first type parameter) with properties of the entire container (here: \(\vHasKey\ \vkey\ \vt\)). Fortunately, it is easy to state, prove and use facts like the above, via lemmas which are just functions. 3
Defining Lemmas To state a lemma, we need only convert it into a type by viewing universal quantifiers as function parameters, and implications as function types:
Proving Lemmas Note how the signature for lemma_notMem
corresponds exactly to the missing fact from above. The “output” type is a Bool
refined with the proposition that we desire. We prove the lemma simply by traversing the tree which lets LiquidHaskell build up a proof for the output fact by inductively combining the proofs from the subtrees.
Using Lemmas To use a lemma, we need to instantiate it to the particular keys and trees we care about, by “calling” the lemma function, and forcing its result to be in the environment used to typecheck the expression where we want to use the lemma. Say what? Here’s how to use lemmas to verify get
:
By calling lemma_notMem
we create a dummy Bool
refined with the fact not (HasKey k' r)
(resp. not (HasKey k' l)
). We force the calls to get k' l
(resp. get k' r
) to be typechecked using the materialized refinement by wrapping the calls in assert
:
Ghost Values This technique of materializing auxiliary facts via ghost values is a well known idea in program verification. Usually, one has to take care to ensure that ghost computations do not interfere with the regular computations. If we had to actually execute lemma_notMem
it would wreck the efficient logarithmic lookup time, assuming we kept the trees balanced, as we would traverse the entire tree instead of just the short path to a node. 4
Laziness comes to our rescue: as the ghost value is (trivially) not needed, it is never computed. In fact, it is straightforward to entirely erase the call in the compiled code, which lets us freely assert
such lemma
s to carry out proofs, without paying any runtime penalty. In an eager language we would have to do a bit of work to specifically mark the computation as a ghost or irrelevant but in the lazy setting we get this for free.
Exercise: (Membership Test): Capisce? Fix the definition of mem
so that it verifiably implements the given signature.
Exercise: (Fresh): To make sure you really understand this business of ghosts values and proofs, complete the implementation of the following function which returns a fresh
integer that is distinct from all the values in its input list:
To refresh your memory, here are the definitions for Elem
we saw earlier
In this chapter we saw how to combine several of the techniques from previous chapters in a case study. We learned how to:
Define an API for associative maps that used refinements to track the set of keys
stored in a map, in order to prevent lookup failures, the NullPointerDereference
errors of the functional world,
Use the API to implement a small interpreter that is guaranteed to never fail with UnboundVariable
errors, as long as the input expressions were closed,
Implement the API using Binary Search Trees; in particular, using ghost lemmas to assert
facts that LiquidHaskell is otherwise unable to deduce automatically.
Recall that Empty
, Union
, In
and the other Set
operators are described here.↩︎
Feel free to embellish the language with fancier features like functions, tuples etc.↩︎
Why does LiquidHaskell not automatically deduce this information? This is tricky to describe. Intuitively, because there is no way of automatically connecting the traversal corresponding to keys
with the type variable k
. I wish I had a better way to explain this rather subtle point; suggestions welcome!↩︎
Which is what makes dynamic contract checking inefficient for such invariants.↩︎