**1.**Introduction {#intro}**1.1.**Well-Typed Programs Do Go Wrong {#gowrong}**1.2.**Refinement Types**1.3.**Audience**1.4.**Getting Started**2.**Logic & SMT**2.1.**Syntax**2.2.**Semantics {#semantics}**2.3.**Verification Conditions**2.4.**Examples: Propositions**2.5.**Examples: Arithmetic**2.6.**Examples: Uninterpreted Function**2.7.**Recap**3.**Refinement Types**3.1.**Defining Types {#definetype}**3.2.**Errors**3.3.**Subtyping**3.4.**Writing Specifications**3.5.**Refining Function Types: Pre-conditions**3.6.**Refining Function Types: Post-conditions**3.7.**Testing Values: Booleans and Propositions {#propositions}**3.8.**Putting It All Together**3.9.**Recap**4.**Polymorphism {#polymorphism}**4.1.**Specification: Vector Bounds {#vectorbounds}**4.2.**Verification: Vector Lookup**4.3.**Inference: Our First Recursive Function**4.4.**Higher-Order Functions: Bottling Recursion in a`loop`

**4.5.**Refinements and Polymorphism {#sparsetype}**4.6.**Recap**5.**Refined Datatypes {#refineddatatypes}**5.1.**Sparse Vectors Revisited {#autosmart}**5.2.**Ordered Lists {#orderedlists}**5.3.**Ordered Trees {#binarysearchtree}**5.4.**Recap**6.**Boolean Measures {#boolmeasures}**6.1.**Partial Functions**6.2.**Lifting Functions to Measures {#usingmeasures}**6.3.**A Safe List API**6.4.**Recap**7.**Numeric Measures {#numericmeasure}**7.1.**Wholemeal Programming**7.2.**Specifying List Dimensions**7.3.**Lists: Size Preserving API**7.4.**Lists: Size Reducing API {#listreducing}**7.5.**Dimension Safe Vector API**7.6.**Dimension Safe Matrix API**7.7.**Recap**8.**Elemental Measures {#setmeasure}**8.1.**Talking about Sets**8.2.**Proving QuickCheck Style Properties {#quickcheck}**8.3.**Content-Aware List API {#listelems}**8.4.**Permutations**8.5.**Uniqueness**8.6.**Unique Zippers**8.7.**Recap**9.**Case Study: Okasaki's Lazy Queues {#lazyqueue}**10.**Case Study: Associative Maps**10.1.**Specifying Maps {#mapapi}**10.2.**Using Maps: Well Scoped Expressions**10.3.**Implementing Maps: Binary Search Trees {#lemmas}**10.4.**Recap**11.**Case Study: Pointers & Bytes {#case-study-pointers}**11.1.**HeartBleeds in Haskell**11.2.**Low-level Pointer API**11.3.**A Refined Pointer API**11.4.**Assumptions vs Guarantees**11.5.**ByteString API**11.6.**Application API**11.7.**Nested ByteStrings**11.8.**Recap: Types Against Overflows**12.**Case Study: AVL Trees {#case-study-avltree}

{-@ LIQUID "--no-termination" @-}
module Tutorial_10_Case_Study_Associative_Maps where
import Data.Set hiding (elems)
-- | Boilerplate
{-@ die :: {v:_ | false} -> a @-}
die x = error x
-- | Haskell Definitions
data Map k v = Node { key :: k
, value :: v
, left :: Map k v
, right :: Map k v }
| Tip
lemma_notMem :: k -> Map k v -> Bool
set :: (Ord k) => k -> v -> Map k v -> Map k v
get :: (Ord k) => k -> Map k v -> v
get' :: (Ord k) => k -> Map k v -> v
mem :: (Ord k) => k -> Map k v -> Bool
emp :: Map k v
elems :: (Ord a) => [a] -> Set a
fresh :: [Int] -> Int
-- | Set Interface
{-@ predicate In X Xs = Set_mem X Xs @-}
{-@ predicate Subset X Y = Set_sub X Y @-}
{-@ predicate Empty X = Set_emp X @-}
{-@ predicate Union X Y Z = X = Set_cup Y Z @-}
{-@ predicate Union1 X Y Z = Union X (Set_sng Y) Z @-}
-- | Predicate Aliases
{-@ predicate NoKeys M = Empty (keys M) @-}
{-@ predicate HasKey K M = In K (keys M) @-}
{-@ predicate AddKey K M N = Union1 (keys N) K (keys M) @-}
-- {-@ ignore tests @-}
-- {-@ ignore set @-}
-- {-@ ignore get @-}
-- {-@ ignore get' @-}
-- {-@ ignore mem @-}

Recall the following from the introduction:

```
ghci> :m +Data.Map
ghci> let m = fromList [ ("haskell" , "lazy")
, ("javascript", "eager")]
ghci> m ! "haskell"
"lazy"
ghci> m ! "python"
"*** Exception: key is not in the map
```

The problem illustrated above is quite a pervasive one; associative
maps pop up everywhere. Failed lookups are the equivalent of
`NullPointerDereference`

exceptions in languages like
Haskell. It is rather difficult to use Haskell’s type system to
precisely characterize the behavior of associative map APIs as
ultimately, this requires tracking the *dynamic set of keys* in
the map.

In this case study, we’ll see how to combine two techniques, measures and refined
data types, to analyze programs that *implement* and
*use* associative maps (e.g. `Data.Map`

or
`Data.HashMap`

).

Lets start by defining a *refined API* for Associative Maps
that tracks the set of keys stored in the map, in order to statically
ensure the safety of lookups.

**Types** First, we need a type for `Map`

s. As
usual, let’s parameterize the type with `k`

for the type of
keys and `v`

for the type of values:

`data Map k v -- Currently left abstract`

**Keys** To talk about the set of keys in a map, we will
use a *measure*

`measure keys :: Map k v -> Set k`

that associates each `Map`

to the `Set`

of its
defined keys. Next, we use the above measure, and the usual
`Set`

operators to refine the types of the functions that
*create*, *add* and *lookup* key-value bindings, in
order to precisely track, within the type system, the `keys`

that are dynamically defined within each `Map`

. ^{1}

**Empty** `Map`

s have no keys in them. Hence, we
type the empty `Map`

as:

`emp :: {m:Map k v | Empty (keys m)}`

**Add** The function `set`

takes a key \(k\) a value \(v\) and a map `m`

and returns
the new map obtained by extending `m`

with the binding \({k \mapsto v}\). Thus, the set of
`keys`

of the output `Map`

includes those of the
input plus the singleton \(k\), that
is:

```
set :: k:k -> v -> m:Map k v -> {n: Map k v| AddKey k m n}
predicate AddKey K M N = keys N = Set_cup (Set_sng K) (keys M)
```

**Query** Finally, queries will only succeed for keys that
are defined a given `Map`

. Thus, we define an alias:

`predicate HasKey K M = In K (keys M)`

and use it to type `mem`

which *checks* if a key is
defined in the `Map`

and `get`

which actually
returns the value associated with a given key.

```
-- | Check if key is defined
mem :: k:k -> m:Map k v -> {v:Bool | v <=> HasKey k m}
-- | Lookup key's value
get :: k:k -> {m:Map k v | HasKey k m} -> v
```

Rather than jumping into the *implementation* of the above
`Map`

API, let’s write a *client* that uses
`Map`

s to implement an interpreter for a tiny language. In
particular, we will use maps as an *environment* containing the
values of *bound variables*, and we will use the refined API to
ensure that *lookups never fail*, and hence, that well-scoped
programs always reduce to a value.

**Expressions** Lets work with a simple language with
integer constants, variables, binding and arithmetic operators:^{2}

type Var = String
data Expr = Const Int
| Var Var
| Plus Expr Expr
| Let Var Expr Expr

**Values** We can use refinements to formally describe
*values* as a subset of `Expr`

allowing us to reuse a
bunch of code. To this end, we simply define a (`measure`

)
predicate characterizing values:

{-@ measure val @-}
val :: Expr -> Bool
val (Const _) = True
val (Var _) = False
val (Plus _ _) = False
val (Let _ _ _ ) = False

and then we can use the lifted `measure`

to define an
alias for `Val`

denoting values:

{-@ type Val = {v:Expr | val v} @-}

we can use the above to write simple *operators* on
`Val`

, for example:

{-@ plus :: Val -> Val -> Val @-}
plus (Const i) (Const j) = Const (i+j)
plus _ _ = die "Bad call to plus"

**Environments** let us save values for the local”
i.e. *let-bound* variables; when evaluating an expression
`Var x`

we simply look up the value of `x`

in the
environment. This is why `Map`

s were invented! Lets define
our environments as `Map`

s from `Var`

iables to
`Val`

ues:

{-@ type Env = Map Var Val @-}

The above definition essentially specifies, inside the types, an
*eager* evaluation strategy: LiquidHaskell will prevent us from
sticking unevaluated `Expr`

s inside the environments.

**Evaluation** proceeds via a straightforward recursion
over the structure of the expression. When we hit a `Var`

we
simply query its value from the environment. When we hit a
`Let`

we compute the bound expression and tuck its value into
the environment before proceeding within.

eval _ i@(Const _) = i
eval g (Var x) = get x g
eval g (Plus e1 e2) = plus (eval g e1) (eval g e2)
eval g (Let x e1 e2) = eval g' e2
where
g' = set x v1 g
v1 = eval g e1

The above `eval`

seems rather unsafe; whats the guarantee
that `get x g`

will succeed? For example, surely trying:

`ghci> eval emp (Var "x")`

will lead to some unpleasant crash. Shouldn’t we *check* if
the variables is present and if not, fail with some sort of
`Variable Not Bound`

error? We could, but we can do better:
we can prove at compile time, that such errors will not occur.

**Free Variables** are those whose values are *not*
bound within an expression, that is, the set of variables that
*appear* in the expression, but are not *bound* by a
dominating `Let`

. We can formalize this notion as a (lifted)
function:

{-@ measure free @-}
free :: Expr -> (Set Var)
free (Const _) = empty
free (Var x) = singleton x
free (Plus e1 e2) = xs1 `union` xs2
where
xs1 = free e1
xs2 = free e2
free (Let x e1 e2) = xs1 `union` (xs2 `difference` xs)
where
xs1 = free e1
xs2 = free e2
xs = singleton x

**An Expression is Closed** with respect to an environment
`G`

if all the *free* variables in the expression
appear in `G`

, i.e. the environment contains bindings for all
the variables in the expression that are *not* bound within the
expression. As we’ve seen repeatedly, often a whole pile of informal
hand-waving, can be succinctly captured by a type definition that says
the `free`

variables in the `Expr`

must be
contained in the `keys`

of the environment
`G`

:

{-@ type ClosedExpr G = {v:Expr | Subset (free v) (keys G)} @-}

**Closed Evaluation** never goes wrong, i.e. we can ensure
that `eval`

will not crash with unbound variables, as long as
it is invoked with suitable environments:

{-@ eval :: g:Env -> ClosedExpr g -> Val @-}

We can be sure an `Expr`

is well-scoped if it has
*no* free variables.Lets use that to write a “top-level”
evaluator:

{-@ topEval :: {v:Expr | Empty (free v)} -> Val @-}
topEval = eval emp

**Exercise: (Wellformedness Check): **Complete the
definition of the below function which *checks* if an
`Expr`

is well formed before `eval`

uating it:

{-@ evalAny :: Env -> Expr -> Maybe Val @-}
evalAny g e
| ok = Just $ eval g e
| otherwise = Nothing
where
ok = undefined

Proof is all well and good, in the end, you need a few sanity tests to kick the tires. So:

tests = [v1, v2]
where
v1 = topEval e1 -- Rejected by LH
v2 = topEval e2 -- Accepted by LH
e1 = (Var x) `Plus` c1
e2 = Let x c10 e1
x = "x"
c1 = Const 1
c10 = Const 10

**Exercise: (Closures): **Extend the language above to
include functions. That is, extend `Expr`

as below, (and
`eval`

and `free`

respectively.)

`data Expr = ... | Fun Var Expr | App Expr Expr`

Just focus on ensuring the safety of variable lookups; ensuring full type-safety (i.e. every application is to a function) is rather more complicated and beyond the scope of what we’ve seen so far.

We just saw how easy it is to *use* the Associative Map API to ensure the safety of lookups, even though the
`Map`

has a “dynamically” generated set of keys. Next, let’s
see how we can *implement* a `Map`

library that
respects the API using Binary Search
Trees

**Data Type** First, let’s provide an implementation of the
hitherto abstract data type for `Map`

. We shall use Binary
Search Trees, wherein, at each `Node`

, the `left`

(resp. `right`

) subtree has keys that are less than (resp.
greater than) the root `key`

.

{-@ data Map k v = Node { key :: k
, value :: v
, left :: Map {v:k | v < key} v
, right :: Map {v:k | key < v} v }
| Tip
@-}

Recall that the above refined data
definition yields strengthened data constructors that statically ensure
that only legal, *binary-search ordered* trees are created in the
program.

**Defined Keys** Next, we must provide an implementation of
the notion of the `keys`

that are defined for a given
`Map`

. This is achieved via the lifted measure function:

{-@ measure keys @-}
keys :: (Ord k) => Map k v -> Set k
keys Tip = empty
keys (Node k _ l r) = ks `union` kl `union` kr
where
kl = keys l
kr = keys r
ks = singleton k

Armed with the basic type and measure definition, we can start to
fill in the operations for `Maps`

.

**Exercise: (Empty Maps): **To make sure you are following,
fill in the definition for an `emp`

ty Map:

{-@ emp :: {m:Map k v | Empty (keys m)} @-}
emp = undefined

**Exercise: (Insert): **To add a key `k'`

to a
`Map`

we recursively traverse the `Map`

zigging
`left`

or `right`

depending on the result of
comparisons with the keys along the path. Unfortunately, the version
below has an (all too common!) bug, and hence, is *rejected* by
LiquidHaskell. Find and fix the bug so that the function is
verified.

{-@ set :: (Ord k) => k:k -> v -> m:Map k v
-> {n: Map k v | AddKey k m n} @-}
set k' v' (Node k v l r)
| k' == k = Node k v' l r
| k' < k = set k' v l
| otherwise = set k' v r
set k' v' Tip = Node k' v' Tip Tip

**Lookup** Next, let’s write the `mem`

function
that returns the value associated with a key `k'`

. To do so
we just compare `k'`

with the root key, if they are equal, we
return the binding, and otherwise we go down the `left`

(resp. `right`

) subtree if sought for key is less (resp.
greater) than the root `key`

. Crucially, we want to check
that lookup *never fails*, and hence, we implement the
`Tip`

(i.e. empty) case with `die`

gets
LiquidHaskell to prove that that case is indeed dead code, i.e. never
happens at run-time.

{-@ get' :: (Ord k) => k:k -> m:{Map k v| HasKey k m} -> v @-}
get' k' m@(Node k v l r)
| k' == k = v
| k' < k = get' k' l
| otherwise = get' k' r
get' _ Tip = die "Lookup Never Fails"

**Unfortunately** the function above is *rejected*
by LiquidHaskell. This is a puzzler (and a bummer!) because in fact it
*is* correct. So what gives? Well, let’s look at the error for
the call `get' k' l`

```
src/07-case-study-associative-maps.lhs:411:25: Error: Liquid Type Mismatch
Inferred type
VV : Map a b | VV == l
not a subtype of Required type
VV : Map a b | Set_mem k' (keys VV)
In Context
VV : Map a b | VV == l
k : a
l : Map a b
k' : a
```

LiquidHaskell is *unable* to deduce that the key
`k'`

definitely belongs in the `left`

subtree
`l`

. Well, let’s ask ourselves: *why* must
`k'`

belong in the left subtree? From the input, we know
`HasKey k' m`

i.e. that `k'`

is *somewhere*
in `m`

. That is *one of* the following holds:

`k' == k`

or,`HasKey k' l`

or,`HasKey k' r`

.

As the preceding guard `k' == k`

fails, we (and
LiquidHaskell) can rule out case (1). Now, what about the
`Map`

tells us that case (2) must hold, i.e. that case (3)
cannot hold? The *BST invariant*, all keys in `r`

exceed `k`

which itself exceeds `k'`

. That is, all
nodes in `r`

are *disequal* to `k'`

and
hence `k'`

cannot be in `r`

, ruling out case (3).
Formally, we need the fact that: \[\forall\
\vkey,\ \vt. \vt :: {\vMap\ \reft{\vkey'}{k}{\vkey' \not =
\vkey}\ v}
\ \Rightarrow\
\lnot (\vHasKey\ \vkey\ \vt)\]

**Conversion Lemmas** Unfortunately, LiquidHaskell
*cannot automatically* deduce facts like the above, as they
relate refinements of a container’s *type parameters* (here:
\(\vkey' \not = \vkey\), which
refines the `Map`

s first type parameter) with properties of
the entire container (here: \(\vHasKey\ \vkey\
\vt\)). Fortunately, it is easy to *state*, *prove*
and *use* facts like the above, via *lemmas* which are
just functions. ^{3}

**Defining Lemmas** To state a lemma, we need only convert
it into a type by viewing universal
quantifiers as function parameters, and implications as function
types:

{-@ lemma_notMem :: key:k
-> m:Map {k:k | k /= key} v
-> {v:Bool | not (HasKey key m)}
@-}
lemma_notMem _ Tip = True
lemma_notMem key (Node _ _ l r) = lemma_notMem key l &&
lemma_notMem key r

**Proving Lemmas** Note how the signature for
`lemma_notMem`

corresponds exactly to the missing fact from
above. The “output” type is a `Bool`

refined with the
proposition that we desire. We *prove* the lemma simply by
*traversing* the tree which lets LiquidHaskell build up a proof
for the output fact by inductively combining the proofs from the
subtrees.

**Using Lemmas** To use a lemma, we need to
*instantiate* it to the particular keys and trees we care about,
by “calling” the lemma function, and forcing its result to be in the
*environment* used to typecheck the expression where we want to
use the lemma. Say what? Here’s how to use lemmas to verify
`get`

:

{-@ get :: (Ord k) => k:k -> m:{Map k v | HasKey k m} -> v @-}
get k' (Node k v l r)
| k' == k = v
| k' < k = assert (lemma_notMem k' r) $
get k' l
| otherwise = assert (lemma_notMem k' l) $
get k' r
get _ Tip = die "Lookup failed? Impossible."

By calling `lemma_notMem`

we create a dummy
`Bool`

refined with the fact `not (HasKey k' r)`

(resp. `not (HasKey k' l)`

). We force the calls to
`get k' l`

(resp. `get k' r`

) to be typechecked
using the materialized refinement by wrapping the calls in
`assert`

:

assert _ x = x

**Ghost Values** This technique of materializing auxiliary
facts via *ghost values* is a well known idea in program
verification. Usually, one has to take care to ensure that ghost
computations do not interfere with the regular computations. If we had
to actually *execute* `lemma_notMem`

it would wreck
the efficient logarithmic lookup time, assuming we kept the trees balanced, as we would traverse the *entire*
tree instead of just the short path to a node. ^{4}

**Laziness** comes to our rescue: as the ghost value is
(trivially) not needed, it is never computed. In fact, it is
straightforward to entirely *erase* the call in the compiled
code, which lets us freely `assert`

such `lemma`

s
to carry out proofs, without paying any runtime penalty. In an eager
language we would have to do a bit of work to specifically mark the
computation as a ghost or irrelevant but
in the lazy setting we get this for free.

**Exercise: (Membership Test): **Capisce? Fix the
definition of `mem`

so that it verifiably implements the
given signature.

{-@ mem :: (Ord k) => k:k -> m:Map k v
-> {v:_ | v <=> HasKey k m} @-}
mem k' (Node k _ l r)
| k' == k = True
| k' < k = mem k' l
| otherwise = mem k' r
mem _ Tip = False

**Exercise: (Fresh): **To make sure you really understand
this business of ghosts values and proofs, complete the implementation
of the following function which returns a `fresh`

integer
that is *distinct* from all the values in its input list:

{-@ fresh :: xs:[Int] -> {v:Int | not (Elem v xs)} @-}
fresh = undefined

To refresh your memory, here are the definitions for
`Elem`

we saw earlier

{-@ predicate Elem X Ys = In X (elems Ys) @-}
{-@ measure elems @-}
elems [] = empty
elems (x:xs) = (singleton x) `union` (elems xs)

In this chapter we saw how to combine several of the techniques from previous chapters in a case study. We learned how to:

*Define*an API for associative maps that used refinements to track the*set*of`keys`

stored in a map, in order to prevent lookup failures, the`NullPointerDereference`

errors of the functional world,*Use*the API to implement a small interpreter that is guaranteed to never fail with`UnboundVariable`

errors, as long as the input expressions were closed,*Implement*the API using Binary Search Trees; in particular, using*ghost lemmas*to`assert`

facts that LiquidHaskell is otherwise unable to deduce automatically.

Recall that

`Empty`

,`Union`

,`In`

and the other`Set`

operators are described here.↩︎Feel free to embellish the language with fancier features like functions, tuples etc.↩︎

Why does LiquidHaskell not automatically deduce this information? This is tricky to describe. Intuitively, because there is no way of automatically connecting the

*traversal*corresponding to`keys`

with the type variable`k`

. I wish I had a better way to explain this rather subtle point; suggestions welcome!↩︎Which is what makes dynamic contract checking inefficient for such invariants.↩︎