Skip to content

CSV Tables

Most demonstrations for verification techniques involve programs with complicated invariants and properties. However, these methods can often be rather useful for describing simple but important aspects of APIs or programs with more humble goals. I saw a rather nice example of using Scala's Shapeless library for preventing off-by-one errors in CSV processing code. Here's the same, short, example rephrased with LiquidHaskell.

23: module CSV where
24: 
25: -- | Using LiquidHaskell for CSV lists
26: -- c.f. http://www.reddit.com/r/scala/comments/1nhzi2/using_shapelesss_sized_type_to_eliminate_real/

The Type

Suppose you wanted to represent tables as a list of comma-separated values.

For example, here's a table listing the articles and prices at the coffee shop I'm sitting in right now:

Item Price
Espresso 2.25
Macchiato 2.75
Cappucino 3.35
Americano 2.25

You might represent this with a simple Haskell data type:

64: 
65: data CSV = Csv { (CSV.CSV) -> [[(GHC.Types.Char)]]headers :: [String]
66:                , (CSV.CSV) -> [[[(GHC.Types.Char)]]]rows    :: [[String]]
67:                }

and now, the above table is just:

73: (CSV.CSV)zumbarMenu = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Item"     , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Price"]
74:                  [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Espresso" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"2.25" ]  
75:                  , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Macchiato", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"2.75" ]
76:                  , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Cappucino", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"3.35" ]
77:                  , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Americano", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"2.25" ]
78:                  ]

The Errors

Our CSV type supports tables with an arbitrary number of headers and rows but of course, we'd like to ensure that each row has data for each header, that is, we don't end up with tables like this one

89: -- Eeks, we missed the header name!
90: 
91: (CSV.CSV)csvBad1 = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Date" {- ??? -} ] 
92:               [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) > 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Mon", {VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"1"]
93:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) > 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Tue", {VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"2"]
94:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) > 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Wed", {VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"3"] 
95:               ]
96: 

or this one,

102: -- Blergh! we missed a column.
103: 
104: (CSV.CSV)csvBad2 = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Name" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Age"  ] 
105:               [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Alice", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"32"   ]
106:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Bob"  {- ??? -}]
107:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Cris" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"29"   ] 
108:               ]

Alas, both the above are valid inhabitants of the Haskell CSV type, and so, sneak past GHC.

The Invariant

Thus, we want to refine the CSV type to specify that the number of elements in each row is exactly the same as the number of headers.

To do so, we merely write a refined data type definition:

123: {-@ data CSV = Csv { headers :: [String]
124:                    , rows    :: [{v:[String] | (len v) = (len headers)}]
125:                    }
126:   @-}

Here len is a measure denoting the length of a list. Thus, (len headers) is the number of headers in the table, and the refinement on the rows field states that each row is a list of Strings, with exactly the same number of elements as the number of headers.

We can now have our arbitrary-arity tables, but LiquidHaskell will make sure that we don't miss entries here or there.

138: -- All is well! 
139: 
140: (CSV.CSV)csvGood = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Id", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Name", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Days"]
141:               [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"1", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Jan", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"31"]
142:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"2", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Feb", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"28"]
143:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"3", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Mar", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"31"]
144:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"4", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Apr", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"30"] 
145:               ]

Bonus Points

How would you modify the specification to prevent table with degenerate entries like this one?

155: (CSV.CSV)deg = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Id", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Name", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Days"]
156:           [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"1" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Jan" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"31"]
157:           , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"2" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Feb" , {VV : [{VV : (GHC.Types.Char) | false}]<\_ VV -> false> | (((null VV)) <=> true) && ((len VV) == 0) && ((len VV) >= 0)}""]
158:           ]
Ranjit Jhala
2013-10-10 16:12