Branches and Binary Operators

BOA: Branches and Binary Operators

Next, lets add

  • Branches (if-expressions)
  • Binary Operators (+, -, etc.)

In the process of doing so, we will learn about

  • Intermediate Forms
  • Normalization

Branches

Lets start first with branches (conditionals).

We will stick to our recipe of:

  1. Build intuition with examples,
  2. Model problem with types,
  3. Implement with type-transforming-functions,
  4. Validate with tests.

Examples

First, lets look at some examples of what we mean by branches.

  • For now, lets treat 0 as “false” and non-zero as “true”

Example: If1

  • Since 10 is not 0 we evaluate the “then” case to get 22

Example: If2

  • Since sub(1) is 0 we evaluate the “else” case to get -1

QUIZ: If3

if-else is also an expression so we can nest them:

What should the following evaluate to?

  • A. 999
  • B. 0
  • C. 1
  • D. 1000
  • E. -1

Control Flow in Assembly

To compile branches, we will use:

  • labels of the form

are “landmarks” from which execution (control-flow) can be started, or to which it can be diverted,

  • comparisons of the form
  • Perform a (numeric) comparison between the values a1 and a2, and
  • Store the result in a special processor flag,

  • Jump operations of the form

  • Use the result of the flag set by the most recent cmp
  • To continue execution from the given LABEL

QUIZ

Which of the following is a valid x86 encoding of

QUIZ: Compiling if-else
QUIZ: Compiling if-else

Strategy

To compile an expression of the form

We will:

  1. Compile eCond
  2. Compare the result (in eax) against 0
  3. Jump if the result is zero to a special "IfFalse" label
    • At which we will evaluate eElse,
    • Ending with a special "IfExit" label.
  4. (Otherwise) continue to evaluate eTrue
    • And then jump (unconditionally) to the "IfExit" label.

Example: If-Expressions to Asm

Lets see how our strategy works by example:

Example: if1

Example: if1
Example: if1

Example: if2

Example: if2
Example: if2

Example: if3

Example: if3
Example: if3

Oops, cannot reuse labels across if-expressions!

  • Can’t use same label in two places (invalid assembly)
Example: if3 wrong
Example: if3 wrong

Oops, need distinct labels for each branch!

  • Require distinct tags for each if-else expression
Example: if3 tagged
Example: if3 tagged

Types: Source

Lets modify the Source Expression

  • Add if-else expressions and
  • Add tags of type a for each sub-expression
    • Tags are polymorphic a so we can have different types of tags
    • e.g. Source-Position information for error messages

Lets define a name for Tag (just integers).

We will now use:

Types: Assembly

Now, lets extend the Assembly with labels, comparisons and jumps:

Transforms

We can’t expect programmer to put in tags (yuck.)

  • Lets squeeze in a tagging transform into our pipeline
Adding Tagging to the Compiler Pipeline
Adding Tagging to the Compiler Pipeline

Transforms: Parse

Just as before, but now puts a dummy () into each position

Transforms: Tag

The key work is done by doTag i e

  1. Recursively walk over the BareE named e starting tagging at counter i
  2. Return a pair (i', e') of updated counter i' and tagged expr e'

QUIZ

What expressions shall we fill in for _1 and _2 ?

(ProTip: Use mapAccumL)

We can now tag the whole program by

  • Calling doTag with the initial counter (e.g. 0),

  • Throwing away the final counter.

Transforms: CodeGen

Now that we have the tags we lets implement our compilation strategy

Recap: Branches

  • Tag each sub-expression,
  • Use tag to generate control-flow labels implementing branch.

Lesson: Tagged program representation simplifies compilation…

  • Next: another example of how intermediate representations help.

Binary Operations

You know the drill.

  1. Build intuition with examples,
  2. Model problem with types,
  3. Implement with type-transforming-functions,
  4. Validate with tests.

Compiling Binary Operations

Lets look at some expressions and figure out how they would get compiled.

  • Recall: We want the result to be in eax after the instructions finish.

QUIZ

What is the assembly corresponding to 33 - 10 ?

  • A. ?1 = sub, ?2 = 33, ?3 = mov, ?4 = 10

  • B. ?1 = mov, ?2 = 33, ?3 = sub, ?4 = 10

  • C. ?1 = sub, ?2 = 10, ?3 = mov, ?4 = 33

  • D. ?1 = mov, ?2 = 10, ?3 = sub, ?4 = 33

How to compile n1 * n2

Example: Bin1

Lets start with some easy ones. The source:

Example: Bin 1
Example: Bin 1

Strategy: Given n1 + n2

  • Move n1 into eax,
  • Add n2 to eax.

Example: Bin2

What if the first operand is a variable?

Example: Bin 2
Example: Bin 2

Simple, just copy the variable off the stack into eax

Strategy: Given x + n

  • Move x (from stack) into eax,
  • Add n to eax.

Example: Bin3

Same thing works if the second operand is a variable.

Example: Bin 3
Example: Bin 3

Strategy: Given x + n

  • Move x (from stack) into eax,
  • Add n to eax.

QUIZ

What is the assembly corresponding to (10 + 20) * 30 ?

  • A. ?1 = add, ?2 = 30, ?3 = mul, ?4 = 20

  • B. ?1 = mul, ?2 = 30, ?3 = add, ?4 = 20

  • C. ?1 = add, ?2 = 20, ?3 = mul, ?4 = 30

  • D. ?1 = mul, ?2 = 20, ?3 = add, ?4 = 30

Second Operand is Constant

In general, to compile e + n we can do

Example: Bin4

But what if we have nested expressions

  • Can compile 1 + 2 with result in eax
  • .. but then need to reuse eax for 3 + 4

Need to save 1 + 2 somewhere!

Idea How about use another register for 3 + 4?

  • But then what about (1 + 2) * (3 + 4) * (5 + 6) ?
  • In general, may need to save more sub-expressions than we have registers.

Idea: Immediate Expressions

Why were 1 + 2 and x + y so easy to compile but (1 + 2) * (3 + 4) not?

Because 1 and x are immediate expressions

Their values don’t require any computation!

  • Either a constant, or,
  • variable whose value is on the stack.

Idea: Administrative Normal Form (ANF)

An expression is in Administrative Normal Form (ANF)

if all primitive operations have immediate arguments.

Primitive Operations: Those whose values we need for computation to proceed.

  • v1 + v2
  • v1 - v2
  • v1 * v2

QUIZ

Is the following expression in ANF?

A. Yes, its ANF. B. Nope, its not, because of + C. Nope, its not, because of * D. Nope, its not, because of - E. Huh, WTF is ANF?

A. must be isImm B. meh ANF is fine!

Conversion to ANF

However, note the following variant is in ANF

How can we compile the above code?

Binary Operations: Strategy

We can convert any expression to ANF

  • By adding “temporary” variables for sub-expressions
Compiler Pipeline with ANF
Compiler Pipeline with ANF
  • Step 1: Compiling ANF into Assembly
  • Step 2: Converting Expressions into ANF

Types: Source

Lets add binary primitive operators

and use them to extend the source language:

So, for example, 2 + 3 would be parsed as:

Types: Assembly

Need to add X86 instructions for primitive arithmetic:

Types: ANF

We can define a separate type for ANF (try it!)

… but …

super tedious as it requires duplicating a bunch of code.

Instead, lets write a function that describes immediate expressions

We can now think of immediate expressions as:

The subset of Expr such that isImm returns True

QUIZ

Similarly, lets write a function that describes ANF expressions

What should we fill in for _1?

QUIZ

Similarly, lets write a function that describes ANF expressions

What should we fill in for _2?

We can now think of ANF expressions as:

The subset of Expr such that isAnf returns True

Use the above function to test our ANF conversion.

immArg :: Env -> ImmExpr -> Arg
immArg env (Number n _) = Const n
immArg env (Id x _)     = RegOffset EBP (lookupEnv env x)

compileImm :: Env -> ImmExpr Tag -> [Instruction]
compileImm env v  = [IMov (Reg EAX) (immArg env v) ]

compile :: Env -> AnfExpr Tag -> [Instruction]
compile env v@(Number {})  = compileImm env v
compile env v@(Id _ _)     = compileImm env v 
compile env (Prim1 op e)   = compile env e
                          ++ [ (prim1Asm op (Reg EAX) (Const 1)]
compile env (Prim2 op v1 v2) = [ compileImm env v1 
                               , prim2asm o (Reg EAX) (immArg env v2) 
                               ]

prim2Asm Add2 = IAdd
prim2Asm Sub2 = ISub
prim2Asm Mul2 = IMul

prim1Asm Add1 = IAdd
prim1Asm Sub1 = ISub


anf :: Int -> Expr a -> (Int, AnfExpr a)
anf i p@(Number {}) = (i, p)
anf i v@(Id {})     = (i, v)
anf i (Prim1 o e)   = (i', Prim1 o e') 
  where 
    (i', e')        = anf i e

anf i (Prim2 o e1 e2) = (i2, mkLet (bs1 ++ bs2) (Prim2 o v1 v2))
  where
     (i1, (bs1, v1))  = imm i  e1
     (i2, (bs2, v2))  = imm i1 e2

anf i (Let x  e1 e2)  = (i2, Let x e1' e2')
  where 
     (i1, e1')        = anf i e1
     (i2, e2')        = anf i1 e2

anf i (If  e1 e2 e3)  = (i3, If e1' e2' e3')
  where 
     (i1, e1')        = anf i e1
     (i2, e2')        = anf i1 e2
     (i3, e3')        = anf i2 e3

imm :: Int -> Expr a -> (Int, ([(Id , AnfExpr a)] , ImmExpr a))
imm i e@(Number {}) = (i, ([], e))
imm i e@(Id {})     = (i, ([], e))
imm i e@(Prim1 {})  = immExp i e
imm i e@(If {})     = immExp i e
imm i e@(Let {})    = immExp i e

imm i (Prim2 o e1 e2) = (i2+1, ((v, Prim2 o v1 v2) : bs2 ++ bs1, Id v))
  where
    (i1, (bs1, v1)) = imm i e1
    (i2, (bs2, v2)) = imm i1 e2
    v               = mkTmpVar i2

mkTmpVar i = "tmp" ++ show i

mkLet :: [(Id , AnfExpr a)] -> AnfExpr a -> AnfExpr a
mkLet []              e = e
mkLet ((x1, e1) : bs) e = Let x1 e1 (mkLet bs e)

immExp i e  = (i' + 1, ([(v, e')], v))
  where
    (i',e') = anf e
    v       = mkTmpVar i'

Types & Strategy

Writing the type aliases:

we get the overall pipeline:

Compiler Pipeline with ANF: Types
Compiler Pipeline with ANF: Types

Transforms: Compiling AnfTagE to Asm

Compiler Pipeline: ANF to ASM
Compiler Pipeline: ANF to ASM

The compilation from ANF is easy, lets recall our examples and strategy:

Strategy: Given v1 + v2 (where v1 and v2 are immediate expressions)

  • Move v1 into eax,
  • Add v2 to eax.

where we have a helper to find the Asm variant of a Prim2 operation

and another to convert an immediate expression to an x86 argument:

QUIZ

Which of the below are in ANF ?

  • A. 1, 2, 3, 4

  • B. 1, 2, 3

  • C. 2, 3, 4

  • D. 1, 2

  • E. 2, 3

Transforms: Compiling Bare to Anf

Next lets focus on A-Normalization i.e. transforming expressions into ANF

Compiler Pipeline: Bare to ANF
Compiler Pipeline: Bare to ANF

A-Normalization

We can fill in the base cases easily

Interesting cases are the binary operations

Example: Anf-1

Left operand is not immediate

Example: ANF 1
Example: ANF 1

Key Idea: Helper Function

imm e returns ([(t1, a1),...,(tn, an)], v) where

  • ti, ai are new temporary variables bound to ANF exprs,
  • v is an immediate value (either a constant or variable)

Such that e is equivalent to

Lets look at some more examples.

Example: Anf-2

Left operand is not internally immediate

Example: ANF 2
Example: ANF 2

Example: Anf-3

Both operands are not immediate

Example: ANF 3
Example: ANF 3

ANF: General Strategy

ANF Strategy
ANF Strategy
  1. Invoke imm on both the operands
  2. Concat the let bindings
  3. Apply the binop to the immediate values

ANF: Implementation

Lets implement the above strategy

Intuitively, lets stitches together a bunch of definitions as follows:

For Let just make sure we recursively anf the sub-expressions.

Same principle applies to If

  • use anf to recursively transform the branches.

ANF: Making Arguments Immediate via imm

The workhorse is the function

which creates temporary variables to crunch an arbitrary Bare into an immediate value.

No need to create an variables if the expression is already immediate:

The tricky case is when the expression has a primop:

Oh, what shall we do when:

Lets look at an example for inspiration.

Example: ANF 4
Example: ANF 4

That is, simply

  • anf the relevant expressions,
  • bind them to a fresh variable.

One last thing: Whats up with makeFreshVar ?

Wait a minute, what is this magic FRESH ?

How can we create distinct names out of thin air?

What’s that? Global variables? Increment a counter?

Get thee behind me Satan!
Get thee behind me Satan!

We will use a counter, but will have to pass its value around

Just like doTag

and

where now, the fresh function returns a new counter and a variable

Note this is super clunky. There is a really slick way to write the above code without the clutter of the i but thats too much of a digression, but feel free to look it up yourself

Recap and Summary

Just created Boa with

  • Branches (if-expressions)
  • Binary Operators (+, -, etc.)

In the process of doing so, we will learned about

  • Intermediate Forms
  • Normalization

Specifically,

Compiler Pipeline with ANF
Compiler Pipeline with ANF