Slide 1

Slide 1 text

Programs that Write Programs Craig Stuntz https://speakerdeck.com/craigstuntz https://github.com/CraigStuntz/TinyLanguage

Slide 2

Slide 2 text

What Do Developers Do?

Slide 3

Slide 3 text

–Steve Yegge “You're actually surrounded by compilation problems. You run into them almost every day.” http://steve-yegge.blogspot.ca/2007/06/rich-programmer-food.html

Slide 4

Slide 4 text

–Greenspun’s Tenth Rule “Any sufficiently complicated C or Fortran program contains an ad hoc, informally- specified, bug-ridden, slow implementation of half of Co!"on Lisp.” https://commons.wikimedia.org/wiki/File:Philip_Greenspun_and_Alex_the_dog.jpg

Slide 5

Slide 5 text

The Hoover Dam

Slide 6

Slide 6 text

Generalize the Problem

Slide 7

Slide 7 text

–Eugene Wallingford “…compilers ultimately depend on a single big idea from the theory of computer science: that a certain kind of machine can simulate anything — including itself. As a result, this certain kind of machine, the Turing machine, is the very definition of computability.” http://www.cs.uni.edu/~wallingf/blog/archives/monthly/2015-09.html#e2015-09-03T15_26_47.htm

Slide 8

Slide 8 text

Compiler

Slide 9

Slide 9 text

Compiler Interpreter

Slide 10

Slide 10 text

code exe

Slide 11

Slide 11 text

code exe

Slide 12

Slide 12 text

Useful Bits • Regular Expressions (lexing) • Deserializers (parsing) • Linters, static analysis (syntax, type checking) • Solvers, theorem provers (optimization) • Code migration tools (compilers!)

Slide 13

Slide 13 text

A → B

Slide 14

Slide 14 text

Source code → Program JPEG file → Image on screen Source code → Potential style error list JSON → Object graph Code with 2 digit years → Y2K compliant code VB6 → C# Object graph → User interface markup Algorithm → Faster, equivalent algorithm

Slide 15

Slide 15 text

Designing with Formal Methods

Slide 16

Slide 16 text

#define D define #D Y return #D R for #D e while #D I printf #D l int #D W if #D C y=v+111;H(x,v)*y++= *x #D H(a,b)R(a=b+11;au){R(w=i=0;i<4;i++)w+=(m=v[h[i]])==f?300:m==q?-300:(t=v[ih[i]])==f?-50: t==q?50:0;Y w;}H(z,0){W(E(v,z,f,100)){c++;w= -S(d+1,n,q,0,-b,-j);W(w>j){g=bz=z; j=w;W(w#$b%&w#$8003)Y w;}}}W(!c){g=0;W(_){H(x,v)c+= *x==f?1:*x==3-f?-1:0;Y c>0? 8000+c:c-8000;}C;j= -S(d+1,n,q,1,-b,-j);)bz=g;Y d#$u-1?j+(c'(3):j;}main(){R(;t< 1600;t+=100)R(m=0;m<100;m++)V[t+m]=m<11%&m>88%&(m+1)%10<2?3:0;I("Level:");V[44] =V[55]=1;V[45]=V[54]=2;s(u);e(lv>0){Z do{I("You:");s(m);}e(!E(V,m,2,0))*m+,99); W(m+,99)lv--;W(lv<15)*u<10)u+=2;U("Wait\n");I("Value:%d\n",S(0,V,1,0,-9000,9000 ));I("move: %d\n",(lv-=E(V,bz,1,0),bz));}}E(v,z,f,o)l*v;{l*j,q=3-f,g=0,i,w,*k=v +z;W(*k==0)R(i=7;i#$0;i--){j=k+(w=r[i]);e(*j==q)j+=w;W(*j==f)*j-w+,k){W(!g){g=1 ;C;}e(j+,k)*((j-=w)+o)=f;}}Y g;}

Slide 17

Slide 17 text

Duff’s Device There Are No Edge Cases In Progra"#ing Languages send(to, from, count) register short *to, *from; register count; { register n = (count + 7) / 8; switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } }

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

1 + 2 + 3 + … + 100 = 100 * 101 / 2 = 5050

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Lexer → Regular Expressions Parser → Context Free Gra!"ar Optimizer → Algebra Type Checker → Logical Inference Rules Code Generator → Denotational Semantics

Slide 22

Slide 22 text

–Leslie Lamport “You don’t achieve simplicity by thinking in terms of complicated languages. Simplicity requires thinking abstractly before you start implementing.” http://www.heidelberg-laureate-forum.org/blog/video/lecture-monday-august-24-2015-leslie-lamport/ https://commons.wikimedia.org/wiki/File:Leslie_Lamport.jpg

Slide 23

Slide 23 text

A Few Important Concepts

Slide 24

Slide 24 text

Syntax x = x + 1; alert(x); Sequence Assign Invoke x add x 1 alert x

Slide 25

Slide 25 text

Semantics name = "Nate" # +/ "Nate" String.upcase(name) # +/ "NATE" name # +/ "Nate" name = "Nate" # +/ "Nate" name.upcase! # +/ "NATE" name # +/ "NATE" http://www.natescottwest.com/elixir-for-rubyists-part-2/

Slide 26

Slide 26 text

Semantics Imports System Namespace Hello Class HelloWorld Overloads Shared Sub Main(ByVal args() As String) Dim name As String = "VB.NET" 'See if argument passed If args.Length = 1 Then name = args(0) Console.WriteLine("Hello, " & name & "!") End Sub End Class End Namespace using System; namespace Hello { public class HelloWorld { public static void Main(string[] args) { string name = "C#"; !" See if argument passed if (args.Length == 1) name = args[0]; Console.WriteLine("Hello, " + name + "!"); } } } http://www.harding.edu/fmccown/vbnet_csharp_comparison.html

Slide 27

Slide 27 text

Front End: Understand Language Back End: Emit Code

Slide 28

Slide 28 text

Lexer IL Generator Parser Type Checker Optimizer Optimizer Object Code Generator Binder

Slide 29

Slide 29 text

OK, so let’s compile something already! module Compiler let compile = Lexer.lex 01 Parser.parse 01 Binder.bind 01 Optimize Binding.optimize 01 IlGenerator.codegen 01 Railway.map OptimizeIl.optimize 01 Railway.map Il.toAssemblyBuilder

Slide 30

Slide 30 text

(inc -1)

Slide 31

Slide 31 text

(inc -1) Ldc.i4 -1 Ldc.i4 1 Add

Slide 32

Slide 32 text

(inc -1) Ldc.i4 -1 Ldc.i4 1 Add Ldc.i4.0

Slide 33

Slide 33 text

(inc -1) Lex LeftParen, Identifier(inc), Number(-1), RightParen Parse Apply “inc” to -1 Type check “inc” exists and takes an int argument, and -1 is an int. Great! Optimize -1 + 1 = 0, so just emit int 0! IL generate Ldc.i4 0 Optimize Ldc.i4 0 → Ldc.i4.0 Object code Produce assembly with entry point which contains the IL generated

Slide 34

Slide 34 text

(defun add-1 (int x) (inc x)) (defun main () (print (add-1 2)))

Slide 35

Slide 35 text

Lexer What Problem Are We Solving? String → Sequence of tokens Non-Compiler Example Text search

Slide 36

Slide 36 text

Lexer Search “am” I am. You are.

Slide 37

Slide 37 text

Regular Expressions leftParenthesis = ‘(‘ rightParenthesis = ‘)’ letter = ‘A’ | ‘B’ | ‘C’ | … digit = ‘0’ | ‘1’ | ‘2’ | … number = (‘+’digit|‘-’digit|digit) digit* alphanumeric = letter | number !3 …

Slide 38

Slide 38 text

Lexer

Slide 39

Slide 39 text

Lexer type Lexeme = | LeftParenthesis | RightParenthesis | Identifier of string | LiteralInt of int | LiteralString of string | Unrecognized of char

Slide 40

Slide 40 text

Lexer type Lexeme = | LeftParenthesis | RightParenthesis | Identifier of string | LiteralInt of int | LiteralString of string | Unrecognized of char

Slide 41

Slide 41 text

Lexer let private prettyPrint (lexeme: Lexeme) = match lexeme with | LeftParenthesis !→ "(" | RightParenthesis !→ ")" | Identifier identifier !→ identifier | LiteralInt num !→ num.ToString() | LiteralString str !→ str | Unrecognized ch !→ ch.ToString()

Slide 42

Slide 42 text

Lexer (inc -1)

Slide 43

Slide 43 text

Lexer (inc -1) “(“ “inc” “-1” “)” LeftParenthesis Identifier(“inc”) LiteralInt(-1) RightParenthesis

Slide 44

Slide 44 text

Lexer ( -1)

Slide 45

Slide 45 text

Lexer let rec private lexChars (source: char list) : Lexeme list = match source with | '(' :: rest !→ LeftParenthesis :: lexChars rest | ')' :: rest !→ RightParenthesis :: lexChars rest | '"' :: rest !→ lexString(rest, "") | c :: rest when isIdentifierStart c !→ lexName (source, "") | d :: rest when System.Char.IsDigit d !→ lexNumber(source, "") | [] !→ [] | w :: rest when System.Char.IsWhiteSpace w !→ lexChars rest | c :: rest !→ Unrecognized c :: lexChars rest

Slide 46

Slide 46 text

Lexer http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Slide 47

Slide 47 text

Lexer http://www.regular-expressions.info/email.html “So even when following official standards, there are still trade-offs to be made. Don't blindly copy regular expressions from online libraries or discussion forums.” -Jan Goyvaerts, regular-expressions.info

Slide 48

Slide 48 text

Parser What Problem Are We Solving? Sequence of tokens → Syntax tree Non-Compiler Example Deserialization

Slide 49

Slide 49 text

PEMDAS 1 + 2 * 3 1 + (2 * 3)

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Gra"#ar := | := | := “(defun” identifier “)” := number | string | := “(” identifier “)”

Slide 52

Slide 52 text

Gra"#ar type Expression = | IntExpr of int | StringExpr of string | DefunExpr of name: string * argument: ArgumentExp | InvokeExpr of name: string * argument: Expression | IdentifierExpr of string | ErrorExpr of string | EmptyListExpr

Slide 53

Slide 53 text

Parser LeftParenthesis Identifier(“inc”) LiteralInt(-1) RightParenthesis

Slide 54

Slide 54 text

Parser LeftParenthesis Identifier(“inc”) LiteralInt(-1) RightParenthesis Invoke “inc” -1

Slide 55

Slide 55 text

Parser LeftParenthesis Identifier(“inc”) LiteralInt(-1) LiteralInt(-1)

Slide 56

Slide 56 text

Parser LeftParenthesis Identifier(“inc”) LiteralInt(-1) LiteralInt(-1) “Expected ‘)’”

Slide 57

Slide 57 text

Parser let rec private parseExpression (state : ParseState): ParseState = match state.Remaining with | LeftParenthesis :: Identifier "defun" :: Identifier name :: rest !→ let defun = parseDefun (name, { state with Remaining = rest }) match defun.Expressions, defun.Remaining with | [ ErrorExpr _ ], _ !→ defun | _, RightParenthesis :: remaining !→ { defun with Remaining = remaining } | _, [] !→ error ("Expected ')'.") | _, wrong :: _ !→ error (sprintf "Expected ')'; found %A." wrong) | LeftParenthesis :: Identifier name :: argumentsAndBody !→ let invoke = parseInvoke (name, { state with Remaining = argumentsAndBody }) match invoke.Remaining with | RightParenthesis :: remaining !→ { invoke with Remaining = remaining } | [] !→ error ("Expected ')'.") | wrong :: _ !→ error (sprintf "Expected ')'; found %A." wrong) | LeftParenthesis :: wrong !→ error (sprintf "%A cannot follow '('." wrong)

Slide 58

Slide 58 text

Parser

Slide 59

Slide 59 text

–Guy Steele “If it's worth telling another progra!"er, it's worth telling the compiler, I think.” https://joshvarty.wordpress.com/2015/08/03/learn-roslyn-now-part-11-introduction-to-code-fixes/

Slide 60

Slide 60 text

Parser https://joshvarty.wordpress.com/2015/08/03/learn-roslyn-now-part-11-introduction-to-code-fixes/

Slide 61

Slide 61 text

Scope What Problem Are We Solving? What does “x” mean right now? Non-Compiler Example Bounded Context in Domain Driven Design

Slide 62

Slide 62 text

Scope https://msujaws.wordpress.com/2011/05/03/static-vs-dynamic-scoping/

Slide 63

Slide 63 text

Binding InvokeExpr “inc” -1

Slide 64

Slide 64 text

InvokeBinding { FunctionName = "inc" Function = Inc Argument = IntBinding -1} Binding InvokeExpr “inc” -1

Slide 65

Slide 65 text

InvokeExpr { Name = "not-a-function" Argument = StringExpr "" } Binding

Slide 66

Slide 66 text

InvokeExpr { Name = "not-a-function" Argument = StringExpr "" } Binding “Undefined function ‘not-a-function’.”

Slide 67

Slide 67 text

https://msdn.microsoft.com/en-us/library/ms228296.aspx?f=255&MSPPError=-2147217396

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

About Those Errors [] member this.``should return error for unbound invocation``() = let source = "(bad-method 2)" let expected = ErrorBinding ( "Undefined function 'bad-method'.", EmptyBinding) let actual = bind source actual |> should equal expected

Slide 70

Slide 70 text

About Those Errors http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488?pgno=2

Slide 71

Slide 71 text

About Those Errors • Die in a fire http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488?pgno=2

Slide 72

Slide 72 text

About Those Errors • Die in a fire • Guess what I meant, not what I said http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488?pgno=2

Slide 73

Slide 73 text

About Those Errors • Die in a fire • Guess what I meant, not what I said • Poisoning http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488?pgno=2

Slide 74

Slide 74 text

Type Checking What Problem Are We Solving? AST → Boolean “Is it valid?” Non-Compiler Example Linter

Slide 75

Slide 75 text

Type Checking ldstr "Hi" ldstr "Hi" div This is bad. Don’t do this.

Slide 76

Slide 76 text

Type Inference Rules Γ ⊢ A Γ ⊢ B Γ ⊢ A×B Γ ⊢ v1 :Int Γ ⊢ v2 :Int Γ ⊢ v1 +v2 :Int

Slide 77

Slide 77 text

Type Checking • Statically typed • Unityped (“dynamic language”) • Untyped

Slide 78

Slide 78 text

Type Checking let rec private toBinding (environment: Map) match expression with | IntExpr n !→ IntBinding n | StringExpr str !→ String Binding str

Slide 79

Slide 79 text

Type Checking | InvokeExpr (name, argument) !→ match environment.TryFind name with | Some (Function Binding func) !→ let argumentBinding = toInvokedArgumentBinding environment argument match argumentTypeError argumentBinding func with | None !→ InvokeBinding { FunctionName = name Function = func Argument = argumentBinding } | Some argumentTypeErrorMessage !→ ErrorBinding (argumentTypeErrorMessage, EmptyBinding) | Some bindingType !→ ErrorBinding (sprintf "Expected function; found %A" bindingType, EmptyBinding) | None !→ ErrorBinding (sprintf "Undefined function '%s'." name, EmptyBinding)

Slide 80

Slide 80 text

InvokeExpr { Name = "inc" Argument = StringExpr “Oops!" } Type Checking

Slide 81

Slide 81 text

InvokeExpr { Name = "inc" Argument = StringExpr “Oops!" } Type Checking “Expected integer; found ‘Oops!’.”

Slide 82

Slide 82 text

Optimizers What Problem Are We Solving? Program → Faster, but equivalent program Non-Compiler Example Theorem prover

Slide 83

Slide 83 text

Optimization (I) InvokeBinding “inc” -1

Slide 84

Slide 84 text

Optimization (I) InvokeBinding “inc” -1 IntBinding 0

Slide 85

Slide 85 text

Optimization (I) Invoke “some-method” -1

Slide 86

Slide 86 text

Optimization (I) Invoke “some-method” -1 Invoke “some-method” -1

Slide 87

Slide 87 text

Optimization (I) let private optimizeInc (binding: Binding) : Binding = match binding with | IncBinding (IntBinding number) !→ IntBinding (number + 1) | IncBinding _ | BoolBinding _ | IntBinding _ | String Binding _ | VariableBinding _ | Function Binding _ | InvokeBinding _ | DefBinding _ | ErrorBinding _ | EmptyBinding _ !→ binding

Slide 88

Slide 88 text

IL Generation IntBinding 0

Slide 89

Slide 89 text

IL Generation IntBinding 0 Ldc.i4 0

Slide 90

Slide 90 text

IL Generation IntBinding 0 Ldc.i4 0 Ldc.i4.0

Slide 91

Slide 91 text

IL Generation let rec private codegenBinding (binding : Binding) = match binding with | BoolBinding b !→ match b with | true !→ [Ldc_I4_1] | false !→ [Ldc_I4_0] | IntBinding n !→ [Ldc_I4 n] | String Binding s !→ [Ldstr s] | !" …

Slide 92

Slide 92 text

IL Generation let private writeLineMethod = typeof.GetMethod( "WriteLine", [| typeof |] let private codegenOper = function | IncInt !→ [ Instruction.Ldc_I4_1 Instruction.Add ] | WriteLine !→ [ Instruction.Call writeLineMethod ]

Slide 93

Slide 93 text

Optimization (II) Ldc.i4 0

Slide 94

Slide 94 text

Optimization (II) Ldc.i4 0 Ldc.i4.0

Slide 95

Slide 95 text

Optimization (II) let private optimalShortEncodingFor = function | Ldc_I4 0 !→ Ldc_I4_0 | Ldc_I4 1 !→ Ldc_I4_1 | Ldc_I4 2 !→ Ldc_I4_2 | Ldc_I4 3 !→ Ldc_I4_3 | Ldc_I4 4 !→ Ldc_I4_4 | Ldc_I4 5 !→ Ldc_I4_5 | Ldc_I4 6 !→ Ldc_I4_6 | Ldc_I4 7 !→ Ldc_I4_7 | Ldc_I4 8 !→ Ldc_I4_8 | Ldloc 0 !→ Ldloc_0 | Ldloc 1 !→ Ldloc_1 | Ldloc 2 !→ Ldloc_2 | Ldloc 3 !→ Ldloc_3 | Ldloc i when i :; maxByte !→ Ldloc_S(Convert.ToByte(i))

Slide 96

Slide 96 text

Special Tools!

Slide 97

Slide 97 text

Compare!

Slide 98

Slide 98 text

https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf

Slide 99

Slide 99 text

Trusting Trust Compiler Executable Compiler Source Code Compiler Executable

Slide 100

Slide 100 text

Trusting Trust Compiler Executable Compiler Source Code Trojaned Compiler Executable Trojan Code

Slide 101

Slide 101 text

Trusting Trust Trojaned Compiler Executable Benign App Source Code Trojaned App Executable

Slide 102

Slide 102 text

Trusting Trust Trojaned Compiler Executable (Benign!) Compiler Source Code Trojaned Compiler Executable

Slide 103

Slide 103 text

Conclusion

Slide 104

Slide 104 text

Further Reading

Slide 105

Slide 105 text

Further Reading • Progra!"ing Language Concepts, by Peter Sestoft • Modern Compiler Implementation in ML, by Andrew W. Appel • miniml (608 line implementation of ML subset), by Andrej Bauer • Coursera Compilers Course, by Alex Aiken

Slide 106

Slide 106 text

Craig Stuntz @craigstuntz [email protected] https://www.craigstuntz.com https://www.meetup.com/Papers-We-Love-Columbus/ https://speakerdeck.com/craigstuntz https://github.com/CraigStuntz/TinyLanguage