Slide 1

Slide 1 text

CRAIG STUNTZ PROJECT DATE CLIENT 2015.01.08 YOUR FLYING CAR IS READY AMAZING PROGRAMMING TOOLS FROM THE FUTURE, TODAY! https://www.flickr.com/photos/ellenm1/7847402208

Slide 2

Slide 2 text

SLIDES https://speakerdeck.com/craigstuntz/ Slides are already online. Also, call me on jargon. Ask “who cares?”

Slide 3

Slide 3 text

“THE FUTURE IS ALREADY HERE — IT’S JUST NOT VERY EVENLY DISTRIBUTED.” WILLIAM GIBSON I have a lot of quotes on slides. I’ll give you some time to read them, but I won’t read them aloud. Spoilers alert! Here’s the whole talk. 1. The software we use today is mostly broken; solves the wrong problem incorrectly. 2. Software is broken for a reason: we verify the wrong things. 3. It’s possible to produce software which solves harder problems and isn’t broken. 4. The way to do better exists and is used to produce software you use every day.

Slide 4

Slide 4 text

“A LANGUAGE THAT DOESN'T AFFECT THE WAY YOU THINK ABOUT PROGRAMMING, IS NOT WORTH KNOWING.” ALAN PERLIS EPIGRAMS IN PROGRAMMING https://www.flickr.com/photos/randyread/2385812579 Will talk about tools most people don’t know exist, when it makes sense to use them. When not. “All languages are the same, just syntax.” I disagree.

Slide 5

Slide 5 text

Like Testing, But Better http://fsharpforfunandprofit.com/posts/low-risk-ways-to-use-fsharp-at-work-3/ Why is this so? Consider unit tests. You’ve all written unit tests… right? Devil’s advocate: Why are unit tests useful? They’re just “more code.” Doesn’t that make maintenance worse? Value is approaching problem differently; tests value verification of correctness over working algorithm New languages do this even better!

Slide 6

Slide 6 text

“You theorized a machine that could solve any problem. It didn’t just do one thing; it did everything.” (fictional) Joan Clarke to (fictional) Alan Turing The Imitation Game (2014) http://theimitationgamemovie.com/#blog/104786411214 Misunderstanding Turing Completeness Here’s a quote from a Hollywood movie. This is fiction. The real Joan Clarke was way too smart to say this. Turing never claimed his machines could solve any problem. To the contrary, his purpose was to prove that problems existed which they could not solve! More important for this talk: A Turing complete language can express any computable algorithm, but it cannot help you find that algorithm! Languages are more powerful when they help you think differently. (Quine example.)

Slide 7

Slide 7 text

“SOMETIMES WE DON’T PROGRAM TO SHIP; WE PROGRAM TO UNDERSTAND PROGRAMMING.” NADA AMIN PROGRAMMING SHOULD EAT ITSELF Do programming languages exist to produce programs? You can create a program without a PL, though it’s harder. We program not for its own sake (mostly) but to solve business problems. PLs and compilers produce exe code, yes, but find syntax errors, semantic errors, and are “tools of thought.” My real goal here: To expand the set of problems you think you can solve with programming. To do that, you need new ways of approaching a language, not just tooling.

Slide 8

Slide 8 text

Note Taking in F# When I approach a very difficult problem, I take notes in F#. This particular example (fine if it’s unreadable; details insignificant) concerns business rules inferred from GBs of XML files. This is living data, XML changing under my feet, so executing the F# tells me if my assumptions are still valid in “current” XML. The F# code exists not to produce a result but to validate my thinking about the problem. Nice side effect: It turns out other people also want the ability to run arbitrary queries on this data!

Slide 9

Slide 9 text

What Is the Upper Limit of Software Quality? function three() { return 1 + 2; } Want to show you a function I wrote. I’ll apologize in advance; I’ve been warned to avoid using examples involving math. What’s interesting? It’s perfect! This is the only defect-free JS I’ll be showing you today. The notion of “perfect” code is controversial. But it’s clearly possible! How much quality are we willing to pay for? Does it depend on the application?

Slide 10

Slide 10 text

“LIFE WAS SIMPLE BEFORE WORLD WAR II. AFTER THAT, WE HAD SYSTEMS.” GRACE HOPPER Perfect code is trivial. Perfect programs, systems harder. Composing code harder than writing code in most cases. Why? This is essential! Not enough to write correct functions unless they’re all total. There are always external factors. That’s fine.

Slide 11

Slide 11 text

“BEWARE OF BUGS IN THE ABOVE CODE; I HAVE ONLY PROVED IT CORRECT, NOT TRIED IT.” DONALD KNUTH NOTES ON THE VAN EMDE BOAS CONSTRUCTION OF PRIORITY DEQUES: AN INSTRUCTIVE USE OF RECURSION https://www.flickr.com/photos/gem66/38298868 Dangerous ideas! I'll be showing a lot of languages which are “still in the lab.” You may find some of this useful in your work tomorrow, but not all experiments succeed. I’ll ask you to make a choice, though: Which side are you on? Do you believe software must be forever buggy? Or do we attempt to come closer to correct software?

Slide 12

Slide 12 text

In particular the lab is Microsoft Research. Many people know Kinect, WorldWide Telescope, F#, Entity Framework, Pex.

Slide 13

Slide 13 text

“IF YOU’RE GOING TO USE CUTTING EDGE TECHNOLOGY, DON’T EXPECT NICE BLOG POSTS THAT TELL YOU IT’S EASY.” JOE ARMSTRONG CHICAGO ERLANG PRESENTATION https://www.flickr.com/photos/vanchett/3180276972 I have a rule of thumb for application architecture. Consider the tech you want to be using in 5 years, because it takes time to… Try to see into the future. This is hard! Never specify tech the Hacker News Hipsters tell you that you should be using today. However, every tool I discuss is real and is used in production software, including software you might use every day.

Slide 14

Slide 14 text

“YOUNG MAN, IN MATHEMATICS YOU DON'T UNDERSTAND THINGS. YOU JUST GET USED TO THEM.” JOHN VON NEUMANN LETTER TO FELIX T. SMITH https://www.flickr.com/photos/36621927@N00/8378574271 These languages operate very differently than those you probably use in your day to day work. Don't worry if you don't follow every bit of syntax. To be quite honest, I don't fully understand all of this stuff myself. The important thing is to know what is available, and to think about problems in new ways.

Slide 15

Slide 15 text

Some Specialized Languages Assembly SQL C F# C#? I’ll be talking about specialized tools. We’re biased towards general purpose languages so we can learn only one. But we happily use SQL when needed. We abandon GP languages when inessential complexity too high. We grow domain-specific languages to GP when necessary. So when you see a language which doesn’t produce EXEs, don’t dismiss it out of hand.

Slide 16

Slide 16 text

“IF DEBUGGING IS THE PROCESS OF REMOVING SOFTWARE BUGS, THEN PROGRAMMING MUST BE THE PROCESS OF PUTTING THEM IN.” ATTRIBUTED TO EDSGER DIJKSTRA Sounds like Dijkstra! Let’s talk about bugs. Broken code should be obvious. Cognitive overhead from inessential complexity turns out to be surprisingly high. Let’s examine some buggy code.

Slide 17

Slide 17 text

JavaScript function add(a, b) { return a + b; } In study of buggy code, makes sense to start with JavaScript. In contrast to earlier example, completely broken. No error. Anyone know what it returns? JS, so we never specified the return type. Type checker would find. A test might find the bug This is not a good part

Slide 18

Slide 18 text

http://blog.erratasec.com/2014/09/bash-bug-as-big-as-heartbleed.html#.VCN_7StdWwE PLs are usually specified in EBNF. Machine verifiable specs are easy; bash doesn’t use them and has evolved to the point where can’t be specified. So there are edge cases…. Syntax which should not be allowed. This causes issues. Also Ruby. There’s a file in Ruby source: parser.y impenetrable. Half the size of all of Lua for parser alone. Other implementations are probably different than MRI. Formal PL grammars keep parsers maintainable.

Slide 19

Slide 19 text

Goto Fail static OSStatus SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, uint8_t *signature, UInt16 signatureLen) { OSStatus err; ... if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; ... fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; } Besides the obvious… “fail” is not always fail, and “err” is not always err. Crappy code, maybe, but real production code, and not the worst I’ve ever seen.

Slide 20

Slide 20 text

"Quality software costs money — Heartbleed was free." Paul-Henning Kamp DOI:10.1145/2631095 Not a TLS spec bug (though those can happen). Possibly creeping featuritis in spec. Also, strings are kind of broken. That’s true for most languages, not just C.

Slide 21

Slide 21 text

C# static Type GetType() where T : new() { T t = new T(); return t.GetType(); // * } static void Main(string[] args) { Console.WriteLine(GetType().ToSting()); } Appears contrived, but useful, because type system is broken and it fits on a slide. * line throws. Deeply weird that you can new up something and can’t ask for its type. Type systems help, but not 100% safe.

Slide 22

Slide 22 text

F# let average (someList: int list) = (List.sum someList) / (List.length someList) Type system is restricted. There is no type for a non-empty list. Could check for empty in code, but requires code + test. Shouldn’t have to examine result to determine if the call made sense in the first place. Better type system could do it for us. Empty list is to null as nil is to reference type, in at least some, but not all, cases!

Slide 23

Slide 23 text

CAN WE DO BETTER? OR MUST WE DO BETTER? https://www.flickr.com/photos/jurvetson/5872448596 How? Remember: Which side are you on? Do you believe software must be forever buggy?

Slide 24

Slide 24 text

“Attempting to prove any nontrivial theorem about your program will expose lots of bugs. “The particular choice of theorem makes little difference! “Typechecking is good because it proves lots and lots of little theorems about your program.” –Benjamin C. Pierce http://www.cis.upenn.edu/~bcpierce/papers/harmful-mfps.pdf Define theorem. Quote is interesting. Reminds me of what people say about testing. Use strong types! Simple example: In F#, nullable and non-nullable references are separate types, and this eliminates null reference errors in pure F# code. There is a deep relationship between programs and mathematical proofs. Talk to me after, but types good! Strong types (especially for real strong types) are awesome for refactoring. Slash + burn. Don’t fix the bug. Change the data types to make the state which caused it impossible. (Paul Phillips) Powerful idea! C# types default to invalid state; lots of work to only allow state which is correct by construction.

Slide 25

Slide 25 text

Hoare Verification {P} C {Q} Partial vs. total correctness Considered to be high-effort; 100/100 Tony Hoare (anyone?), Algol (who has heard of Algol?) Anyone ever seen a null reference error? One can totally specify software. Precondition->Command->Postcondition

Slide 26

Slide 26 text

“PROGRAM TESTING CAN BE USED TO SHOW THE PRESENCE OF BUGS, BUT NEVER THEIR ABSENCE.” EDSGER DIJKSTRA STRUCTURED PROGRAMMING http://en.wikipedia.org/wiki/Edsger_W._Dijkstra#mediaviewer/File:Edsger_Dijkstra_1994.jpg Tests are ∃, strong types are ∀. Tests are a weak form of static typing. Useful when static typing too hard (fixable) or when static typing can’t deal with imprecise spec. “Morally well-typed.” "Type errors are not just red flags: in a sufficiently well-specified theory, all errors are type errors." Evan Jenkins Testing is great; property-based testing (QuickCheck, etc.) even better Testing is evidence, not a proof Let’s expand on this….

Slide 27

Slide 27 text

Effort vs. Reward I have a slightly different definition of test coverage than most. I don’t care which lines of code are executed nearly as much as I care about covering the possible states of the system. Covering dead code is pointless. (click) For simple theorems, like ‘are the arguments to this method all non-null,’ many people don’t bother testing, because there are so many, which is why we have so many null reference errors. Inferred types in a rich type system cover this for free. (click) OTOH, total state space coverage for complex theorems (“I am following SSL protocol to the letter”) with types can be difficult, although the effort pays off when needed. Tests might be “good enough.” Types always cover all states, but complexity starts very low and extends to very high. Tests are in middle on coverage, effort, and complexity.

Slide 28

Slide 28 text

Effort vs. Reward Tests Types I have a slightly different definition of test coverage than most. I don’t care which lines of code are executed nearly as much as I care about covering the possible states of the system. Covering dead code is pointless. (click) For simple theorems, like ‘are the arguments to this method all non-null,’ many people don’t bother testing, because there are so many, which is why we have so many null reference errors. Inferred types in a rich type system cover this for free. (click) OTOH, total state space coverage for complex theorems (“I am following SSL protocol to the letter”) with types can be difficult, although the effort pays off when needed. Tests might be “good enough.” Types always cover all states, but complexity starts very low and extends to very high. Tests are in middle on coverage, effort, and complexity.

Slide 29

Slide 29 text

Effort vs. Reward Tests Types Simple Theorems Low effort per theorem, but there are lots of simple theorems! Covers a few states Very low effort, especially if types inferred Covers all states I have a slightly different definition of test coverage than most. I don’t care which lines of code are executed nearly as much as I care about covering the possible states of the system. Covering dead code is pointless. (click) For simple theorems, like ‘are the arguments to this method all non-null,’ many people don’t bother testing, because there are so many, which is why we have so many null reference errors. Inferred types in a rich type system cover this for free. (click) OTOH, total state space coverage for complex theorems (“I am following SSL protocol to the letter”) with types can be difficult, although the effort pays off when needed. Tests might be “good enough.” Types always cover all states, but complexity starts very low and extends to very high. Tests are in middle on coverage, effort, and complexity.

Slide 30

Slide 30 text

Effort vs. Reward Tests Types Simple Theorems Low effort per theorem, but there are lots of simple theorems! Covers a few states Very low effort, especially if types inferred Covers all states Complex Theorems Medium effort Covers a few, important states High effort Covers all states I have a slightly different definition of test coverage than most. I don’t care which lines of code are executed nearly as much as I care about covering the possible states of the system. Covering dead code is pointless. (click) For simple theorems, like ‘are the arguments to this method all non-null,’ many people don’t bother testing, because there are so many, which is why we have so many null reference errors. Inferred types in a rich type system cover this for free. (click) OTOH, total state space coverage for complex theorems (“I am following SSL protocol to the letter”) with types can be difficult, although the effort pays off when needed. Tests might be “good enough.” Types always cover all states, but complexity starts very low and extends to very high. Tests are in middle on coverage, effort, and complexity.

Slide 31

Slide 31 text

“WHAT’S TRUE OF EVERY BUG FOUND IN THE FIELD? IT HAS PASSED THE TYPE CHECKER… AND ALL OF THE TESTS.” RICH HICKEY SIMPLE MADE EASY https://www.flickr.com/photos/grouperkun/5351080866 Simplicity, conceptual clarity, is the silver bullet, not languages. "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” Brian Kernighan When you diverge from essential complexity, you’re creating a maintenance problem.

Slide 32

Slide 32 text

Z3 http://rise4fun.com/Z3 http://z3.codeplex.com/ https://www.flickr.com/photos/laughingsquid/102654032 So here’s your flying car! Z3 is a theorem prover. Sounds like math, but stay with me. When you hear “theorem prover,” think really strong types. Equivalent. Specification = always true about a system. “Formal specification” = verifiable by a machine. SMT solver. Take some specs, simplify them algebraically, and efficiently prove the spec satisfiable or not, with examples. Z3/SMT-LIB is ASM of solvers. Usually not used directly. Will see examples of systems which use it.

Slide 33

Slide 33 text

A Simple Problem Write a Ruby program that determines the smallest three digit number such that when said number is divided by the sum of its digits the answer is 20. Example: Number = 123. Sum of digits = 6 123/6 = 20.5, so not a solution. Hate it when speakers read slides out loud, but… Picked this because it’s a very simple problem.

Slide 34

Slide 34 text

Ruby Most people would try something like… Seems reasonable, but it’s brute force solution. Have we fully understood the problem? Also, this is wrong. You probably have to be fairly good with Ruby to figure out why. Ruby is complicated; just try and parse it! Cognitive overhead for even a simple problem is very high!

Slide 35

Slide 35 text

Ruby Looks efficient! Is this right? Remember, I like to put buggy code on my slides! Is it the best solution? Do you want this in the code you maintain?

Slide 36

Slide 36 text

SMT-LIB I know, (). SMT-LIB language used to compare/benchmark solvers. You don’t typically use this for production. Minimal interface to Z3. Did live in rehearsal. Awkward! See me in person for demos. “Formal” spec for most of problem. Machine checkable. Omitted one part.

Slide 37

Slide 37 text

SMT-LIB Variable?

Slide 38

Slide 38 text

SMT-LIB Test-only programming. Does forall make sense?

Slide 39

Slide 39 text

SMT-LIB

Slide 40

Slide 40 text

http://rise4fun.com/Z3/7VZh Had to write digit-sum

Slide 41

Slide 41 text

Note that the model is valid SMT-LIB code. Optimized! Really important. Complex definitions tend to be wrong when first written out. They can also be complete nonsense!

Slide 42

Slide 42 text

Add one clause

Slide 43

Slide 43 text

Not “can’t find.” Doesn’t exist.

Slide 44

Slide 44 text

Who uses this? Hyper-V hypervisor. If this is wrong the world ends. 100000 lines C, 5000 lines x64 ASM Complex implementation, fairly simple spec. Around 1.5 person years, incl learn VCC. 18 hours execution. Xen flaw to be disclosed Wednesday. Also Dafny, F*, etc.

Slide 45

Slide 45 text

DAFNY http://rise4fun.com/Dafny http://research.microsoft.com/dafny https://www.flickr.com/photos/marcusjb/440973101/

Slide 46

Slide 46 text

Dafny Useful for education. Correctness more important than executability. Looks like code contracts but proven at compile time! Imperative code. We often ask what might go wrong with our code. Instead we should ask what must go right? Solver proves that mathematical and imperative definitions equivalent. Important, especially for optimization. Similar to 180 example.

Slide 47

Slide 47 text

Who uses Dafny? Rice University “Reasoning about algorithms”

Slide 48

Slide 48 text

https://www.flickr.com/photos/ayman/21226117 F* based on F#. Subsumes F7 and other MSR projects.

Slide 49

Slide 49 text

Append function for length-indexed list. Heavy effort, heavy return. Remember C# variance annotations: Useful, even if you don’t write them!

Slide 50

Slide 50 text

Interesting use of F7/F* is MiTLS, a fully verified implementation of TLS. Verifies both TLS specification itself and MiTLS implementation.

Slide 51

Slide 51 text

F7 source for miTLS. This will be verified, then dependent types “compiled away.” Result is…

Slide 52

Slide 52 text

Compiles to (correct) F#.

Slide 53

Slide 53 text

Funny thing about formally verifying specs. Sounds awesome that the code meets spec.

Slide 54

Slide 54 text

TLA+ http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html "Writing is nature’s way of letting you know how sloppy your thinking is." Richard Guindon "Mathematics is nature’s way of letting you know how sloppy your writing is.... Formal mathematics is nature’s way of letting you know how sloppy your mathematics is." Leslie Lamport "Specification is not an end in itself; it is just a tool that an engineer should be able to use when appropriate." p. 76 "TLA+ is particularly effective at revealing concurrency errors—ones that arise through the interaction of asynchronous components." TLA book, p. 76.

Slide 55

Slide 55 text

http://lorinhochstein.wordpress.com/2014/06/04/crossing-the-river-with-tla/ TLA+ exhaustively and maybe not totally efficiently checks all possible states of your system. Unlike QuickCheck, it doesn’t do reducing. Also unlike QC, it forces you to specify your system in probably the simplest possible form.

Slide 56

Slide 56 text

Nice feature: Can produce very pretty output via LaTeX. Equivalent of previous ASCII.

Slide 57

Slide 57 text

http://somethingofthatilk.com/index.php?id=135

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

Who Uses TLA+? http://research.microsoft.com/en-us/um/people/lamport/tla/formal-methods-amazon.pdf Back to the real world: Anything on AWS: Netflix, Heroku

Slide 62

Slide 62 text

WHAT HAVE WE LEARNED? Thinking about specification, and formal specs keep you honest! Force you to consider whole problem. Make a spec which is internally consistent. Double entry check vs. code. Useful when problem domain too large (AWS) or too complex (Ruby) to test. Proves optimized code equivalent to readable code.

Slide 63

Slide 63 text

ARE FLYING CARS A BAD IDEA? https://www.flickr.com/photos/bobjagendorf/4934950194/ Tooling is an issue. Proving production code matches spec can be challenging. __agl verify ECC C code Impractical for complex systems. Good when it makes you simplify! Exhaustive testing, when possible, can give you similar return for less effort. Not always possible.

Slide 64

Slide 64 text

Gratitude The people of Microsoft Research Others I’ve learned from SMT-LIB: Laurentiu Nicola (blog comment) Dafny: Swarat Chaudhuri’s articles TLA+: Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and Michael Deardeuff and Lorin Hochstein’s blog Photographers (credited on each slide where used) My family, employer, and coworkers, for putting up with me spending time on this stuff

Slide 65

Slide 65 text

CRAIG STUNTZ @CraigStuntz [email protected] http://blogs.teamb.com/craigstuntz http://www.meetup.com/Papers-We-Love-Columbus/ Least interesting part, but…. Questions?