Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Analysis of the Redesign of the CoffeeScript Compiler

An Analysis of the Redesign of the CoffeeScript Compiler

88d24101a5653f4b98c363c6a05acc6a?s=128

Michael Ficarra

November 30, 2012
Tweet

Transcript

  1. Michael Ficarra An Analysis of the Redesign of the CoffeeScript

    Compiler
  2. CoffeeScript http://coffeescript.org "a little language that compiles into JavaScript" I

    prefer "an alternate syntax for idiomatic JS"
  3. Michael Ficarra /michaelficarra • CoffeeScript maintainer ◦ worked on jashkenas/coffee-script

    for years ◦ influential in the language's development • contribute to many ECMAScript projects ◦ constellation/escodegen ◦ constellation/esmangle ◦ documentcloud/underscore ◦ kriskowal/es5-shim • and plenty of my own -- check them out
  4. http://www.kickstarter.com/projects/michaelficarra/make-a-better-coffeescript-

  5. Project Goals • separation of concerns ◦ modularity ◦ use

    and expose standardised IRs • bug fixes ◦ especially two-pass symbol generation • source maps • better error reporting • mild extensibility ◦ support multiple (similar) compilation targets ◦ syntax extension is out of scope
  6. Where do we start? • Definitions: define the language ◦

    jashkenas/coffee-script is overly permissive ▪ loosely defines the language as whatever passes through the compiler without an error ▪ these need to be disallowed ◦ jashkenas/coffee-script is sometimes too restrictive ▪ mostly due to parser failings ▪ these need to be allowed $ coffee -bep 'a is b and c = d' var c; a === b && (c = d); $ coffee -bep 'fn ->, ->' Error: Parse error on line 1: Unexpected ','
  7. Where do we start? • Definitions: define the language with

    ◦ consistent syntactic rules ◦ consistent semantics to go with them ◦ an AST format that can represent CoffeeScript programs • Process ◦ break down compilation into individual components ◦ provide an interface for composition
  8. Parser Preprocessor Independent Components CS context free CS Code Generator

    CS AST JS AST context free CS CS AST JS AST JS + source map Compiler
  9. CS Code Generator Independent Components Analysis CS AST CS AST

    CS CS AST Optimiser Predicate Yes / No CS AST
  10. Syntax Formatter Compositions CS jashkenas/coffee-script CS JS CS • preprocessor

    • parser • compiler • JS code generator • discard the source map • preprocessor • parser • CS code generator
  11. CLI: Composition and I/O output destination: --output CS context free

    CS CS AST JS AST CS JS + source map JavaScript: --js source map: --source-map input source: (defaults to stdin) --input --cli preprocessed: (not standardised) N/A parsed: --parse compiled: --compile CoffeeScript: --cscodegen
  12. None
  13. None
  14. None
  15. None
  16. None
  17. • Chose to generate the parser from a parsing expression

    grammar (PEG) • Upsides of PEGs ◦ operates in time linear to input length ◦ better error reporting ▪ can enumerate all valid inputs following read position ◦ good JS tooling support available at the time ◦ fully describe the syntax of the language in one place ▪ no separate lexer Parsing
  18. None
  19. None
  20. None
  21. None
  22. None
  23. • Chose to generate the parser from a parsing expression

    grammar (PEG) • Downsides of PEGs ◦ not runtime extensible like parser combinators ▪ builds parsers from other parsers ▪ built at runtime, so may be overridden or extended ◦ can only accept context-free languages ▪ parser for context-sensitive languages needs an additional stack ▪ PDA accepts context-free languages ▪ LBA is needed to accept context-sensitive languages Parsing
  24. • one really simple job ◦ keep stack of context

    tokens as input is read ◦ insert context boundary markers context boundaries: • additional benefits ◦ assures pairing chars are paired before parsing ◦ enforces consistent indentation style Preprocessing (INDENT) (DEDENT) " " """ """ { } ` ` ' ' ''' ''' ( ) #{ } / / /// /// [ ] # (line terminator) ### ###
  25. None
  26. https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API

  27. Spidermonkey AST Example ariya/esprima input: { block: statement } ariya/esprima

    output: { type: 'Program', body: [ { type: 'BlockStatement', body: [ { type: 'LabeledStatement', label: { type: 'Identifier', name: 'block' }, body: { type: 'ExpressionStatement', expression: { type: 'Identifier', name: 'statement' } } } ] } ] } ariya/esprima input: ({object: expression}) ariya/esprima output: { type: 'Program', body: [ { type: 'ExpressionStatement', expression: { type: 'ObjectExpression', properties: [ { type: 'Property', key: { type: 'Identifier', name: 'object' }, value: { type: 'Identifier', name: 'expression' }, kind: 'init' } ] } } ] }
  28. Spidermonkey AST Tools ariya/esprima JS AST JS yahoo/istanbul JS AST

    (instrumented) • ECMAScript 5 parser • extremely true to spec. ◦ aside from some minor restrictions around early errors • harmony branch • instruments Spidermonkey AST for code coverage • instrumented code produces standardised report (LCOV) JS AST
  29. Spidermonkey AST Tools constellation/escodegen JS AST mozilla/sweet.js JS AST •

    JS code generator • configurable formatting with minification defaults • guarantees parse(gen(tree)) == tree • result of Tim Disney's Mozilla internship • Creates augmented parser using user-provided macro definitions JS (using macros) JS macro defs.
  30. Spidermonkey AST Tools constellation/esmangle JS AST • generates semantically equivalent,

    syntactically minimal AST • more difficult (and fun) than it sounds • name mangling • constant folding • fixed-point evaluation of set of declarative rules • 2 phases ◦ AST simplification rules ▪ !!!a => !a ◦ syntactic simplification (AST expansion) rules ▪ a.Infinity => a[1/0] ▪ true => !0 • declarative rule specification is extensible and modular JS AST
  31. None
  32. Spidermonkey AST Tools constellation/estraverse • extracted from esmangle project •

    escodegen also uses it • provides AST traversal functions • implements simple visitor pattern on Spidermonkey AST pufuwozu/brushtail • tail call elimination on spidermonkey ASTs • uses estraverse and escope constellation/escope • extracted from esmangle project • provides static scope analysis • predicates such as ◦ isStatic (detects global, with, presence of direct eval) ◦ isArgumentsMaterialized • you probably don't know catch variables are block scoped in JS ◦ escope does ◦ (and CoffeeScript fixes this for you anyway)
  33. Spidermonkey AST • not perfect ◦ some trees are impossible

    syntactic constructs { type: 'IfStatement', test: ..., consequent: { type: 'IfStatement', test: ..., consequent: ..., alternate: null }, alternate: ... } ◦ no way to represent directive statements • still better than alternatives ◦ adoption has hit critical mass ◦ interop with those tools is too valuable
  34. Use Standardised IRs! • take advantage of other open source

    projects • your users can extract parts of your project • in case of jashkenas/coffee-script ◦ compiler and parser/rewriter are highly coupled ◦ code generation is intermixed with compilation ◦ code gen bugs are common ◦ code gen logic is strewn throughout the compiler ◦ no consistent concept of target's syntax ▪ statement vs. expression ({} is different in different positions) ▪ operator precedence ▪ special syntactic constructs (esp. surrounding `new` operator) ▪ significant whitespace
  35. None
  36. None
  37. Doing it Right • esprima • acorn • estraverse •

    escope • escodegen • esmangle • brushtail • Sweet.js • istanbul • ibrik • code painter • LLJS • RumCoke • JSX
  38. Calling You Out • TypeScript • ClojureScript • UglifyJS •

    UglifyJS2 (sigh) • Dart • Google Closure Compiler • Roy (soon!) • LiveScript (soon!) • jashkenas/coffee-script
  39. Optimisation / Compilation • declarative rule specification ◦ inherently extensible

    • optimiser: fixpoint evaluation strategy CS AST JS AST Compiler CS AST CS AST Optimiser
  40. None
  41. None
  42. None
  43. None
  44. Symbol Generation • long-running problem with jashkenas/coffee- script • common

    issue for our users • very difficult to fix with the current compiler design $ coffee -bep '_this = 0; fn = => this' var fn, _this = this; _this = 0; fn = function() { return _this; };
  45. None
  46. None
  47. None
  48. None
  49. None
  50. None
  51. Symbol Generation • did you catch my hypocrisy? • that

    IR is neither standardised nor exposed • don't want to force this to be two operations ◦ steps can be interleaved for performance ◦ but the IR might actually be useful; it's a tradeoff CS AST JS AST +gensyms Compiler (in reality) JS AST
  52. Source Maps • set of mappings from section of JavaScript

    to section of source text directly responsible for producing it • supported in Chrome • Firefox support coming soon ◦ see bugzilla #771597 • Debug as if the source text is actually running in your JS interpreter
  53. None
  54. Source Maps 1. preserve source info in parser 2. preserve

    source info through transformations ◦ optimiser ◦ compiler 3. modify escodegen to create a CST instead of a string 4. use mozilla/source-map to generate source map and flatten CST to JS
  55. None
  56. None
  57. None
  58. None
  59. None
  60. None
  61. Image by Ryan Florence

  62. None
  63. Current Status • fixed over 50 open bugs • implemented

    20 accepted enhancements • fairly stable interfaces • 98% feature complete • extensible design • source map generation + esmangle integration • great parser and runtime error reporting • being integrated with a popular IDE People are using it and contributing!
  64. None
  65. http://michaelficarra.github.com/CoffeeScriptRedux/

  66. Future Work • minor bug fixes • loosen some whitespace

    restrictions • more complete test suite • rewrite parser actions in CoffeeScript • remove some accidental mutation in compiler and optimiser rules • update text editor plugins • consider performance • release 2.0, replace jashkenas/coffee-script • fork and make it my own
  67. None
  68. None
  69. Summary • carefully choose your IRs ◦ use standards whenever

    possible ◦ expose them ◦ take advantage of others' tools that operate on your IRs ◦ for structured JS representation, use Mozilla's Spidermonkey API ◦ JS code gen in JS from this representation is a solved problem; use escodegen or equivalent • declarative behaviour specification is inherently extensible • this compiler is a huge improvement over what we had before ◦ start using it right now ◦ report bugs and tell me what to work on next
  70. None
  71. None