Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Your Program as a Transpiler: Applying Compiler Design to Everyday Programming

Your Program as a Transpiler: Applying Compiler Design to Everyday Programming

Many languages “transpile” into other languages, but compilers are still often seen as arcane pieces of software that only a master of the dark arts could write. But at the end of the day, both are programs that translate code from a programming language to another.

So what does make a transpiler simple and a compiler hard? What can we learn from these complex pieces of software? And are they really that complex?

The lessons we can learn from language implementation design patterns are really within the reach of everyone; not only can they apply to daily programming problems, but they are also key to really understand the basis of exciting new technologies such as the GraalVM project and the Quarkus stack. In our experience on the Drools and jBPM projects, we have come across many opportunities to apply programming language development techniques to a broader context. In this talk, we will see some of these examples.

Edoardo Vacchi

May 07, 2019
Tweet

More Decks by Edoardo Vacchi

Other Decks in Technology

Transcript

  1. Your Program
    as a Transpiler
    Applying Compiler Design
    to Everyday Programming

    View Slide

  2. About Me
    • Edoardo Vacchi @evacchi
    • Research @ University of Milan
    • Research @ UniCredit R&D
    • Drools and jBPM Team @ Red Hat

    View Slide

  3. Motivation

    View Slide

  4. Motivation
    • My first task in Red Hat: marshalling backend for jBPM
    • Data model mapping
    • From XML tree model to graph representation
    • Apparently boring, but challenging in a way

    View Slide

  5. Motivation
    • Language implementation is often seen as a dark art
    • But some design patterns are simple at their core
    • Best practices can be applied to everyday programming

    View Slide

  6. Motivation (cont'd)
    • Learning about language implementation will give you a
    different angle to deal with many problems
    • It will lead you to a better understanding of how GraalVM
    and Quarkus do their magic

    View Slide

  7. Goals
    • Programs have often a pre-processing phase where you
    prepare for execution
    • Then, there's actual process execution phase
    • Learn to recognize and structure the pre-processing phase

    View Slide

  8. Transpilers

    View Slide

  9. Transpilers vs. Compilers
    • Compiler: translates code written in a language (source
    code) into code written in a target language (object code).
    The target language may be at a lower level of abstraction
    • Transpiler: translates code written in a language into
    code written in another language at the same level of
    abstraction (Source-to-Source Translator).

    View Slide

  10. Are transpilers simpler than compilers?
    • Lower-level languages are complex
    • They are not: if anything, they're simple
    • Syntactic sugar is not a higher-level of abstraction
    • It is: a concise construct is expanded at compile-time
    • Proper compilers do low-level optimizations
    • You are thinking of optimizing compilers.

    View Slide

  11. The distinction is moot
    • It is pretty easy to write a crappy compiler, call it a
    transpiler and feel at peace with yourself
    • Writing a good transpiler is no different or harder than
    writing a good compiler
    • So, how do you write a good compiler?

    View Slide

  12. Your Program
    as a Compiler
    Applying Compiler Design
    to Everyday Programming

    View Slide

  13. Compiler-like workflows
    • At least two classes of problems can be solved with
    compiler-like workflows
    • Boot time optimization problems
    • Data transformation problems

    View Slide

  14. Compiler-like workflows
    • At least two classes of problems can be solved with
    compiler-like workflows
    • Boot time optimization problems
    • Data transformation problems

    View Slide

  15. Running Example
    Function Orchestration

    View Slide

  16. Function Orchestration
    • You are building an immutable Dockerized serverless
    function
    f g

    View Slide

  17. Function Orchestration
    • Problem
    • No standard* way to describe function orchestration yet
    * Yes, I know about https://github.com/cncf/wg-serverless
    f g

    View Slide

  18. process:
    elements:
    - start: &_1
    name: Start
    - function: &_2:
    name: Hello
    - end: &_3
    name: End
    - edge:
    source: *_1
    target: *_2
    - edge:
    source: *_2
    target: *_3
    Start End
    Hello
    Solution: Roll your own YAML format
    Congratulations !
    Enjoy attending conferences worldwide

    View Slide

  19. Alternate Solution
    • You are describing a workflow
    • There is a perfectly fine standard: BPMN
    • Business Process Model and Notation
    Task 1 Task 2

    View Slide




  20. System.out.println("Hello World");







    https://github.com/evacchi/ypaat
    Start End
    Hello

    View Slide

  21. Start End
    Hello
    Downside: Nobody will invite you at
    their conference to talk about BPM.

    View Slide

  22. Start End
    Hello
    Unless you trick them.
    Downside: Nobody will invite you at
    their conference to talk about BPM.

    View Slide

  23. Bonuses for choosing BPMN
    • Standard XML-based serialization format
    • that's not the bonus
    • There is standard tooling to validate and parse
    • that is a bonus
    • Moreover:
    • Different types of nodes included in the main spec
    • Optional spec for laying out nodes on a diagram
    Start
    End
    Hello

    View Slide

  24. Goals
    • Read a BPMN workflow
    • Execute that workflow
    • Visualize that workflow
    Start
    End
    Hello

    View Slide

  25. Step 1
    Recognize your compilation phase

    View Slide

  26. What's a compilation phase?
    • It's your setup phase.
    • You do it only once before the actual processing begins

    View Slide

  27. Configuring the application
    • Problem. Use config values from a file/env vars/etc
    • Do you validate config values each time you read them?
    • Compile-time:
    • Read config values into a validated data structure
    • Run-time:
    • Use validated config values

    View Slide

  28. Data Transformation Pipeline
    • Problem. Manipulate data to produce analytics
    • Compile-time:
    • Define transformations (e.g. map, filter, etc. operations)
    • Decide the execution plan (local, distributed, etc.)
    • Run-time:
    • Evaluate the execution plan

    View Slide

  29. Example: BPMN Execution
    • Problem. Execute a workflow description.
    • Compile-time:
    • Read BPMN into a visitable structure (StartEvent)
    • Run-time:
    • Visit the structure
    • For each node, execute tasks
    Start
    End
    Hello

    View Slide

  30. Example: BPMN Visualization
    • Problem. Visualize a workflow diagram.
    • Compile-time:
    • Read BPMN into a graph
    • Run-time:
    • For each node and edge, draw on a canvas
    Start
    End
    Hello

    View Slide

  31. Read BPMN into a Data Structure
    • Full XML Schema Definition* is automatically mapped
    onto Java classes, validated against schema constraints
    TDefinitions tdefs = JAXB.unmarshal(
    resource,
    TDefinitions.class);
    * Yes kids, we have working schemas

    View Slide

  32. BPMN: From Tree to Graph
    • No ordering imposed
    on the description







    System.out.println("Hello World");



    Forward References

    View Slide





  33. System.out.println("Hello World");







    https://github.com/evacchi/ypaat






















    Separate Layout Definition

    View Slide





  34. System.out.println("Hello World");







    https://github.com/evacchi/ypaat






















    Separate Layout Definition

    View Slide

  35. Step 2
    Work like a compiler

    View Slide

  36. Compiling a programming language
    • You start from a text representation of a program
    • The text representation is fed to a parser
    • The parser returns a parse tree
    • The parse tree is refined into an abstract syntax tree (AST)
    • The AST is further refined through intermediate representations (IRs)
    • Up until the final representation is returned

    View Slide

  37. Compiling a programming language
    • You start from a text representation of a program
    • The text representation is fed to a parser
    • The parser returns a parse tree
    • The parse tree is refined into an abstract syntax tree (AST)
    • The AST is further refined through intermediate representations (IRs)
    • Up until the final representation is returned

    View Slide

  38. What makes a compiler a proper compiler
    • Not optimization
    • Compilation Phases
    • You can have as many as you like

    View Slide

  39. Example. A Configuration File
    3
    Sanitize values
    2
    Unmarshall file into
    a typed object
    1
    Read file from
    (class)path
    5
    Coerce to typed
    values
    4
    Validate values

    View Slide

  40. Example. Produce a Report
    3
    Merge into single
    data stream
    2
    Discard invalid
    values
    1
    Fetch data from
    different sources
    5
    Generate synthesis
    data structure
    4
    Compute aggregates
    (sums, avgs, etc.)

    View Slide

  41. Example. A Workflow Engine
    2
    Collect nodes
    1
    Read BPMN file
    4
    Prepare for
    visit/layout
    3
    Collect edges
    Start End
    Hello

    View Slide

  42. Compilation Phases
    • Better separation of concerns
    • Better testability
    • You can test each intermediate result
    • You can choose when and where each phase gets evaluated
    • More Requirements = More Phases !

    View Slide

  43. Phase vs Pass
    • Many phases do not necessarily mean as many passes
    • You could do several phases in one pass
    • Logically phases are still distinct

    View Slide

  44. One Pass vs. Multi-Pass
    for value in config:
    sanitized = sanitize(value)
    validated = validate(sanitized)
    coerced = coerce(validated)
    for value in config:
    sanitized += sanitize(value)
    for value in sanitized:
    validated += validate(value)
    for value in validated:
    coerced += coerce(value)
    Myth: one pass doing many things is better than doing many passes, each doing one thing

    View Slide

  45. It is not: Complexity
    for value in config:
    sanitized = sanitize(value)
    validated = validate(sanitized)
    coerced = coerce(validated)
    n times:
    sanitize = 1 op
    validate = 1 op
    coerce = 1 op
    (1 op + 1 op + 1 op) × n = 3n
    for value in config:
    sanitized += sanitize(value)
    for value in sanitized:
    validated += validate(value)
    for value in validated:
    coerced += coerce(value)
    n times: sanitize = n op
    n times: validate = n op
    n times: coerce = n op
    (n + n + n) = 3n

    View Slide

  46. Single-pass is not always possible
    However, doing one
    pass may be be
    cumbersome or plain
    impossible to do







    System.out.println("Hello World");



    Forward References

    View Slide

  47. Workflow Phases: Evaluation
    var resource = getResourceAsStream("/example.bpmn2");
    var tdefs = unmarshall(resource, TDefinitions.class);
    var graphBuilder = new GraphBuilder();
    // collect nodes on the builder
    var nodeCollector = new NodeCollector(graphBuilder);
    nodeCollector.visitFlowElements(tdefs.getFlowElements());
    // collect edges on the builder
    var edgeCollector = new EdgeCollector(graphBuilder);
    edgeCollector.visitFlowElements(tdefs.getFlowElements());
    https://github.com/evacchi/ypaat
    2
    3
    4
    5
    1 // prepare graph for visit
    var engineGraph = EngineGraph.of(graphBuilder);
    // “interpret” the graph
    var engine = new Engine(engineGraph);
    engine.eval();

    View Slide

  48. Workflow Phases: Layout




    ...





    ...


    https://github.com/evacchi/ypaat
    var resource = getResourceAsStream("/example.bpmn2");
    var tdefs = unmarshall(resource, TDefinitions.class);
    var graphBuilder = new GraphBuilder();
    // collect nodes on the builder
    var nodeCollector = new NodeCollector(graphBuilder);
    nodeCollector.visitFlowElements(tdefs.getFlowElements());
    // collect edges on the builder
    var edgeCollector = new EdgeCollector(graphBuilder);
    edgeCollector.visitFlowElements(tdefs.getFlowElements());
    2
    3
    4
    5
    1 // extract layout information
    var extractor = new LayoutExtractor();
    extractor.visit(tdefs);
    var index = extractor.index();
    // “compile” into buffered image
    var canvas = new Canvas(graphBuilder, index);
    var bufferedImage canvas.eval();

    View Slide

  49. Visitors

    View Slide

  50. Data Structures
    TFlowElement
    |
    +---- StartEventNode
    |
    +---- EndEventNode
    |
    `---- ScriptTask

    View Slide

  51. Pattern Matching
    nodeCollector.visit(node)
    def visit(node: TFlowElement) = {
    node match {
    case StartEventNode(...) =>
    ...
    case EndEventNode(...) =>
    ...
    case ScriptTask(...) =>
    ...
    }
    }

    View Slide

  52. The Poor Man's Alternatives
    interface Visitor {
    void visit(TFlowElement el);
    void visit(TStartEventNode start);
    void visit(TEndEventNode end);
    void visit(TScriptTask task);
    }
    interface Visitable {
    void accept(Visitor v);
    }
    if (node instanceof StartEventNode) {
    StartEventNode evt = (StartEventNode) node;
    ...
    } else if (node instanceof EndEventNode) {
    EndEventNode evt = (EndEventNode) node;
    ...
    } else if (node instanceof ScriptTask)
    ScriptTask evt = (ScriptTask) node;
    ...
    }

    View Slide

  53. Visitor Pattern
    class NodeCollector implements Visitor {
    void visit(TStartEventNode start) {
    graphBuilder.add(
    new StartEventNode(evt.getId(), evt));
    }
    void visit(TEndEvent evt) {
    graphBuilder.add(
    new EndEventNode(evt.getId(), evt));
    }
    void visit(TScriptTask task) {
    graphBuilder.add(
    new ScriptTaskNode(task.getId(), task));
    }
    }
    class EdgeCollector implements Visitor {
    void visit(TSequenceFlow seq) {
    graphBuilder.addEdge(
    seq.getId(),
    seq.getSourceRef(),
    seq.getTargetRef());
    }
    }
    https://github.com/evacchi/ypaat

    View Slide

  54. Step 3
    Choose a run-time representation

    View Slide

  55. Workflow Evaluation
    • Choose a representation suitable for
    evaluation
    • In our case, for each node, we need to get
    the outgoing edges with the next node to
    visit
    • The most convenient representation of
    the graph is adjacency lists
    • adj( p ) = { q | ( p, q ) edges }
    var graphBuilder = new GraphBuilder();
    ...
    // prepare graph for visit
    var engineGraph =
    EngineGraph.of(graphBuilder);
    // decorate with an evaluator
    var engine =
    new Engine(engineGraph);
    // evaluate the graph by visiting once more
    engine.eval();
    Map> outgoing;

    View Slide

  56. Workflow Evaluation
    • The most convenient representation of the graph is adjacency lists
    • adj( p ) ↦ { q | ( p, q ) edges }
    • Map> outgoing

    View Slide

  57. Evaluation
    class Engine implements GraphVisitor {
    void visit(StartEventNode node) {
    logger.info("Process '{}' started.", graph.name());
    graph.outgoing(node).forEach(this::visit);
    }
    void visit(EndEventNode node) {
    logger.info("Process ended.");
    // no outgoing edges
    }
    void visit(ScriptTaskNode node) {
    logger.info("Evaluating script task: {}", node.element().getScript().getContent());
    graph.outgoing(node).forEach(this::visit);
    }
    ...
    }
    https://github.com/evacchi/ypaat

    View Slide

  58. Workflow Layout
    • In this case, for each node and edge,
    we need to get the shape and position
    • No particular ordering is required
    • e.g. first render edges and then shapes




    ...





    ...


    var canvas = new Canvas(graph, index);
    var bufferedImage canvas.eval();
    void eval() {
    graph.edges().forEach(this::draw);
    graph.nodes().forEach(this::visit);
    }
    https://github.com/evacchi/ypaat

    View Slide

  59. Layout
    class Canvas implements GraphVisitor {
    void draw(Edge edge) {
    var pts = index.edge(edge.id());
    setStroke(Color.BLACK);
    var left = pts.get(0);
    for (int i = 1; i < pts.size(); i++) {
    var right = pts.get(i);
    drawLine(left.x, left.y, right.x, right.y);
    left = right;
    }
    }
    void visit(StartEventNode node) {
    var shape = shapeOf(node);
    setStroke(Color.BLACK);
    setFill(Color.GREEN);
    drawEllipse(shape.x, shape.y, shape.width, shape.height);
    drawLabel(element.getName());
    }
    ...
    }
    Start
    End
    Hello

    View Slide

  60. Bonus Step 4
    Generate code at compile-time

    View Slide

  61. The Killer App
    • Move pre-processing out of program run-time
    • Generate code
    • Run-time effectively consists only in pure processing

    View Slide

  62. View Slide

  63. AI and Automation Platform
    • Drools rule engine
    • jBPM workflow platform
    • OptaPlanner constraint solver

    View Slide

  64. The Submarine Initiative
    “The question of whether a computer can
    think is no more interesting than the
    question of whether a submarine can
    swim.”
    Edsger W. Dijkstra

    View Slide

  65. GraalVM: “One VM to Rule Them All”
    • Polyglot VM with cross-language JIT
    • Java Bytecode and JVM Languages
    • Dynamic Languages (Truffle API)
    • Native binary compilation (SubstrateVM)

    View Slide

  66. GraalVM: “One VM to Rule Them All”
    • Polyglot VM with cross-language JIT
    • Java Bytecode and JVM Languages
    • Dynamic Languages (Truffle API)
    • Native binary compilation (SubstrateVM)

    View Slide

  67. Native Image: Restrictions
    • Native binary compilation
    • Restriction: “closed-world assumption”
    • No dynamic code loading
    • You must declare classes you want to reflect upon

    View Slide

  68. Quarkus

    View Slide

  69. Drools and jBPM
    rule R1 when // constraints
    $r : Result()
    $p : Person( age >= 18 )
    then // consequence
    $r.setValue( $p.getName() + " can drink");
    end
    Drools
    jBPM

    View Slide

  70. Drools DRL
    rule R1 when // constraints
    $r : Result()
    $p : Person( age >= 18 )
    then // consequence
    $r.setValue( $p.getName() + " can drink");
    end
    var r = declarationOf(Result.class, "$r");
    var p = declarationOf(Person.class, "$p");
    var rule =
    rule("com.example", "R1").build(
    pattern(r),
    pattern(p)
    .expr("e", p -> p.getAge() >= 18),
    alphaIndexedBy(
    int.class,
    GREATER_OR_EQUAL,
    1, this::getAge, 18),
    reactOn("age")),
    on(p, r).execute(
    ($p, $r) ->
    $r.setValue(
    $p.getName() + " can drink")));

    View Slide

  71. jBPM
    RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems");
    factory.variable("order", new ObjectDataType("com.myspace.demo.Order"));
    factory.variable("item", new ObjectDataType("java.lang.String"));
    factory.name("orderItems");
    factory.packageName("com.myspace.demo");
    factory.dynamic(false);
    factory.version("1.0");
    factory.visibility("Private");
    factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20");
    org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1);
    startNode1.name("Start");
    startNode1.done();
    org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2);
    actionNode2.name("Show order details");
    actionNode2.action(kcontext -> {

    View Slide

  72. Startup Time

    View Slide

  73. Conclusion

    View Slide

  74. Take Aways
    • Process in phases
    • Do more in the pre-processing phase (compile-time)
    • Do less during the processing phase (run-time)
    • In other words, separate what you can do once from what you
    have to do repeatedly
    • Move all or some of your phases to compile-time

    View Slide

  75. Resources
    • Full Source Code https://github.com/evacchi/ypaat
    • Your Program as a Transpiler (part I)
    • Improving Application Performance by Applying Compiler Design
    http://bit.ly/ypaat-performance
    • Other resources
    • Submarine https://github.com/kiegroup/submarine-examples
    • Drools Blog http://blog.athico.com
    • Crafting Interpreters http://craftinginterpreters.com
    • GraalVM.org
    • Quarkus.io
    Edoardo Vacchi @evacchi

    View Slide

  76. Q&A

    View Slide