Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Your Program as a Transpiler

Your Program as a Transpiler

Devoxx 2019 edition

Edoardo Vacchi

November 06, 2019
Tweet

More Decks by Edoardo Vacchi

Other Decks in Technology

Transcript

  1. Your Program
    as a Transpiler
    Applying Compiler Design
    to Everyday Programming

    View full-size slide

  2. About Me
    • Edoardo Vacchi @evacchi
    • Research @ University of Milan
    • Research @ UniCredit R&D
    • Drools, jBPM, Kogito @ Red Hat

    View full-size slide

  3. Motivation
    • My first task in Red Hat: marshalling backend for jBPM
    • Data model mapping
    • From XML tree model to graph representation
    • Apparently boring, but challenging in a way

    View full-size slide

  4. Motivation
    • Language implementation is often seen as a dark art
    • But some design patterns are simple at their core
    • Best practices can be applied to everyday programming

    View full-size slide

  5. Motivation (cont'd)
    • Learning about language implementation will give you a
    different angle to deal with many problems
    • It will lead you to a better understanding of how
    Quarkus and GraalVM AoT do their magic

    View full-size slide

  6. Goals
    • Programs have often a pre-processing phase where you
    prepare for execution
    • Then, there's actual process execution phase
    • Learn to recognize and structure the pre-processing phase

    View full-size slide

  7. Transpilers vs. Compilers
    • Compiler: translates code written in a language (source
    code) into code written in a target language (object code).
    The target language may be at a lower level of abstraction
    • Transpiler: translates code written in a language into
    code written in another language at the same level of
    abstraction (Source-to-Source Translator).

    View full-size slide

  8. Are transpilers simpler than compilers?
    • Lower-level languages are complex
    • They are not: if anything, they're simple
    • Syntactic sugar is not a higher-level of abstraction
    • It is: a concise construct is expanded at compile-time
    • Proper compilers do low-level optimizations
    • You are thinking of optimizing compilers.

    View full-size slide

  9. The distinction is moot
    • It is pretty easy to write a crappy compiler, call it a
    transpiler and feel at peace with yourself
    • Writing a good transpiler is no different or harder than
    writing a good compiler
    • So, how do you write a good compiler?

    View full-size slide

  10. Your Program
    as a Compiler
    Applying Compiler Design
    to Everyday Programming

    View full-size slide

  11. Compiler-like workflows
    • At least two classes of problems can be solved with
    compiler-like workflows
    • Boot time optimization problems
    • Data transformation problems

    View full-size slide

  12. Compiler-like workflows
    • At least two classes of problems can be solved with
    compiler-like workflows
    • Boot time optimization problems
    • Data transformation problems

    View full-size slide

  13. Running Example
    Function Orchestration

    View full-size slide

  14. Function Orchestration
    • You are building an immutable Dockerized serverless
    function
    f g

    View full-size slide

  15. Function Orchestration
    • Problem
    • No standard* way to describe function orchestration yet
    * Yes, I know about https://github.com/cncf/wg-serverless
    f g

    View full-size slide

  16. process:
    elements:
    - start: &_1
    name: Start
    - function: &_2:
    name: Hello
    - end: &_3
    name: End
    - edge:
    source: *_1
    target: *_2
    - edge:
    source: *_2
    target: *_3
    Start End
    Hello
    Solution: Roll your own YAML format
    Congratulations !
    Enjoy attending conferences worldwide

    View full-size slide

  17. Alternate Solution
    • You are describing a workflow
    • There is a perfectly fine standard: BPMN
    • Business Process Model and Notation
    Task 1 Task 2

    View full-size slide




  18. System.out.println("Hello World");







    https://github.com/evacchi/ypaat
    Start End
    Hello

    View full-size slide

  19. Start End
    Hello
    Downside: Nobody will invite you at
    their conference to talk about BPM.

    View full-size slide

  20. Start End
    Hello
    Unless you trick them.
    Downside: Nobody will invite you at
    their conference to talk about BPM.

    View full-size slide

  21. Bonuses for choosing BPMN
    • Standard XML-based serialization format
    • that's not the bonus
    • There is standard tooling to validate and parse
    • that is a bonus
    • Moreover:
    • Different types of nodes included in the main spec
    • Optional spec for laying out nodes on a diagram
    Start
    End
    Hello

    View full-size slide

  22. Goals
    • Read a BPMN workflow
    • Execute that workflow
    • Visualize that workflow
    Start
    End
    Hello

    View full-size slide

  23. Step 1
    Recognize your compilation phase

    View full-size slide

  24. What's a compilation phase?
    • It's your setup phase.
    • You do it only once before the actual processing begins

    View full-size slide

  25. Configuring the application
    • Problem. Use config values from a file/env vars/etc
    • Do you validate config values each time you read them?
    • Compile-time:
    • Read config values into a validated data structure
    • Run-time:
    • Use validated config values

    View full-size slide

  26. Data Transformation Pipeline
    • Problem. Manipulate data to produce analytics
    • Compile-time:
    • Define transformations (e.g. map, filter, etc. operations)
    • Decide the execution plan (local, distributed, etc.)
    • Run-time:
    • Evaluate the execution plan

    View full-size slide

  27. Example: BPMN Execution
    • Problem. Execute a workflow description.
    • Compile-time:
    • Read BPMN into a visitable structure (StartEvent)
    • Run-time:
    • Visit the structure
    • For each node, execute tasks
    Start
    End
    Hello

    View full-size slide

  28. Example: BPMN Visualization
    • Problem. Visualize a workflow diagram.
    • Compile-time:
    • Read BPMN into a graph
    • Run-time:
    • For each node and edge, draw on a canvas
    Start
    End
    Hello

    View full-size slide

  29. Read BPMN into a Data Structure
    • Full XML Schema Definition* is automatically mapped
    onto Java classes, validated against schema constraints
    TDefinitions tdefs = JAXB.unmarshal(
    resource,
    TDefinitions.class);
    * Yes kids, we have working schemas

    View full-size slide

  30. BPMN: From Tree to Graph
    • No ordering imposed
    on the description







    System.out.println("Hello World");



    Forward References

    View full-size slide





  31. System.out.println("Hello World");







    https://github.com/evacchi/ypaat






















    Separate Layout Definition

    View full-size slide





  32. System.out.println("Hello World");







    https://github.com/evacchi/ypaat






















    Separate Layout Definition

    View full-size slide

  33. Step 2
    Work like a compiler

    View full-size slide

  34. Compiling a programming language
    • You start from a text representation of a program
    • The text representation is fed to a parser
    • The parser returns a parse tree
    • The parse tree is refined into an abstract syntax tree (AST)
    • The AST is further refined through intermediate representations (IRs)
    • Up until the final representation is returned

    View full-size slide

  35. Compiling a programming language
    • You start from a text representation of a program
    • The text representation is fed to a parser
    • The parser returns a parse tree
    • The parse tree is refined into an abstract syntax tree (AST)
    • The AST is further refined through intermediate representations (IRs)
    • Up until the final representation is returned

    View full-size slide

  36. What makes a compiler a proper compiler
    • Not optimization
    • Compilation Phases
    • You can have as many as you like

    View full-size slide

  37. Example. A Configuration File
    3
    Sanitize values
    2
    Unmarshall file into
    a typed object
    1
    Read file from
    (class)path
    5
    Coerce to typed
    values
    4
    Validate values

    View full-size slide

  38. Example. Produce a Report
    3
    Merge into single
    data stream
    2
    Discard invalid
    values
    1
    Fetch data from
    different sources
    5
    Generate synthesis
    data structure
    4
    Compute aggregates
    (sums, avgs, etc.)

    View full-size slide

  39. Example. A Workflow Engine
    2
    Collect nodes
    1
    Read BPMN file
    4
    Prepare for
    visit/layout
    3
    Collect edges
    Start End
    Hello

    View full-size slide

  40. Compilation Phases
    • Better separation of concerns
    • Better testability
    • You can test each intermediate result
    • You can choose when and where each phase gets evaluated
    • More Requirements = More Phases !

    View full-size slide

  41. Phase vs Pass
    • Many phases do not necessarily mean as many passes
    • You could do several phases in one pass
    • Logically phases are still distinct

    View full-size slide

  42. One Pass vs. Multi-Pass
    for value in config:
    sanitized = sanitize(value)
    validated = validate(sanitized)
    coerced = coerce(validated)
    for value in config:
    sanitized += sanitize(value)
    for value in sanitized:
    validated += validate(value)
    for value in validated:
    coerced += coerce(value)
    Myth: one pass doing many things is better than doing many passes, each doing one thing

    View full-size slide

  43. It is not: Complexity
    for value in config:
    sanitized = sanitize(value)
    validated = validate(sanitized)
    coerced = coerce(validated)
    n times:
    sanitize = 1 op
    validate = 1 op
    coerce = 1 op
    (1 op + 1 op + 1 op) × n = 3n
    for value in config:
    sanitized += sanitize(value)
    for value in sanitized:
    validated += validate(value)
    for value in validated:
    coerced += coerce(value)
    n times: sanitize = n op
    n times: validate = n op
    n times: coerce = n op
    (n + n + n) = 3n

    View full-size slide

  44. Single-pass is not always possible
    However, doing one
    pass may be be
    cumbersome or plain
    impossible to do







    System.out.println("Hello World");



    Forward References

    View full-size slide

  45. Workflow Phases: Evaluation
    var resource = getResourceAsStream("/example.bpmn2");
    var tdefs = unmarshall(resource, TDefinitions.class);
    var graphBuilder = new GraphBuilder();
    // collect nodes on the builder
    var nodeCollector = new NodeCollector(graphBuilder);
    nodeCollector.visitFlowElements(tdefs.getFlowElements());
    // collect edges on the builder
    var edgeCollector = new EdgeCollector(graphBuilder);
    edgeCollector.visitFlowElements(tdefs.getFlowElements());
    https://github.com/evacchi/ypaat
    2
    3
    4
    5
    1 // prepare graph for visit
    var engineGraph = EngineGraph.of(graphBuilder);
    // “interpret” the graph
    var engine = new Engine(engineGraph);
    engine.eval();

    View full-size slide

  46. Workflow Phases: Layout




    ...





    ...


    https://github.com/evacchi/ypaat
    var resource = getResourceAsStream("/example.bpmn2");
    var tdefs = unmarshall(resource, TDefinitions.class);
    var graphBuilder = new GraphBuilder();
    // collect nodes on the builder
    var nodeCollector = new NodeCollector(graphBuilder);
    nodeCollector.visitFlowElements(tdefs.getFlowElements());
    // collect edges on the builder
    var edgeCollector = new EdgeCollector(graphBuilder);
    edgeCollector.visitFlowElements(tdefs.getFlowElements());
    2
    3
    4
    5
    1 // extract layout information
    var extractor = new LayoutExtractor();
    extractor.visit(tdefs);
    var index = extractor.index();
    // “compile” into buffered image
    var canvas = new Canvas(graphBuilder, index);
    var bufferedImage canvas.eval();

    View full-size slide

  47. Data Structures
    TFlowElement
    |
    +---- StartEventNode
    |
    +---- EndEventNode
    |
    `---- ScriptTask

    View full-size slide

  48. Pattern Matching
    nodeCollector.visit(node)
    def visit(node: TFlowElement) = {
    node match {
    case StartEventNode(...) =>
    ...
    case EndEventNode(...) =>
    ...
    case ScriptTask(...) =>
    ...
    }
    }

    View full-size slide

  49. Pattern Matching
    nodeCollector.visit(node)
    void visit(TFlowElement node) {
    switch (node) {
    case StartEventNode(...) ->
    ...
    case EndEventNode(...) ->
    ...
    case ScriptTask(...) ->
    ...
    }
    }

    View full-size slide

  50. The Poor Man's Alternatives
    interface Visitor {
    void visit(TFlowElement el);
    void visit(TStartEventNode start);
    void visit(TEndEventNode end);
    void visit(TScriptTask task);
    }
    interface Visitable {
    void accept(Visitor v);
    }
    if (node instanceof StartEventNode) {
    StartEventNode evt = (StartEventNode) node;
    ...
    } else if (node instanceof EndEventNode) {
    EndEventNode evt = (EndEventNode) node;
    ...
    } else if (node instanceof ScriptTask)
    ScriptTask evt = (ScriptTask) node;
    ...
    }

    View full-size slide

  51. Step 3
    Choose a run-time representation

    View full-size slide

  52. Workflow Evaluation
    • Choose a representation suitable for
    evaluation
    • In our case, for each node, we need to get
    the outgoing edges with the next node to
    visit
    • The most convenient representation of
    the graph is adjacency lists
    • adj( p ) = { q | ( p, q ) edges }
    var graphBuilder = new GraphBuilder();
    ...
    // prepare graph for visit
    var engineGraph =
    EngineGraph.of(graphBuilder);
    // decorate with an evaluator
    var engine =
    new Engine(engineGraph);
    // evaluate the graph by visiting once more
    engine.eval();
    Map> outgoing;

    View full-size slide

  53. Workflow Evaluation
    • The most convenient representation of the graph is adjacency lists
    • adj( p ) ↦ { q | ( p, q ) edges }
    • Map> outgoing

    View full-size slide

  54. Evaluation
    class Engine implements GraphVisitor {
    void visit(StartEventNode node) {
    logger.info("Process '{}' started.", graph.name());
    graph.outgoing(node).forEach(this::visit);
    }
    void visit(EndEventNode node) {
    logger.info("Process ended.");
    // no outgoing edges
    }
    void visit(ScriptTaskNode node) {
    logger.info("Evaluating script task: {}", node.element().getScript().getContent());
    graph.outgoing(node).forEach(this::visit);
    }
    ...
    }
    https://github.com/evacchi/ypaat

    View full-size slide

  55. Workflow Layout
    • In this case, for each node and edge,
    we need to get the shape and position
    • No particular ordering is required
    • e.g. first render edges and then shapes




    ...





    ...


    var canvas = new Canvas(graph, index);
    var bufferedImage canvas.eval();
    void eval() {
    graph.edges().forEach(this::draw);
    graph.nodes().forEach(this::visit);
    }
    https://github.com/evacchi/ypaat

    View full-size slide

  56. Layout
    class Canvas implements GraphVisitor {
    void draw(Edge edge) {
    var pts = index.edge(edge.id());
    setStroke(Color.BLACK);
    var left = pts.get(0);
    for (int i = 1; i < pts.size(); i++) {
    var right = pts.get(i);
    drawLine(left.x, left.y, right.x, right.y);
    left = right;
    }
    }
    void visit(StartEventNode node) {
    var shape = shapeOf(node);
    setStroke(Color.BLACK);
    setFill(Color.GREEN);
    drawEllipse(shape.x, shape.y, shape.width, shape.height);
    drawLabel(element.getName());
    }
    ...
    }
    Start
    End
    Hello

    View full-size slide

  57. Bonus Step 4
    Generate code at compile-time

    View full-size slide

  58. The Killer App
    • Move pre-processing out of program run-time
    • Generate code
    • Run-time effectively consists only in pure processing

    View full-size slide

  59. Compiler-like workflows
    • At least two classes of problems can be solved with
    compiler-like workflows
    • Boot time optimization problems
    • Data transformation problems

    View full-size slide

  60. Example: Application Wiring
    • You are building an immutable Dockerized microservice
    • Do you really need all that Runtime Reflection?
    • Do you really need Runtime Dependency Injection?
    public class Example {
    private final Animal animal;
    @Inject public Example(Animal animal) { this.animal = animal; }
    public Animal animal() { return animal; }
    }
    public interface Animal {}
    @InjectCandidate public class Dog implements Animal {}

    View full-size slide

  61. All these things make your startup slow!
    • But it's done only once!
    • Never is better than once
    • But it's flexible
    • Ask yourself when is the last time you changed
    dependencies/startup config/classpath at runtime
    • If it's recent, ask yourself the price you pay for that flexibility

    View full-size slide

  62. Example: A Quick DI Framework
    https://github.com/evacchi/reflection-vs-codegen
    public class Example {
    private final Animal animal;
    @Inject public Example(Animal animal) { this.animal = animal; }
    public Animal animal() { return animal; }
    }
    public interface Animal {}
    @InjectCandidate public class Dog implements Animal {}

    View full-size slide

  63. Binder binder = new Binder();
    binder.scan();
    Example ex =
    binder.createInstance(Example.class);
    Animal animal = ex.animal();
    Objects.requireNonNull(animal);
    assert animal instanceof Dog
    https://github.com/evacchi/reflection-vs-codegen

    View full-size slide

  64. public class Binder {
    public Binder scan() {
    Reflections reflections = new Reflections();
    reflections.getTypesAnnotatedWith(InjectCandidate.class)
    .forEach(t -> bindings.put(interfaceOf(t), constructorOf(t)));
    return this;
    }
    public T createInstance(Class extends T> t) {
    return (T) Arrays.stream(t.getDeclaredConstructors())
    .filter(c -> c.getAnnotation(Inject.class) != null)
    .peek(c -> c.setAccessible(true))
    .map(this::createInstance)
    .findFirst().get();
    }
    ...
    } https://github.com/evacchi/reflection-vs-codegen
    At run-time, complexity is easy to miss
    This loop gets executed
    at each instance creation

    View full-size slide

  65. public void scan() {
    Reflections reflections = new Reflections();
    // resolve injection candidates
    reflections.getTypesAnnotatedWith(InjectCandidate.class);
    // resolve injected constructors
    reflections.getConstructorsAnnotatedWith(Inject.class);
    // collect candidates
    reflections.forEach(this::collect);
    // resolve mappings
    resolveMappings();
    }

    View full-size slide

  66. The processor is triggered by the Java compiler for claimed annotations.
    Bindings bindings = processInjectionCandidates(
    env.getElementsAnnotatedWith(InjectCandidate.class));
    processInjectionSites(
    env.getElementsAnnotatedWith(Inject.class),
    bindings);
    generateJavaSources(bindings);
    https://github.com/evacchi/reflection-vs-codegen
    DI: Annotation Processor

    View full-size slide

  67. Example
    % time java io.github.evacchi.Reflective
    6.94s user 0.29s system 259% cpu 2.785 total
    % time java io.github.evacchi.Codegen
    0.08s user 0.01s system 111% cpu 0.087 total

    View full-size slide

  68. Example
    % time java io.github.evacchi.Reflective
    6.94s user 0.29s system 259% cpu 2.785 total
    % time java io.github.evacchi.Codegen
    0.08s user 0.01s system 111% cpu 0.087 total
    % time ./io.github.evacchi.codegen
    ./io.github.evacchi.codegen 0.00s user 0.00s system 86% cpu 0.003 total

    View full-size slide

  69. Kogito
    ergo cloud.

    View full-size slide

  70. AI and Automation Platform
    • Drools rule engine
    • jBPM workflow platform
    • OptaPlanner constraint solver

    View full-size slide

  71. code.quarkus.io

    View full-size slide

  72. Drools and jBPM
    rule R1 when // constraints
    $r : Result()
    $p : Person( age >= 18 )
    then // consequence
    $r.setValue( $p.getName() + " can drink");
    end
    Drools
    jBPM

    View full-size slide

  73. Drools DRL
    rule R1 when // constraints
    $r : Result()
    $p : Person( age >= 18 )
    then // consequence
    $r.setValue( $p.getName() + " can drink");
    end
    var r = declarationOf(Result.class, "$r");
    var p = declarationOf(Person.class, "$p");
    var rule =
    rule("com.example", "R1").build(
    pattern(r),
    pattern(p)
    .expr("e", p -> p.getAge() >= 18),
    alphaIndexedBy(
    int.class,
    GREATER_OR_EQUAL,
    1, this::getAge, 18),
    reactOn("age")),
    on(p, r).execute(
    ($p, $r) ->
    $r.setValue(
    $p.getName() + " can drink")));

    View full-size slide

  74. jBPM
    RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems");
    factory.variable("order", new ObjectDataType("com.myspace.demo.Order"));
    factory.variable("item", new ObjectDataType("java.lang.String"));
    factory.name("orderItems");
    factory.packageName("com.myspace.demo");
    factory.dynamic(false);
    factory.version("1.0");
    factory.visibility("Private");
    factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20");
    org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1);
    startNode1.name("Start");
    startNode1.done();
    org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2);
    actionNode2.name("Show order details");
    actionNode2.action(kcontext -> {

    View full-size slide

  75. Startup Time

    View full-size slide

  76. Take Aways
    • Process in phases
    • Do more in the pre-processing phase (compile-time)
    • Do less during the processing phase (run-time)
    • In other words, separate what you can do once from what you
    have to do repeatedly
    • Move all or some of your phases to compile-time

    View full-size slide

  77. Resources
    • Full Source Code https://github.com/evacchi/ypaat
    • Your Program as a Transpiler (part I)
    • Improving Application Performance by Applying Compiler Design
    http://bit.ly/ypaat-performance
    https://github.com/evacchi/reflection-vs-codegen
    • Other resources
    • Kogito https://github.com/kiegroup/kogito-examples
    • Drools Blog http://blog.athico.com
    • Crafting Interpreters http://craftinginterpreters.com
    • Quarkus.io
    Edoardo Vacchi @evacchi

    View full-size slide