Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Your Program as a Transpiler

Your Program as a Transpiler

Devoxx 2019 edition

Edoardo Vacchi

November 06, 2019
Tweet

More Decks by Edoardo Vacchi

Other Decks in Technology

Transcript

  1. About Me • Edoardo Vacchi @evacchi • Research @ University

    of Milan • Research @ UniCredit R&D • Drools, jBPM, Kogito @ Red Hat
  2. Motivation • My first task in Red Hat: marshalling backend

    for jBPM • Data model mapping • From XML tree model to graph representation • Apparently boring, but challenging in a way
  3. Motivation • Language implementation is often seen as a dark

    art • But some design patterns are simple at their core • Best practices can be applied to everyday programming
  4. Motivation (cont'd) • Learning about language implementation will give you

    a different angle to deal with many problems • It will lead you to a better understanding of how Quarkus and GraalVM AoT do their magic
  5. Goals • Programs have often a pre-processing phase where you

    prepare for execution • Then, there's actual process execution phase • Learn to recognize and structure the pre-processing phase
  6. Transpilers vs. Compilers • Compiler: translates code written in a

    language (source code) into code written in a target language (object code). The target language may be at a lower level of abstraction • Transpiler: translates code written in a language into code written in another language at the same level of abstraction (Source-to-Source Translator).
  7. Are transpilers simpler than compilers? • Lower-level languages are complex

    • They are not: if anything, they're simple • Syntactic sugar is not a higher-level of abstraction • It is: a concise construct is expanded at compile-time • Proper compilers do low-level optimizations • You are thinking of optimizing compilers.
  8. The distinction is moot • It is pretty easy to

    write a crappy compiler, call it a transpiler and feel at peace with yourself • Writing a good transpiler is no different or harder than writing a good compiler • So, how do you write a good compiler?
  9. Compiler-like workflows • At least two classes of problems can

    be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems
  10. Compiler-like workflows • At least two classes of problems can

    be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems
  11. Function Orchestration • Problem • No standard* way to describe

    function orchestration yet * Yes, I know about https://github.com/cncf/wg-serverless f g
  12. process: elements: - start: &_1 name: Start - function: &_2:

    name: Hello - end: &_3 name: End - edge: source: *_1 target: *_2 - edge: source: *_2 target: *_3 Start End Hello Solution: Roll your own YAML format Congratulations ! Enjoy attending conferences worldwide
  13. Alternate Solution • You are describing a workflow • There

    is a perfectly fine standard: BPMN • Business Process Model and Notation Task 1 Task 2
  14. <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2" name="Hello">

    <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat Start End Hello
  15. Start End Hello Unless you trick them. Downside: Nobody will

    invite you at their conference to talk about BPM.
  16. Bonuses for choosing BPMN • Standard XML-based serialization format •

    that's not the bonus • There is standard tooling to validate and parse • that is a bonus • Moreover: • Different types of nodes included in the main spec • Optional spec for laying out nodes on a diagram Start End Hello
  17. Goals • Read a BPMN workflow • Execute that workflow

    • Visualize that workflow Start End Hello
  18. What's a compilation phase? • It's your setup phase. •

    You do it only once before the actual processing begins
  19. Configuring the application • Problem. Use config values from a

    file/env vars/etc • Do you validate config values each time you read them? • Compile-time: • Read config values into a validated data structure • Run-time: • Use validated config values
  20. Data Transformation Pipeline • Problem. Manipulate data to produce analytics

    • Compile-time: • Define transformations (e.g. map, filter, etc. operations) • Decide the execution plan (local, distributed, etc.) • Run-time: • Evaluate the execution plan
  21. Example: BPMN Execution • Problem. Execute a workflow description. •

    Compile-time: • Read BPMN into a visitable structure (StartEvent) • Run-time: • Visit the structure • For each node, execute tasks Start End Hello
  22. Example: BPMN Visualization • Problem. Visualize a workflow diagram. •

    Compile-time: • Read BPMN into a graph • Run-time: • For each node and edge, draw on a canvas Start End Hello
  23. Read BPMN into a Data Structure • Full XML Schema

    Definition* is automatically mapped onto Java classes, validated against schema constraints TDefinitions tdefs = JAXB.unmarshal( resource, TDefinitions.class); * Yes kids, we have working schemas
  24. BPMN: From Tree to Graph • No ordering imposed on

    the description <process id="Minimal" name="Minimal Process"> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> <scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <startEvent id="_1" name="Start"/> </process> Forward References
  25. <definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2"

    name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_2"> <dc:Bounds x="193" y="30" width="80" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_3"> <dc:Bounds x="396" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNEdge bpmnElement="_1-_2"> <di:waypoint x="35" y="50"/> <di:waypoint x="229" y="50"/> </bpmndi:BPMNEdge> <bpmndi:BPMNEdge bpmnElement="_2-_3"> <di:waypoint x="229" y="50"/> <di:waypoint x="441" y="50"/> </bpmndi:BPMNEdge> </bpmndi:BPMNPlane> </bpmndi:BPMNDiagram> </definitions> Separate Layout Definition
  26. <definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2"

    name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_2"> <dc:Bounds x="193" y="30" width="80" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_3"> <dc:Bounds x="396" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNEdge bpmnElement="_1-_2"> <di:waypoint x="35" y="50"/> <di:waypoint x="229" y="50"/> </bpmndi:BPMNEdge> <bpmndi:BPMNEdge bpmnElement="_2-_3"> <di:waypoint x="229" y="50"/> <di:waypoint x="441" y="50"/> </bpmndi:BPMNEdge> </bpmndi:BPMNPlane> </bpmndi:BPMNDiagram> </definitions> Separate Layout Definition
  27. Compiling a programming language • You start from a text

    representation of a program • The text representation is fed to a parser • The parser returns a parse tree • The parse tree is refined into an abstract syntax tree (AST) • The AST is further refined through intermediate representations (IRs) • Up until the final representation is returned
  28. Compiling a programming language • You start from a text

    representation of a program • The text representation is fed to a parser • The parser returns a parse tree • The parse tree is refined into an abstract syntax tree (AST) • The AST is further refined through intermediate representations (IRs) • Up until the final representation is returned
  29. What makes a compiler a proper compiler • Not optimization

    • Compilation Phases • You can have as many as you like
  30. Example. A Configuration File 3 Sanitize values 2 Unmarshall file

    into a typed object 1 Read file from (class)path 5 Coerce to typed values 4 Validate values
  31. Example. Produce a Report 3 Merge into single data stream

    2 Discard invalid values 1 Fetch data from different sources 5 Generate synthesis data structure 4 Compute aggregates (sums, avgs, etc.)
  32. Example. A Workflow Engine 2 Collect nodes 1 Read BPMN

    file 4 Prepare for visit/layout 3 Collect edges Start End Hello
  33. Compilation Phases • Better separation of concerns • Better testability

    • You can test each intermediate result • You can choose when and where each phase gets evaluated • More Requirements = More Phases !
  34. Phase vs Pass • Many phases do not necessarily mean

    as many passes • You could do several phases in one pass • Logically phases are still distinct
  35. One Pass vs. Multi-Pass for value in config: sanitized =

    sanitize(value) validated = validate(sanitized) coerced = coerce(validated) for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) Myth: one pass doing many things is better than doing many passes, each doing one thing
  36. It is not: Complexity for value in config: sanitized =

    sanitize(value) validated = validate(sanitized) coerced = coerce(validated) n times: sanitize = 1 op validate = 1 op coerce = 1 op (1 op + 1 op + 1 op) × n = 3n for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) n times: sanitize = n op n times: validate = n op n times: coerce = n op (n + n + n) = 3n
  37. Single-pass is not always possible However, doing one pass may

    be be cumbersome or plain impossible to do <process id="Minimal" name="Minimal Process"> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> <scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <startEvent id="_1" name="Start"/> </process> Forward References
  38. Workflow Phases: Evaluation var resource = getResourceAsStream("/example.bpmn2"); var tdefs =

    unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); https://github.com/evacchi/ypaat 2 3 4 5 1 // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // “interpret” the graph var engine = new Engine(engineGraph); engine.eval();
  39. Workflow Phases: Layout <?xml version="1.0" encoding="UTF-8"?> <definitions ...> <process id="Minimal"

    name="Minimal Process"> <startEvent id="_1" name="Start"/> ... </process> <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> ... </bpmndi:BPMNDiagram> </definitions> https://github.com/evacchi/ypaat var resource = getResourceAsStream("/example.bpmn2"); var tdefs = unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); 2 3 4 5 1 // extract layout information var extractor = new LayoutExtractor(); extractor.visit(tdefs); var index = extractor.index(); // “compile” into buffered image var canvas = new Canvas(graphBuilder, index); var bufferedImage canvas.eval();
  40. Pattern Matching nodeCollector.visit(node) def visit(node: TFlowElement) = { node match

    { case StartEventNode(...) => ... case EndEventNode(...) => ... case ScriptTask(...) => ... } }
  41. Pattern Matching nodeCollector.visit(node) void visit(TFlowElement node) { switch (node) {

    case StartEventNode(...) -> ... case EndEventNode(...) -> ... case ScriptTask(...) -> ... } }
  42. The Poor Man's Alternatives interface Visitor { void visit(TFlowElement el);

    void visit(TStartEventNode start); void visit(TEndEventNode end); void visit(TScriptTask task); } interface Visitable { void accept(Visitor v); } if (node instanceof StartEventNode) { StartEventNode evt = (StartEventNode) node; ... } else if (node instanceof EndEventNode) { EndEventNode evt = (EndEventNode) node; ... } else if (node instanceof ScriptTask) ScriptTask evt = (ScriptTask) node; ... }
  43. Workflow Evaluation • Choose a representation suitable for evaluation •

    In our case, for each node, we need to get the outgoing edges with the next node to visit • The most convenient representation of the graph is adjacency lists • adj( p ) = { q | ( p, q ) edges } var graphBuilder = new GraphBuilder(); ... // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // decorate with an evaluator var engine = new Engine(engineGraph); // evaluate the graph by visiting once more engine.eval(); Map<Node, List<Node>> outgoing;
  44. Workflow Evaluation • The most convenient representation of the graph

    is adjacency lists • adj( p ) ↦ { q | ( p, q ) edges } • Map<Node, List<Node>> outgoing
  45. Evaluation class Engine implements GraphVisitor { void visit(StartEventNode node) {

    logger.info("Process '{}' started.", graph.name()); graph.outgoing(node).forEach(this::visit); } void visit(EndEventNode node) { logger.info("Process ended."); // no outgoing edges } void visit(ScriptTaskNode node) { logger.info("Evaluating script task: {}", node.element().getScript().getContent()); graph.outgoing(node).forEach(this::visit); } ... } https://github.com/evacchi/ypaat
  46. Workflow Layout • In this case, for each node and

    edge, we need to get the shape and position • No particular ordering is required • e.g. first render edges and then shapes <?xml version="1.0" encoding="UTF-8"?> <definitions ...> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> ... </process> <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> ... </bpmndi:BPMNDiagram> </definitions> var canvas = new Canvas(graph, index); var bufferedImage canvas.eval(); void eval() { graph.edges().forEach(this::draw); graph.nodes().forEach(this::visit); } https://github.com/evacchi/ypaat
  47. Layout class Canvas implements GraphVisitor { void draw(Edge edge) {

    var pts = index.edge(edge.id()); setStroke(Color.BLACK); var left = pts.get(0); for (int i = 1; i < pts.size(); i++) { var right = pts.get(i); drawLine(left.x, left.y, right.x, right.y); left = right; } } void visit(StartEventNode node) { var shape = shapeOf(node); setStroke(Color.BLACK); setFill(Color.GREEN); drawEllipse(shape.x, shape.y, shape.width, shape.height); drawLabel(element.getName()); } ... } Start End Hello
  48. The Killer App • Move pre-processing out of program run-time

    • Generate code • Run-time effectively consists only in pure processing
  49. Compiler-like workflows • At least two classes of problems can

    be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems
  50. Example: Application Wiring • You are building an immutable Dockerized

    microservice • Do you really need all that Runtime Reflection? • Do you really need Runtime Dependency Injection? public class Example { private final Animal animal; @Inject public Example(Animal animal) { this.animal = animal; } public Animal animal() { return animal; } } public interface Animal {} @InjectCandidate public class Dog implements Animal {}
  51. All these things make your startup slow! • But it's

    done only once! • Never is better than once • But it's flexible • Ask yourself when is the last time you changed dependencies/startup config/classpath at runtime • If it's recent, ask yourself the price you pay for that flexibility
  52. Example: A Quick DI Framework https://github.com/evacchi/reflection-vs-codegen public class Example {

    private final Animal animal; @Inject public Example(Animal animal) { this.animal = animal; } public Animal animal() { return animal; } } public interface Animal {} @InjectCandidate public class Dog implements Animal {}
  53. Binder binder = new Binder(); binder.scan(); Example ex = binder.createInstance(Example.class);

    Animal animal = ex.animal(); Objects.requireNonNull(animal); assert animal instanceof Dog https://github.com/evacchi/reflection-vs-codegen
  54. public class Binder { public Binder scan() { Reflections reflections

    = new Reflections(); reflections.getTypesAnnotatedWith(InjectCandidate.class) .forEach(t -> bindings.put(interfaceOf(t), constructorOf(t))); return this; } public <T> T createInstance(Class<? extends T> t) { return (T) Arrays.stream(t.getDeclaredConstructors()) .filter(c -> c.getAnnotation(Inject.class) != null) .peek(c -> c.setAccessible(true)) .map(this::createInstance) .findFirst().get(); } ... } https://github.com/evacchi/reflection-vs-codegen At run-time, complexity is easy to miss This loop gets executed at each instance creation
  55. public void scan() { Reflections reflections = new Reflections(); //

    resolve injection candidates reflections.getTypesAnnotatedWith(InjectCandidate.class); // resolve injected constructors reflections.getConstructorsAnnotatedWith(Inject.class); // collect candidates reflections.forEach(this::collect); // resolve mappings resolveMappings(); }
  56. The processor is triggered by the Java compiler for claimed

    annotations. Bindings bindings = processInjectionCandidates( env.getElementsAnnotatedWith(InjectCandidate.class)); processInjectionSites( env.getElementsAnnotatedWith(Inject.class), bindings); generateJavaSources(bindings); https://github.com/evacchi/reflection-vs-codegen DI: Annotation Processor
  57. Example % time java io.github.evacchi.Reflective 6.94s user 0.29s system 259%

    cpu 2.785 total % time java io.github.evacchi.Codegen 0.08s user 0.01s system 111% cpu 0.087 total
  58. Example % time java io.github.evacchi.Reflective 6.94s user 0.29s system 259%

    cpu 2.785 total % time java io.github.evacchi.Codegen 0.08s user 0.01s system 111% cpu 0.087 total % time ./io.github.evacchi.codegen ./io.github.evacchi.codegen 0.00s user 0.00s system 86% cpu 0.003 total
  59. AI and Automation Platform • Drools rule engine • jBPM

    workflow platform • OptaPlanner constraint solver
  60. Drools and jBPM rule R1 when // constraints $r :

    Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end Drools jBPM
  61. Drools DRL rule R1 when // constraints $r : Result()

    $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end var r = declarationOf(Result.class, "$r"); var p = declarationOf(Person.class, "$p"); var rule = rule("com.example", "R1").build( pattern(r), pattern(p) .expr("e", p -> p.getAge() >= 18), alphaIndexedBy( int.class, GREATER_OR_EQUAL, 1, this::getAge, 18), reactOn("age")), on(p, r).execute( ($p, $r) -> $r.setValue( $p.getName() + " can drink")));
  62. jBPM RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems"); factory.variable("order", new ObjectDataType("com.myspace.demo.Order")); factory.variable("item", new

    ObjectDataType("java.lang.String")); factory.name("orderItems"); factory.packageName("com.myspace.demo"); factory.dynamic(false); factory.version("1.0"); factory.visibility("Private"); factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20"); org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1); startNode1.name("Start"); startNode1.done(); org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2); actionNode2.name("Show order details"); actionNode2.action(kcontext -> {
  63. Take Aways • Process in phases • Do more in

    the pre-processing phase (compile-time) • Do less during the processing phase (run-time) • In other words, separate what you can do once from what you have to do repeatedly • Move all or some of your phases to compile-time
  64. Resources • Full Source Code https://github.com/evacchi/ypaat • Your Program as

    a Transpiler (part I) • Improving Application Performance by Applying Compiler Design http://bit.ly/ypaat-performance https://github.com/evacchi/reflection-vs-codegen • Other resources • Kogito https://github.com/kiegroup/kogito-examples • Drools Blog http://blog.athico.com • Crafting Interpreters http://craftinginterpreters.com • Quarkus.io Edoardo Vacchi @evacchi
  65. Q&A