Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Your Program as a Transpiler: Applying Compiler Design to Everyday Programming

Your Program as a Transpiler: Applying Compiler Design to Everyday Programming

Many languages “transpile” into other languages, but compilers are still often seen as arcane pieces of software that only a master of the dark arts could write. But at the end of the day, both are programs that translate code from a programming language to another.

So what does make a transpiler simple and a compiler hard? What can we learn from these complex pieces of software? And are they really that complex?

The lessons we can learn from language implementation design patterns are really within the reach of everyone; not only can they apply to daily programming problems, but they are also key to really understand the basis of exciting new technologies such as the GraalVM project and the Quarkus stack. In our experience on the Drools and jBPM projects, we have come across many opportunities to apply programming language development techniques to a broader context. In this talk, we will see some of these examples.

Edoardo Vacchi

May 07, 2019
Tweet

More Decks by Edoardo Vacchi

Other Decks in Technology

Transcript

  1. About Me • Edoardo Vacchi @evacchi • Research @ University

    of Milan • Research @ UniCredit R&D • Drools and jBPM Team @ Red Hat
  2. Motivation • My first task in Red Hat: marshalling backend

    for jBPM • Data model mapping • From XML tree model to graph representation • Apparently boring, but challenging in a way
  3. Motivation • Language implementation is often seen as a dark

    art • But some design patterns are simple at their core • Best practices can be applied to everyday programming
  4. Motivation (cont'd) • Learning about language implementation will give you

    a different angle to deal with many problems • It will lead you to a better understanding of how GraalVM and Quarkus do their magic
  5. Goals • Programs have often a pre-processing phase where you

    prepare for execution • Then, there's actual process execution phase • Learn to recognize and structure the pre-processing phase
  6. Transpilers vs. Compilers • Compiler: translates code written in a

    language (source code) into code written in a target language (object code). The target language may be at a lower level of abstraction • Transpiler: translates code written in a language into code written in another language at the same level of abstraction (Source-to-Source Translator).
  7. Are transpilers simpler than compilers? • Lower-level languages are complex

    • They are not: if anything, they're simple • Syntactic sugar is not a higher-level of abstraction • It is: a concise construct is expanded at compile-time • Proper compilers do low-level optimizations • You are thinking of optimizing compilers.
  8. The distinction is moot • It is pretty easy to

    write a crappy compiler, call it a transpiler and feel at peace with yourself • Writing a good transpiler is no different or harder than writing a good compiler • So, how do you write a good compiler?
  9. Compiler-like workflows • At least two classes of problems can

    be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems
  10. Compiler-like workflows • At least two classes of problems can

    be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems
  11. Function Orchestration • Problem • No standard* way to describe

    function orchestration yet * Yes, I know about https://github.com/cncf/wg-serverless f g
  12. process: elements: - start: &_1 name: Start - function: &_2:

    name: Hello - end: &_3 name: End - edge: source: *_1 target: *_2 - edge: source: *_2 target: *_3 Start End Hello Solution: Roll your own YAML format Congratulations ! Enjoy attending conferences worldwide
  13. Alternate Solution • You are describing a workflow • There

    is a perfectly fine standard: BPMN • Business Process Model and Notation Task 1 Task 2
  14. <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2" name="Hello">

    <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat Start End Hello
  15. Start End Hello Unless you trick them. Downside: Nobody will

    invite you at their conference to talk about BPM.
  16. Bonuses for choosing BPMN • Standard XML-based serialization format •

    that's not the bonus • There is standard tooling to validate and parse • that is a bonus • Moreover: • Different types of nodes included in the main spec • Optional spec for laying out nodes on a diagram Start End Hello
  17. Goals • Read a BPMN workflow • Execute that workflow

    • Visualize that workflow Start End Hello
  18. What's a compilation phase? • It's your setup phase. •

    You do it only once before the actual processing begins
  19. Configuring the application • Problem. Use config values from a

    file/env vars/etc • Do you validate config values each time you read them? • Compile-time: • Read config values into a validated data structure • Run-time: • Use validated config values
  20. Data Transformation Pipeline • Problem. Manipulate data to produce analytics

    • Compile-time: • Define transformations (e.g. map, filter, etc. operations) • Decide the execution plan (local, distributed, etc.) • Run-time: • Evaluate the execution plan
  21. Example: BPMN Execution • Problem. Execute a workflow description. •

    Compile-time: • Read BPMN into a visitable structure (StartEvent) • Run-time: • Visit the structure • For each node, execute tasks Start End Hello
  22. Example: BPMN Visualization • Problem. Visualize a workflow diagram. •

    Compile-time: • Read BPMN into a graph • Run-time: • For each node and edge, draw on a canvas Start End Hello
  23. Read BPMN into a Data Structure • Full XML Schema

    Definition* is automatically mapped onto Java classes, validated against schema constraints TDefinitions tdefs = JAXB.unmarshal( resource, TDefinitions.class); * Yes kids, we have working schemas
  24. BPMN: From Tree to Graph • No ordering imposed on

    the description <process id="Minimal" name="Minimal Process"> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> <scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <startEvent id="_1" name="Start"/> </process> Forward References
  25. <definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2"

    name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_2"> <dc:Bounds x="193" y="30" width="80" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_3"> <dc:Bounds x="396" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNEdge bpmnElement="_1-_2"> <di:waypoint x="35" y="50"/> <di:waypoint x="229" y="50"/> </bpmndi:BPMNEdge> <bpmndi:BPMNEdge bpmnElement="_2-_3"> <di:waypoint x="229" y="50"/> <di:waypoint x="441" y="50"/> </bpmndi:BPMNEdge> </bpmndi:BPMNPlane> </bpmndi:BPMNDiagram> </definitions> Separate Layout Definition
  26. <definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2"

    name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_2"> <dc:Bounds x="193" y="30" width="80" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_3"> <dc:Bounds x="396" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNEdge bpmnElement="_1-_2"> <di:waypoint x="35" y="50"/> <di:waypoint x="229" y="50"/> </bpmndi:BPMNEdge> <bpmndi:BPMNEdge bpmnElement="_2-_3"> <di:waypoint x="229" y="50"/> <di:waypoint x="441" y="50"/> </bpmndi:BPMNEdge> </bpmndi:BPMNPlane> </bpmndi:BPMNDiagram> </definitions> Separate Layout Definition
  27. Compiling a programming language • You start from a text

    representation of a program • The text representation is fed to a parser • The parser returns a parse tree • The parse tree is refined into an abstract syntax tree (AST) • The AST is further refined through intermediate representations (IRs) • Up until the final representation is returned
  28. Compiling a programming language • You start from a text

    representation of a program • The text representation is fed to a parser • The parser returns a parse tree • The parse tree is refined into an abstract syntax tree (AST) • The AST is further refined through intermediate representations (IRs) • Up until the final representation is returned
  29. What makes a compiler a proper compiler • Not optimization

    • Compilation Phases • You can have as many as you like
  30. Example. A Configuration File 3 Sanitize values 2 Unmarshall file

    into a typed object 1 Read file from (class)path 5 Coerce to typed values 4 Validate values
  31. Example. Produce a Report 3 Merge into single data stream

    2 Discard invalid values 1 Fetch data from different sources 5 Generate synthesis data structure 4 Compute aggregates (sums, avgs, etc.)
  32. Example. A Workflow Engine 2 Collect nodes 1 Read BPMN

    file 4 Prepare for visit/layout 3 Collect edges Start End Hello
  33. Compilation Phases • Better separation of concerns • Better testability

    • You can test each intermediate result • You can choose when and where each phase gets evaluated • More Requirements = More Phases !
  34. Phase vs Pass • Many phases do not necessarily mean

    as many passes • You could do several phases in one pass • Logically phases are still distinct
  35. One Pass vs. Multi-Pass for value in config: sanitized =

    sanitize(value) validated = validate(sanitized) coerced = coerce(validated) for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) Myth: one pass doing many things is better than doing many passes, each doing one thing
  36. It is not: Complexity for value in config: sanitized =

    sanitize(value) validated = validate(sanitized) coerced = coerce(validated) n times: sanitize = 1 op validate = 1 op coerce = 1 op (1 op + 1 op + 1 op) × n = 3n for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) n times: sanitize = n op n times: validate = n op n times: coerce = n op (n + n + n) = 3n
  37. Single-pass is not always possible However, doing one pass may

    be be cumbersome or plain impossible to do <process id="Minimal" name="Minimal Process"> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> <scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <startEvent id="_1" name="Start"/> </process> Forward References
  38. Workflow Phases: Evaluation var resource = getResourceAsStream("/example.bpmn2"); var tdefs =

    unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); https://github.com/evacchi/ypaat 2 3 4 5 1 // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // “interpret” the graph var engine = new Engine(engineGraph); engine.eval();
  39. Workflow Phases: Layout <?xml version="1.0" encoding="UTF-8"?> <definitions ...> <process id="Minimal"

    name="Minimal Process"> <startEvent id="_1" name="Start"/> ... </process> <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> ... </bpmndi:BPMNDiagram> </definitions> https://github.com/evacchi/ypaat var resource = getResourceAsStream("/example.bpmn2"); var tdefs = unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); 2 3 4 5 1 // extract layout information var extractor = new LayoutExtractor(); extractor.visit(tdefs); var index = extractor.index(); // “compile” into buffered image var canvas = new Canvas(graphBuilder, index); var bufferedImage canvas.eval();
  40. Pattern Matching nodeCollector.visit(node) def visit(node: TFlowElement) = { node match

    { case StartEventNode(...) => ... case EndEventNode(...) => ... case ScriptTask(...) => ... } }
  41. The Poor Man's Alternatives interface Visitor { void visit(TFlowElement el);

    void visit(TStartEventNode start); void visit(TEndEventNode end); void visit(TScriptTask task); } interface Visitable { void accept(Visitor v); } if (node instanceof StartEventNode) { StartEventNode evt = (StartEventNode) node; ... } else if (node instanceof EndEventNode) { EndEventNode evt = (EndEventNode) node; ... } else if (node instanceof ScriptTask) ScriptTask evt = (ScriptTask) node; ... }
  42. Visitor Pattern class NodeCollector implements Visitor { void visit(TStartEventNode start)

    { graphBuilder.add( new StartEventNode(evt.getId(), evt)); } void visit(TEndEvent evt) { graphBuilder.add( new EndEventNode(evt.getId(), evt)); } void visit(TScriptTask task) { graphBuilder.add( new ScriptTaskNode(task.getId(), task)); } } class EdgeCollector implements Visitor { void visit(TSequenceFlow seq) { graphBuilder.addEdge( seq.getId(), seq.getSourceRef(), seq.getTargetRef()); } } https://github.com/evacchi/ypaat
  43. Workflow Evaluation • Choose a representation suitable for evaluation •

    In our case, for each node, we need to get the outgoing edges with the next node to visit • The most convenient representation of the graph is adjacency lists • adj( p ) = { q | ( p, q ) edges } var graphBuilder = new GraphBuilder(); ... // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // decorate with an evaluator var engine = new Engine(engineGraph); // evaluate the graph by visiting once more engine.eval(); Map<Node, List<Node>> outgoing;
  44. Workflow Evaluation • The most convenient representation of the graph

    is adjacency lists • adj( p ) ↦ { q | ( p, q ) edges } • Map<Node, List<Node>> outgoing
  45. Evaluation class Engine implements GraphVisitor { void visit(StartEventNode node) {

    logger.info("Process '{}' started.", graph.name()); graph.outgoing(node).forEach(this::visit); } void visit(EndEventNode node) { logger.info("Process ended."); // no outgoing edges } void visit(ScriptTaskNode node) { logger.info("Evaluating script task: {}", node.element().getScript().getContent()); graph.outgoing(node).forEach(this::visit); } ... } https://github.com/evacchi/ypaat
  46. Workflow Layout • In this case, for each node and

    edge, we need to get the shape and position • No particular ordering is required • e.g. first render edges and then shapes <?xml version="1.0" encoding="UTF-8"?> <definitions ...> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> ... </process> <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> ... </bpmndi:BPMNDiagram> </definitions> var canvas = new Canvas(graph, index); var bufferedImage canvas.eval(); void eval() { graph.edges().forEach(this::draw); graph.nodes().forEach(this::visit); } https://github.com/evacchi/ypaat
  47. Layout class Canvas implements GraphVisitor { void draw(Edge edge) {

    var pts = index.edge(edge.id()); setStroke(Color.BLACK); var left = pts.get(0); for (int i = 1; i < pts.size(); i++) { var right = pts.get(i); drawLine(left.x, left.y, right.x, right.y); left = right; } } void visit(StartEventNode node) { var shape = shapeOf(node); setStroke(Color.BLACK); setFill(Color.GREEN); drawEllipse(shape.x, shape.y, shape.width, shape.height); drawLabel(element.getName()); } ... } Start End Hello
  48. The Killer App • Move pre-processing out of program run-time

    • Generate code • Run-time effectively consists only in pure processing
  49. AI and Automation Platform • Drools rule engine • jBPM

    workflow platform • OptaPlanner constraint solver
  50. The Submarine Initiative “The question of whether a computer can

    think is no more interesting than the question of whether a submarine can swim.” Edsger W. Dijkstra
  51. GraalVM: “One VM to Rule Them All” • Polyglot VM

    with cross-language JIT • Java Bytecode and JVM Languages • Dynamic Languages (Truffle API) • Native binary compilation (SubstrateVM)
  52. GraalVM: “One VM to Rule Them All” • Polyglot VM

    with cross-language JIT • Java Bytecode and JVM Languages • Dynamic Languages (Truffle API) • Native binary compilation (SubstrateVM)
  53. Native Image: Restrictions • Native binary compilation • Restriction: “closed-world

    assumption” • No dynamic code loading • You must declare classes you want to reflect upon
  54. Drools and jBPM rule R1 when // constraints $r :

    Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end Drools jBPM
  55. Drools DRL rule R1 when // constraints $r : Result()

    $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end var r = declarationOf(Result.class, "$r"); var p = declarationOf(Person.class, "$p"); var rule = rule("com.example", "R1").build( pattern(r), pattern(p) .expr("e", p -> p.getAge() >= 18), alphaIndexedBy( int.class, GREATER_OR_EQUAL, 1, this::getAge, 18), reactOn("age")), on(p, r).execute( ($p, $r) -> $r.setValue( $p.getName() + " can drink")));
  56. jBPM RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems"); factory.variable("order", new ObjectDataType("com.myspace.demo.Order")); factory.variable("item", new

    ObjectDataType("java.lang.String")); factory.name("orderItems"); factory.packageName("com.myspace.demo"); factory.dynamic(false); factory.version("1.0"); factory.visibility("Private"); factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20"); org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1); startNode1.name("Start"); startNode1.done(); org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2); actionNode2.name("Show order details"); actionNode2.action(kcontext -> {
  57. Take Aways • Process in phases • Do more in

    the pre-processing phase (compile-time) • Do less during the processing phase (run-time) • In other words, separate what you can do once from what you have to do repeatedly • Move all or some of your phases to compile-time
  58. Resources • Full Source Code https://github.com/evacchi/ypaat • Your Program as

    a Transpiler (part I) • Improving Application Performance by Applying Compiler Design http://bit.ly/ypaat-performance • Other resources • Submarine https://github.com/kiegroup/submarine-examples • Drools Blog http://blog.athico.com • Crafting Interpreters http://craftinginterpreters.com • GraalVM.org • Quarkus.io Edoardo Vacchi @evacchi
  59. Q&A