Your Program as a Transpiler: Applying Compiler Design to Everyday Programming

Slide 1

Slide 1 text

Your Program as a Transpiler Applying Compiler Design to Everyday Programming

Slide 2

Slide 2 text

About Me • Edoardo Vacchi @evacchi • Research @ University of Milan • Research @ UniCredit R&D • Drools and jBPM Team @ Red Hat

Slide 3

Slide 3 text

Motivation

Slide 4

Slide 4 text

Motivation • My first task in Red Hat: marshalling backend for jBPM • Data model mapping • From XML tree model to graph representation • Apparently boring, but challenging in a way

Slide 5

Slide 5 text

Motivation • Language implementation is often seen as a dark art • But some design patterns are simple at their core • Best practices can be applied to everyday programming

Slide 6

Slide 6 text

Motivation (cont'd) • Learning about language implementation will give you a different angle to deal with many problems • It will lead you to a better understanding of how GraalVM and Quarkus do their magic

Slide 7

Slide 7 text

Goals • Programs have often a pre-processing phase where you prepare for execution • Then, there's actual process execution phase • Learn to recognize and structure the pre-processing phase

Slide 8

Slide 8 text

Transpilers

Slide 9

Slide 9 text

Transpilers vs. Compilers • Compiler: translates code written in a language (source code) into code written in a target language (object code). The target language may be at a lower level of abstraction • Transpiler: translates code written in a language into code written in another language at the same level of abstraction (Source-to-Source Translator).

Slide 10

Slide 10 text

Are transpilers simpler than compilers? • Lower-level languages are complex • They are not: if anything, they're simple • Syntactic sugar is not a higher-level of abstraction • It is: a concise construct is expanded at compile-time • Proper compilers do low-level optimizations • You are thinking of optimizing compilers.

Slide 11

Slide 11 text

The distinction is moot • It is pretty easy to write a crappy compiler, call it a transpiler and feel at peace with yourself • Writing a good transpiler is no different or harder than writing a good compiler • So, how do you write a good compiler?

Slide 12

Slide 12 text

Your Program as a Compiler Applying Compiler Design to Everyday Programming

Slide 13

Slide 13 text

Compiler-like workflows • At least two classes of problems can be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems

Slide 14

Slide 14 text

Compiler-like workflows • At least two classes of problems can be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems

Slide 15

Slide 15 text

Running Example Function Orchestration

Slide 16

Slide 16 text

Function Orchestration • You are building an immutable Dockerized serverless function f g

Slide 17

Slide 17 text

Function Orchestration • Problem • No standard* way to describe function orchestration yet * Yes, I know about https://github.com/cncf/wg-serverless f g

Slide 18

Slide 18 text

process: elements: - start: &_1 name: Start - function: &_2: name: Hello - end: &_3 name: End - edge: source: *_1 target: *_2 - edge: source: *_2 target: *_3 Start End Hello Solution: Roll your own YAML format Congratulations ! Enjoy attending conferences worldwide

Slide 19

Slide 19 text

Alternate Solution • You are describing a workflow • There is a perfectly fine standard: BPMN • Business Process Model and Notation Task 1 Task 2

Slide 20

Slide 20 text

System.out.println("Hello World"); https://github.com/evacchi/ypaat Start End Hello

Slide 21

Slide 21 text

Start End Hello Downside: Nobody will invite you at their conference to talk about BPM.

Slide 22

Slide 22 text

Start End Hello Unless you trick them. Downside: Nobody will invite you at their conference to talk about BPM.

Slide 23

Slide 23 text

Bonuses for choosing BPMN • Standard XML-based serialization format • that's not the bonus • There is standard tooling to validate and parse • that is a bonus • Moreover: • Different types of nodes included in the main spec • Optional spec for laying out nodes on a diagram Start End Hello

Slide 24

Slide 24 text

Goals • Read a BPMN workflow • Execute that workflow • Visualize that workflow Start End Hello

Slide 25

Slide 25 text

Step 1 Recognize your compilation phase

Slide 26

Slide 26 text

What's a compilation phase? • It's your setup phase. • You do it only once before the actual processing begins

Slide 27

Slide 27 text

Configuring the application • Problem. Use config values from a file/env vars/etc • Do you validate config values each time you read them? • Compile-time: • Read config values into a validated data structure • Run-time: • Use validated config values

Slide 28

Slide 28 text

Data Transformation Pipeline • Problem. Manipulate data to produce analytics • Compile-time: • Define transformations (e.g. map, filter, etc. operations) • Decide the execution plan (local, distributed, etc.) • Run-time: • Evaluate the execution plan

Slide 29

Slide 29 text

Example: BPMN Execution • Problem. Execute a workflow description. • Compile-time: • Read BPMN into a visitable structure (StartEvent) • Run-time: • Visit the structure • For each node, execute tasks Start End Hello

Slide 30

Slide 30 text

Example: BPMN Visualization • Problem. Visualize a workflow diagram. • Compile-time: • Read BPMN into a graph • Run-time: • For each node and edge, draw on a canvas Start End Hello

Slide 31

Slide 31 text

Read BPMN into a Data Structure • Full XML Schema Definition* is automatically mapped onto Java classes, validated against schema constraints TDefinitions tdefs = JAXB.unmarshal( resource, TDefinitions.class); * Yes kids, we have working schemas

Slide 32

Slide 32 text

BPMN: From Tree to Graph • No ordering imposed on the description System.out.println("Hello World"); Forward References

Slide 33

Slide 33 text

System.out.println("Hello World"); https://github.com/evacchi/ypaat Separate Layout Definition

Slide 34

Slide 34 text

System.out.println("Hello World"); https://github.com/evacchi/ypaat Separate Layout Definition

Slide 35

Slide 35 text

Step 2 Work like a compiler

Slide 36

Slide 36 text

Compiling a programming language • You start from a text representation of a program • The text representation is fed to a parser • The parser returns a parse tree • The parse tree is refined into an abstract syntax tree (AST) • The AST is further refined through intermediate representations (IRs) • Up until the final representation is returned

Slide 37

Slide 37 text

Slide 38

Slide 38 text

What makes a compiler a proper compiler • Not optimization • Compilation Phases • You can have as many as you like

Slide 39

Slide 39 text

Example. A Configuration File 3 Sanitize values 2 Unmarshall file into a typed object 1 Read file from (class)path 5 Coerce to typed values 4 Validate values

Slide 40

Slide 40 text

Example. Produce a Report 3 Merge into single data stream 2 Discard invalid values 1 Fetch data from different sources 5 Generate synthesis data structure 4 Compute aggregates (sums, avgs, etc.)

Slide 41

Slide 41 text

Example. A Workflow Engine 2 Collect nodes 1 Read BPMN file 4 Prepare for visit/layout 3 Collect edges Start End Hello

Slide 42

Slide 42 text

Compilation Phases • Better separation of concerns • Better testability • You can test each intermediate result • You can choose when and where each phase gets evaluated • More Requirements = More Phases !

Slide 43

Slide 43 text

Phase vs Pass • Many phases do not necessarily mean as many passes • You could do several phases in one pass • Logically phases are still distinct

Slide 44

Slide 44 text

One Pass vs. Multi-Pass for value in config: sanitized = sanitize(value) validated = validate(sanitized) coerced = coerce(validated) for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) Myth: one pass doing many things is better than doing many passes, each doing one thing

Slide 45

Slide 45 text

It is not: Complexity for value in config: sanitized = sanitize(value) validated = validate(sanitized) coerced = coerce(validated) n times: sanitize = 1 op validate = 1 op coerce = 1 op (1 op + 1 op + 1 op) × n = 3n for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) n times: sanitize = n op n times: validate = n op n times: coerce = n op (n + n + n) = 3n

Slide 46

Slide 46 text

Single-pass is not always possible However, doing one pass may be be cumbersome or plain impossible to do System.out.println("Hello World"); Forward References

Slide 47

Slide 47 text

Workflow Phases: Evaluation var resource = getResourceAsStream("/example.bpmn2"); var tdefs = unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); https://github.com/evacchi/ypaat 2 3 4 5 1 // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // “interpret” the graph var engine = new Engine(engineGraph); engine.eval();

Slide 48

Slide 48 text

Workflow Phases: Layout ... ... https://github.com/evacchi/ypaat var resource = getResourceAsStream("/example.bpmn2"); var tdefs = unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); 2 3 4 5 1 // extract layout information var extractor = new LayoutExtractor(); extractor.visit(tdefs); var index = extractor.index(); // “compile” into buffered image var canvas = new Canvas(graphBuilder, index); var bufferedImage canvas.eval();

Slide 49

Slide 49 text

Visitors

Slide 50

Slide 50 text

Data Structures TFlowElement | +---- StartEventNode | +---- EndEventNode | `---- ScriptTask

Slide 51

Slide 51 text

Pattern Matching nodeCollector.visit(node) def visit(node: TFlowElement) = { node match { case StartEventNode(...) => ... case EndEventNode(...) => ... case ScriptTask(...) => ... } }

Slide 52

Slide 52 text

The Poor Man's Alternatives interface Visitor { void visit(TFlowElement el); void visit(TStartEventNode start); void visit(TEndEventNode end); void visit(TScriptTask task); } interface Visitable { void accept(Visitor v); } if (node instanceof StartEventNode) { StartEventNode evt = (StartEventNode) node; ... } else if (node instanceof EndEventNode) { EndEventNode evt = (EndEventNode) node; ... } else if (node instanceof ScriptTask) ScriptTask evt = (ScriptTask) node; ... }

Slide 53

Slide 53 text

Visitor Pattern class NodeCollector implements Visitor { void visit(TStartEventNode start) { graphBuilder.add( new StartEventNode(evt.getId(), evt)); } void visit(TEndEvent evt) { graphBuilder.add( new EndEventNode(evt.getId(), evt)); } void visit(TScriptTask task) { graphBuilder.add( new ScriptTaskNode(task.getId(), task)); } } class EdgeCollector implements Visitor { void visit(TSequenceFlow seq) { graphBuilder.addEdge( seq.getId(), seq.getSourceRef(), seq.getTargetRef()); } } https://github.com/evacchi/ypaat

Slide 54

Slide 54 text

Step 3 Choose a run-time representation

Slide 55

Slide 55 text

Workflow Evaluation • Choose a representation suitable for evaluation • In our case, for each node, we need to get the outgoing edges with the next node to visit • The most convenient representation of the graph is adjacency lists • adj( p ) = { q | ( p, q ) edges } var graphBuilder = new GraphBuilder(); ... // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // decorate with an evaluator var engine = new Engine(engineGraph); // evaluate the graph by visiting once more engine.eval(); Map> outgoing;

Slide 56

Slide 56 text

Workflow Evaluation • The most convenient representation of the graph is adjacency lists • adj( p ) ↦ { q | ( p, q ) edges } • Map> outgoing

Slide 57

Slide 57 text

Evaluation class Engine implements GraphVisitor { void visit(StartEventNode node) { logger.info("Process '{}' started.", graph.name()); graph.outgoing(node).forEach(this::visit); } void visit(EndEventNode node) { logger.info("Process ended."); // no outgoing edges } void visit(ScriptTaskNode node) { logger.info("Evaluating script task: {}", node.element().getScript().getContent()); graph.outgoing(node).forEach(this::visit); } ... } https://github.com/evacchi/ypaat

Slide 58

Slide 58 text

Workflow Layout • In this case, for each node and edge, we need to get the shape and position • No particular ordering is required • e.g. first render edges and then shapes ... ... var canvas = new Canvas(graph, index); var bufferedImage canvas.eval(); void eval() { graph.edges().forEach(this::draw); graph.nodes().forEach(this::visit); } https://github.com/evacchi/ypaat

Slide 59

Slide 59 text

Layout class Canvas implements GraphVisitor { void draw(Edge edge) { var pts = index.edge(edge.id()); setStroke(Color.BLACK); var left = pts.get(0); for (int i = 1; i < pts.size(); i++) { var right = pts.get(i); drawLine(left.x, left.y, right.x, right.y); left = right; } } void visit(StartEventNode node) { var shape = shapeOf(node); setStroke(Color.BLACK); setFill(Color.GREEN); drawEllipse(shape.x, shape.y, shape.width, shape.height); drawLabel(element.getName()); } ... } Start End Hello

Slide 60

Slide 60 text

Bonus Step 4 Generate code at compile-time

Slide 61

Slide 61 text

The Killer App • Move pre-processing out of program run-time • Generate code • Run-time effectively consists only in pure processing

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

AI and Automation Platform • Drools rule engine • jBPM workflow platform • OptaPlanner constraint solver

Slide 64

Slide 64 text

The Submarine Initiative “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” Edsger W. Dijkstra

Slide 65

Slide 65 text

GraalVM: “One VM to Rule Them All” • Polyglot VM with cross-language JIT • Java Bytecode and JVM Languages • Dynamic Languages (Truffle API) • Native binary compilation (SubstrateVM)

Slide 66

Slide 66 text

GraalVM: “One VM to Rule Them All” • Polyglot VM with cross-language JIT • Java Bytecode and JVM Languages • Dynamic Languages (Truffle API) • Native binary compilation (SubstrateVM)

Slide 67

Slide 67 text

Native Image: Restrictions • Native binary compilation • Restriction: “closed-world assumption” • No dynamic code loading • You must declare classes you want to reflect upon

Slide 68

Slide 68 text

Quarkus

Slide 69

Slide 69 text

Drools and jBPM rule R1 when // constraints $r : Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end Drools jBPM

Slide 70

Slide 70 text

Drools DRL rule R1 when // constraints $r : Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end var r = declarationOf(Result.class, "$r"); var p = declarationOf(Person.class, "$p"); var rule = rule("com.example", "R1").build( pattern(r), pattern(p) .expr("e", p -> p.getAge() >= 18), alphaIndexedBy( int.class, GREATER_OR_EQUAL, 1, this::getAge, 18), reactOn("age")), on(p, r).execute( ($p, $r) -> $r.setValue( $p.getName() + " can drink")));

Slide 71

Slide 71 text

jBPM RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems"); factory.variable("order", new ObjectDataType("com.myspace.demo.Order")); factory.variable("item", new ObjectDataType("java.lang.String")); factory.name("orderItems"); factory.packageName("com.myspace.demo"); factory.dynamic(false); factory.version("1.0"); factory.visibility("Private"); factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20"); org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1); startNode1.name("Start"); startNode1.done(); org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2); actionNode2.name("Show order details"); actionNode2.action(kcontext -> {

Slide 72

Slide 72 text

Startup Time

Slide 73

Slide 73 text

Conclusion

Slide 74

Slide 74 text

Take Aways • Process in phases • Do more in the pre-processing phase (compile-time) • Do less during the processing phase (run-time) • In other words, separate what you can do once from what you have to do repeatedly • Move all or some of your phases to compile-time

Slide 75

Slide 75 text

Resources • Full Source Code https://github.com/evacchi/ypaat • Your Program as a Transpiler (part I) • Improving Application Performance by Applying Compiler Design http://bit.ly/ypaat-performance • Other resources • Submarine https://github.com/kiegroup/submarine-examples • Drools Blog http://blog.athico.com • Crafting Interpreters http://craftinginterpreters.com • GraalVM.org • Quarkus.io Edoardo Vacchi @evacchi

Slide 76

Slide 76 text

Q&A