Your Program as a Transpiler

Your Program as a Transpiler Applying Compiler Design to Everyday
Programming

About Me • Edoardo Vacchi @evacchi • Research @ University
of Milan • Research @ UniCredit R&D • Drools, jBPM, Kogito @ Red Hat

Motivation

Motivation • My first task in Red Hat: marshalling backend
for jBPM • Data model mapping • From XML tree model to graph representation • Apparently boring, but challenging in a way

Motivation • Language implementation is often seen as a dark
art • But some design patterns are simple at their core • Best practices can be applied to everyday programming

Motivation (cont'd) • Learning about language implementation will give you
a different angle to deal with many problems • It will lead you to a better understanding of how Quarkus and GraalVM AoT do their magic

Goals • Programs have often a pre-processing phase where you
prepare for execution • Then, there's actual process execution phase • Learn to recognize and structure the pre-processing phase

Transpilers

Transpilers vs. Compilers • Compiler: translates code written in a
language (source code) into code written in a target language (object code). The target language may be at a lower level of abstraction • Transpiler: translates code written in a language into code written in another language at the same level of abstraction (Source-to-Source Translator).

Are transpilers simpler than compilers? • Lower-level languages are complex
• They are not: if anything, they're simple • Syntactic sugar is not a higher-level of abstraction • It is: a concise construct is expanded at compile-time • Proper compilers do low-level optimizations • You are thinking of optimizing compilers.

The distinction is moot • It is pretty easy to
write a crappy compiler, call it a transpiler and feel at peace with yourself • Writing a good transpiler is no different or harder than writing a good compiler • So, how do you write a good compiler?

Your Program as a Compiler Applying Compiler Design to Everyday
Programming

Compiler-like workflows • At least two classes of problems can
be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems

Running Example Function Orchestration

Function Orchestration • You are building an immutable Dockerized serverless
function f g

Function Orchestration • Problem • No standard* way to describe
function orchestration yet * Yes, I know about https://github.com/cncf/wg-serverless f g

process: elements: - start: &_1 name: Start - function: &_2:
name: Hello - end: &_3 name: End - edge: source: *_1 target: *_2 - edge: source: *_2 target: *_3 Start End Hello Solution: Roll your own YAML format Congratulations ! Enjoy attending conferences worldwide

Alternate Solution • You are describing a workflow • There
is a perfectly fine standard: BPMN • Business Process Model and Notation Task 1 Task 2

<process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2" name="Hello">
<script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat Start End Hello

Start End Hello Downside: Nobody will invite you at their
conference to talk about BPM.

Start End Hello Unless you trick them. Downside: Nobody will
invite you at their conference to talk about BPM.

Bonuses for choosing BPMN • Standard XML-based serialization format •
that's not the bonus • There is standard tooling to validate and parse • that is a bonus • Moreover: • Different types of nodes included in the main spec • Optional spec for laying out nodes on a diagram Start End Hello

Goals • Read a BPMN workflow • Execute that workflow
• Visualize that workflow Start End Hello

Step 1 Recognize your compilation phase

What's a compilation phase? • It's your setup phase. •
You do it only once before the actual processing begins

Configuring the application • Problem. Use config values from a
file/env vars/etc • Do you validate config values each time you read them? • Compile-time: • Read config values into a validated data structure • Run-time: • Use validated config values

Data Transformation Pipeline • Problem. Manipulate data to produce analytics
• Compile-time: • Define transformations (e.g. map, filter, etc. operations) • Decide the execution plan (local, distributed, etc.) • Run-time: • Evaluate the execution plan

Example: BPMN Execution • Problem. Execute a workflow description. •
Compile-time: • Read BPMN into a visitable structure (StartEvent) • Run-time: • Visit the structure • For each node, execute tasks Start End Hello

Example: BPMN Visualization • Problem. Visualize a workflow diagram. •
Compile-time: • Read BPMN into a graph • Run-time: • For each node and edge, draw on a canvas Start End Hello

Read BPMN into a Data Structure • Full XML Schema
Definition* is automatically mapped onto Java classes, validated against schema constraints TDefinitions tdefs = JAXB.unmarshal( resource, TDefinitions.class); * Yes kids, we have working schemas

BPMN: From Tree to Graph • No ordering imposed on
the description <process id="Minimal" name="Minimal Process"> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> <scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <startEvent id="_1" name="Start"/> </process> Forward References

<definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> <scriptTask id="_2"
name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> </process> https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_2"> <dc:Bounds x="193" y="30" width="80" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNShape bpmnElement="_3"> <dc:Bounds x="396" y="30" width="48" height="48"/> </bpmndi:BPMNShape> <bpmndi:BPMNEdge bpmnElement="_1-_2"> <di:waypoint x="35" y="50"/> <di:waypoint x="229" y="50"/> </bpmndi:BPMNEdge> <bpmndi:BPMNEdge bpmnElement="_2-_3"> <di:waypoint x="229" y="50"/> <di:waypoint x="441" y="50"/> </bpmndi:BPMNEdge> </bpmndi:BPMNPlane> </bpmndi:BPMNDiagram> </definitions> Separate Layout Definition

Step 2 Work like a compiler

Compiling a programming language • You start from a text
representation of a program • The text representation is fed to a parser • The parser returns a parse tree • The parse tree is refined into an abstract syntax tree (AST) • The AST is further refined through intermediate representations (IRs) • Up until the final representation is returned

What makes a compiler a proper compiler • Not optimization
• Compilation Phases • You can have as many as you like

Example. A Configuration File 3 Sanitize values 2 Unmarshall file
into a typed object 1 Read file from (class)path 5 Coerce to typed values 4 Validate values

Example. Produce a Report 3 Merge into single data stream
2 Discard invalid values 1 Fetch data from different sources 5 Generate synthesis data structure 4 Compute aggregates (sums, avgs, etc.)

Example. A Workflow Engine 2 Collect nodes 1 Read BPMN
file 4 Prepare for visit/layout 3 Collect edges Start End Hello

Compilation Phases • Better separation of concerns • Better testability
• You can test each intermediate result • You can choose when and where each phase gets evaluated • More Requirements = More Phases !

Phase vs Pass • Many phases do not necessarily mean
as many passes • You could do several phases in one pass • Logically phases are still distinct

One Pass vs. Multi-Pass for value in config: sanitized =
sanitize(value) validated = validate(sanitized) coerced = coerce(validated) for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) Myth: one pass doing many things is better than doing many passes, each doing one thing

It is not: Complexity for value in config: sanitized =
sanitize(value) validated = validate(sanitized) coerced = coerce(validated) n times: sanitize = 1 op validate = 1 op coerce = 1 op (1 op + 1 op + 1 op) × n = 3n for value in config: sanitized += sanitize(value) for value in sanitized: validated += validate(value) for value in validated: coerced += coerce(value) n times: sanitize = n op n times: validate = n op n times: coerce = n op (n + n + n) = 3n

Single-pass is not always possible However, doing one pass may
be be cumbersome or plain impossible to do <process id="Minimal" name="Minimal Process"> <sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/> <endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent> <sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/> <scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask> <startEvent id="_1" name="Start"/> </process> Forward References

Workflow Phases: Evaluation var resource = getResourceAsStream("/example.bpmn2"); var tdefs =
unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); https://github.com/evacchi/ypaat 2 3 4 5 1 // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // “interpret” the graph var engine = new Engine(engineGraph); engine.eval();

Workflow Phases: Layout <?xml version="1.0" encoding="UTF-8"?> <definitions ...> <process id="Minimal"
name="Minimal Process"> <startEvent id="_1" name="Start"/> ... </process> <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> ... </bpmndi:BPMNDiagram> </definitions> https://github.com/evacchi/ypaat var resource = getResourceAsStream("/example.bpmn2"); var tdefs = unmarshall(resource, TDefinitions.class); var graphBuilder = new GraphBuilder(); // collect nodes on the builder var nodeCollector = new NodeCollector(graphBuilder); nodeCollector.visitFlowElements(tdefs.getFlowElements()); // collect edges on the builder var edgeCollector = new EdgeCollector(graphBuilder); edgeCollector.visitFlowElements(tdefs.getFlowElements()); 2 3 4 5 1 // extract layout information var extractor = new LayoutExtractor(); extractor.visit(tdefs); var index = extractor.index(); // “compile” into buffered image var canvas = new Canvas(graphBuilder, index); var bufferedImage canvas.eval();

Visitors

Data Structures TFlowElement | +---- StartEventNode | +---- EndEventNode |
`---- ScriptTask

Pattern Matching nodeCollector.visit(node) def visit(node: TFlowElement) = { node match
{ case StartEventNode(...) => ... case EndEventNode(...) => ... case ScriptTask(...) => ... } }

Pattern Matching nodeCollector.visit(node) void visit(TFlowElement node) { switch (node) {
case StartEventNode(...) -> ... case EndEventNode(...) -> ... case ScriptTask(...) -> ... } }

The Poor Man's Alternatives interface Visitor { void visit(TFlowElement el);
void visit(TStartEventNode start); void visit(TEndEventNode end); void visit(TScriptTask task); } interface Visitable { void accept(Visitor v); } if (node instanceof StartEventNode) { StartEventNode evt = (StartEventNode) node; ... } else if (node instanceof EndEventNode) { EndEventNode evt = (EndEventNode) node; ... } else if (node instanceof ScriptTask) ScriptTask evt = (ScriptTask) node; ... }

Step 3 Choose a run-time representation

Workflow Evaluation • Choose a representation suitable for evaluation •
In our case, for each node, we need to get the outgoing edges with the next node to visit • The most convenient representation of the graph is adjacency lists • adj( p ) = { q | ( p, q ) edges } var graphBuilder = new GraphBuilder(); ... // prepare graph for visit var engineGraph = EngineGraph.of(graphBuilder); // decorate with an evaluator var engine = new Engine(engineGraph); // evaluate the graph by visiting once more engine.eval(); Map<Node, List<Node>> outgoing;

Workflow Evaluation • The most convenient representation of the graph
is adjacency lists • adj( p ) ↦ { q | ( p, q ) edges } • Map<Node, List<Node>> outgoing

Evaluation class Engine implements GraphVisitor { void visit(StartEventNode node) {
logger.info("Process '{}' started.", graph.name()); graph.outgoing(node).forEach(this::visit); } void visit(EndEventNode node) { logger.info("Process ended."); // no outgoing edges } void visit(ScriptTaskNode node) { logger.info("Evaluating script task: {}", node.element().getScript().getContent()); graph.outgoing(node).forEach(this::visit); } ... } https://github.com/evacchi/ypaat

Workflow Layout • In this case, for each node and
edge, we need to get the shape and position • No particular ordering is required • e.g. first render edges and then shapes <?xml version="1.0" encoding="UTF-8"?> <definitions ...> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/> ... </process> <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess"> <bpmndi:BPMNShape bpmnElement="_1"> <dc:Bounds x="11" y="30" width="48" height="48"/> ... </bpmndi:BPMNDiagram> </definitions> var canvas = new Canvas(graph, index); var bufferedImage canvas.eval(); void eval() { graph.edges().forEach(this::draw); graph.nodes().forEach(this::visit); } https://github.com/evacchi/ypaat

Layout class Canvas implements GraphVisitor { void draw(Edge edge) {
var pts = index.edge(edge.id()); setStroke(Color.BLACK); var left = pts.get(0); for (int i = 1; i < pts.size(); i++) { var right = pts.get(i); drawLine(left.x, left.y, right.x, right.y); left = right; } } void visit(StartEventNode node) { var shape = shapeOf(node); setStroke(Color.BLACK); setFill(Color.GREEN); drawEllipse(shape.x, shape.y, shape.width, shape.height); drawLabel(element.getName()); } ... } Start End Hello

Bonus Step 4 Generate code at compile-time

The Killer App • Move pre-processing out of program run-time
• Generate code • Run-time effectively consists only in pure processing

Compiler-like workflows • At least two classes of problems can
be solved with compiler-like workflows • Boot time optimization problems • Data transformation problems

Example: Application Wiring • You are building an immutable Dockerized
microservice • Do you really need all that Runtime Reflection? • Do you really need Runtime Dependency Injection? public class Example { private final Animal animal; @Inject public Example(Animal animal) { this.animal = animal; } public Animal animal() { return animal; } } public interface Animal {} @InjectCandidate public class Dog implements Animal {}

All these things make your startup slow! • But it's
done only once! • Never is better than once • But it's flexible • Ask yourself when is the last time you changed dependencies/startup config/classpath at runtime • If it's recent, ask yourself the price you pay for that flexibility

Example: A Quick DI Framework https://github.com/evacchi/reflection-vs-codegen public class Example {
private final Animal animal; @Inject public Example(Animal animal) { this.animal = animal; } public Animal animal() { return animal; } } public interface Animal {} @InjectCandidate public class Dog implements Animal {}

Binder binder = new Binder(); binder.scan(); Example ex = binder.createInstance(Example.class);
Animal animal = ex.animal(); Objects.requireNonNull(animal); assert animal instanceof Dog https://github.com/evacchi/reflection-vs-codegen

public class Binder { public Binder scan() { Reflections reflections
= new Reflections(); reflections.getTypesAnnotatedWith(InjectCandidate.class) .forEach(t -> bindings.put(interfaceOf(t), constructorOf(t))); return this; } public <T> T createInstance(Class<? extends T> t) { return (T) Arrays.stream(t.getDeclaredConstructors()) .filter(c -> c.getAnnotation(Inject.class) != null) .peek(c -> c.setAccessible(true)) .map(this::createInstance) .findFirst().get(); } ... } https://github.com/evacchi/reflection-vs-codegen At run-time, complexity is easy to miss This loop gets executed at each instance creation

public void scan() { Reflections reflections = new Reflections(); //
resolve injection candidates reflections.getTypesAnnotatedWith(InjectCandidate.class); // resolve injected constructors reflections.getConstructorsAnnotatedWith(Inject.class); // collect candidates reflections.forEach(this::collect); // resolve mappings resolveMappings(); }

The processor is triggered by the Java compiler for claimed
annotations. Bindings bindings = processInjectionCandidates( env.getElementsAnnotatedWith(InjectCandidate.class)); processInjectionSites( env.getElementsAnnotatedWith(Inject.class), bindings); generateJavaSources(bindings); https://github.com/evacchi/reflection-vs-codegen DI: Annotation Processor

Example % time java io.github.evacchi.Reflective 6.94s user 0.29s system 259%
cpu 2.785 total % time java io.github.evacchi.Codegen 0.08s user 0.01s system 111% cpu 0.087 total

Example % time java io.github.evacchi.Reflective 6.94s user 0.29s system 259%
cpu 2.785 total % time java io.github.evacchi.Codegen 0.08s user 0.01s system 111% cpu 0.087 total % time ./io.github.evacchi.codegen ./io.github.evacchi.codegen 0.00s user 0.00s system 86% cpu 0.003 total

Kogito ergo cloud.

AI and Automation Platform • Drools rule engine • jBPM
workflow platform • OptaPlanner constraint solver

Quarkus

code.quarkus.io

Drools and jBPM rule R1 when // constraints $r :
Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end Drools jBPM

Drools DRL rule R1 when // constraints $r : Result()
$p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end var r = declarationOf(Result.class, "$r"); var p = declarationOf(Person.class, "$p"); var rule = rule("com.example", "R1").build( pattern(r), pattern(p) .expr("e", p -> p.getAge() >= 18), alphaIndexedBy( int.class, GREATER_OR_EQUAL, 1, this::getAge, 18), reactOn("age")), on(p, r).execute( ($p, $r) -> $r.setValue( $p.getName() + " can drink")));

jBPM RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems"); factory.variable("order", new ObjectDataType("com.myspace.demo.Order")); factory.variable("item", new
ObjectDataType("java.lang.String")); factory.name("orderItems"); factory.packageName("com.myspace.demo"); factory.dynamic(false); factory.version("1.0"); factory.visibility("Private"); factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20"); org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1); startNode1.name("Start"); startNode1.done(); org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2); actionNode2.name("Show order details"); actionNode2.action(kcontext -> {

Startup Time

Conclusion

Take Aways • Process in phases • Do more in
the pre-processing phase (compile-time) • Do less during the processing phase (run-time) • In other words, separate what you can do once from what you have to do repeatedly • Move all or some of your phases to compile-time

Resources • Full Source Code https://github.com/evacchi/ypaat • Your Program as
a Transpiler (part I) • Improving Application Performance by Applying Compiler Design http://bit.ly/ypaat-performance https://github.com/evacchi/reflection-vs-codegen • Other resources • Kogito https://github.com/kiegroup/kogito-examples • Drools Blog http://blog.athico.com • Crafting Interpreters http://craftinginterpreters.com • Quarkus.io Edoardo Vacchi @evacchi

Your Program as a Transpiler

Your Program as a Transpiler

More Decks by Edoardo Vacchi

Other Decks in Technology

Featured

Transcript