Slide 1

Slide 1 text

47deg.com The Shape of Kotlin AST Parsing and its role in the Kotlin compiler Amanda Hinchman-Dominguez

Slide 2

Slide 2 text

47deg.com The Shape of Kotlin ● How I stumbled on to AST parsing ● Debunk the black box that is the Kotlin Compiler ● How 47 Degrees is leveraging AST parsing in Arrow-meta elevate the power of Kotlin metaprogramming 2 Amanda Hinchman-Dominguez

Slide 3

Slide 3 text

47deg.com 3

Slide 4

Slide 4 text

47deg.com 4 Detecting UI inputs Amanda Hinchman-Dominguez ● Abstract Syntax Tree Parsing (AST) ● Program Structure Interface (PSI)

Slide 5

Slide 5 text

47deg.com 5

Slide 6

Slide 6 text

47 Degrees is a global consulting firm and certified LightBend and Databricks Partner 47deg.com Specializing in 6

Slide 7

Slide 7 text

We love Open Source! contributors stars repositories languages 656 3217 93 18 47deg.github.io

Slide 8

Slide 8 text

47deg.com 8 The Kotlin compiler is scary!

Slide 9

Slide 9 text

47deg.com 9 By studying AST, we can learn a lot of about the Kotlin compiler.

Slide 10

Slide 10 text

47deg.com 10 What is an Abstract Syntax Tree (AST)?

Slide 11

Slide 11 text

47deg.com ● AST is a form of abstracted representation that is generated and used in several roles within the compiler ● AST is a tree made of nodes that have direct mapping to the text ranges in the underlying document 11 Bottom-most nodes of an AST matches individual tokens Higher nodes matches multiple-token fragments

Slide 12

Slide 12 text

47deg.com What does AST parsing tell us? Amanda Hinchman-Dominguez ● AST parsing tells us how code has been written by the end user ● AST parsing gives all but punctuation in the analyzed text range including tokens 12

Slide 13

Slide 13 text

47deg.com Amanda Hinchman-Dominguez 13 Element.BINARY_EXPRESSION Element.INTEGER_CONSTANT Element.INTEGER_CONSTANT Token.PLUS Token.INTEGER_LITE RAL Token.INTEGER_LITE RAL 2 + 3

Slide 14

Slide 14 text

47deg.com Amanda Hinchman-Dominguez 14 Element.INTEGER_CONSTANT Element.CALL_EXPRESSION Element.DOT_QUALIFIED_EXPLANATION Token.DOT Token.NUMBER Token.IDENTIFIER 2.plus(3) Element.REFERENCE_ EXPRESSION Element.VALUE_ARGUMENT_LIST Token.LPAR Token.RPAR Element.VALUE_ARGUMENT 14 Element.INTEGER_CONSTANT Token.INTEGER_LITERAL

Slide 15

Slide 15 text

47deg.com 15 The Kotlin Compiler

Slide 16

Slide 16 text

47deg.com Amanda Hinchman-Dominguez Parsing Phase ● Builds the AST tree ● Analyzes the tree and augments with complete information

Slide 17

Slide 17 text

47deg.com Amanda Hinchman-Dominguez ● The basis for any compiler optimization ● Transforms the input program into unoptimized intermediate representation ● Generates Program Structure Interface (PSI) Analysis Phase

Slide 18

Slide 18 text

47deg.com Amanda Hinchman-Dominguez ● Generates 2 symbol tables: ○ One to accompany AST ○ Another for the associating generated IR model ● PSI enhanced with descriptors which have been typed-checked ● Optimizations performed on IR to improve quality and performance of machine code Resolution Phase

Slide 19

Slide 19 text

47deg.com Amanda Hinchman-Dominguez ● Transforms IR into the native machine language Codegen

Slide 20

Slide 20 text

47deg.com 20 What role(s) does AST play in the Kotlin compiler?

Slide 21

Slide 21 text

47deg.com Amanda Hinchman-Dominguez ● Lexer breaks code text into a sequence of lexical token ● Lexer may break code into multiple fragments while scanning or into lexemes Lexer

Slide 22

Slide 22 text

47deg.com Amanda Hinchman-Dominguez Builds the AST tree. The parse tree is often: ● Analyzed ● Augmented ● Transformed In later phases of the compiler Syntax Analyzer

Slide 23

Slide 23 text

47deg.com Amanda Hinchman-Dominguez ● Compiler checks the AST tree for type checking and semantic analysis ● Generates symbol table & IR Semantic Analyzer

Slide 24

Slide 24 text

47deg.com Amanda Hinchman-Dominguez ● Outputs unoptimized intermediate representation (IR) ● Analysis performed on IR ○ Control flow ○ Call stacks Intermediate Code Generator

Slide 25

Slide 25 text

47deg.com Amanda Hinchman-Dominguez ● Machine-dependent optimizations on IR ● Improves performance & quality of produced machine code ● Resource & storage decisions Intermediate Code Optimizer

Slide 26

Slide 26 text

47deg.com 26 How does AST relate to PSI?

Slide 27

Slide 27 text

47deg.com Amanda Hinchman-Dominguez 27 Element.BINARY_EXP RESSION Element.INTEGER_LITERAL Element.INTEGER_LITERAL Element.FILE Token.PLUS Token.NUMBER INTEGER_LITERAL AST Tree 2 + 3

Slide 28

Slide 28 text

47deg.com Amanda Hinchman-Dominguez 28 KtBinaryExpression KtConstantExpression KtConstantExpression KtFile PSI Tree 2 + 3

Slide 29

Slide 29 text

47deg.com 29 PsiViewer Plugin

Slide 30

Slide 30 text

47deg.com 30 AST/PSI vs. Descriptors vs. IR?

Slide 31

Slide 31 text

47deg.com Amanda Hinchman-Dominguez ● IR is generated as another form of abstracted representation for CPU-level architecture ● PSI and IR each have symbol tables mapping their nodes to descriptors PSI Descriptor IR

Slide 32

Slide 32 text

47deg.com A sneak peek into Arrow-meta Run Wild, Run Free! 32

Slide 33

Slide 33 text

47deg.com Arrow-meta intercepts AST & its resulting models ● AST allows us to alter the surface level of language without changing the rest of the compiler (although we can and usually do) 33

Slide 34

Slide 34 text

47deg.com 34

Slide 35

Slide 35 text

47deg.com 35

Slide 36

Slide 36 text

47deg.com 36

Slide 37

Slide 37 text

47deg.com 37

Slide 38

Slide 38 text

47deg.com 38

Slide 39

Slide 39 text

47deg.com 39

Slide 40

Slide 40 text

47deg.com 40

Slide 41

Slide 41 text

47deg.com 41

Slide 42

Slide 42 text

47deg.com 42

Slide 43

Slide 43 text

47deg.com 43

Slide 44

Slide 44 text

47deg.com KEEP Insisting! - KotlinConf 2019

Slide 45

Slide 45 text

Thank you! 47deg.com

Slide 46

Slide 46 text

47deg.com 46 ● Arrow-meta: https://github.com/arrow-kt/arrow-meta ● Kotlin compiler crash course: https://github.com/ahinchman1/Kotlin-Compiler-Crash-Course Sources Cited