Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

PHP • A multi-paradigm, simple, pragmatic language • Target the Web • Two major runtimes: • Zend Engine • HHVM

Slide 3

Slide 3 text

PHP is critical Powers 80% of all sites on Internet

Slide 4

Slide 4 text

Vulnerabilities • Almost 500 known • Almost 50 with a CVE score ≥ 9 over 10, e.g.: • Memory corruptions [1, 2, 3] • Errors in parsers [4, 5, 6, 7, 8] • Result in code execution or Denial Of Service • Same for Python or Java, not a language problem

Slide 5

Slide 5 text

Why so many? • Because implementing such a language is hard • It connects with a lot of exotic technologies • Because written in C and C++

Slide 6

Slide 6 text

PHP is good • The language is getting better and better • But the tooling is becoming more important than the language features

Slide 7

Slide 7 text

Tagua VM An experimental PHP virtual machine Ivan Enderlin, May 2017

Slide 8

Slide 8 text

Hello, Tagua! Provide safety and high quality by removing large classes of vulnerabilities and thus avoid the cost of dramatic bugs

Slide 9

Slide 9 text

Hello, Tagua! Provide modernity, new developer experience and state-of-the-art algorithms so performance

Slide 10

Slide 10 text

Hello, Tagua! Provide a set of libraries that will compose the VM and that can be reused outside of the project (like the parser, analysers, extensions etc.)

Slide 11

Slide 11 text

Hello, Tagua! 1. Safe and high quality, 2. Modernity, new developer experience, 3. Designed as a set of libraries to be reused Ambitious, isn’t it?

Slide 12

Slide 12 text

Hello, Tagua! • Developed with Rust • A language that guarantees memory safety, threads without data races, zero-cost abstractions, minimal runtime and, as a bonus, efficient C bindings, in addition to being as fast as C • Developed with LLVM • A solid, state-of-the-art, research, widely used modular and reusable compiler and toolchains technologies

Slide 13

Slide 13 text

Compiler architecture Front end Middle end Back end

Slide 14

Slide 14 text

Front end Lexical analysis Syntactic analysis Abstract Syntax Tree Text input

Slide 15

Slide 15 text

Lexical analysis Lexical analysis Text input Sequence of lexemes

Slide 16

Slide 16 text

Lexical analysis Lexical analysis Text input Sequence of lexemes echo ‘hello’, ‘world’; echo ‘hello’ echo offset 0, length 4, … string_single_quoted offset 5, length 7, … semicolon offset 21, length 1, … comma offset 12, length 1, … string_single_quoted offset 14, length 7, … ‘world’ , ;

Slide 17

Slide 17 text

Syntactic analysis Syntactic analysis Sequence of lexemes Grammar Lexical analysis

Slide 18

Slide 18 text

Syntactic analysis Syntactic analysis Sequence of lexemes Grammar Lexical analysis statement: ( echo | … | … ) “;” echo: “echo” expression+ ”,” expression: string | … | … string: … Can the sequence be derived regarding the grammar?

Slide 19

Slide 19 text

Abstract Syntax Tree Syntactic analysis Abstract Syntax Tree echo ‘hello’ ‘world’ , ; echo ‘hello’ ‘world’

Slide 20

Slide 20 text

In Tagua VM • Lexical and syntactic analyses are merged • 1 pass instead of 2 over all lexemes • Zero-copy in the parser, few allocations in the producers • Descriptive and contextual errors • Based on an extremely fast parser, nom • Safe parsing

Slide 21

Slide 21 text

Parsing in PHP? • Sure, Hoa\Compiler • A grammar description language (PP) • LL(k) compiler-compiler

Slide 22

Slide 22 text

Middle end Middle-level IR Low-level IR Various passes Abstract Syntax Tree

Slide 23

Slide 23 text

Intermediate Representation • Text file is useless, a High-level IR (HIR) is required • Abstract Syntax Tree is a form of a HIR • A Middle-level IR (MIR) can be needed • Control Flow Graph is a form of a MIR • Emit a Low-level IR (LIR) for the back end

Slide 24

Slide 24 text

Pass • Applies a transformation or a check on the IR, e.g.: • type inference • number overflow • data integrity • security vulnerabilities • remove dead branches • inline functions • constant folding • boolean and arithmetic simplifications • loop simplifications • tail recursion • …

Slide 25

Slide 25 text

In Tagua VM • AST = HIR • CFG + simplified PHP = MIR • Unfold intrinsic constructions (loops, iterators, magic methods…) • Static single assignment • Monomorphisation • etc. • Should simplify transformation to LIR

Slide 26

Slide 26 text

Simplified PHP foreach ($iterator as $key => $value) { process($value); } $iterator->rewind(); while ($iterator->valid()) { $key = $iterator->key(); $value = $iterator->current(); process($value); $iterator->next(); }

Slide 27

Slide 27 text

Disclaimer • Not yet implemented • It has drawbacks, like it can be costly but: • It saves time for MIR to LIR transformation • Passes and optimisations are simpler • Could simplify caching • It is always about balancing the equation

Slide 28

Slide 28 text

Back end Low-level IR Result Execution engine Another language

Slide 29

Slide 29 text

Flavours • Execute the LIR directly, or a transformation of it into a native code ‣ Just-in-time compilation, interpret the LIR or compile it during the execution • Save or transform the LIR into another language (like native code, wasm, bytecode…) ‣ Ahead-of-time compilation, compile before the execution vs Offline compiler Online compiler

Slide 30

Slide 30 text

Real world examples Source Opcode Execution Source Bytecode AOT compilation JIT compilation Execution Source IRs AOT compilation Execution later PHP Java Rust JIT compilation

Slide 31

Slide 31 text

Execution engine • Must target the machine code • Must be aware of all the hardware features • x86, ARM, MIPS, SPARC… • 32, 64, 128, 256 bits • Number of CPU • Kind of RAM • SIMD (MMX, SSE, AVX, NEON…) • GPU (OpenCL) • Extremely difficult to do, and hard to test

Slide 32

Slide 32 text

LLVM • A compiler infrastructure: • Low-level IR (LLVM IR) • Linker • Debugger (LLDB) • Execution engines • Much more…

Slide 33

Slide 33 text

In Tagua VM • LIR = LLVM IR • Execution engine = LLVM execution engine • JIT engine is currently MCJIT, OrcJIT is coming • Debugger = LLDB

Slide 34

Slide 34 text

Back end Lower-level IR Result Execution engine

Slide 35

Slide 35 text

Back end LLVM IR Result OrcJIT

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

–Automattic’s creed “I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.”

Slide 39

Slide 39 text

High quality • Safety first • Unit test suites • Integration test suites • Documentation test suites • Continuously tested on macOS, Linux Ubuntu, and Windows • Everything is documented

Slide 40

Slide 40 text

Developer experience • Be 100% compatible with the PHP Language Specification • Be ≃ 100% compatible with Zend Engine • Modernise the PHP development

Slide 41

Slide 41 text

Tools • High performance HTTP server • Powerful debugger • Advanced analysers: Memory, CPU, energy, performance • Flamechart, DTrace, perf, call graph… • Record and replay

Slide 42

Slide 42 text

Internet of Things • Land PHP on new grounds, like on small devices • Require small energy consumption • Require stability • Require to compile the VM on a lot of platforms • Can currently target 47 platforms (aarch64, arm, armv7, asmjs, i386, i586, i686, mips, mips64, mips64el, mipsel, powerpc, powerpc64, powerpc64le, s390x, sparc64, wasm32, x86_64)

Slide 43

Slide 43 text

tagua.io 1. Organise, manipulate, and share data 2. Passive tools: Security audit, language conformance, advanced language analysis, transpiler, metrics, bots… 3. Active tools: Many visualisations, online debugger, record and replay… 4. Team specific tools: Boards, custom reports…

Slide 44

Slide 44 text

Be sustainable • Tagua VM is free, and open source (BSD) • Tagua VM will expose all the features, and tools for free • tagua.io is the platform to earn money, and to get better interfaces over tools’ outputs

Slide 45

Slide 45 text

You • We have a ton of ideas • We are doing this for PHP, and for you

Slide 46

Slide 46 text

How to help? • Share your fantasies, and your frustrations • Please, please, fill this pad • We are preparing 2 fund raisings • Build a company behind Tagua VM

Slide 47

Slide 47 text

–Mark Twain “They did not know it was impossible, so they did it”

Slide 48

Slide 48 text

To serve you Julien Bianchi @jubianchi Ivan Enderlin @mnt_io Sébastien Houzé @sebastienhouze

Slide 49

Slide 49 text

Thanks! ❤ @mnt_io @taguavm https://github.com/tagua-vm/ http://tagua.io/