Tagua VM, a safe PHP virtual machine

Tagua VM, a safe PHP virtual machine

This presentation has been given at PHPTour'17 (AFUP, Nantes, France), http://event.afup.org/phptournantes2017/programme/?lang=en#2034.

During this talk, Tagua VM has been introduced as a new experimental PHP Virtual Machine.

Record on Youtube: https://www.youtube.com/watch?v=Ymy8qAEe0kQ (in French).

C9eaecd38c8fa34ce638520112017f9a?s=128

Ivan Enderlin

May 18, 2017
Tweet

Transcript

  1. None
  2. PHP • A multi-paradigm, simple, pragmatic language • Target the

    Web • Two major runtimes: • Zend Engine • HHVM
  3. PHP is critical Powers 80% of all sites on Internet

  4. Vulnerabilities • Almost 500 known • Almost 50 with a

    CVE score ≥ 9 over 10, e.g.: • Memory corruptions [1, 2, 3] • Errors in parsers [4, 5, 6, 7, 8] • Result in code execution or Denial Of Service • Same for Python or Java, not a language problem
  5. Why so many? • Because implementing such a language is

    hard • It connects with a lot of exotic technologies • Because written in C and C++
  6. PHP is good • The language is getting better and

    better • But the tooling is becoming more important than the language features
  7. Tagua VM An experimental PHP virtual machine Ivan Enderlin, May

    2017
  8. Hello, Tagua! Provide safety and high quality by removing large

    classes of vulnerabilities and thus avoid the cost of dramatic bugs
  9. Hello, Tagua! Provide modernity, new developer experience and state-of-the-art algorithms

    so performance
  10. Hello, Tagua! Provide a set of libraries that will compose

    the VM and that can be reused outside of the project (like the parser, analysers, extensions etc.)
  11. Hello, Tagua! 1. Safe and high quality, 2. Modernity, new

    developer experience, 3. Designed as a set of libraries to be reused Ambitious, isn’t it?
  12. Hello, Tagua! • Developed with Rust • A language that

    guarantees memory safety, threads without data races, zero-cost abstractions, minimal runtime and, as a bonus, efficient C bindings, in addition to being as fast as C • Developed with LLVM • A solid, state-of-the-art, research, widely used modular and reusable compiler and toolchains technologies
  13. Compiler architecture Front end Middle end Back end

  14. Front end Lexical analysis Syntactic analysis Abstract Syntax Tree Text

    input
  15. Lexical analysis Lexical analysis Text input Sequence of lexemes

  16. Lexical analysis Lexical analysis Text input Sequence of lexemes echo

    ‘hello’, ‘world’; echo ‘hello’ echo offset 0, length 4, … string_single_quoted offset 5, length 7, … semicolon offset 21, length 1, … comma offset 12, length 1, … string_single_quoted offset 14, length 7, … ‘world’ , ;
  17. Syntactic analysis Syntactic analysis Sequence of lexemes Grammar Lexical analysis

  18. Syntactic analysis Syntactic analysis Sequence of lexemes Grammar Lexical analysis

    statement: ( echo | … | … ) “;” echo: “echo” expression+ ”,” expression: string | … | … string: … Can the sequence be derived regarding the grammar?
  19. Abstract Syntax Tree Syntactic analysis Abstract Syntax Tree echo ‘hello’

    ‘world’ , ; echo ‘hello’ ‘world’
  20. In Tagua VM • Lexical and syntactic analyses are merged

    • 1 pass instead of 2 over all lexemes • Zero-copy in the parser, few allocations in the producers • Descriptive and contextual errors • Based on an extremely fast parser, nom • Safe parsing
  21. Parsing in PHP? • Sure, Hoa\Compiler • A grammar description

    language (PP) • LL(k) compiler-compiler
  22. Middle end Middle-level IR Low-level IR Various passes Abstract Syntax

    Tree
  23. Intermediate Representation • Text file is useless, a High-level IR

    (HIR) is required • Abstract Syntax Tree is a form of a HIR • A Middle-level IR (MIR) can be needed • Control Flow Graph is a form of a MIR • Emit a Low-level IR (LIR) for the back end
  24. Pass • Applies a transformation or a check on the

    IR, e.g.: • type inference • number overflow • data integrity • security vulnerabilities • remove dead branches • inline functions • constant folding • boolean and arithmetic simplifications • loop simplifications • tail recursion • …
  25. In Tagua VM • AST = HIR • CFG +

    simplified PHP = MIR • Unfold intrinsic constructions (loops, iterators, magic methods…) • Static single assignment • Monomorphisation • etc. • Should simplify transformation to LIR
  26. Simplified PHP foreach ($iterator as $key => $value) { process($value);

    } $iterator->rewind(); while ($iterator->valid()) { $key = $iterator->key(); $value = $iterator->current(); process($value); $iterator->next(); }
  27. Disclaimer • Not yet implemented • It has drawbacks, like

    it can be costly but: • It saves time for MIR to LIR transformation • Passes and optimisations are simpler • Could simplify caching • It is always about balancing the equation
  28. Back end Low-level IR Result Execution engine Another language

  29. Flavours • Execute the LIR directly, or a transformation of

    it into a native code ‣ Just-in-time compilation, interpret the LIR or compile it during the execution • Save or transform the LIR into another language (like native code, wasm, bytecode…) ‣ Ahead-of-time compilation, compile before the execution vs Offline compiler Online compiler
  30. Real world examples Source Opcode Execution Source Bytecode AOT compilation

    JIT compilation Execution Source IRs AOT compilation Execution later PHP Java Rust JIT compilation
  31. Execution engine • Must target the machine code • Must

    be aware of all the hardware features • x86, ARM, MIPS, SPARC… • 32, 64, 128, 256 bits • Number of CPU • Kind of RAM • SIMD (MMX, SSE, AVX, NEON…) • GPU (OpenCL) • Extremely difficult to do, and hard to test
  32. LLVM • A compiler infrastructure: • Low-level IR (LLVM IR)

    • Linker • Debugger (LLDB) • Execution engines • Much more…
  33. In Tagua VM • LIR = LLVM IR • Execution

    engine = LLVM execution engine • JIT engine is currently MCJIT, OrcJIT is coming • Debugger = LLDB
  34. Back end Lower-level IR Result Execution engine

  35. Back end LLVM IR Result OrcJIT

  36. None
  37. None
  38. –Automattic’s creed “I am in a marathon, not a sprint,

    and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.”
  39. High quality • Safety first • Unit test suites •

    Integration test suites • Documentation test suites • Continuously tested on macOS, Linux Ubuntu, and Windows • Everything is documented
  40. Developer experience • Be 100% compatible with the PHP Language

    Specification • Be ≃ 100% compatible with Zend Engine • Modernise the PHP development
  41. Tools • High performance HTTP server • Powerful debugger •

    Advanced analysers: Memory, CPU, energy, performance • Flamechart, DTrace, perf, call graph… • Record and replay
  42. Internet of Things • Land PHP on new grounds, like

    on small devices • Require small energy consumption • Require stability • Require to compile the VM on a lot of platforms • Can currently target 47 platforms (aarch64, arm, armv7, asmjs, i386, i586, i686, mips, mips64, mips64el, mipsel, powerpc, powerpc64, powerpc64le, s390x, sparc64, wasm32, x86_64)
  43. tagua.io 1. Organise, manipulate, and share data 2. Passive tools:

    Security audit, language conformance, advanced language analysis, transpiler, metrics, bots… 3. Active tools: Many visualisations, online debugger, record and replay… 4. Team specific tools: Boards, custom reports…
  44. Be sustainable • Tagua VM is free, and open source

    (BSD) • Tagua VM will expose all the features, and tools for free • tagua.io is the platform to earn money, and to get better interfaces over tools’ outputs
  45. You • We have a ton of ideas • We

    are doing this for PHP, and for you
  46. How to help? • Share your fantasies, and your frustrations

    • Please, please, fill this pad • We are preparing 2 fund raisings • Build a company behind Tagua VM
  47. –Mark Twain “They did not know it was impossible, so

    they did it”
  48. To serve you Julien Bianchi @jubianchi Ivan Enderlin @mnt_io Sébastien

    Houzé @sebastienhouze
  49. Thanks! ❤ @mnt_io @taguavm https://github.com/tagua-vm/ http://tagua.io/