Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tagua VM, a safe PHP virtual machine

Tagua VM, a safe PHP virtual machine

This presentation has been given at PHPTour'17 (AFUP, Nantes, France), http://event.afup.org/phptournantes2017/programme/?lang=en#2034.

During this talk, Tagua VM has been introduced as a new experimental PHP Virtual Machine.

Record on Youtube: https://www.youtube.com/watch?v=Ymy8qAEe0kQ (in French).

Ivan Enderlin

May 18, 2017
Tweet

More Decks by Ivan Enderlin

Other Decks in Programming

Transcript

  1. View Slide

  2. PHP
    • A multi-paradigm, simple, pragmatic language
    • Target the Web
    • Two major runtimes:
    • Zend Engine
    • HHVM

    View Slide

  3. PHP is critical
    Powers 80% of all sites on Internet

    View Slide

  4. Vulnerabilities
    • Almost 500 known
    • Almost 50 with a CVE score ≥ 9 over 10, e.g.:
    • Memory corruptions [1, 2, 3]
    • Errors in parsers [4, 5, 6, 7, 8]
    • Result in code execution or Denial Of Service
    • Same for Python or Java, not a language problem

    View Slide

  5. Why so many?
    • Because implementing such a language is hard
    • It connects with a lot of exotic technologies
    • Because written in C and C++

    View Slide

  6. PHP is good
    • The language is getting better and better
    • But the tooling is becoming more important than
    the language features

    View Slide

  7. Tagua VM
    An experimental PHP virtual machine
    Ivan Enderlin, May 2017

    View Slide

  8. Hello, Tagua!
    Provide safety and high quality by removing large
    classes of vulnerabilities and thus avoid the cost of
    dramatic bugs

    View Slide

  9. Hello, Tagua!
    Provide modernity, new developer experience and
    state-of-the-art algorithms so performance

    View Slide

  10. Hello, Tagua!
    Provide a set of libraries that will compose the VM
    and that can be reused outside of the project (like the
    parser, analysers, extensions etc.)

    View Slide

  11. Hello, Tagua!
    1. Safe and high quality,
    2. Modernity, new developer experience,
    3. Designed as a set of libraries to be reused
    Ambitious, isn’t it?

    View Slide

  12. Hello, Tagua!
    • Developed with Rust
    • A language that guarantees memory safety, threads without data races,
    zero-cost abstractions, minimal runtime and, as a bonus, efficient C
    bindings, in addition to being as fast as C
    • Developed with LLVM
    • A solid, state-of-the-art, research, widely used modular and reusable
    compiler and toolchains technologies

    View Slide

  13. Compiler architecture
    Front end Middle end Back end

    View Slide

  14. Front end
    Lexical analysis
    Syntactic analysis
    Abstract Syntax Tree
    Text input

    View Slide

  15. Lexical analysis
    Lexical analysis
    Text input
    Sequence of lexemes

    View Slide

  16. Lexical analysis
    Lexical analysis
    Text input
    Sequence of lexemes
    echo ‘hello’, ‘world’;
    echo
    ‘hello’
    echo
    offset 0, length 4, …
    string_single_quoted
    offset 5, length 7, …
    semicolon
    offset 21, length 1, …
    comma
    offset 12, length 1, …
    string_single_quoted
    offset 14, length 7, …
    ‘world’
    ,
    ;

    View Slide

  17. Syntactic analysis
    Syntactic analysis
    Sequence of lexemes
    Grammar
    Lexical analysis

    View Slide

  18. Syntactic analysis
    Syntactic analysis
    Sequence of lexemes
    Grammar
    Lexical analysis
    statement:
    ( echo | … | … ) “;”
    echo:
    “echo” expression+
    ”,”
    expression:
    string | … | …
    string: …
    Can the sequence
    be derived regarding
    the grammar?

    View Slide

  19. Abstract Syntax Tree
    Syntactic analysis
    Abstract Syntax Tree
    echo ‘hello’ ‘world’
    , ;
    echo
    ‘hello’ ‘world’

    View Slide

  20. In Tagua VM
    • Lexical and syntactic analyses are merged
    • 1 pass instead of 2 over all lexemes
    • Zero-copy in the parser, few allocations in the
    producers
    • Descriptive and contextual errors
    • Based on an extremely fast parser, nom
    • Safe parsing

    View Slide

  21. Parsing in PHP?
    • Sure, Hoa\Compiler
    • A grammar description language (PP)
    • LL(k) compiler-compiler

    View Slide

  22. Middle end
    Middle-level IR
    Low-level IR
    Various passes
    Abstract Syntax Tree

    View Slide

  23. Intermediate Representation
    • Text file is useless, a High-level IR (HIR) is required
    • Abstract Syntax Tree is a form of a HIR
    • A Middle-level IR (MIR) can be needed
    • Control Flow Graph is a form of a MIR
    • Emit a Low-level IR (LIR) for the back end

    View Slide

  24. Pass
    • Applies a transformation or a check on the IR, e.g.:
    • type inference
    • number overflow
    • data integrity
    • security vulnerabilities
    • remove dead branches
    • inline functions
    • constant folding
    • boolean and arithmetic
    simplifications
    • loop simplifications
    • tail recursion
    • …

    View Slide

  25. In Tagua VM
    • AST = HIR
    • CFG + simplified PHP = MIR
    • Unfold intrinsic constructions (loops, iterators, magic
    methods…)
    • Static single assignment
    • Monomorphisation
    • etc.
    • Should simplify transformation to LIR

    View Slide

  26. Simplified PHP
    foreach ($iterator as $key => $value) {
    process($value);
    }
    $iterator->rewind();
    while ($iterator->valid()) {
    $key = $iterator->key();
    $value = $iterator->current();
    process($value);
    $iterator->next();
    }

    View Slide

  27. Disclaimer
    • Not yet implemented
    • It has drawbacks, like it can be costly but:
    • It saves time for MIR to LIR transformation
    • Passes and optimisations are simpler
    • Could simplify caching
    • It is always about balancing the equation

    View Slide

  28. Back end
    Low-level IR
    Result
    Execution engine Another language

    View Slide

  29. Flavours
    • Execute the LIR directly,
    or a transformation of it
    into a native code
    ‣ Just-in-time compilation,
    interpret the LIR or
    compile it during the
    execution
    • Save or transform the LIR
    into another language
    (like native code, wasm,
    bytecode…)
    ‣ Ahead-of-time
    compilation, compile
    before the execution
    vs
    Offline compiler
    Online compiler

    View Slide

  30. Real world examples
    Source
    Opcode
    Execution
    Source
    Bytecode
    AOT compilation
    JIT compilation
    Execution
    Source
    IRs
    AOT compilation
    Execution
    later
    PHP Java Rust
    JIT compilation

    View Slide

  31. Execution engine
    • Must target the machine code
    • Must be aware of all the hardware features
    • x86, ARM, MIPS, SPARC…
    • 32, 64, 128, 256 bits
    • Number of CPU
    • Kind of RAM
    • SIMD (MMX, SSE, AVX,
    NEON…)
    • GPU (OpenCL)
    • Extremely difficult to do, and hard to test

    View Slide

  32. LLVM
    • A compiler infrastructure:
    • Low-level IR (LLVM IR)
    • Linker
    • Debugger (LLDB)
    • Execution engines
    • Much more…

    View Slide

  33. In Tagua VM
    • LIR = LLVM IR
    • Execution engine = LLVM execution engine
    • JIT engine is currently MCJIT, OrcJIT is coming
    • Debugger = LLDB

    View Slide

  34. Back end
    Lower-level IR
    Result
    Execution engine

    View Slide

  35. Back end
    LLVM IR
    Result
    OrcJIT

    View Slide

  36. View Slide

  37. View Slide

  38. –Automattic’s creed
    “I am in a marathon, not a sprint, and no matter
    how far away the goal is, the only way to get
    there is by putting one foot in front of another
    every day. Given time, there is no problem that’s
    insurmountable.”

    View Slide

  39. High quality
    • Safety first
    • Unit test suites
    • Integration test suites
    • Documentation test suites
    • Continuously tested on macOS, Linux Ubuntu, and
    Windows
    • Everything is documented

    View Slide

  40. Developer experience
    • Be 100% compatible with the PHP Language
    Specification
    • Be ≃ 100% compatible with Zend Engine
    • Modernise the PHP development

    View Slide

  41. Tools
    • High performance HTTP server
    • Powerful debugger
    • Advanced analysers: Memory, CPU, energy,
    performance
    • Flamechart, DTrace, perf, call graph…
    • Record and replay

    View Slide

  42. Internet of Things
    • Land PHP on new grounds, like on small devices
    • Require small energy consumption
    • Require stability
    • Require to compile the VM on a lot of platforms
    • Can currently target 47 platforms (aarch64, arm, armv7, asmjs,
    i386, i586, i686, mips, mips64, mips64el, mipsel, powerpc, powerpc64, powerpc64le, s390x,
    sparc64, wasm32, x86_64)

    View Slide

  43. tagua.io
    1. Organise, manipulate, and share data
    2. Passive tools: Security audit, language conformance, advanced
    language analysis, transpiler, metrics, bots…
    3. Active tools: Many visualisations, online debugger, record and
    replay…
    4. Team specific tools: Boards, custom reports…

    View Slide

  44. Be sustainable
    • Tagua VM is free, and open source (BSD)
    • Tagua VM will expose all the features, and tools for
    free
    • tagua.io is the platform to earn money, and to get
    better interfaces over tools’ outputs

    View Slide

  45. You
    • We have a ton of ideas
    • We are doing this for PHP, and for you

    View Slide

  46. How to help?
    • Share your fantasies, and your frustrations
    • Please, please, fill this pad
    • We are preparing 2 fund raisings
    • Build a company behind Tagua VM

    View Slide

  47. –Mark Twain
    “They did not know it was impossible,
    so they did it”

    View Slide

  48. To serve you
    Julien Bianchi
    @jubianchi
    Ivan Enderlin
    @mnt_io
    Sébastien Houzé
    @sebastienhouze

    View Slide

  49. Thanks!

    @mnt_io
    @taguavm https://github.com/tagua-vm/
    http://tagua.io/

    View Slide