Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tagua VM, a safe PHP virtual machine

Tagua VM, a safe PHP virtual machine

This presentation has been given at PHPTour'17 (AFUP, Nantes, France), http://event.afup.org/phptournantes2017/programme/?lang=en#2034.

During this talk, Tagua VM has been introduced as a new experimental PHP Virtual Machine.

Record on Youtube: https://www.youtube.com/watch?v=Ymy8qAEe0kQ (in French).

Ivan Enderlin

May 18, 2017
Tweet

More Decks by Ivan Enderlin

Other Decks in Programming

Transcript

  1. PHP • A multi-paradigm, simple, pragmatic language • Target the

    Web • Two major runtimes: • Zend Engine • HHVM
  2. Vulnerabilities • Almost 500 known • Almost 50 with a

    CVE score ≥ 9 over 10, e.g.: • Memory corruptions [1, 2, 3] • Errors in parsers [4, 5, 6, 7, 8] • Result in code execution or Denial Of Service • Same for Python or Java, not a language problem
  3. Why so many? • Because implementing such a language is

    hard • It connects with a lot of exotic technologies • Because written in C and C++
  4. PHP is good • The language is getting better and

    better • But the tooling is becoming more important than the language features
  5. Hello, Tagua! Provide safety and high quality by removing large

    classes of vulnerabilities and thus avoid the cost of dramatic bugs
  6. Hello, Tagua! Provide a set of libraries that will compose

    the VM and that can be reused outside of the project (like the parser, analysers, extensions etc.)
  7. Hello, Tagua! 1. Safe and high quality, 2. Modernity, new

    developer experience, 3. Designed as a set of libraries to be reused Ambitious, isn’t it?
  8. Hello, Tagua! • Developed with Rust • A language that

    guarantees memory safety, threads without data races, zero-cost abstractions, minimal runtime and, as a bonus, efficient C bindings, in addition to being as fast as C • Developed with LLVM • A solid, state-of-the-art, research, widely used modular and reusable compiler and toolchains technologies
  9. Lexical analysis Lexical analysis Text input Sequence of lexemes echo

    ‘hello’, ‘world’; echo ‘hello’ echo offset 0, length 4, … string_single_quoted offset 5, length 7, … semicolon offset 21, length 1, … comma offset 12, length 1, … string_single_quoted offset 14, length 7, … ‘world’ , ;
  10. Syntactic analysis Syntactic analysis Sequence of lexemes Grammar Lexical analysis

    statement: ( echo | … | … ) “;” echo: “echo” expression+ ”,” expression: string | … | … string: … Can the sequence be derived regarding the grammar?
  11. In Tagua VM • Lexical and syntactic analyses are merged

    • 1 pass instead of 2 over all lexemes • Zero-copy in the parser, few allocations in the producers • Descriptive and contextual errors • Based on an extremely fast parser, nom • Safe parsing
  12. Parsing in PHP? • Sure, Hoa\Compiler • A grammar description

    language (PP) • LL(k) compiler-compiler
  13. Intermediate Representation • Text file is useless, a High-level IR

    (HIR) is required • Abstract Syntax Tree is a form of a HIR • A Middle-level IR (MIR) can be needed • Control Flow Graph is a form of a MIR • Emit a Low-level IR (LIR) for the back end
  14. Pass • Applies a transformation or a check on the

    IR, e.g.: • type inference • number overflow • data integrity • security vulnerabilities • remove dead branches • inline functions • constant folding • boolean and arithmetic simplifications • loop simplifications • tail recursion • …
  15. In Tagua VM • AST = HIR • CFG +

    simplified PHP = MIR • Unfold intrinsic constructions (loops, iterators, magic methods…) • Static single assignment • Monomorphisation • etc. • Should simplify transformation to LIR
  16. Simplified PHP foreach ($iterator as $key => $value) { process($value);

    } $iterator->rewind(); while ($iterator->valid()) { $key = $iterator->key(); $value = $iterator->current(); process($value); $iterator->next(); }
  17. Disclaimer • Not yet implemented • It has drawbacks, like

    it can be costly but: • It saves time for MIR to LIR transformation • Passes and optimisations are simpler • Could simplify caching • It is always about balancing the equation
  18. Flavours • Execute the LIR directly, or a transformation of

    it into a native code ‣ Just-in-time compilation, interpret the LIR or compile it during the execution • Save or transform the LIR into another language (like native code, wasm, bytecode…) ‣ Ahead-of-time compilation, compile before the execution vs Offline compiler Online compiler
  19. Real world examples Source Opcode Execution Source Bytecode AOT compilation

    JIT compilation Execution Source IRs AOT compilation Execution later PHP Java Rust JIT compilation
  20. Execution engine • Must target the machine code • Must

    be aware of all the hardware features • x86, ARM, MIPS, SPARC… • 32, 64, 128, 256 bits • Number of CPU • Kind of RAM • SIMD (MMX, SSE, AVX, NEON…) • GPU (OpenCL) • Extremely difficult to do, and hard to test
  21. LLVM • A compiler infrastructure: • Low-level IR (LLVM IR)

    • Linker • Debugger (LLDB) • Execution engines • Much more…
  22. In Tagua VM • LIR = LLVM IR • Execution

    engine = LLVM execution engine • JIT engine is currently MCJIT, OrcJIT is coming • Debugger = LLDB
  23. –Automattic’s creed “I am in a marathon, not a sprint,

    and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.”
  24. High quality • Safety first • Unit test suites •

    Integration test suites • Documentation test suites • Continuously tested on macOS, Linux Ubuntu, and Windows • Everything is documented
  25. Developer experience • Be 100% compatible with the PHP Language

    Specification • Be ≃ 100% compatible with Zend Engine • Modernise the PHP development
  26. Tools • High performance HTTP server • Powerful debugger •

    Advanced analysers: Memory, CPU, energy, performance • Flamechart, DTrace, perf, call graph… • Record and replay
  27. Internet of Things • Land PHP on new grounds, like

    on small devices • Require small energy consumption • Require stability • Require to compile the VM on a lot of platforms • Can currently target 47 platforms (aarch64, arm, armv7, asmjs, i386, i586, i686, mips, mips64, mips64el, mipsel, powerpc, powerpc64, powerpc64le, s390x, sparc64, wasm32, x86_64)
  28. tagua.io 1. Organise, manipulate, and share data 2. Passive tools:

    Security audit, language conformance, advanced language analysis, transpiler, metrics, bots… 3. Active tools: Many visualisations, online debugger, record and replay… 4. Team specific tools: Boards, custom reports…
  29. Be sustainable • Tagua VM is free, and open source

    (BSD) • Tagua VM will expose all the features, and tools for free • tagua.io is the platform to earn money, and to get better interfaces over tools’ outputs
  30. You • We have a ton of ideas • We

    are doing this for PHP, and for you
  31. How to help? • Share your fantasies, and your frustrations

    • Please, please, fill this pad • We are preparing 2 fund raisings • Build a company behind Tagua VM