From PHP to Machine Code - PHP UK conference 2016

From PHP to Machine Code - PHP UK conference 2016

What exactly happens when you run `php example.com`? I wanted to answer this, so I decided to build my own PHP interpreter. Let me tell you what I learned and what does fancy words like opcodes and bytecode cache mean.

With now two main rivals the standard PHP and HHVM from Facebook, there are plenty of choices to run your code fast. But they are both complicated projects, making it hard to understand the inner workings. Rather than relying on them to magically do the right thing, understand the principles of programming languages.

9b1dc79f9ca74e47f98ff5ad9b5c46f3?s=128

Juozas Kaziukėnas

February 19, 2016
Tweet

Transcript

  1. FROM PHP TO MACHINE CODE

  2. hello my name is @JUOKAZ

  3. $> PHP HELLOWORLD.PHP

  4. $> G++ HELLO.C -O HELLO $> ./HELLO

  5. COMPILERS TRANSLATE SOURCE CODE TO EXECUTABLE INSTRUCTIONS

  6. PHP DOES ALL THIS AT RUNTIME

  7. None
  8. EVERYONE SHOULD WRITE A COMPILER ONCE AND THEN NEVER USE

    IT
  9. PyHP https://github.com/juokaz/pyhp ALPHA

  10. Python + PHP

  11. PYHP • PHP interpreter capable of running “any” PHP program*

    • Written in Python • Very fast • * Supports most of the basic PHP, except objects
  12. PHP 6 $> pyhp unicode.php
 䩚-Ղ $> php unicode.php
 ?-?

    <?php $a = "䩚Ղ"; // Tokyo print $a[0] . '-' . $a[1];
  13. None
  14. EXECUTION LIFECYCLE Parse Compile to Opcodes Execute

  15. EXECUTION LIFECYCLE Parse Compile to Opcodes Execute Cached using opcode-cache

  16. PARSE (TOKENIZE) • Parse the source code into labels/tokens •

    Parser uses a grammar file defining the structure of programs (Zend/zend_language_parser.y) • Index all variables, functions for faster lookup • Turn tokens into a tree structure called AST (in PHP since 7.0)
  17. PYHP GRAMMAR expression : <variable> | <literal> ; assignmentexpression :

    expression >assignmentoperator< assignmentexpression | <expression> ; assignmentoperator : "=" | "\*=" | "\/=" | "\%=" | "\+=" | "\-=" | "<<=" | ">>=" | ">>>=" | "&=" | "^=" | "\|=" | ".=" ; ifstatement : ["if"] ["("] comparisonexpression [")"] statement ["else"] statement | ["if"] ["("] comparisonexpression [")"] statement ; statement : <block> | <assignmentexpression> [";"] | <ifstatement> | <returnstatement> [“;"] ;
  18. T_OPEN_TAG, <?php\n, 1 T_WHITESPACE, \n, 2 T_VARIABLE, $a, 3 T_WHITESPACE,

    , 3 = T_WHITESPACE, , 3 T_CONSTANT_ENCAPSED_STRING, "Hello world", 3 ; T_WHITESPACE, \n\n, 3 T_PRINT, print, 5 T_WHITESPACE, , 5 T_VARIABLE, $a, 5 <?php $a = 'Hello world'; print $a; token_get_all() php function
  19. T_CONSTANT_ENCAPSED_STRING, "Hello world", 3 ; T_WHITESPACE, \n\n, 3 T_PRINT, print,

    5 T_WHITESPACE, , 5 T_VARIABLE, $a, 5 ; $ast = [ Stmt_Assign( Variable(0, ’$a’), Scalar_String(‘Hello World’), ), Stmt_Echo([ Variable(0, $a) ]), ];
  20. NAMESPACES ARE GONE AT THIS STAGE, FULLY QUALIFIED CLASS NAMES

    ARE RESOLVED Interesting fact #1
  21. TRAITS ARE LITERALLY COPY AND PASTED INTO CLASSES WHICH USE

    THEM Interesting fact #2
  22. COMPILE TO OPCODES • Process tokens into flat list of

    opcodes • Opcode represents one operation in the VM, comparable to assembler commands • The result can be cached (“opcode cache”) • Collection of opcodes is called a bytecode • Install VLD extension to dump them or use http://3v4l.org/
  23. filename: /tmp/example.php function name: (null) number of ops: 4 compiled

    vars: !0 = $a line #* E I O op fetch ext return operands ------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 PRINT ~1 !0 2 FREE ~1 3 > RETURN 1 <?php $a = 'Hello world'; print $a;
  24. $bytecode = [ 0 => Assign(0, 'Hello World'), 1 =>

    Print(0), 2 => Free(), 3 => Return(1), ]; filename: /tmp/example.php function name: (null) number of ops: 4 compiled vars: !0 = $a line #* E I O op fetch ext return operands ------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 PRINT ~1 !0 2 FREE ~1 3 > RETURN 1
  25. filename: /tmp/example.php function name: (null) number of ops: 9 compiled

    vars: !0 = $a line #* E I O op fetch ext return operands --------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 IS_EQUAL ~1 !0, 'Wrong' 2 > JMPZ ~1, ->6 6 3 > PRINT ~2 'ERROR' 4 FREE ~2 7 5 > JMP ->8 8 6 > PRINT ~3 !0 7 FREE ~3 9 8 > > RETURN 1 <?php $a = 'Hello world'; if ($a == 'Wrong') { print ‘ERROR'; } else { print $a; }
  26. filename: /tmp/example.php function name: (null) number of ops: 9 compiled

    vars: !0 = $a line #* E I O op fetch ext return operands --------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 IS_EQUAL ~1 !0, 'Wrong' 2 > JMPZ ~1, ->6 6 3 > PRINT ~2 'ERROR' 4 FREE ~2 7 5 > JMP ->8 8 6 > PRINT ~3 !0 7 FREE ~3 9 8 > > RETURN 1 <?php $a = 'Hello world'; if ($a == 'Wrong') { print ‘ERROR'; } else { print $a; } If statement True branch Else branch
  27. EXECUTE (ZEND ENGINE) • Iterate over a list of opcodes

    • Each consumes/emits values, or jumps into a position • Values are stored in a stack or similar structure • Calling a function and running a script is almost the same • Zend Engine handles system level/memory management
  28. Bytecode <=> Machine code Zend VM <=> Linux

  29. $ast = [ Stmt_Assign( Variable(0, ’$a’), Scalar_String(‘Hello World’), ), Stmt_Echo([

    Variable(0, $a) ]), ]; Parse PHP file $bytecode = [ 0 => Assign(0, 'Hello World'), 1 => Print(0), 2 => Free(), 3 => Return(1), ]; Compile to bytecode Execute!
  30. EXECUTION LOOP function run(array $bytecode, array $frame) { $pc =

    0; while ($pc < count($bytecode)) { $op = $bytecode[$pc]; $result = $op->handle($frame); if (isset($result)) { return $result; } if ($op instanceof Jump) { $pc = $op->jump(); } else { $pc++; } } die('Bytecode missing a RETURN statement'); }
  31. OPCODES class Assign { function __construct($variable) { $this->name = $variable;

    } function handle($frame) { $value = $frame->popValue(); $frame->assignValue($this->name, $value); } } class Return { function handle($frame) { $value = $frame->popValue(); return $value; } }
  32. “CALL” OPCODE class Call { function __construct($function) { $this->func =

    $function; } function handle($frame) { $arguments = $frame->popValue(); $new_frame = new Frame($frame); $new_frame->setArguments($arguments); $result = $interpreter->run($this->func, $new_frame); $frame->pushValue($result); } }
  33. Parser parses every line of source code and creates a

    virtual machine-code known as bytecode. Bytecode gets executed by a Virtual Machine running low-level instructions for every opcode.
  34. None
  35. PHP IS SLOW?

  36. SPECIALIZED CODE IS FAST

  37. SPECIALIZED CODE IS FAST • C is the ultimate specialized

    code • PHP is slower than C because it has to support dynamic code • eval() is the king of dynamic code • Strict typing makes life easier for compilers
  38. AUTOMATIC SPECIALIZATION - JIT

  39. JIT • Just-in-time compiler (Standard PHP is a AOT, Ahead-of-time,

    compiler, HHVM is JIT) • Slower to start, likely very fast after N>1000 executions • Compiles code to machine code on execution, optimized for the host platform • Re-compile “hot” code based on runtime information
  40. CAN THIS BE OPTIMIZED? function simple() { $a = 0;

    for ($i = 0; $i < 1000000; $i++) $a++; $thisisanotherlongname = 0; for ($thisisalongname = 0; $thisisalongname < 1000000; $thisisalongname++) $thisisanotherlongname++; } simple();
  41. HOT LOOPS function simple() { $a = 0; for ($i

    = 0; $i < 1000000; $i++) $a++; $thisisanotherlongname = 0; for ($thisisalongname = 0; $thisisalongname < 1000000; $thisisalongname++) $thisisanotherlongname++; } simple(); PLUS: all variables are integers, no need to check variable type!
  42. PyHP

  43. PYHP • Uses the PyPy/RPython technology stack • JIT support

    • Unit tested • Compiles in 10min
  44. WHY? • PHP is written in C, hard to play

    around with • Hard to learn by looking at existing interpreters • Prototype any new feature in a matter of minutes
  45. THINGS TO TRY • PHPPHP - A PHP VM implementation

    in PHP
 https://github.com/ircmaxell/PHPPHP • PyHP - A PHP VM implementation in Python
 https://github.com/juokaz/pyhp • HHVM - High-peformance PHP interpreter with JIT
 http://hhvm.com/
  46. None
  47. WHAT’S NEXT?

  48. PyHP.JS

  49. (PYTHON + JAVASCRIPT) + PHP https://github.com/juokaz/pyhp.js

  50. None
  51. https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript

  52. QUESTIONS?

  53. THANKS! Juozas Kaziukėnas @juokaz