From PHP to Machine Code - PHPCon Japan 2015

From PHP to Machine Code - PHPCon Japan 2015

What exactly happens when you run php example.com? I wanted to answer this, so I decided to build my own PHP interpreter. Let me tell you what I learned and what does fancy words like opcodes and bytecode cache mean.

With now two main rivals the standard PHP and HHVM from Facebook, there are plenty of choices to run your code fast. But they are both complicated projects, making it hard to understand the inner workings. Rather than relying on them to magically do the right thing, understand the principles of programming languages.

9b1dc79f9ca74e47f98ff5ad9b5c46f3?s=128

Juozas Kaziukėnas

October 03, 2015
Tweet

Transcript

  1. FROM PHP TO MACHINE CODE

  2. hello my name is @JUOKAZ

  3. Joe + Japan

  4. $> PHP HELLOWORLD.PHP

  5. $> G++ HELLO.C -O HELLO $> ./HELLO

  6. None
  7. EVERYONE SHOULD WRITE A COMPILER ONCE AND THEN NEVER USE

    IT
  8. PyHP https://github.com/juokaz/pyhp ALPHA

  9. Python + PHP

  10. PYHP • PHP interpreter capable of running “any” PHP program*

    • Written in Python • Very fast • * Supports most of the basic PHP, except objects
  11. PHP 6 $> ./pyhp unicode.php
 䩚-Ղ $> php unicode.php
 ?-?

    <?php $a = "䩚Ղ"; print $a[0] . '-' . $a[1] . "\n";
  12. None
  13. EXECUTION LIFECYCLE Parse Compile to Execute

  14. EXECUTION LIFECYCLE Parse Compile to Execute Cached using opcode-cache

  15. PARSE (TOKENIZE) • Parse the source code into labels/tokens •

    Parser uses a grammar file defining the structure of programs (Zend/zend_language_parser.y) • Index all variables, functions for faster lookup • Turn tokens into a tree structure called AST (in PHP since 7.0)
  16. PYHP GRAMMAR expression : <variable> | <literal> ; assignmentexpression :

    expression >assignmentoperator< assignmentexpression | <expression> ; assignmentoperator : "=" | "\*=" | "\/=" | "\%=" | "\+=" | "\-=" | "<<=" | ">>=" | ">>>=" | "&=" | "^=" | "\|=" | ".=" ; ifstatement : ["if"] ["("] comparisonexpression [")"] statement ["else"] statement | ["if"] ["("] comparisonexpression [")"] statement ; statement : <block> | <assignmentexpression> [";"] | <ifstatement> | <returnstatement> [“;"] ;
  17. T_OPEN_TAG, <?php\n, 1 T_WHITESPACE, \n, 2 T_VARIABLE, $a, 3 T_WHITESPACE,

    , 3 = T_WHITESPACE, , 3 T_CONSTANT_ENCAPSED_STRING, "Hello world", 3 ; T_WHITESPACE, \n\n, 3 T_PRINT, print, 5 T_WHITESPACE, , 5 T_VARIABLE, $a, 5 <?php $a = 'Hello world'; print $a; token_get_all() php function
  18. T_CONSTANT_ENCAPSED_STRING, "Hello world", 3 ; T_WHITESPACE, \n\n, 3 T_PRINT, print,

    5 T_WHITESPACE, , 5 T_VARIABLE, $a, 5 ; $ast = [ Stmt_Assign( Variable(0, ’$a’), Scalar_String(‘Hello World’), ), Stmt_Echo([ Variable(0, $a) ]), ];
  19. COMPILE TO OPCODES • Process tokens into flat list of

    opcodes • Opcode represents one operation in the VM, comparable to assembler commands • The result can be cached (“opcode cache”) • Collection of opcodes is called a bytecode • Install VLD extension to dump them or use http://3v4l.org/
  20. filename: /tmp/example.php function name: (null) number of ops: 4 compiled

    vars: !0 = $a line #* E I O op fetch ext return operands ------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 PRINT ~1 !0 2 FREE ~1 3 > RETURN 1 <?php $a = 'Hello world'; print $a;
  21. $bytecode = [ 0 => Assign(0, 'Hello World'), 1 =>

    Print(0), 2 => Free(), 3 => Return(1), ]; filename: /tmp/example.php function name: (null) number of ops: 4 compiled vars: !0 = $a line #* E I O op fetch ext return operands ------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 PRINT ~1 !0 2 FREE ~1 3 > RETURN 1
  22. filename: /tmp/example.php function name: (null) number of ops: 9 compiled

    vars: !0 = $a line #* E I O op fetch ext return operands --------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 IS_EQUAL ~1 !0, 'Wrong' 2 > JMPZ ~1, ->6 6 3 > PRINT ~2 'ERROR' 4 FREE ~2 7 5 > JMP ->8 8 6 > PRINT ~3 !0 7 FREE ~3 9 8 > > RETURN 1 <?php $a = 'Hello world'; if ($a == 'Wrong') { print ‘ERROR'; } else { print $a; }
  23. filename: /tmp/example.php function name: (null) number of ops: 9 compiled

    vars: !0 = $a line #* E I O op fetch ext return operands --------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 'Hello+world' 5 1 IS_EQUAL ~1 !0, 'Wrong' 2 > JMPZ ~1, ->6 6 3 > PRINT ~2 'ERROR' 4 FREE ~2 7 5 > JMP ->8 8 6 > PRINT ~3 !0 7 FREE ~3 9 8 > > RETURN 1 <?php $a = 'Hello world'; if ($a == 'Wrong') { print ‘ERROR'; } else { print $a; } If statement True branch Else branch
  24. EXECUTE (ZEND ENGINE) • Iterate over a list of opcodes

    • Each consumes/emits values, or jumps into a position • Values are stored in a stack or similar structure • Calling a function and running a script is almost the same • Zend Engine handles system level/memory management
  25. $ast = [ Stmt_Assign( Variable(0, ’$a’), Scalar_String(‘Hello World’), ), Stmt_Echo([

    Variable(0, $a) ]), ]; Parse PHP file $bytecode = [ 0 => Assign(0, 'Hello World'), 1 => Print(0), 2 => Free(), 3 => Return(1), ]; Compile to bytecode Execute!
  26. EXECUTION LOOP function run(array $bytecode, array $frame) { $pc =

    0; while ($pc < count($bytecode)) { $op = $bytecode[$bc]; $result = $op->handle($frame); if (isset($result)) { return $result; } if ($op instanceof Jump) { $pc = $op->jump(); } else { $pc++; } } die('Bytecode missing a RETURN statement'); }
  27. OPCODES class Assign { function __construct($variable) { $this->name = $variable;

    } function handle($frame) { $value = $frame->popValue(); $frame->assignValue($this->name, $value); } } class Return { function handle($frame) { $value = $frame->popValue(); return $value; } }
  28. “CALL” OPCODE class Call { function __construct($function) { $this->func =

    $function; } function handle($frame) { $arguments = $frame->popValue(); $new_frame = new Frame($frame); $new_frame->setArguments($arguments); $result = $interpreter->run($this->func, $new_frame); $frame->pushValue($result); } }
  29. None
  30. PHP IS SLOW?

  31. SPECIALIZED CODE IS FAST

  32. SPECIALIZED CODE IS FAST • C is the ultimate specialized

    code • PHP is slower than C because it has to support dynamic code • eval() is the king of dynamic code • Strict typing makes life easier for compilers
  33. AUTOMATIC SPECIALIZATION - JIT

  34. JIT • Just-in-time compiler (Standard PHP is a AOT, Ahead-

    of-time, compiler, HHVM is JIT) • Slower to start, likely very fast after N>1000 executions • Compiles code to machine code on execution, optimized for the host platform • Re-compile “hot” code based on runtime information
  35. CAN THIS BE OPTIMIZED? function simple() { $a = 0;

    for ($i = 0; $i < 1000000; $i++) $a++; $thisisanotherlongname = 0; for ($thisisalongname = 0; $thisisalongname < 1000000; $thisisalongname++) $thisisanotherlongname++; } simple();
  36. HOT LOOPS function simple() { $a = 0; for ($i

    = 0; $i < 1000000; $i++) $a++; $thisisanotherlongname = 0; for ($thisisalongname = 0; $thisisalongname < 1000000; $thisisalongname++) $thisisanotherlongname++; } simple(); PLUS: all variables are integers, no need to check variable type!
  37. PyHP

  38. PYHP • Uses the PyPy/RPython technology stack • JIT support

    • Unit tested • Compiles in 10min
  39. WHY? • PHP is written in C, hard to play

    around with • Hard to learn by looking at existing interpreters • Prototype any new feature in a matter of minutes
  40. THINGS TO TRY • PHPPHP - A PHP VM implementation

    in PHP (https://github.com/ircmaxell/PHPPHP) • PyHP - A PHP VM implementation in Python (https://github.com/juokaz/pyhp) • HHVM - High-peformance PHP interpreter with JIT (http://hhvm.com/)
  41. None
  42. WHAT’S NEXT?

  43. PyHP.JS

  44. (PYTHON + JAVASCRIPT) + PHP https://github.com/juokaz/pyhp.js

  45. None
  46. https://www.destroyallsoftware.com/talks/the-birth-and- death-of-javascript

  47. QUESTIONS?

  48. THANKS! Juozas Kaziukėnas @juokaz