$30 off During Our Annual Pro Sale. View Details »

compiler_-_php010.pdf

 compiler_-_php010.pdf

Joshua Thijssen

September 08, 2016
Tweet

More Decks by Joshua Thijssen

Other Decks in Technology

Transcript

  1. 1
    Joshua Thijssen
    jaytaph
    an introduction into compilers, interpreters and JIT
    From source to code

    View Slide

  2. Computers don't understand:
    2
    http://blog.ruslans.com/2012_11_01_archive.html

    View Slide

  3. 56e9 5200 4641 4c46 5245 0020 0102 0001
    e002 4000 f00b 0009 0012 0002 0000 0000
    0000 0000 0000 ef29 adbe 52de 4641 4c46
    5245 2020 2020 4146 3154 2032 2020 6152
    6666 656c 4f72 3a53 0020 414e 454d 2053
    2020 4144 0054 4900 fa37 c031 d88e c08e
    d08e 00bc fb7c 3ebe e87c 0117 00bb b880
    0001 12bf e800 00a3 00bb b8a4 0013 0ebf
    e800 0097 bffc a400 4abe b97c 000b f357
    5fa6 0d74 c781 0020 ff81 c000 ea76 fde9
    8bff 1c45 55a3 bb7c c000 458b 501a abe8
    5800 c381 0200 8953 d1c3 01eb 8bc3 0097
    5b80 01a9 7500 8107 ffe2 e90f 0003 eac1
    8104 f0fa 890f 72d0 31d4 cdc0 001a 5716
    e87c 0088 ff25 8900 bec3 c000 f789 3e03
    7c55 0ab4 2588 cbfe 0f74 39ac 7cfe be03
    c000 0a3c f074 f1e9 56ff b046 3a0a 7504
    c6f9 0004 e85e 006b fde9 50ff 0ae8 5800
    c381 0200 4f40 f375 89c3 81e5 08ec be00
    0012 d231 f6f7 c2fe 5688 beff 0002 d231
    f6f7 5688 89fe fc46 01b0 6e8a 8afc fe76
    4e8a b2ff b400 cd02 8913 c3ec 022d 3100
    b1c9 f701 05e1 0021 bee8 c3ff 57a1 ba7c
    8405 e2f7 063b 7c57 0375 d488 a340 7c57
    d089 acc3 003c 0a74 0eb4 07bb cd00 e910
    fff1 90c3 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 aa55
    3

    View Slide

  4. 56e9 5200 4641 4c46 5245 0020 0102 0001
    e002 4000 f00b 0009 0012 0002 0000 0000
    0000 0000 0000 ef29 adbe 52de 4641 4c46
    5245 2020 2020 4146 3154 2032 2020 6152
    6666 656c 4f72 3a53 0020 414e 454d 2053
    2020 4144 0054 4900 fa37 c031 d88e c08e
    d08e 00bc fb7c 3ebe e87c 0117 00bb b880
    0001 12bf e800 00a3 00bb b8a4 0013 0ebf
    e800 0097 bffc a400 4abe b97c 000b f357
    5fa6 0d74 c781 0020 ff81 c000 ea76 fde9
    8bff 1c45 55a3 bb7c c000 458b 501a abe8
    5800 c381 0200 8953 d1c3 01eb 8bc3 0097
    5b80 01a9 7500 8107 ffe2 e90f 0003 eac1
    8104 f0fa 890f 72d0 31d4 cdc0 001a 5716
    e87c 0088 ff25 8900 bec3 c000 f789 3e03
    7c55 0ab4 2588 cbfe 0f74 39ac 7cfe be03
    c000 0a3c f074 f1e9 56ff b046 3a0a 7504
    c6f9 0004 e85e 006b fde9 50ff 0ae8 5800
    c381 0200 4f40 f375 89c3 81e5 08ec be00
    0012 d231 f6f7 c2fe 5688 beff 0002 d231
    f6f7 5688 89fe fc46 01b0 6e8a 8afc fe76
    4e8a b2ff b400 cd02 8913 c3ec 022d 3100
    b1c9 f701 05e1 0021 bee8 c3ff 57a1 ba7c
    8405 e2f7 063b 7c57 0375 d488 a340 7c57
    d089 acc3 003c 0a74 0eb4 07bb cd00 e910
    fff1 90c3 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 aa55
    3

    View Slide

  5. 4
    mov ax, [di + 0x1A] ; Here starts FAT clusters
    loadClusters:
    push ax
    call ReadCluster
    pop ax
    add bx, 512 ; Next sector
    push bx
    mov bx, ax
    shr bx, 1
    add bx, ax
    mov dx, [0x8000 + bx]
    pop bx
    test ax, 0x01
    jnz oddCluster
    evenCluster:
    and dx, 0x0FFF
    jmp testCluster
    oddCluster:
    shr dx, 4
    testCluster:
    cmp dx, 0xFF0
    mov ax, dx
    jb loadClusters
    https://github.com/domcode/rafflers/blob/master/jaytaph-bootsector-asm/raffler.S

    View Slide

  6. 1 static void out_string(conn *c, const char *str) {
    2 size_t len;
    3
    4 assert(c != NULL);
    5
    6 if (c->noreply) {
    7 if (settings.verbose > 1)
    8 fprintf(stderr, ">%d NOREPLY %s\n", c->sfd, str);
    9 c->noreply = false;
    10 conn_set_state(c, conn_new_cmd);
    11 return;
    12 }
    13
    14 if (settings.verbose > 1)
    15 fprintf(stderr, ">%d %s\n", c->sfd, str);
    16
    17 /* Nuke a partial output... */
    18 c->msgcurr = 0;
    19 c->msgused = 0;
    20 c->iovused = 0;
    21 add_msghdr(c);
    22
    23 len = strlen(str);
    24 if ((len + 2) > c->wsize) {
    25 /* ought to be always enough. just fail for simplicity */
    26 str = "SERVER_ERROR output line too long";
    27 len = strlen(str);
    28 }
    29
    30 memcpy(c->wbuf, str, len);
    31 memcpy(c->wbuf + len, "\r\n", 2);
    5
    https://github.com/memcached/memcached/blob/master/memcached.c

    View Slide

  7. 1 2
    3 if (! isset($_POST['email'])) {
    4 die("Need mail");
    5 }
    6
    7 $email_to = "[email protected]";
    8 $email_subject = "subject line";
    9
    10 // What could possibly go wrong here...
    11 $email_message = $_POST['email'];
    12
    13 @mail($email_to, $email_subject, $email_message);
    6

    View Slide

  8. Let's write a language
    (in 5 minutes or less)
    7

    View Slide

  9. 8
    set a 1
    set b 2
    add a b c
    print c

    View Slide

  10. 8
    set a 1
    set b 2
    add a b c
    print c
    operator

    View Slide

  11. 8
    set a 1
    set b 2
    add a b c
    print c
    operator operands

    View Slide

  12. 8
    set a 1
    set b 2
    add a b c
    print c
    1 2
    3 $vars = array();
    4
    5 $lines = file("example.010");
    6
    7 foreach ($lines as $line) {
    8 $line = explode(" ", trim($line));
    9 switch ($line[0]) {
    10 case "set" :
    11 $vars[$line[1]] = $line[2];
    12 break;
    13 case "add" :
    14 $vars[$line[3]] = $vars[$line[1]] + $vars[$line[2]];
    15 break;
    16 case "print" :
    17 echo $vars[$line[1]];
    18 break;
    19 }
    20 }
    operator operands

    View Slide

  13. 9
    Interpreter

    View Slide

  14. 10

    View Slide

  15. 10
    ➡ Runs directly from the source.

    View Slide

  16. 10
    ➡ Runs directly from the source.
    ➡ Platform agnostic.

    View Slide

  17. 10
    ➡ Runs directly from the source.
    ➡ Platform agnostic.
    ➡ Runtime checks.

    View Slide

  18. 10
    ➡ Runs directly from the source.
    ➡ Platform agnostic.
    ➡ Runtime checks.
    ➡ Slow.

    View Slide

  19. 10
    ➡ Runs directly from the source.
    ➡ Platform agnostic.
    ➡ Runtime checks.
    ➡ Slow.
    ➡ Every instruction interpreted over and over
    again.

    View Slide

  20. 11
    PHP is an
    interpreted language
    (for now)

    View Slide

  21. What if:
    we could convert our
    source code into
    machine code?
    12

    View Slide

  22. Compiler
    13

    View Slide

  23. Pro
    14

    View Slide

  24. Pro
    14
    ➡ Compiles source into machine code.

    View Slide

  25. Pro
    14
    ➡ Compiles source into machine code.
    ➡ Runs very fast (no "layers" in between).

    View Slide

  26. Pro
    14
    ➡ Compiles source into machine code.
    ➡ Runs very fast (no "layers" in between).
    ➡ No need for the source code.

    View Slide

  27. Pro
    14
    ➡ Compiles source into machine code.
    ➡ Runs very fast (no "layers" in between).
    ➡ No need for the source code.
    ➡ Optimized for the running platform.

    View Slide

  28. Cons
    15

    View Slide

  29. Cons
    ➡ Can ONLY run on the compiled
    architecture (x64, arm, sparc etc).
    15

    View Slide

  30. Cons
    ➡ Can ONLY run on the compiled
    architecture (x64, arm, sparc etc).
    ➡ A change in the source code means a
    recompilation is needed.
    15

    View Slide

  31. Cons
    ➡ Can ONLY run on the compiled
    architecture (x64, arm, sparc etc).
    ➡ A change in the source code means a
    recompilation is needed.

    15

    View Slide

  32. 16

    View Slide

  33. 17
    http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

    View Slide

  34. This is why sane people do
    not write compilers*
    18
    * I'm currently writing a compiler.

    View Slide

  35. 19

    View Slide

  36. Hybrid system
    20

    View Slide

  37. 21

    View Slide

  38. 21
    ➡ We compile source code to an
    intermediate code.

    View Slide

  39. 21
    ➡ We compile source code to an
    intermediate code.
    ➡ Run this intermediate code in a specialized
    interpreter.

    View Slide

  40. 21
    ➡ We compile source code to an
    intermediate code.
    ➡ Run this intermediate code in a specialized
    interpreter.
    ➡ Faster interpretation.

    View Slide

  41. 22

    View Slide

  42. 22
    function foo() {
    $i = 1;
    $i = $i + 1;
    return $i;
    }
    line #* E I O op ext return operands
    ---------------------------------------------------
    4 0 E > ASSIGN !0, 1
    5 1 ADD ~2 !0, 1
    2 ASSIGN !0, ~2
    6 3 > RETURN !0
    7 4* > RETURN null
    PHP

    View Slide

  43. 22
    function foo() {
    $i = 1;
    $i = $i + 1;
    return $i;
    }
    line #* E I O op ext return operands
    ---------------------------------------------------
    4 0 E > ASSIGN !0, 1
    5 1 ADD ~2 !0, 1
    2 ASSIGN !0, ~2
    6 3 > RETURN !0
    7 4* > RETURN null
    PHP
    def foo():
    i = 1
    i = i + 1
    return i
    4 0 LOAD_CONST 1 (1)
    3 STORE_FAST 0 (i)
    5 6 LOAD_FAST 0 (i)
    9 LOAD_CONST 1 (1)
    12 BINARY_ADD
    13 STORE_FAST 0 (i)
    6 16 LOAD_FAST 0 (i)
    19 RETURN_VALUE
    PYTHON

    View Slide

  44. 23
    compile PHP to
    bytecode
    Run PHP bytecode
    read PHP file into
    memory

    View Slide

  45. 24
    compile PHP to
    bytecode
    Run PHP bytecode
    read PHP file into
    memory

    View Slide

  46. 24
    compile PHP to
    bytecode
    Run PHP bytecode
    read PHP file into
    memory
    OPCache

    View Slide

  47. 25
    compile PHP to
    bytecode
    read PHP file into
    memory
    OPCache
    Run PHP bytecode

    View Slide

  48. JIT
    ( just in time )
    26

    View Slide

  49. 27

    View Slide

  50. ➡ Compiles to native machine code at
    runtime.
    27

    View Slide

  51. ➡ Compiles to native machine code at
    runtime.
    ➡ Compiles per function /method
    27

    View Slide

  52. ➡ Compiles to native machine code at
    runtime.
    ➡ Compiles per function /method
    ➡ Or, compiles code block.
    27

    View Slide

  53. ➡ Compiles to native machine code at
    runtime.
    ➡ Compiles per function /method
    ➡ Or, compiles code block.
    ➡ Or, only compiles on multiple calls.
    27

    View Slide

  54. ➡ Compiles to native machine code at
    runtime.
    ➡ Compiles per function /method
    ➡ Or, compiles code block.
    ➡ Or, only compiles on multiple calls.
    ➡ Or, interprets, compiles in the
    background, and switches to compiled
    code when compilation is finished.
    27

    View Slide

  55. init();
    $a = 1;
    for ($i=0; $i!=1000; $i++) {
    // very CPU consuming functions
    $a = $i / 100 * sqrt($i / 163.21) + foobar($i));
    echo $a;
    }
    28

    View Slide

  56. 29

    View Slide

  57. 29
    ➡ We don't have to "wait" for compilation.

    View Slide

  58. 29
    ➡ We don't have to "wait" for compilation.
    ➡ Still runs fast (as in binary code).

    View Slide

  59. 29
    ➡ We don't have to "wait" for compilation.
    ➡ Still runs fast (as in binary code).
    ➡ Can optimize even better than pre-
    compilation (it has more context).

    View Slide

  60. PHP7
    30

    View Slide

  61. 31
    Lexing &
    Parsing
    Bytecode
    Compilation
    Execution

    View Slide

  62. 32
    AST Generation
    Bytecode
    Compiliation
    Execution
    Lexing &
    Parsing

    View Slide

  63. 33
    AST Generation
    Bytecode
    Compiliation
    Execution
    Lexing &
    parsing
    Whatever
    you want to do

    View Slide

  64. 34
    Abstract Syntax Tree
    https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/Abstract_syntax_tree_for_Euclidean_algorithm.svg/400px-Abstract_syntax_tree_for_Euclidean_algorithm.svg.png
    while ($b != 0) {
    if ($a > $b) {
    $a = $a - $b;
    } else {
    $b = $b - $a;
    }
    }
    return $a;

    View Slide

  65. 35
    ➡ Make changes to the tree
    (eg: remove all else-statements)
    ➡ Convert code back to older versions or
    other language (transpiling)
    ➡ Analyze code
    ➡ Optimize code

    View Slide

  66. 36
    ➡ JIT system (other than HHVM)
    ➡ LLVM ?
    ➡ Bytecode interchange ?
    ➡ PhpPhp (php interpreter written in php?)
    The future

    View Slide

  67. http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 37

    View Slide

  68. 38
    Find me on twitter: @jaytaph
    Find me for development and training:
    www.noxlogic.nl / www.techademy.nl
    Find me on email: [email protected]
    Find me for blogs: www.adayinthelifeof.nl

    View Slide