$30 off During Our Annual Pro Sale. View Details »

Introduction into interpreters, compilers and JIT

Joshua Thijssen
May 31, 2016
130

Introduction into interpreters, compilers and JIT

Joshua Thijssen

May 31, 2016
Tweet

Transcript

  1. 1
    Joshua Thijssen
    jaytaph
    an introduction into compilers, interpreters and JIT
    From source to code

    View Slide

  2. Computers are really dumb.
    2

    View Slide

  3. They do not understand:
    3
    http://blog.ruslans.com/2012_11_01_archive.html

    View Slide

  4. 56e9 5200 4641 4c46 5245 0020 0102 0001
    e002 4000 f00b 0009 0012 0002 0000 0000
    0000 0000 0000 ef29 adbe 52de 4641 4c46
    5245 2020 2020 4146 3154 2032 2020 6152
    6666 656c 4f72 3a53 0020 414e 454d 2053
    2020 4144 0054 4900 fa37 c031 d88e c08e
    d08e 00bc fb7c 3ebe e87c 0117 00bb b880
    0001 12bf e800 00a3 00bb b8a4 0013 0ebf
    e800 0097 bffc a400 4abe b97c 000b f357
    5fa6 0d74 c781 0020 ff81 c000 ea76 fde9
    8bff 1c45 55a3 bb7c c000 458b 501a abe8
    5800 c381 0200 8953 d1c3 01eb 8bc3 0097
    5b80 01a9 7500 8107 ffe2 e90f 0003 eac1
    8104 f0fa 890f 72d0 31d4 cdc0 001a 5716
    e87c 0088 ff25 8900 bec3 c000 f789 3e03
    7c55 0ab4 2588 cbfe 0f74 39ac 7cfe be03
    c000 0a3c f074 f1e9 56ff b046 3a0a 7504
    c6f9 0004 e85e 006b fde9 50ff 0ae8 5800
    c381 0200 4f40 f375 89c3 81e5 08ec be00
    0012 d231 f6f7 c2fe 5688 beff 0002 d231
    f6f7 5688 89fe fc46 01b0 6e8a 8afc fe76
    4e8a b2ff b400 cd02 8913 c3ec 022d 3100
    b1c9 f701 05e1 0021 bee8 c3ff 57a1 ba7c
    8405 e2f7 063b 7c57 0375 d488 a340 7c57
    d089 acc3 003c 0a74 0eb4 07bb cd00 e910
    fff1 90c3 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 aa55
    4

    View Slide

  5. 56e9 5200 4641 4c46 5245 0020 0102 0001
    e002 4000 f00b 0009 0012 0002 0000 0000
    0000 0000 0000 ef29 adbe 52de 4641 4c46
    5245 2020 2020 4146 3154 2032 2020 6152
    6666 656c 4f72 3a53 0020 414e 454d 2053
    2020 4144 0054 4900 fa37 c031 d88e c08e
    d08e 00bc fb7c 3ebe e87c 0117 00bb b880
    0001 12bf e800 00a3 00bb b8a4 0013 0ebf
    e800 0097 bffc a400 4abe b97c 000b f357
    5fa6 0d74 c781 0020 ff81 c000 ea76 fde9
    8bff 1c45 55a3 bb7c c000 458b 501a abe8
    5800 c381 0200 8953 d1c3 01eb 8bc3 0097
    5b80 01a9 7500 8107 ffe2 e90f 0003 eac1
    8104 f0fa 890f 72d0 31d4 cdc0 001a 5716
    e87c 0088 ff25 8900 bec3 c000 f789 3e03
    7c55 0ab4 2588 cbfe 0f74 39ac 7cfe be03
    c000 0a3c f074 f1e9 56ff b046 3a0a 7504
    c6f9 0004 e85e 006b fde9 50ff 0ae8 5800
    c381 0200 4f40 f375 89c3 81e5 08ec be00
    0012 d231 f6f7 c2fe 5688 beff 0002 d231
    f6f7 5688 89fe fc46 01b0 6e8a 8afc fe76
    4e8a b2ff b400 cd02 8913 c3ec 022d 3100
    b1c9 f701 05e1 0021 bee8 c3ff 57a1 ba7c
    8405 e2f7 063b 7c57 0375 d488 a340 7c57
    d089 acc3 003c 0a74 0eb4 07bb cd00 e910
    fff1 90c3 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 9090
    9090 9090 9090 9090 9090 9090 9090 aa55
    4

    View Slide

  6. 5
    mov ax, [di + 0x1A] ; Here starts FAT clusters
    loadClusters:
    push ax
    call ReadCluster
    pop ax
    add bx, 512 ; Next sector
    push bx
    mov bx, ax
    shr bx, 1
    add bx, ax
    mov dx, [0x8000 + bx]
    pop bx
    test ax, 0x01
    jnz oddCluster
    evenCluster:
    and dx, 0x0FFF
    jmp testCluster
    oddCluster:
    shr dx, 4
    testCluster:
    cmp dx, 0xFF0
    mov ax, dx
    jb loadClusters
    https://github.com/domcode/rafflers/blob/master/jaytaph-bootsector-asm/raffler.S

    View Slide

  7. 1 /**
    2 * Return bucket for specified key
    3 */
    4 static t_hash_table_bucket *find_bucket(t_hash_table *ht, t_hash_key *key) {
    5 // Locate the hash value in the bucket list.
    6 hash_t hash_value = ht_hash(ht, key);
    7 hash_t hash_value_capped = hash_value % ht->bucket_count;
    8 if (ht->bucket_list[hash_value_capped] == NULL) {
    9 // Not found
    10 return NULL;
    11 }
    12
    13 // Found bucket. Try and find the key. Traverse linked list if needed.
    14 int found = 0;
    15 t_hash_table_bucket *htb = ht->bucket_list[hash_value_capped];
    16
    17
    18 // Cache the key hashval when we are dealing with objects.
    19 char *key_hash_val = NULL;
    20 if (key->type == HASH_KEY_OBJ) {
    21 key_hash_val = object_get_hash((t_object *)(key->val.o));
    22 }
    23
    24 while (htb) {
    25 switch (key->type) {
    26 case HASH_KEY_STR :
    27 if (strcmp(htb->key->val.s, key->val.s) == 0) found = 1;
    28 break;
    29 case HASH_KEY_NUM :
    30 if (htb->key->val.n == key->val.n) found = 1;
    6
    https://github.com/jaytaph/saffire/blob/develop/src/components/general/hash/chained.c

    View Slide

  8. 1 public function isEmpty()
    2 {
    3 foreach ($this->children as $child) {
    4 if (!$child->isEmpty()) {
    5 return false;
    6 }
    7 }
    8
    9 return FormUtil::isEmpty($this->modelData) ||
    10 // arrays, countables
    11 0 === count($this->modelData) ||
    12 // traversables that are not countable
    13 ($this->modelData instanceof \Traversable &&
    0 === iterator_count($this->modelData));
    14 }
    7
    https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Form/Form.php

    View Slide

  9. Let's write a language
    (in 5 minutes or less)
    8

    View Slide

  10. 9
    set a 1
    set b 2
    add a b c
    print c

    View Slide

  11. 9
    set a 1
    set b 2
    add a b c
    print c
    operator

    View Slide

  12. 9
    set a 1
    set b 2
    add a b c
    print c
    operator operands

    View Slide

  13. 9
    set a 1
    set b 2
    add a b c
    print c
    1 2
    3 $vars = array();
    4
    5 $lines = file("example.ipc");
    6
    7 foreach ($lines as $line) {
    8 $line = explode(" ", trim($line));
    9 switch ($line[0]) {
    10 case "set" :
    11 $vars[$line[1]] = $line[2];
    12 break;
    13 case "add" :
    14 $vars[$line[3]] = $vars[$line[1]] + $vars[$line[2]];
    15 break;
    16 case "print" :
    17 echo $vars[$line[1]];
    18 break;
    19 }
    20 }
    operator operands

    View Slide

  14. 10
    Interpreter

    View Slide

  15. 11

    View Slide

  16. 11
    ➡ Parses and runs code directly from the
    source code by the interpreter.

    View Slide

  17. 11
    ➡ Parses and runs code directly from the
    source code by the interpreter.
    ➡ Works on "any" platform (which runs the
    interpreter).

    View Slide

  18. 11
    ➡ Parses and runs code directly from the
    source code by the interpreter.
    ➡ Works on "any" platform (which runs the
    interpreter).
    ➡ "platform" agnostic.

    View Slide

  19. 11
    ➡ Parses and runs code directly from the
    source code by the interpreter.
    ➡ Works on "any" platform (which runs the
    interpreter).
    ➡ "platform" agnostic.
    ➡ Slow.

    View Slide

  20. 11
    ➡ Parses and runs code directly from the
    source code by the interpreter.
    ➡ Works on "any" platform (which runs the
    interpreter).
    ➡ "platform" agnostic.
    ➡ Slow.
    ➡ Every instruction interpreted over and over
    again (for instance, loops).

    View Slide

  21. 12
    PHP is an
    interpreted language
    (for now)

    View Slide

  22. What if:
    we could convert our
    source code into
    machine code?
    13

    View Slide

  23. Compiler
    14

    View Slide

  24. Pro
    15

    View Slide

  25. Pro
    15
    ➡ Compiles source to machine code.

    View Slide

  26. Pro
    15
    ➡ Compiles source to machine code.
    ➡ Runs very fast.

    View Slide

  27. Pro
    15
    ➡ Compiles source to machine code.
    ➡ Runs very fast.
    ➡ No need for the source code.

    View Slide

  28. Pro
    15
    ➡ Compiles source to machine code.
    ➡ Runs very fast.
    ➡ No need for the source code.
    ➡ Optimized for the running platform.

    View Slide

  29. Cons
    16
    https://xkcd.com/303/

    View Slide

  30. Cons
    ➡ Can ONLY run on the compiled
    architecture (x64, arm, sparc etc).
    16
    https://xkcd.com/303/

    View Slide

  31. Cons
    ➡ Can ONLY run on the compiled
    architecture (x64, arm, sparc etc).
    ➡ A change in the source code means a
    recompilation is needed.
    16
    https://xkcd.com/303/

    View Slide

  32. Cons
    ➡ Can ONLY run on the compiled
    architecture (x64, arm, sparc etc).
    ➡ A change in the source code means a
    recompilation is needed.

    16
    https://xkcd.com/303/

    View Slide

  33. ➡ Instruction Set Architecture (ISA)
    ➡ Application Binary Interface (ABI)
    17

    View Slide

  34. ISA
    ➡ The API of your computer / CPU.
    ➡ The specs are not funny.
    18

    View Slide

  35. 19
    http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

    View Slide

  36. ABI
    ➡ Again, an API
    ➡ How does a program need to be loaded in
    order to work on the given ISA?
    ➡ file formats, function calls, stacks etc.
    20

    View Slide

  37. 21
    #include
    int main(void) {
    int i = 41;
    i++;
    printf("The number is: %d\n", i);
    }

    View Slide

  38. .file "test.c"
    .section .rodata
    .LC0:
    .string "The number is: %d\n"
    .text
    .globl main
    .type main, @function
    main:
    pushl %ebp
    movl %esp, %ebp
    andl $-16, %esp
    subl $32, %esp
    movl $41, 28(%esp)
    addl $1, 28(%esp)
    movl $.LC0, %eax
    movl 28(%esp), %edx
    movl %edx, 4(%esp)
    movl %eax, (%esp)
    call printf
    leave
    ret
    .size main, .-main
    .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)"
    .section .note.GNU-stack,"",@progbits
    21
    #include
    int main(void) {
    int i = 41;
    i++;
    printf("The number is: %d\n", i);
    }

    View Slide

  39. .file "test.c"
    .section .rodata
    .LC0:
    .string "The number is: %d\n"
    .text
    .globl main
    .type main, @function
    main:
    pushl %ebp
    movl %esp, %ebp
    andl $-16, %esp
    subl $32, %esp
    movl $41, 28(%esp)
    addl $1, 28(%esp)
    movl $.LC0, %eax
    movl 28(%esp), %edx
    movl %edx, 4(%esp)
    movl %eax, (%esp)
    call printf
    leave
    ret
    .size main, .-main
    .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)"
    .section .note.GNU-stack,"",@progbits
    21
    #include
    int main(void) {
    int i = 41;
    i++;
    printf("The number is: %d\n", i);
    }
    .file "test.c"
    .section .rodata
    .LC0:
    .string "The number is: %d\n"
    .text
    .globl main
    .type main, @function
    main:
    .LFB0:
    .cfi_startproc
    pushq %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp
    .cfi_def_cfa_register 6
    subq $16, %rsp
    movl $41, -4(%rbp)
    addl $1, -4(%rbp)
    movl $.LC0, %eax
    movl -4(%rbp), %edx
    movl %edx, %esi
    movq %rax, %rdi
    movl $0, %eax
    call printf
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
    .LFE0:
    .size main, .-main
    .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)"
    .section .note.GNU-stack,"",@progbits

    View Slide

  40. This is why sane people do
    not write compilers*
    22
    * I'm currently writing a compiler.

    View Slide

  41. Hybrid system
    23

    View Slide

  42. 24
    ➡ We "compile" source code to an
    intermediate code.
    ➡ Intermediate code looks very similar to
    machine code: few and simple instructions.

    View Slide

  43. 25

    View Slide

  44. ➡ Intermediate is very simple.
    25

    View Slide

  45. ➡ Intermediate is very simple.
    ➡ Doesn't use any/very few optimizations.
    25

    View Slide

  46. ➡ Intermediate is very simple.
    ➡ Doesn't use any/very few optimizations.
    ➡ Can be compiled at runtime, because it's
    very fast (compared to "real" compilation).
    25

    View Slide

  47. 26

    View Slide

  48. ➡ Intermediate code gets "interpreted".
    26

    View Slide

  49. ➡ Intermediate code gets "interpreted".
    ➡ Since intermediate code resembles machine
    code, interpretation and execution is
    relatively fast.
    26

    View Slide

  50. ➡ Intermediate code gets "interpreted".
    ➡ Since intermediate code resembles machine
    code, interpretation and execution is
    relatively fast.
    ➡ Lots of error checks have been removed /
    dealt with during compilation phase.
    26

    View Slide

  51. 27

    View Slide

  52. ➡ Interpreting is done through a "Virtual
    Machine"
    (not to be confused with: vmware, virtualbox)
    ➡ Java virtual machine (JVM)
    ➡ Python, Ruby, PHP
    27

    View Slide

  53. Finding entry points
    Branch analysis from position: 0
    Jump found. Position 1 = -2
    filename: /in/CQ5uJ
    function name: (null)
    number of ops: 6
    compiled vars: !0 = $a, !1 = $b
    line #* E I O op fetch ext return operands
    -------------------------------------------------------------------------------------
    3 0 E > ASSIGN !0, 'hello+world'
    5 1 ASSIGN !1, 41
    7 2 ASSIGN_ADD 0 !1, 1
    9 3 CONCAT ~5 !0, !1
    4 ECHO ~5
    5 > RETURN 1
    Generated using Vulcan Logic Dumper, using php 7.0.0
    $a = "hello world";
    $b = 41;
    $b += 1;
    echo $a . $b;
    28

    View Slide

  54. 29
    compile PHP to
    bytecode
    Run PHP bytecode
    read PHP file into
    memory

    View Slide

  55. 29
    compile PHP to
    bytecode
    Run PHP bytecode
    read PHP file into
    memory
    OPCache

    View Slide

  56. 30
    compile PHP to
    bytecode
    read PHP file into
    memory
    OPCache
    Run PHP bytecode

    View Slide

  57. JIT
    ( just in time )
    31

    View Slide

  58. 32
    ➡ Many different "flavors"
    ➡ JIT != JIT != JIT

    View Slide

  59. 33

    View Slide

  60. ➡ Compiles to native machine code at
    runtime.
    33

    View Slide

  61. ➡ Compiles to native machine code at
    runtime.
    ➡ Either compiles "block" first, then runs.
    33

    View Slide

  62. ➡ Compiles to native machine code at
    runtime.
    ➡ Either compiles "block" first, then runs.
    ➡ Or, only compiles only on multiple calls.
    33

    View Slide

  63. ➡ Compiles to native machine code at
    runtime.
    ➡ Either compiles "block" first, then runs.
    ➡ Or, only compiles only on multiple calls.
    ➡ Or, compiles per function.
    33

    View Slide

  64. ➡ Compiles to native machine code at
    runtime.
    ➡ Either compiles "block" first, then runs.
    ➡ Or, only compiles only on multiple calls.
    ➡ Or, compiles per function.
    ➡ Or, interprets, compiles in the background,
    and switches to compiled code when
    compilation is finished.
    33

    View Slide

  65. init();
    $a = 1;
    for ($i=0; $i!=1000; $i++) {
    // very CPU consuming functions
    $a = $i / 100 * sqrt($i / 163.21) + foobar($i));
    echo $a;
    }
    34

    View Slide

  66. 35

    View Slide

  67. 35
    ➡ We don't have to "wait" for compilation.

    View Slide

  68. 35
    ➡ We don't have to "wait" for compilation.
    ➡ Still runs fast (as in binary code).

    View Slide

  69. 35
    ➡ We don't have to "wait" for compilation.
    ➡ Still runs fast (as in binary code).
    ➡ Can optimize even better than pre-
    compilation (it has more context).

    View Slide

  70. PHP7
    36

    View Slide

  71. 37
    ➡ Hybrid / ByteCode VM
    ➡ AST
    ➡ No static analysis (yet)
    ➡ No JIT (yet)

    View Slide

  72. Abstract Syntax Tree
    38

    View Slide

  73. 39
    Lexing &
    Parsing
    Bytecode
    Compilation
    Execution

    View Slide

  74. 40
    AST Generation
    Bytecode
    Compiliation
    Execution
    Lexing &
    Parsing

    View Slide

  75. 41
    AST Generation
    Bytecode
    Compiliation
    Execution
    Lexing &
    parsing
    Whatever
    you want to do

    View Slide

  76. 42
    Abstract Syntax Tree
    https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/Abstract_syntax_tree_for_Euclidean_algorithm.svg/400px-Abstract_syntax_tree_for_Euclidean_algorithm.svg.png
    while ($b != 0) {
    if ($a > $b) {
    $a = $a - $b;
    } else {
    $b = $b - $a;
    }
    }
    return $a;

    View Slide

  77. 43
    ➡ Make changes to the tree
    (eg: remove all else-statements)
    ➡ Convert code back to older versions or
    other language (transpiling)
    ➡ Analyze code
    ➡ Optimize code

    View Slide

  78. 44
    ➡ JIT system (other than hiphopvm)
    ➡ LLVM
    ➡ Bytecode interchange
    ➡ PhpPhp (php interpreter written in php?)
    The future

    View Slide

  79. http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 45

    View Slide

  80. 46
    Find me on twitter: @jaytaph
    Find me for development and training:
    www.noxlogic.nl / www.techademy.nl
    Find me on email: [email protected]
    Find me for blogs: www.adayinthelifeof.nl

    View Slide

  81. class_declaration_statement:
    class_modifiers T_CLASS T_STRING extends_from implements_list '{' statement_list '}'
    | T_CLASS T_STRING extends_from implements_list '{' statement_list '}'
    ;
    class_modifiers:
    class_modifier
    | class_modifiers class_modifier { zend_add_class_modifiers($1, $2) }
    ;
    class_modifier:
    T_ABSTRACT
    | T_FINAL
    ;
    47
    abstract abstract final final class foo { }

    View Slide