Slide 1

Slide 1 text

1 Joshua Thijssen jaytaph an introduction into compilers, interpreters and JIT From source to code

Slide 2

Slide 2 text

Computers don't understand: 2 http://blog.ruslans.com/2012_11_01_archive.html

Slide 3

Slide 3 text

56e9 5200 4641 4c46 5245 0020 0102 0001 e002 4000 f00b 0009 0012 0002 0000 0000 0000 0000 0000 ef29 adbe 52de 4641 4c46 5245 2020 2020 4146 3154 2032 2020 6152 6666 656c 4f72 3a53 0020 414e 454d 2053 2020 4144 0054 4900 fa37 c031 d88e c08e d08e 00bc fb7c 3ebe e87c 0117 00bb b880 0001 12bf e800 00a3 00bb b8a4 0013 0ebf e800 0097 bffc a400 4abe b97c 000b f357 5fa6 0d74 c781 0020 ff81 c000 ea76 fde9 8bff 1c45 55a3 bb7c c000 458b 501a abe8 5800 c381 0200 8953 d1c3 01eb 8bc3 0097 5b80 01a9 7500 8107 ffe2 e90f 0003 eac1 8104 f0fa 890f 72d0 31d4 cdc0 001a 5716 e87c 0088 ff25 8900 bec3 c000 f789 3e03 7c55 0ab4 2588 cbfe 0f74 39ac 7cfe be03 c000 0a3c f074 f1e9 56ff b046 3a0a 7504 c6f9 0004 e85e 006b fde9 50ff 0ae8 5800 c381 0200 4f40 f375 89c3 81e5 08ec be00 0012 d231 f6f7 c2fe 5688 beff 0002 d231 f6f7 5688 89fe fc46 01b0 6e8a 8afc fe76 4e8a b2ff b400 cd02 8913 c3ec 022d 3100 b1c9 f701 05e1 0021 bee8 c3ff 57a1 ba7c 8405 e2f7 063b 7c57 0375 d488 a340 7c57 d089 acc3 003c 0a74 0eb4 07bb cd00 e910 fff1 90c3 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 aa55 3

Slide 4

Slide 4 text

56e9 5200 4641 4c46 5245 0020 0102 0001 e002 4000 f00b 0009 0012 0002 0000 0000 0000 0000 0000 ef29 adbe 52de 4641 4c46 5245 2020 2020 4146 3154 2032 2020 6152 6666 656c 4f72 3a53 0020 414e 454d 2053 2020 4144 0054 4900 fa37 c031 d88e c08e d08e 00bc fb7c 3ebe e87c 0117 00bb b880 0001 12bf e800 00a3 00bb b8a4 0013 0ebf e800 0097 bffc a400 4abe b97c 000b f357 5fa6 0d74 c781 0020 ff81 c000 ea76 fde9 8bff 1c45 55a3 bb7c c000 458b 501a abe8 5800 c381 0200 8953 d1c3 01eb 8bc3 0097 5b80 01a9 7500 8107 ffe2 e90f 0003 eac1 8104 f0fa 890f 72d0 31d4 cdc0 001a 5716 e87c 0088 ff25 8900 bec3 c000 f789 3e03 7c55 0ab4 2588 cbfe 0f74 39ac 7cfe be03 c000 0a3c f074 f1e9 56ff b046 3a0a 7504 c6f9 0004 e85e 006b fde9 50ff 0ae8 5800 c381 0200 4f40 f375 89c3 81e5 08ec be00 0012 d231 f6f7 c2fe 5688 beff 0002 d231 f6f7 5688 89fe fc46 01b0 6e8a 8afc fe76 4e8a b2ff b400 cd02 8913 c3ec 022d 3100 b1c9 f701 05e1 0021 bee8 c3ff 57a1 ba7c 8405 e2f7 063b 7c57 0375 d488 a340 7c57 d089 acc3 003c 0a74 0eb4 07bb cd00 e910 fff1 90c3 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 9090 aa55 3

Slide 5

Slide 5 text

4 mov ax, [di + 0x1A] ; Here starts FAT clusters loadClusters: push ax call ReadCluster pop ax add bx, 512 ; Next sector push bx mov bx, ax shr bx, 1 add bx, ax mov dx, [0x8000 + bx] pop bx test ax, 0x01 jnz oddCluster evenCluster: and dx, 0x0FFF jmp testCluster oddCluster: shr dx, 4 testCluster: cmp dx, 0xFF0 mov ax, dx jb loadClusters https://github.com/domcode/rafflers/blob/master/jaytaph-bootsector-asm/raffler.S

Slide 6

Slide 6 text

1 static void out_string(conn *c, const char *str) { 2 size_t len; 3 4 assert(c != NULL); 5 6 if (c->noreply) { 7 if (settings.verbose > 1) 8 fprintf(stderr, ">%d NOREPLY %s\n", c->sfd, str); 9 c->noreply = false; 10 conn_set_state(c, conn_new_cmd); 11 return; 12 } 13 14 if (settings.verbose > 1) 15 fprintf(stderr, ">%d %s\n", c->sfd, str); 16 17 /* Nuke a partial output... */ 18 c->msgcurr = 0; 19 c->msgused = 0; 20 c->iovused = 0; 21 add_msghdr(c); 22 23 len = strlen(str); 24 if ((len + 2) > c->wsize) { 25 /* ought to be always enough. just fail for simplicity */ 26 str = "SERVER_ERROR output line too long"; 27 len = strlen(str); 28 } 29 30 memcpy(c->wbuf, str, len); 31 memcpy(c->wbuf + len, "\r\n", 2); 5 https://github.com/memcached/memcached/blob/master/memcached.c

Slide 7

Slide 7 text

1

Slide 8

Slide 8 text

Let's write a language (in 5 minutes or less) 7

Slide 9

Slide 9 text

8 set a 1 set b 2 add a b c print c

Slide 10

Slide 10 text

8 set a 1 set b 2 add a b c print c operator

Slide 11

Slide 11 text

8 set a 1 set b 2 add a b c print c operator operands

Slide 12

Slide 12 text

8 set a 1 set b 2 add a b c print c 1

Slide 13

Slide 13 text

9 Interpreter

Slide 14

Slide 14 text

10

Slide 15

Slide 15 text

10 ➡ Runs directly from the source.

Slide 16

Slide 16 text

10 ➡ Runs directly from the source. ➡ Platform agnostic.

Slide 17

Slide 17 text

10 ➡ Runs directly from the source. ➡ Platform agnostic. ➡ Runtime checks.

Slide 18

Slide 18 text

10 ➡ Runs directly from the source. ➡ Platform agnostic. ➡ Runtime checks. ➡ Slow.

Slide 19

Slide 19 text

10 ➡ Runs directly from the source. ➡ Platform agnostic. ➡ Runtime checks. ➡ Slow. ➡ Every instruction interpreted over and over again.

Slide 20

Slide 20 text

11 PHP is an interpreted language (for now)

Slide 21

Slide 21 text

What if: we could convert our source code into machine code? 12

Slide 22

Slide 22 text

Compiler 13

Slide 23

Slide 23 text

Pro 14

Slide 24

Slide 24 text

Pro 14 ➡ Compiles source into machine code.

Slide 25

Slide 25 text

Pro 14 ➡ Compiles source into machine code. ➡ Runs very fast (no "layers" in between).

Slide 26

Slide 26 text

Pro 14 ➡ Compiles source into machine code. ➡ Runs very fast (no "layers" in between). ➡ No need for the source code.

Slide 27

Slide 27 text

Pro 14 ➡ Compiles source into machine code. ➡ Runs very fast (no "layers" in between). ➡ No need for the source code. ➡ Optimized for the running platform.

Slide 28

Slide 28 text

Cons 15

Slide 29

Slide 29 text

Cons ➡ Can ONLY run on the compiled architecture (x64, arm, sparc etc). 15

Slide 30

Slide 30 text

Cons ➡ Can ONLY run on the compiled architecture (x64, arm, sparc etc). ➡ A change in the source code means a recompilation is needed. 15

Slide 31

Slide 31 text

Cons ➡ Can ONLY run on the compiled architecture (x64, arm, sparc etc). ➡ A change in the source code means a recompilation is needed. ➡ 15

Slide 32

Slide 32 text

16

Slide 33

Slide 33 text

17 http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Slide 34

Slide 34 text

This is why sane people do not write compilers* 18 * I'm currently writing a compiler.

Slide 35

Slide 35 text

19

Slide 36

Slide 36 text

Hybrid system 20

Slide 37

Slide 37 text

21

Slide 38

Slide 38 text

21 ➡ We compile source code to an intermediate code.

Slide 39

Slide 39 text

21 ➡ We compile source code to an intermediate code. ➡ Run this intermediate code in a specialized interpreter.

Slide 40

Slide 40 text

21 ➡ We compile source code to an intermediate code. ➡ Run this intermediate code in a specialized interpreter. ➡ Faster interpretation.

Slide 41

Slide 41 text

22

Slide 42

Slide 42 text

22 ASSIGN !0, 1 5 1 ADD ~2 !0, 1 2 ASSIGN !0, ~2 6 3 > RETURN !0 7 4* > RETURN null PHP

Slide 43

Slide 43 text

22 ASSIGN !0, 1 5 1 ADD ~2 !0, 1 2 ASSIGN !0, ~2 6 3 > RETURN !0 7 4* > RETURN null PHP def foo(): i = 1 i = i + 1 return i 4 0 LOAD_CONST 1 (1) 3 STORE_FAST 0 (i) 5 6 LOAD_FAST 0 (i) 9 LOAD_CONST 1 (1) 12 BINARY_ADD 13 STORE_FAST 0 (i) 6 16 LOAD_FAST 0 (i) 19 RETURN_VALUE PYTHON

Slide 44

Slide 44 text

23 compile PHP to bytecode Run PHP bytecode read PHP file into memory

Slide 45

Slide 45 text

24 compile PHP to bytecode Run PHP bytecode read PHP file into memory

Slide 46

Slide 46 text

24 compile PHP to bytecode Run PHP bytecode read PHP file into memory OPCache

Slide 47

Slide 47 text

25 compile PHP to bytecode read PHP file into memory OPCache Run PHP bytecode

Slide 48

Slide 48 text

JIT ( just in time ) 26

Slide 49

Slide 49 text

27

Slide 50

Slide 50 text

➡ Compiles to native machine code at runtime. 27

Slide 51

Slide 51 text

➡ Compiles to native machine code at runtime. ➡ Compiles per function /method 27

Slide 52

Slide 52 text

➡ Compiles to native machine code at runtime. ➡ Compiles per function /method ➡ Or, compiles code block. 27

Slide 53

Slide 53 text

➡ Compiles to native machine code at runtime. ➡ Compiles per function /method ➡ Or, compiles code block. ➡ Or, only compiles on multiple calls. 27

Slide 54

Slide 54 text

➡ Compiles to native machine code at runtime. ➡ Compiles per function /method ➡ Or, compiles code block. ➡ Or, only compiles on multiple calls. ➡ Or, interprets, compiles in the background, and switches to compiled code when compilation is finished. 27

Slide 55

Slide 55 text

init(); $a = 1; for ($i=0; $i!=1000; $i++) { // very CPU consuming functions $a = $i / 100 * sqrt($i / 163.21) + foobar($i)); echo $a; } 28

Slide 56

Slide 56 text

29

Slide 57

Slide 57 text

29 ➡ We don't have to "wait" for compilation.

Slide 58

Slide 58 text

29 ➡ We don't have to "wait" for compilation. ➡ Still runs fast (as in binary code).

Slide 59

Slide 59 text

29 ➡ We don't have to "wait" for compilation. ➡ Still runs fast (as in binary code). ➡ Can optimize even better than pre- compilation (it has more context).

Slide 60

Slide 60 text

PHP7 30

Slide 61

Slide 61 text

31 Lexing & Parsing Bytecode Compilation Execution

Slide 62

Slide 62 text

32 AST Generation Bytecode Compiliation Execution Lexing & Parsing

Slide 63

Slide 63 text

33 AST Generation Bytecode Compiliation Execution Lexing & parsing Whatever you want to do

Slide 64

Slide 64 text

34 Abstract Syntax Tree https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/Abstract_syntax_tree_for_Euclidean_algorithm.svg/400px-Abstract_syntax_tree_for_Euclidean_algorithm.svg.png while ($b != 0) { if ($a > $b) { $a = $a - $b; } else { $b = $b - $a; } } return $a;

Slide 65

Slide 65 text

35 ➡ Make changes to the tree (eg: remove all else-statements) ➡ Convert code back to older versions or other language (transpiling) ➡ Analyze code ➡ Optimize code

Slide 66

Slide 66 text

36 ➡ JIT system (other than HHVM) ➡ LLVM ? ➡ Bytecode interchange ? ➡ PhpPhp (php interpreter written in php?) The future

Slide 67

Slide 67 text

http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 37

Slide 68

Slide 68 text

38 Find me on twitter: @jaytaph Find me for development and training: www.noxlogic.nl / www.techademy.nl Find me on email: jthijssen@noxlogic.nl Find me for blogs: www.adayinthelifeof.nl