PHP 8 and Just In Time Compilation

PHP 8 and Just In Time Compilation

PHP 7 already brought a real performance gain. But PHP 8 is trying to even go further
by integrating a Just In Time Compiler.

Just in Time Compilation is a way to turn the PHP OpCode into machine language that can be
run directly on the processor, in order to achieve even better performances.

The aim of this talk is to dive into the JIT technology chosen by the Zend Engine development team,
as well as to present some performances benchmarks on Symfony applications.

6648bd4390fba79c9baa6045e58fa337?s=128

Benoit Jacquemont

May 16, 2019
Tweet

Transcript

  1. 1.

    PHP 8 & PHP 8 & PHP 8 & JIT

    Compilation JIT Compilation JIT Compilation Benoit Jacquemont Benoit Jacquemont Benoit Jacquemont @bjacquemont @bjacquemont @bjacquemont
  2. 5.

    Just In Time Compilation It's a way of executing computer

    code that involves compilation at run time rather than prior to execution. Expectation Compiled code speed > > Interpreted code speed
  3. 6.

    Platforms With JIT Java with the Hotspot JVM .NET with

    the Common Language Runtime NodeJS with V8 ...
  4. 8.
  5. 9.
  6. 10.

    Compilation Is A Very CPU Intensive Process 10 minutes to

    Compile Zend Engine 1 hour and half to compile the Linux kernel
  7. 11.

    So, You Want To Make Your Code Execute Faster By

    Compiling Stuff During Execution?
  8. 12.

    Fast Compilation Time And Best Bene ts From Compilation Compile

    Only The Most Executed Code Less code to compile = time spent on compilation Most used code compiled = relevant performance improvements
  9. 13.
  10. 14.

    JIT Standard Work ow Initial code ⤚ ⚙ syntax validation

    + compilation ⚙ → intermediate representation ⤚ ⚙ execution + pro ling ⚙ → selection of most used code ⤚ ⚙ compilation to native code ⚙ → native code for most used code ➠ execution on the processor
  11. 15.

    Executing Native Code The Hardware Problem Native means built for

    a processor instructions set. And there's more than one... x86 x86_64 ARM MIPS RISC V
  12. 16.

    Executing Native Code The OS Problem The OS controls what

    is executed. Should work on Linux, but as well as Windows, MacOS and BSDs, 32bits and 64bits...
  13. 17.

    PHP JIT Requirements an internal pro ler very fast compilers

    from Opcode to: x86 x86_64 ARM MIPS ... Multi-OS support
  14. 21.

    Without DynASM Need to generate each of the following x86_64

    ARM MIPS $i++; mov ebx, 0x1234h mov eax, [ebx] inc eax mov [ebx], eax MOV R0, (#0x1234h) ADD R0, R0, #1 MOV (#0x1234h), R0 lw $t0,0x1234h addw $t0,$t0,1 sw $t0,0x1234h
  15. 22.

    With DynASM Only need to generate one assembly code DynASM

    will generate the native code for the target x86_64 ARM MIPS $i++; mov $0x1234, %rdi inc %rdi mov %rdi, $0x1234 mov ebx, 0x1234h mov eax, [ebx] inc eax mov [ebx], eax MOV R0, (#0x1234h) ADD R0, R0, #1 MOV (#0x1234h), R0 lw $t0,0x1234h addw $t0,$t0,1 sw $t0,0x1234h
  16. 24.

    JIT Compilation & PHP Instead of developing multiple compilers: PHP

    opcode to x86 PHP opcode to x86_64 PHP opcode to ARM PHP opcode to MIPS ... Only need: PHP opcode to DynASM Assembly
  17. 25.

    PHP JIT Work ow PHP code ⤚ ⚙ syntax validation

    + compilation ⚙ → opcode ⤚ ⚙ execution + pro ling ⚙ → selection of most used code ⤚ ⚙ compilation to DynASM assembly ⚙ → DynASM Assembly ⤚ ⚙ compilation to native code (thanks DynASM!) ⚙ → Native code ➠ execution on the processor
  18. 27.
  19. 28.
  20. 29.
  21. 30.
  22. 31.
  23. 32.
  24. 33.
  25. 34.
  26. 35.
  27. 37.
  28. 38.
  29. 39.
  30. 41.

    Hello World! Opcode <?php echo "Hello world!"; $_main: L0 (2):

    ECHO string("Hello world!") L1 (3): RETURN int(1)
  31. 42.

    Hello World! DynASM Assembly <?php echo "Hello world!"; sub $0x10,

    %rsp mov %r15, (%r14) mov $0x40d29d48, %rdi mov $0xc, %rsi mov $php_output_write, %rax call *%rax mov $EG(exception), %rax cmp $0x0, (%rax) jnz JIT$$exception_handler add $0x20, %r15 add $0x10, %rsp mov $0x560d02b357b1, %rax call *%rax jmp (%r15)
  32. 43.

    If Opcode $a = true; if ($a === true) {

    echo "Yes!"; } else { echo "No!"; } $_main: L0 (3): ASSIGN CV0 bool(true) L1 (5): T1 = IS_IDENTICAL CV0 bool(true) L2 (5): JMPZ T1 L5 L3 (6): ECHO string("Yes!") L4 (10): RETURN int(1) L5 (8): ECHO string("No!") L6 (10): RETURN int(1)
  33. 44.

    $a = true; if ($a === true) { echo "Yes!";

    } else { echo "No!"; } sub $0x10, %rsp lea 0x50(%r14), %rdi cmp $0xa, 0x8(%rdi) jnz .L1 mov (%rdi), %rdi cmp $0x0, 0x18(%rdi) jnz .L7 add $0x8, %rdi .L1: test $0x1, 0x9(%rdi) jnz .L8 .L2: mov $0x3, 0x8(%rdi) .L3: mov $EG(exception), %rax cmp $0x0, (%rax) jnz JIT$$exception_handler lea 0x50(%r14), %rdi cmp $0xa, 0x8(%rdi) jnz .L4 mov (%rdi), %rdi add $0x8, %rdi .L4: cmp $0x3, 0x8(%rdi) jz .L5 jmp .L6 .L5: add $0x60, %r15 mov %r15, (%r14) mov $0x40af2d48, %rdi mov $0x4, %rsi mov $php_output_write, %rax call *%rax mov $EG(exception), %rax cmp $0x0, (%rax) jnz JIT$$exception_handler add $0x20, %r15 add $0x10, %rsp mov $0x559ee5a027b1, %rax call *%rax jmp (%r15) .L6: mov $0x4115d6a0, %r15 mov %r15, (%r14) mov $0x40af2d70, %rdi mov $0x3, %rsi mov $php_output_write, %rax call *%rax mov $EG(exception), %rax cmp $0x0, (%rax) jnz JIT$$exception_handler add $0x20, %r15 add $0x10, %rsp mov $0x559ee5a027b1, %rax call *%rax jmp (%r15) .L7: mov $0x4115d5c0, %rsi mov $zend_jit_assign_const_to_typed_ref, %rax call *%rax jmp .L3 .L8: mov (%rdi), %rax sub $0x1, (%rax) jnz .L9 mov %rax, (%rsp) mov $0x3, 0x8(%rdi) mov (%rsp), %rdi mov %r15, (%r14) mov $rc_dtor_func, %rax call *%rax jmp .L3 .L9: mov (%rdi), %rax mov 0x4(%rax), %eax and $0xfffffc10, %eax cmp $0x10, %eax jnz .L2 mov %rdi, (%rsp) mov (%rdi), %rdi mov $gc_possible_root, %rax call *%rax mov (%rsp), %rdi jmp .L2
  34. 47.

    JIT Controls Aka CRTO C: CPU Optimization 0 - none

    1 - enable AVX instruction generation R: Register Allocation 0 - don't perform register allocation 1 - use local liner-scan register allocator 2 - use global liner-scan register allocator T: JIT Trigger 0 - JIT all functions on rst script load 1 - JIT function on rst execution 2 - Pro le on rst request and compile hot functions on second request 3 - Pro le on the y and compile hot functions 4 - Compile functions with @jit tag in doc-comments O: Optimization level 0 - don't JIT 1 - minimal JIT (call standard VM handlers) 2 - selective VM handler inlining 3 - optimized JIT based on static type inference of individual function 4 - optimized JIT based on static type inference and call tree 5 - optimized JIT based on static type inference and inner procedure analyses opcache.jit=1235
  35. 48.

    How To Run PHP8 JIT With Docker Compile It From

    / docker run akondas/php:8.0-cli-alpine \ php -d zend_extension=opcache.so \ -d opcache.enable_cli=1 \ -d opcache.jit_buffer_size=100M \ -d opcache.jit=1235 github.com/zendtech/php-src/tree/jit-dynasm
  36. 51.

    Zend/Bench.Php Very basic bench available in the PHP source tree

    Without JIT: 0.567s With JIT: 0.130s x4 improvement
  37. 52.

    Fibonacci Without JIT: 8.3s With JIT: 2.7s x3 improvement function

    fibonacci($n){ return(($n < 2) ? 1 : fibonacci($n - 2) + fibonacci($n - 1)); } $start = microtime(true); fibonacci(40); $stop = microtime(true); echo sprintf("Time: %s\n", $stop - $start);
  38. 54.

    Composer Benchmark composer update on Akeneo PIM Enterprise Edition Without

    JIT: 53s With JIT: Oops... JIT can have an effect on application behavior Your requirements could not be resolved to an installable set of packages Problem 1 - akeneo/pim-community-dev 3.2.x-dev requires doctrine/annotations 1.6.0 -> satisfiable by doctrine/annotations[v1.6.0]. - Conclusion: don't install doctrine/annotations v1.1.2| remove doctrine/annotations v1.6.0
  39. 56.

    Wordpress Front Page Without JIT: 190 requests/s with JIT CTRO

    1235: 160 requests/s with JIT CTRO 1225: 189 requests/s
  40. 58.

    IO Bound Vs CPU Bound In general, application perf limited

    either by CPU or by IO (database, network, disk, etc...) Most Of The Time, PHP Applications Are IO Bound
  41. 59.

    Native Functions Are Very Fast PHP is maybe the fastest

    scripting language Large number of native functions. All written in C. Native functions already natively compiled to machine code.
  42. 60.

    GFX PHP Pure PHP image manipulation library 1.4MB PNG rescale

    20x Without JIT: 52s With JIT: 38s 27% faster
  43. 61.

    PHP Engine Dev State Estimated 5 millions PHP devs worldwide

    4 active devs on PHP Zend Engine C makes it dif cult for PHP devs to work on it github.com/php/php-src/graphs/contributors
  44. 62.

    What If PHP Internal Could Be Written In... PHP If

    JIT Could Make PHP As Fast As C
  45. 63.

    GFX PHP JIT Vs GD 1.4MB PNG rescale 20x Without

    JIT: 52s With JIT: 38s Same with PHP+GD: 0.9s
  46. 68.

    Platform Support Current JIT supports only x86 and x86_64 on

    Linux, MacOSX and Windows DynASM supports more CPUs, but work needed on Zend Engine side
  47. 70.

    Potentially Different Bugs Depending On The JIT Con guration Pro

    ling con guration, triggering conditions, @jit tagged functions...
  48. 73.

    Key Takeaways Don't expect the moon from JIT... ...but there's

    still a long way to go Test for your workload Look at PHP 7.4 preload for performances