Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PHP 8 and Just In Time Compilation

PHP 8 and Just In Time Compilation

PHP 7 already brought a real performance gain. But PHP 8 is trying to even go further
by integrating a Just In Time Compiler.

Just in Time Compilation is a way to turn the PHP OpCode into machine language that can be
run directly on the processor, in order to achieve even better performances.

The aim of this talk is to dive into the JIT technology chosen by the Zend Engine development team,
as well as to present some performances benchmarks on Symfony applications.

Benoit Jacquemont

May 16, 2019
Tweet

More Decks by Benoit Jacquemont

Other Decks in Programming

Transcript

  1. PHP 8 &
    PHP 8 &
    PHP 8 &
    JIT Compilation
    JIT Compilation
    JIT Compilation
    Benoit Jacquemont
    Benoit Jacquemont
    Benoit Jacquemont
    @bjacquemont
    @bjacquemont
    @bjacquemont

    View Slide

  2. PHP Perf Evolution
    Source: https://kinsta.com/blog/php-benchmarks/

    View Slide

  3. How To Get Even
    Further?
    JIT As The Next Frontier?

    View Slide

  4. What's JIT?

    View Slide

  5. Just In Time Compilation
    It's a way of executing computer code that involves
    compilation at run time rather than prior to execution.
    Expectation
    Compiled code speed > > Interpreted code speed

    View Slide

  6. Platforms With JIT
    Java with the Hotspot JVM
    .NET with the Common Language Runtime
    NodeJS with V8
    ...

    View Slide

  7. Once upon a time,
    there were
    PHP and JIT...

    View Slide

  8. View Slide

  9. View Slide

  10. Compilation Is A Very
    CPU Intensive Process
    10 minutes to Compile Zend Engine
    1 hour and half to compile the Linux kernel

    View Slide

  11. So, You Want To Make Your Code
    Execute Faster By Compiling Stuff
    During Execution?

    View Slide

  12. Fast Compilation Time
    And
    Best Bene ts From Compilation
    Compile Only The Most
    Executed Code
    Less code to compile = time spent on compilation
    Most used code compiled = relevant performance
    improvements

    View Slide

  13. How To Know What Is The Most
    Executed Code Parts?
    Add A Pro ler Into The
    Mix...

    View Slide

  14. JIT Standard Work ow
    Initial code
    ⤚ ⚙ syntax validation + compilation ⚙ →
    intermediate representation
    ⤚ ⚙ execution + pro ling ⚙ →
    selection of most used code
    ⤚ ⚙ compilation to native code ⚙ →
    native code for most used code
    ➠ execution on the processor

    View Slide

  15. Executing Native Code
    The Hardware Problem
    Native means built for a processor instructions set.
    And there's more than one...
    x86 x86_64 ARM MIPS RISC V

    View Slide

  16. Executing Native Code
    The OS Problem
    The OS controls what is executed.
    Should work on Linux, but as well as Windows, MacOS
    and BSDs, 32bits and 64bits...

    View Slide

  17. PHP JIT Requirements
    an internal pro ler
    very fast compilers from Opcode to:
    x86
    x86_64
    ARM
    MIPS
    ...
    Multi-OS support

    View Slide

  18. Introducing
    Introducing
    Introducing
    DynASM
    DynASM
    DynASM
    Avoiding the Not Invented Here syndrom

    View Slide

  19. DynASM
    DynASM is a Dynamic Assembler Developped for
    LuaJIT

    View Slide

  20. DynASM Is A Generic
    Assembler!

    View Slide

  21. Without DynASM
    Need to generate each of the following
    x86_64 ARM MIPS
    $i++;
    mov ebx, 0x1234h
    mov eax, [ebx]
    inc eax
    mov [ebx], eax
    MOV R0, (#0x1234h)
    ADD R0, R0, #1
    MOV (#0x1234h), R0
    lw $t0,0x1234h
    addw $t0,$t0,1
    sw $t0,0x1234h

    View Slide

  22. With DynASM
    Only need to generate one assembly code
    DynASM will generate the native code for the target
    x86_64 ARM MIPS
    $i++;
    mov $0x1234, %rdi
    inc %rdi
    mov %rdi, $0x1234
    mov ebx, 0x1234h
    mov eax, [ebx]
    inc eax
    mov [ebx], eax
    MOV R0, (#0x1234h)
    ADD R0, R0, #1
    MOV (#0x1234h), R0
    lw $t0,0x1234h
    addw $t0,$t0,1
    sw $t0,0x1234h

    View Slide

  23. DynASM Is Fast
    Very fast and lightweight "assembler".
    (100x faster than LLVM)

    View Slide

  24. JIT Compilation & PHP
    Instead of developing multiple compilers:
    PHP opcode to x86
    PHP opcode to x86_64
    PHP opcode to ARM
    PHP opcode to MIPS
    ...
    Only need:
    PHP opcode to DynASM Assembly

    View Slide

  25. PHP JIT Work ow
    PHP code
    ⤚ ⚙ syntax validation + compilation ⚙ →
    opcode
    ⤚ ⚙ execution + pro ling ⚙ →
    selection of most used code
    ⤚ ⚙ compilation to DynASM assembly ⚙ →
    DynASM Assembly
    ⤚ ⚙ compilation to native code (thanks DynASM!) ⚙ →
    Native code
    ➠ execution on the processor

    View Slide

  26. JIT
    Implementation
    In PHP8

    View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. PHP JIT Is An Extension Of The
    Opcode Cache

    View Slide

  37. View Slide

  38. View Slide

  39. View Slide

  40. Let's Compile!

    View Slide

  41. Hello World!
    Opcode
    echo "Hello world!";
    $_main:
    L0 (2): ECHO string("Hello world!")
    L1 (3): RETURN int(1)

    View Slide

  42. Hello World!
    DynASM Assembly
    echo "Hello world!";
    sub $0x10, %rsp
    mov %r15, (%r14)
    mov $0x40d29d48, %rdi
    mov $0xc, %rsi
    mov $php_output_write, %rax
    call *%rax
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    add $0x20, %r15
    add $0x10, %rsp
    mov $0x560d02b357b1, %rax
    call *%rax
    jmp (%r15)

    View Slide

  43. If
    Opcode
    $a = true;
    if ($a === true) {
    echo "Yes!";
    } else {
    echo "No!";
    }
    $_main:
    L0 (3): ASSIGN CV0 bool(true)
    L1 (5): T1 = IS_IDENTICAL CV0 bool(true)
    L2 (5): JMPZ T1 L5
    L3 (6): ECHO string("Yes!")
    L4 (10): RETURN int(1)
    L5 (8): ECHO string("No!")
    L6 (10): RETURN int(1)

    View Slide

  44. $a = true; if ($a === true) { echo "Yes!"; } else { echo "No!"; }
    sub $0x10, %rsp
    lea 0x50(%r14), %rdi
    cmp $0xa, 0x8(%rdi)
    jnz .L1
    mov (%rdi), %rdi
    cmp $0x0, 0x18(%rdi)
    jnz .L7
    add $0x8, %rdi
    .L1:
    test $0x1, 0x9(%rdi)
    jnz .L8
    .L2:
    mov $0x3, 0x8(%rdi)
    .L3:
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    lea 0x50(%r14), %rdi
    cmp $0xa, 0x8(%rdi)
    jnz .L4
    mov (%rdi), %rdi
    add $0x8, %rdi
    .L4:
    cmp $0x3, 0x8(%rdi)
    jz .L5
    jmp .L6
    .L5:
    add $0x60, %r15
    mov %r15, (%r14)
    mov $0x40af2d48, %rdi
    mov $0x4, %rsi
    mov $php_output_write, %rax
    call *%rax
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    add $0x20, %r15
    add $0x10, %rsp
    mov $0x559ee5a027b1, %rax
    call *%rax
    jmp (%r15)
    .L6:
    mov $0x4115d6a0, %r15
    mov %r15, (%r14)
    mov $0x40af2d70, %rdi
    mov $0x3, %rsi
    mov $php_output_write, %rax
    call *%rax
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    add $0x20, %r15
    add $0x10, %rsp
    mov $0x559ee5a027b1, %rax
    call *%rax
    jmp (%r15)
    .L7:
    mov $0x4115d5c0, %rsi
    mov $zend_jit_assign_const_to_typed_ref, %rax
    call *%rax
    jmp .L3
    .L8:
    mov (%rdi), %rax
    sub $0x1, (%rax)
    jnz .L9
    mov %rax, (%rsp)
    mov $0x3, 0x8(%rdi)
    mov (%rsp), %rdi
    mov %r15, (%r14)
    mov $rc_dtor_func, %rax
    call *%rax
    jmp .L3
    .L9:
    mov (%rdi), %rax
    mov 0x4(%rax), %eax
    and $0xfffffc10, %eax
    cmp $0x10, %eax
    jnz .L2
    mov %rdi, (%rsp)
    mov (%rdi), %rdi
    mov $gc_possible_root, %rax
    call *%rax
    mov (%rsp), %rdi
    jmp .L2

    View Slide

  45. JIT
    Con guration

    View Slide

  46. JIT Buffer
    At 0 , JIT disabled (default value)
    opcache.jit_buffer_size=100M

    View Slide

  47. JIT Controls Aka CRTO
    C: CPU Optimization
    0 - none
    1 - enable AVX instruction generation
    R: Register Allocation
    0 - don't perform register allocation
    1 - use local liner-scan register allocator
    2 - use global liner-scan register allocator
    T: JIT Trigger
    0 - JIT all functions on rst script load
    1 - JIT function on rst execution
    2 - Pro le on rst request and compile hot functions on second request
    3 - Pro le on the y and compile hot functions
    4 - Compile functions with @jit tag in doc-comments
    O: Optimization level
    0 - don't JIT
    1 - minimal JIT (call standard VM handlers)
    2 - selective VM handler inlining
    3 - optimized JIT based on static type inference of individual function
    4 - optimized JIT based on static type inference and call tree
    5 - optimized JIT based on static type inference and inner procedure analyses
    opcache.jit=1235

    View Slide

  48. How To Run PHP8 JIT
    With Docker
    Compile It
    From /
    docker run akondas/php:8.0-cli-alpine \
    php -d zend_extension=opcache.so \
    -d opcache.enable_cli=1 \
    -d opcache.jit_buffer_size=100M \
    -d opcache.jit=1235
    github.com/zendtech/php-src/tree/jit-dynasm

    View Slide

  49. How To Dump DynASM
    Assembly
    opcache.jit_debug=1

    View Slide

  50. STOP WITH THE SUSPENS!
    SHOW ME THE
    PERFORMANCES!!

    View Slide

  51. Zend/Bench.Php
    Very basic bench available in the PHP source tree
    Without JIT: 0.567s
    With JIT: 0.130s
    x4 improvement

    View Slide

  52. Fibonacci
    Without JIT: 8.3s
    With JIT: 2.7s
    x3 improvement
    function fibonacci($n){
    return(($n < 2) ? 1 : fibonacci($n - 2) + fibonacci($n - 1));
    }
    $start = microtime(true);
    fibonacci(40);
    $stop = microtime(true);
    echo sprintf("Time: %s\n", $stop - $start);

    View Slide

  53. Real-Life Performance Results
    Aka Let's Crush Your Dreams

    View Slide

  54. Composer Benchmark
    composer update on Akeneo PIM Enterprise Edition
    Without JIT: 53s
    With JIT:
    Oops... JIT can have an effect on application behavior
    Your requirements could not be resolved to an installable set of packages
    Problem 1
    - akeneo/pim-community-dev 3.2.x-dev requires doctrine/annotations 1.6.0
    -> satisfiable by doctrine/annotations[v1.6.0].
    - Conclusion: don't install doctrine/annotations v1.1.2|
    remove doctrine/annotations v1.6.0

    View Slide

  55. Akeneo PIM EE
    Installation Time
    Without JIT: 2m34s
    with JIT: 2m37s

    View Slide

  56. Wordpress Front Page
    Without JIT: 190 requests/s
    with JIT CTRO 1235: 160 requests/s
    with JIT CTRO 1225: 189 requests/s

    View Slide

  57. But Why JIT Can't Give
    Us More Perfz?

    View Slide

  58. IO Bound Vs CPU Bound
    In general, application perf limited either
    by CPU or by IO (database, network, disk, etc...)
    Most Of The Time,
    PHP Applications Are IO Bound

    View Slide

  59. Native Functions Are
    Very Fast
    PHP is maybe the fastest scripting language
    Large number of native functions. All written in C.
    Native functions already natively compiled to machine
    code.

    View Slide

  60. GFX PHP
    Pure PHP image manipulation library
    1.4MB PNG rescale 20x
    Without JIT: 52s
    With JIT: 38s
    27% faster

    View Slide

  61. PHP Engine Dev State
    Estimated 5 millions PHP devs worldwide
    4 active devs on PHP Zend Engine
    C makes it dif cult for PHP devs to work on it
    github.com/php/php-src/graphs/contributors

    View Slide

  62. What If PHP Internal Could Be
    Written In... PHP
    If JIT Could Make PHP As Fast As C

    View Slide

  63. GFX PHP JIT Vs GD
    1.4MB PNG rescale 20x
    Without JIT: 52s
    With JIT: 38s
    Same with PHP+GD: 0.9s

    View Slide

  64. JIT
    JIT
    JIT
    The Dark Side
    The Dark Side
    The Dark Side

    View Slide

  65. Impact For The Zend
    Engine Developers

    View Slide

  66. Maintenance
    xing bugs of something that generates assembly code

    View Slide

  67. Stability
    JIT can introduce behavior differences

    View Slide

  68. Platform Support
    Current JIT supports only x86 and x86_64 on Linux,
    MacOSX and Windows
    DynASM supports more CPUs, but work needed on
    Zend Engine side

    View Slide

  69. JIT Impact For The PHP
    Developers

    View Slide

  70. Potentially Different
    Bugs Depending On The
    JIT Con guration
    Pro ling con guration, triggering conditions, @jit
    tagged functions...

    View Slide

  71. Debugger Support
    xdebug doesn't work (for now) with JIT enabled

    View Slide

  72. Pro ler Support
    BlackFire will need to be adapted to work with JIT

    View Slide

  73. Key Takeaways
    Don't expect the moon from JIT...
    ...but there's still a long way to go
    Test for your workload
    Look at PHP 7.4 preload for performances

    View Slide

  74. Thank You!
    Questions?
    For more information:
    @bjacquemont
    wiki.php.net/rfc/jit

    View Slide