Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PHP 8 and Just In Time Compilation

PHP 8 and Just In Time Compilation

PHP 7 already brought a real performance gain. But PHP 8 is trying to even go further
by integrating a Just In Time Compiler.

Just in Time Compilation is a way to turn the PHP OpCode into machine language that can be
run directly on the processor, in order to achieve even better performances.

The aim of this talk is to dive into the JIT technology chosen by the Zend Engine development team,
as well as to present some performances benchmarks on Symfony applications.

Benoit Jacquemont

May 16, 2019
Tweet

More Decks by Benoit Jacquemont

Other Decks in Programming

Transcript

  1. PHP 8 &
    PHP 8 &
    PHP 8 &
    JIT Compilation
    JIT Compilation
    JIT Compilation
    Benoit Jacquemont
    Benoit Jacquemont
    Benoit Jacquemont
    @bjacquemont
    @bjacquemont
    @bjacquemont

    View full-size slide

  2. PHP Perf Evolution
    Source: https://kinsta.com/blog/php-benchmarks/

    View full-size slide

  3. How To Get Even
    Further?
    JIT As The Next Frontier?

    View full-size slide

  4. Just In Time Compilation
    It's a way of executing computer code that involves
    compilation at run time rather than prior to execution.
    Expectation
    Compiled code speed > > Interpreted code speed

    View full-size slide

  5. Platforms With JIT
    Java with the Hotspot JVM
    .NET with the Common Language Runtime
    NodeJS with V8
    ...

    View full-size slide

  6. Once upon a time,
    there were
    PHP and JIT...

    View full-size slide

  7. Compilation Is A Very
    CPU Intensive Process
    10 minutes to Compile Zend Engine
    1 hour and half to compile the Linux kernel

    View full-size slide

  8. So, You Want To Make Your Code
    Execute Faster By Compiling Stuff
    During Execution?

    View full-size slide

  9. Fast Compilation Time
    And
    Best Bene ts From Compilation
    Compile Only The Most
    Executed Code
    Less code to compile = time spent on compilation
    Most used code compiled = relevant performance
    improvements

    View full-size slide

  10. How To Know What Is The Most
    Executed Code Parts?
    Add A Pro ler Into The
    Mix...

    View full-size slide

  11. JIT Standard Work ow
    Initial code
    ⤚ ⚙ syntax validation + compilation ⚙ →
    intermediate representation
    ⤚ ⚙ execution + pro ling ⚙ →
    selection of most used code
    ⤚ ⚙ compilation to native code ⚙ →
    native code for most used code
    ➠ execution on the processor

    View full-size slide

  12. Executing Native Code
    The Hardware Problem
    Native means built for a processor instructions set.
    And there's more than one...
    x86 x86_64 ARM MIPS RISC V

    View full-size slide

  13. Executing Native Code
    The OS Problem
    The OS controls what is executed.
    Should work on Linux, but as well as Windows, MacOS
    and BSDs, 32bits and 64bits...

    View full-size slide

  14. PHP JIT Requirements
    an internal pro ler
    very fast compilers from Opcode to:
    x86
    x86_64
    ARM
    MIPS
    ...
    Multi-OS support

    View full-size slide

  15. Introducing
    Introducing
    Introducing
    DynASM
    DynASM
    DynASM
    Avoiding the Not Invented Here syndrom

    View full-size slide

  16. DynASM
    DynASM is a Dynamic Assembler Developped for
    LuaJIT

    View full-size slide

  17. DynASM Is A Generic
    Assembler!

    View full-size slide

  18. Without DynASM
    Need to generate each of the following
    x86_64 ARM MIPS
    $i++;
    mov ebx, 0x1234h
    mov eax, [ebx]
    inc eax
    mov [ebx], eax
    MOV R0, (#0x1234h)
    ADD R0, R0, #1
    MOV (#0x1234h), R0
    lw $t0,0x1234h
    addw $t0,$t0,1
    sw $t0,0x1234h

    View full-size slide

  19. With DynASM
    Only need to generate one assembly code
    DynASM will generate the native code for the target
    x86_64 ARM MIPS
    $i++;
    mov $0x1234, %rdi
    inc %rdi
    mov %rdi, $0x1234
    mov ebx, 0x1234h
    mov eax, [ebx]
    inc eax
    mov [ebx], eax
    MOV R0, (#0x1234h)
    ADD R0, R0, #1
    MOV (#0x1234h), R0
    lw $t0,0x1234h
    addw $t0,$t0,1
    sw $t0,0x1234h

    View full-size slide

  20. DynASM Is Fast
    Very fast and lightweight "assembler".
    (100x faster than LLVM)

    View full-size slide

  21. JIT Compilation & PHP
    Instead of developing multiple compilers:
    PHP opcode to x86
    PHP opcode to x86_64
    PHP opcode to ARM
    PHP opcode to MIPS
    ...
    Only need:
    PHP opcode to DynASM Assembly

    View full-size slide

  22. PHP JIT Work ow
    PHP code
    ⤚ ⚙ syntax validation + compilation ⚙ →
    opcode
    ⤚ ⚙ execution + pro ling ⚙ →
    selection of most used code
    ⤚ ⚙ compilation to DynASM assembly ⚙ →
    DynASM Assembly
    ⤚ ⚙ compilation to native code (thanks DynASM!) ⚙ →
    Native code
    ➠ execution on the processor

    View full-size slide

  23. JIT
    Implementation
    In PHP8

    View full-size slide

  24. PHP JIT Is An Extension Of The
    Opcode Cache

    View full-size slide

  25. Let's Compile!

    View full-size slide

  26. Hello World!
    Opcode
    echo "Hello world!";
    $_main:
    L0 (2): ECHO string("Hello world!")
    L1 (3): RETURN int(1)

    View full-size slide

  27. Hello World!
    DynASM Assembly
    echo "Hello world!";
    sub $0x10, %rsp
    mov %r15, (%r14)
    mov $0x40d29d48, %rdi
    mov $0xc, %rsi
    mov $php_output_write, %rax
    call *%rax
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    add $0x20, %r15
    add $0x10, %rsp
    mov $0x560d02b357b1, %rax
    call *%rax
    jmp (%r15)

    View full-size slide

  28. If
    Opcode
    $a = true;
    if ($a === true) {
    echo "Yes!";
    } else {
    echo "No!";
    }
    $_main:
    L0 (3): ASSIGN CV0 bool(true)
    L1 (5): T1 = IS_IDENTICAL CV0 bool(true)
    L2 (5): JMPZ T1 L5
    L3 (6): ECHO string("Yes!")
    L4 (10): RETURN int(1)
    L5 (8): ECHO string("No!")
    L6 (10): RETURN int(1)

    View full-size slide

  29. $a = true; if ($a === true) { echo "Yes!"; } else { echo "No!"; }
    sub $0x10, %rsp
    lea 0x50(%r14), %rdi
    cmp $0xa, 0x8(%rdi)
    jnz .L1
    mov (%rdi), %rdi
    cmp $0x0, 0x18(%rdi)
    jnz .L7
    add $0x8, %rdi
    .L1:
    test $0x1, 0x9(%rdi)
    jnz .L8
    .L2:
    mov $0x3, 0x8(%rdi)
    .L3:
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    lea 0x50(%r14), %rdi
    cmp $0xa, 0x8(%rdi)
    jnz .L4
    mov (%rdi), %rdi
    add $0x8, %rdi
    .L4:
    cmp $0x3, 0x8(%rdi)
    jz .L5
    jmp .L6
    .L5:
    add $0x60, %r15
    mov %r15, (%r14)
    mov $0x40af2d48, %rdi
    mov $0x4, %rsi
    mov $php_output_write, %rax
    call *%rax
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    add $0x20, %r15
    add $0x10, %rsp
    mov $0x559ee5a027b1, %rax
    call *%rax
    jmp (%r15)
    .L6:
    mov $0x4115d6a0, %r15
    mov %r15, (%r14)
    mov $0x40af2d70, %rdi
    mov $0x3, %rsi
    mov $php_output_write, %rax
    call *%rax
    mov $EG(exception), %rax
    cmp $0x0, (%rax)
    jnz JIT$$exception_handler
    add $0x20, %r15
    add $0x10, %rsp
    mov $0x559ee5a027b1, %rax
    call *%rax
    jmp (%r15)
    .L7:
    mov $0x4115d5c0, %rsi
    mov $zend_jit_assign_const_to_typed_ref, %rax
    call *%rax
    jmp .L3
    .L8:
    mov (%rdi), %rax
    sub $0x1, (%rax)
    jnz .L9
    mov %rax, (%rsp)
    mov $0x3, 0x8(%rdi)
    mov (%rsp), %rdi
    mov %r15, (%r14)
    mov $rc_dtor_func, %rax
    call *%rax
    jmp .L3
    .L9:
    mov (%rdi), %rax
    mov 0x4(%rax), %eax
    and $0xfffffc10, %eax
    cmp $0x10, %eax
    jnz .L2
    mov %rdi, (%rsp)
    mov (%rdi), %rdi
    mov $gc_possible_root, %rax
    call *%rax
    mov (%rsp), %rdi
    jmp .L2

    View full-size slide

  30. JIT
    Con guration

    View full-size slide

  31. JIT Buffer
    At 0 , JIT disabled (default value)
    opcache.jit_buffer_size=100M

    View full-size slide

  32. JIT Controls Aka CRTO
    C: CPU Optimization
    0 - none
    1 - enable AVX instruction generation
    R: Register Allocation
    0 - don't perform register allocation
    1 - use local liner-scan register allocator
    2 - use global liner-scan register allocator
    T: JIT Trigger
    0 - JIT all functions on rst script load
    1 - JIT function on rst execution
    2 - Pro le on rst request and compile hot functions on second request
    3 - Pro le on the y and compile hot functions
    4 - Compile functions with @jit tag in doc-comments
    O: Optimization level
    0 - don't JIT
    1 - minimal JIT (call standard VM handlers)
    2 - selective VM handler inlining
    3 - optimized JIT based on static type inference of individual function
    4 - optimized JIT based on static type inference and call tree
    5 - optimized JIT based on static type inference and inner procedure analyses
    opcache.jit=1235

    View full-size slide

  33. How To Run PHP8 JIT
    With Docker
    Compile It
    From /
    docker run akondas/php:8.0-cli-alpine \
    php -d zend_extension=opcache.so \
    -d opcache.enable_cli=1 \
    -d opcache.jit_buffer_size=100M \
    -d opcache.jit=1235
    github.com/zendtech/php-src/tree/jit-dynasm

    View full-size slide

  34. How To Dump DynASM
    Assembly
    opcache.jit_debug=1

    View full-size slide

  35. STOP WITH THE SUSPENS!
    SHOW ME THE
    PERFORMANCES!!

    View full-size slide

  36. Zend/Bench.Php
    Very basic bench available in the PHP source tree
    Without JIT: 0.567s
    With JIT: 0.130s
    x4 improvement

    View full-size slide

  37. Fibonacci
    Without JIT: 8.3s
    With JIT: 2.7s
    x3 improvement
    function fibonacci($n){
    return(($n < 2) ? 1 : fibonacci($n - 2) + fibonacci($n - 1));
    }
    $start = microtime(true);
    fibonacci(40);
    $stop = microtime(true);
    echo sprintf("Time: %s\n", $stop - $start);

    View full-size slide

  38. Real-Life Performance Results
    Aka Let's Crush Your Dreams

    View full-size slide

  39. Composer Benchmark
    composer update on Akeneo PIM Enterprise Edition
    Without JIT: 53s
    With JIT:
    Oops... JIT can have an effect on application behavior
    Your requirements could not be resolved to an installable set of packages
    Problem 1
    - akeneo/pim-community-dev 3.2.x-dev requires doctrine/annotations 1.6.0
    -> satisfiable by doctrine/annotations[v1.6.0].
    - Conclusion: don't install doctrine/annotations v1.1.2|
    remove doctrine/annotations v1.6.0

    View full-size slide

  40. Akeneo PIM EE
    Installation Time
    Without JIT: 2m34s
    with JIT: 2m37s

    View full-size slide

  41. Wordpress Front Page
    Without JIT: 190 requests/s
    with JIT CTRO 1235: 160 requests/s
    with JIT CTRO 1225: 189 requests/s

    View full-size slide

  42. But Why JIT Can't Give
    Us More Perfz?

    View full-size slide

  43. IO Bound Vs CPU Bound
    In general, application perf limited either
    by CPU or by IO (database, network, disk, etc...)
    Most Of The Time,
    PHP Applications Are IO Bound

    View full-size slide

  44. Native Functions Are
    Very Fast
    PHP is maybe the fastest scripting language
    Large number of native functions. All written in C.
    Native functions already natively compiled to machine
    code.

    View full-size slide

  45. GFX PHP
    Pure PHP image manipulation library
    1.4MB PNG rescale 20x
    Without JIT: 52s
    With JIT: 38s
    27% faster

    View full-size slide

  46. PHP Engine Dev State
    Estimated 5 millions PHP devs worldwide
    4 active devs on PHP Zend Engine
    C makes it dif cult for PHP devs to work on it
    github.com/php/php-src/graphs/contributors

    View full-size slide

  47. What If PHP Internal Could Be
    Written In... PHP
    If JIT Could Make PHP As Fast As C

    View full-size slide

  48. GFX PHP JIT Vs GD
    1.4MB PNG rescale 20x
    Without JIT: 52s
    With JIT: 38s
    Same with PHP+GD: 0.9s

    View full-size slide

  49. JIT
    JIT
    JIT
    The Dark Side
    The Dark Side
    The Dark Side

    View full-size slide

  50. Impact For The Zend
    Engine Developers

    View full-size slide

  51. Maintenance
    xing bugs of something that generates assembly code

    View full-size slide

  52. Stability
    JIT can introduce behavior differences

    View full-size slide

  53. Platform Support
    Current JIT supports only x86 and x86_64 on Linux,
    MacOSX and Windows
    DynASM supports more CPUs, but work needed on
    Zend Engine side

    View full-size slide

  54. JIT Impact For The PHP
    Developers

    View full-size slide

  55. Potentially Different
    Bugs Depending On The
    JIT Con guration
    Pro ling con guration, triggering conditions, @jit
    tagged functions...

    View full-size slide

  56. Debugger Support
    xdebug doesn't work (for now) with JIT enabled

    View full-size slide

  57. Pro ler Support
    BlackFire will need to be adapted to work with JIT

    View full-size slide

  58. Key Takeaways
    Don't expect the moon from JIT...
    ...but there's still a long way to go
    Test for your workload
    Look at PHP 7.4 preload for performances

    View full-size slide

  59. Thank You!
    Questions?
    For more information:
    @bjacquemont
    wiki.php.net/rfc/jit

    View full-size slide