Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HHVM at Etsy

Dan Miller
February 20, 2015

HHVM at Etsy

In 2014 Etsy’s infrastructure group was handed a challenge: scale Etsy’s API cluster 20x. Many efforts were simultaneously undertaken to meet this challenge, including a migration to HHVM after it showed a promising 5x increase in throughput. While getting our code to run on HHVM was easy, working through the deployment and operationalization proved to be a more difficult challenge.

This was presented at the PHP UK 2015 Conference.

Dan Miller

February 20, 2015
Tweet

More Decks by Dan Miller

Other Decks in Programming

Transcript

  1. HHVM at Etsy
    Harder, Better, Faster, Stronger
    Dan Miller
    Core Platform Engineer
    Etsy

    View full-size slide

  2. @jazzdan
    http://bit.ly/hhvm_etsy

    View full-size slide

  3. The World’s Handmade Marketplace

    View full-size slide

  4. Time
    Deploys
    About 60 deploys per day

    View full-size slide

  5. @jazzdan
    Overview
    • What is HHVM?
    • Why were we interested?
    • How did we migrate? What problems did we encounter?
    • What else can it do?
    • The Future

    View full-size slide

  6. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10

    View full-size slide

  7. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10
    • Linux
    • Apache
    • MySQL
    • PHP

    View full-size slide

  8. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10
    • Linux
    • Apache
    • MySQL
    • PHP
    • Couldn’t keep up with traffic growth

    View full-size slide

  9. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10

    View full-size slide

  10. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10
    • HipHop (HPHP)
    • Compile PHP to C++
    • Deploy binary
    • Separate development environment

    View full-size slide

  11. @jazzdan
    HipHop Virtual Machine
    (HHVM)

    View full-size slide

  12. @jazzdan
    HHVM is not a
    source transformer
    (That was HPHPc)

    View full-size slide

  13. Webserver PHP
    Backend
    Services
    FastCGI
    index.php
    Logger.php
    Tpl.php

    View full-size slide

  14. Webserver HHVM
    Backend
    Services
    FastCGI
    index.php
    Logger.php
    Tpl.php

    View full-size slide

  15. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10

    View full-size slide

  16. @jazzdan
    HHVM is Open Source
    • Internal diffs developed in the open
    • Included in Linux distros
    • Over 2000 bugs opened and closed
    • Over 1000 pull requests accepted

    View full-size slide

  17. @jazzdan
    HHVM is Open Source
    1. google.com
    2. facebook.com
    3. youtube.com
    4. yahoo.com
    5. baidu.com
    6. amazon.com
    7. wikipedia.org
    8. twitter.com
    9. taobao.com
    10. qq.com

    View full-size slide

  18. @jazzdan
    HHVM is Open Source
    1. google.com
    2. facebook.com
    3. youtube.com
    4. yahoo.com
    5. baidu.com
    6. amazon.com
    7. wikipedia.org
    8. twitter.com
    9. taobao.com
    10. qq.com

    View full-size slide

  19. @jazzdan
    HHVM is Compatible* with PHP
    • 60% of PHP unit tests fail
    • Missing Extensions
    • Different error message output
    • 20 of the top PHP projects on GitHub do pass
    • 97% of unit tests pass among top 50 projects on GitHub

    View full-size slide

  20. @jazzdan
    HHVM is Compatible* with PHP
    • 99% of Etsy unit tests pass
    • (20 suite failures/1,798 test suites)
    Fail
    20
    Pass
    1,798

    View full-size slide

  21. @jazzdan
    HHVM is Faster

    View full-size slide

  22. @jazzdan
    “Between 3x-6x Faster”

    View full-size slide

  23. @jazzdan
    (for Facebook)

    View full-size slide

  24. Model
    Controller
    API
    Business Logic
    A
    Business Logic
    B
    Business Logic
    A

    View full-size slide

  25. Model
    API v3
    Business Logic

    View full-size slide

  26. @jazzdan
    “Bespoke” Endpoints
    • Specific to a view
    • Aggregate REST endpoints concurrently
    • Return bespoke response

    View full-size slide

  27. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View full-size slide

  28. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View full-size slide

  29. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View full-size slide

  30. @jazzdan
    Overview
    Related
    Listing
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Seller

    View full-size slide

  31. @jazzdan
    curl_multi_*
    curl_multi_init();  
    curl_multi_add();  
    curl_multi_exec();  
    while(!$done)  {  
           curl_multi_select();  
           curl_multi_exec();  
           curl_multi_info_read();  
    }

    View full-size slide

  32. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View full-size slide

  33. API Traffic Web Traffic

    View full-size slide

  34. @jazzdan
    Lower latency
    = more RPS/box

    View full-size slide

  35. @jazzdan
    More RPS/box
    = Fewer boxes

    View full-size slide

  36. @jazzdan
    Fewer boxes
    = less $/datacenter

    View full-size slide

  37. @jazzdan
    Fewer boxes
    =

    View full-size slide

  38. $scp  -­‐r  rlerdorf.vms.etsy.com:/home/rlerdorf/hhvm-­‐root  .

    View full-size slide

  39. @jazzdan
    Missing memcached
    constants?

    View full-size slide

  40. @jazzdan
    /*Missing memcached
    constants*/

    View full-size slide

  41. @jazzdan
    Missing geoip
    extension?

    View full-size slide

  42. @jazzdan
    /*Missing geoip
    extension?*/

    View full-size slide

  43. @jazzdan
    Missing msgpack
    extension?

    View full-size slide

  44. @jazzdan
    Login doesn’t need to
    work for now

    View full-size slide

  45. @jazzdan
    Ran some benchmarks

    View full-size slide

  46. @jazzdan
    It’s faster

    View full-size slide

  47. @jazzdan
    Way faster.

    View full-size slide

  48. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  49. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  50. @jazzdan
    Time to Compile HHVM

    View full-size slide

  51. @jazzdan
    How hard can it be?

    View full-size slide

  52. @jazzdan
    September 2nd, 2014:
    “Started compiling
    HHVM”

    View full-size slide

  53. @jazzdan
    September 23rd, 2014:
    “strace’d cmake”

    View full-size slide

  54. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  55. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  56. @jazzdan
    ( $git clean -fdx )

    View full-size slide

  57. @jazzdan
    September 24th, 2014:
    “GOT IT TO COMPILE”

    View full-size slide

  58. @jazzdan
    September 25th, 2014:
    “BUILT AN RPM”

    View full-size slide

  59. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  60. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  61. libs=$(ldd  "${hhvm_binary}"  |  awk  '{print  $3}'  |  grep  -­‐v  '^$'

    View full-size slide

  62. @jazzdan
    September 30th, 2014:
    “Fix the HHVM rpm we
    made last week that broke
    yum on every prod box”

    View full-size slide

  63. @jazzdan
    Had to Upgrade
    • gcc
    • libmcrypt
    • gmp
    • mpfr
    • mpc
    • glog
    • jemalloc
    • tbb
    • libdwarf
    • libmemcached
    • libc
    • cmake
    • libcurl
    • more

    View full-size slide

  64. Idea
    Code
    Release
    Time

    View full-size slide

  65. Idea
    Code
    Release
    Idea
    Code
    Release
    A/B Test

    View full-size slide

  66. Idea
    Code
    Release
    Idea
    Code
    A/B Test
    Release
    “Oh crap…”
    <- Wasted effort (maybe)

    View full-size slide

  67. Idea
    Code
    Release
    Idea
    Code
    A/B Test
    Release
    Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release

    View full-size slide

  68. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    <- Possibly quite crappy

    View full-size slide

  69. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    <- Possibly quite crappy
    <- Make it less crappy here

    View full-size slide

  70. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    Can HHVM run etsy.com?

    View full-size slide

  71. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    Can HHVM run etsy.com?
    Is it faster?

    View full-size slide

  72. API Traffic Web Traffic

    View full-size slide

  73. Run synthetic
    benchmarks

    View full-size slide

  74. Response Time as Load Increases
    Response Time
    0
    1000
    2000
    3000
    4000
    Requests per Second
    10 30 50 70 90 110 130 150 170 190 210 230 250 270
    HHVM PHP 5.4

    View full-size slide

  75. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  76. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  77. Run an experiment

    View full-size slide

  78. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    Can HHVM run Etsy’s internal API?
    Is it faster?
    Time to fix the problems we skipped

    View full-size slide

  79. @jazzdan
    How do we gain
    more confidence?

    View full-size slide

  80. @jazzdan
    … and also
    validate our hypothesis?

    View full-size slide

  81. Load Balancer
    API
    API-HHVM
    API-TEST

    View full-size slide

  82. Load Balancer
    API
    API-HHVM
    API-TEST

    View full-size slide

  83. Load Balancer
    API
    API-HHVM
    API-TEST

    View full-size slide

  84. @jazzdan
    Infrastructure
    experiments are hard

    View full-size slide

  85. @jazzdan
    Same hardware

    View full-size slide

  86. @jazzdan
    Same traffic profile

    View full-size slide

  87. @jazzdan
    Same hacks

    View full-size slide

  88. @jazzdan
    Both Machines
    • Read Only MySQL Interface
    • Read Only memcached Interface
    • Read Only Redis interface
    • iptables blocking almost all the things
    • No log forwarding

    View full-size slide

  89. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  90. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  91. HHVM CPU 14 hour Zend CPU 14 hour
    140 rps peak 140 rps peak

    View full-size slide

  92. @jazzdan
    “Between 3x-6x Faster”

    View full-size slide

  93. #hhvm
    Repo Authoritative - Extra 20%
    • Produce bytecode SQLite database in advance
    • Build include map
    • Statically resolve file paths
    • Do non-type related optimizations at compile time

    View full-size slide

  94. Time
    Deploys
    About 60 deploys per day

    View full-size slide

  95. @jazzdan
    What about writing data?

    View full-size slide

  96. Employee Only Traffic

    View full-size slide

  97. @jazzdan
    memcached

    View full-size slide

  98. Memcached operation failed (returned false)
    when decrementing KEY

    View full-size slide

  99. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  100. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  101. @jazzdan
    All get()s were
    returning false

    View full-size slide

  102. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  103. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View full-size slide

  104. @jazzdan
    Takeaway:

    View full-size slide

  105. @jazzdan
    HHVM is rock solid

    View full-size slide

  106. @jazzdan
    Extensions
    sometimes have bugs

    View full-size slide

  107. Slow Ramp Up

    View full-size slide

  108. [23/janv./2015:22:40:32 +0000]

    View full-size slide

  109. [23/ 1⽉月/2015:23:37:56]

    View full-size slide

  110. request 1
    setlocale(“a”) request 2
    setlocale(“b”)
    strftime()
    strftime()
    Time

    View full-size slide

  111. @jazzdan
    Solution:
    newlocale()/uselocale()

    View full-size slide

  112. @jazzdan
    Takeaway:

    View full-size slide

  113. @jazzdan
    HHVM is threaded

    View full-size slide

  114. HHVM
    Average Response Time
    12 Hour
    Zend
    Average Response Time
    12 Hour

    View full-size slide

  115. HHVM
    p95 Response Time
    12 Hour
    Zend
    p95 Response Time
    12 Hour

    View full-size slide

  116. HHVM vs PHP 5.5 on Etsy Internal API
    Median
    p95
    p99
    Response Time in Milliseconds
    0 200 400 600 800
    HHVM PHP

    View full-size slide

  117. What else can HHVM do?

    View full-size slide

  118. @jazzdan
    Flame Graphs

    View full-size slide

  119. @jazzdan
    HHVM Debugger

    View full-size slide

  120. @jazzdan
    pfff

    View full-size slide

  121. @jazzdan
    sgrep

    View full-size slide

  122. #hhvm
    sgrep: Problem
    Find all invocations of foo() where the
    second argument is 1, with any number of
    arguments after

    View full-size slide

  123. #hhvm
    sgrep: Solution
    $ sgrep -e 'foo(X, 1, ...)' *.php

    View full-size slide

  124. #hhvm
    sgrep: Problem
    Find all && where both operands are the same

    View full-size slide

  125. #hhvm
    sgrep: Solution
    $ sgrep -e 'X && X' *.php

    View full-size slide

  126. #hhvm
    sgrep: Solution
    $ sgrep -pvar X -e 'X && X' *.php

    View full-size slide

  127. #hhvm
    sgrep: Problem
    Find all calls to foo() where the first
    argument is 1

    View full-size slide

  128. #hhvm
    sgrep: Solution
    $ sgrep -e 'foo(1, ...)' *.php

    View full-size slide

  129. #hhvm
    sgrep: Problem
    Find all method calls “addPreparable()”
    with any number of arguments

    View full-size slide

  130. #hhvm
    sgrep: Solution
    $ sgrep -e 'X->addPreparable(...)'

    View full-size slide

  131. @jazzdan
    spatch

    View full-size slide

  132. #hhvm
    spatch: Problem
    Remove the second argument from all
    invocations of function foo()

    View full-size slide

  133. #hhvm
    spatch: Solution
    //remove_second_arg_foo.spatch
    foo(X
    - ,Y
    )

    View full-size slide

  134. #hhvm
    spatch: Solution
    $ spatch -f remove_second_arg_foo.spatch
    *.php

    View full-size slide

  135. #hhvm
    spatch: Problem
    Rename a function with a variable number of
    arguments

    View full-size slide

  136. #hhvm
    spatch: Solution
    - foo
    + bar
    (...)

    View full-size slide

  137. @jazzdan
    perf(1)

    View full-size slide

  138. @jazzdan
    perf(1)
    - 12.09% HPHP::f_sort
    + PHP::…getAllReplicantNames

    View full-size slide

  139. Lessons Learned

    View full-size slide

  140. @jazzdan
    If you’re running an old
    operating system…

    View full-size slide

  141. @jazzdan
    …you’re gonna have
    a bad time.

    View full-size slide

  142. @jazzdan
    Tee’ing traffic
    is a superpower

    View full-size slide

  143. @jazzdan
    HHVM is Rock Solid

    View full-size slide

  144. @jazzdan
    Extensions
    Sometimes Aren't

    View full-size slide

  145. @jazzdan
    Threads are Hard

    View full-size slide

  146. @jazzdan
    Tooling is Powerful

    View full-size slide

  147. @jazzdan
    Lessons Learned
    • Do:
    • Run a newer Linux distribution
    • Ramp up slowly
    • Don’t:
    • Trust that extensions are 100%
    • Assume that processes are like threads

    View full-size slide

  148. @jazzdan
    What does
    the future hold?

    View full-size slide

  149. @jazzdan
    No one knows

    View full-size slide

  150. @jazzdan
    But now we are
    better prepared

    View full-size slide

  151. @jazzdan
    Questions?
    ( https://joind.in/13384 )

    View full-size slide