Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HHVM at Etsy

Dan Miller
February 20, 2015

HHVM at Etsy

In 2014 Etsy’s infrastructure group was handed a challenge: scale Etsy’s API cluster 20x. Many efforts were simultaneously undertaken to meet this challenge, including a migration to HHVM after it showed a promising 5x increase in throughput. While getting our code to run on HHVM was easy, working through the deployment and operationalization proved to be a more difficult challenge.

This was presented at the PHP UK 2015 Conference.

Dan Miller

February 20, 2015
Tweet

More Decks by Dan Miller

Other Decks in Programming

Transcript

  1. HHVM at Etsy
    Harder, Better, Faster, Stronger
    Dan Miller
    Core Platform Engineer
    Etsy

    View Slide

  2. @jazzdan
    http://bit.ly/hhvm_etsy

    View Slide

  3. The World’s Handmade Marketplace

    View Slide

  4. View Slide

  5. Time
    Deploys
    About 60 deploys per day

    View Slide

  6. @jazzdan
    Overview
    • What is HHVM?
    • Why were we interested?
    • How did we migrate? What problems did we encounter?
    • What else can it do?
    • The Future

    View Slide

  7. View Slide

  8. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10

    View Slide

  9. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10
    • Linux
    • Apache
    • MySQL
    • PHP

    View Slide

  10. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10
    • Linux
    • Apache
    • MySQL
    • PHP
    • Couldn’t keep up with traffic growth

    View Slide

  11. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10

    View Slide

  12. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10
    • HipHop (HPHP)
    • Compile PHP to C++
    • Deploy binary
    • Separate development environment

    View Slide

  13. @jazzdan
    HipHop Virtual Machine
    (HHVM)

    View Slide

  14. @jazzdan
    HHVM is not a
    source transformer
    (That was HPHPc)

    View Slide

  15. Webserver PHP
    Backend
    Services
    FastCGI
    index.php
    Logger.php
    Tpl.php

    View Slide

  16. Webserver HHVM
    Backend
    Services
    FastCGI
    index.php
    Logger.php
    Tpl.php

    View Slide

  17. 2009 2015
    ‘14
    ‘13
    ‘12
    ‘11
    ‘10

    View Slide

  18. @jazzdan
    HHVM is Open Source
    • Internal diffs developed in the open
    • Included in Linux distros
    • Over 2000 bugs opened and closed
    • Over 1000 pull requests accepted

    View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. @jazzdan
    HHVM is Open Source
    1. google.com
    2. facebook.com
    3. youtube.com
    4. yahoo.com
    5. baidu.com
    6. amazon.com
    7. wikipedia.org
    8. twitter.com
    9. taobao.com
    10. qq.com

    View Slide

  23. @jazzdan
    HHVM is Open Source
    1. google.com
    2. facebook.com
    3. youtube.com
    4. yahoo.com
    5. baidu.com
    6. amazon.com
    7. wikipedia.org
    8. twitter.com
    9. taobao.com
    10. qq.com

    View Slide

  24. @jazzdan
    HHVM is Compatible* with PHP
    • 60% of PHP unit tests fail
    • Missing Extensions
    • Different error message output
    • 20 of the top PHP projects on GitHub do pass
    • 97% of unit tests pass among top 50 projects on GitHub

    View Slide

  25. @jazzdan
    HHVM is Compatible* with PHP
    • 99% of Etsy unit tests pass
    • (20 suite failures/1,798 test suites)
    Fail
    20
    Pass
    1,798

    View Slide

  26. @jazzdan
    HHVM is Faster

    View Slide

  27. @jazzdan
    “Between 3x-6x Faster”

    View Slide

  28. @jazzdan
    (for Facebook)

    View Slide

  29. View Slide

  30. Why?

    View Slide

  31. View Slide

  32. Model
    Controller
    API
    Business Logic
    A
    Business Logic
    B
    Business Logic
    A

    View Slide

  33. Model
    API v3
    Business Logic

    View Slide

  34. @jazzdan
    “Bespoke” Endpoints
    • Specific to a view
    • Aggregate REST endpoints concurrently
    • Return bespoke response

    View Slide

  35. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View Slide

  36. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View Slide

  37. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View Slide

  38. @jazzdan
    Overview
    Related
    Listing
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Seller

    View Slide

  39. View Slide

  40. @jazzdan
    curl_multi_*
    curl_multi_init();  
    curl_multi_add();  
    curl_multi_exec();  
    while(!$done)  {  
           curl_multi_select();  
           curl_multi_exec();  
           curl_multi_info_read();  
    }

    View Slide

  41. View Slide

  42. View Slide

  43. @jazzdan
    Bespoke Endpoints
    etsy.com/listing/124740565
    Bespoke Recent
    Shop
    Overview
    Related
    Seller
    Listing

    View Slide

  44. API Traffic Web Traffic

    View Slide

  45. View Slide

  46. View Slide

  47. @jazzdan
    Lower latency
    = more RPS/box

    View Slide

  48. @jazzdan
    More RPS/box
    = Fewer boxes

    View Slide

  49. View Slide

  50. @jazzdan
    Fewer boxes
    = less $/datacenter

    View Slide

  51. @jazzdan
    Fewer boxes
    =

    View Slide

  52. View Slide

  53. View Slide

  54. How?

    View Slide

  55. View Slide

  56. $scp  -­‐r  rlerdorf.vms.etsy.com:/home/rlerdorf/hhvm-­‐root  .

    View Slide

  57. @jazzdan
    Missing memcached
    constants?

    View Slide

  58. @jazzdan
    /*Missing memcached
    constants*/

    View Slide

  59. @jazzdan
    Missing geoip
    extension?

    View Slide

  60. @jazzdan
    /*Missing geoip
    extension?*/

    View Slide

  61. @jazzdan
    Missing msgpack
    extension?

    View Slide

  62. @jazzdan
    Login doesn’t need to
    work for now

    View Slide

  63. @jazzdan
    Etc

    View Slide

  64. @jazzdan
    Ran some benchmarks

    View Slide

  65. @jazzdan
    It’s faster

    View Slide

  66. @jazzdan
    Way faster.

    View Slide

  67. View Slide

  68. View Slide

  69. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  70. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  71. @jazzdan
    Time to Compile HHVM

    View Slide

  72. @jazzdan
    How hard can it be?

    View Slide

  73. @jazzdan
    September 2nd, 2014:
    “Started compiling
    HHVM”

    View Slide

  74. @jazzdan
    September 23rd, 2014:
    “strace’d cmake”

    View Slide

  75. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  76. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  77. @jazzdan
    ( $git clean -fdx )

    View Slide

  78. @jazzdan
    September 24th, 2014:
    “GOT IT TO COMPILE”

    View Slide

  79. @jazzdan
    September 25th, 2014:
    “BUILT AN RPM”

    View Slide

  80. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  81. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  82. libs=$(ldd  "${hhvm_binary}"  |  awk  '{print  $3}'  |  grep  -­‐v  '^$'

    View Slide

  83. @jazzdan
    September 30th, 2014:
    “Fix the HHVM rpm we
    made last week that broke
    yum on every prod box”

    View Slide

  84. 6

    View Slide

  85. @jazzdan
    Had to Upgrade
    • gcc
    • libmcrypt
    • gmp
    • mpfr
    • mpc
    • glog
    • jemalloc
    • tbb
    • libdwarf
    • libmemcached
    • libc
    • cmake
    • libcurl
    • more

    View Slide

  86. The Test

    View Slide

  87. View Slide

  88. Idea
    Code
    Release
    Time

    View Slide

  89. Idea
    Code
    Release
    Idea
    Code
    Release
    A/B Test

    View Slide

  90. Idea
    Code
    Release
    Idea
    Code
    A/B Test
    Release
    “Oh crap…”

    View Slide

  91. Idea
    Code
    Release
    Idea
    Code
    A/B Test
    Release
    Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release

    View Slide

  92. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release

    View Slide

  93. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release

    View Slide

  94. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    Can HHVM run etsy.com?

    View Slide

  95. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    Can HHVM run etsy.com?
    Is it faster?

    View Slide

  96. Scoping

    View Slide

  97. API Traffic Web Traffic

    View Slide

  98. Run synthetic
    benchmarks

    View Slide

  99. Response Time as Load Increases
    Response Time
    0
    1000
    2000
    3000
    4000
    Requests per Second
    10 30 50 70 90 110 130 150 170 190 210 230 250 270
    HHVM PHP 5.4

    View Slide

  100. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  101. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  102. Run an experiment

    View Slide

  103. Idea
    Validate
    Prototype
    A/B Test
    Refinement
    A/B Test
    Release
    Can HHVM run Etsy’s internal API?
    Is it faster?
    Time to fix the problems we skipped

    View Slide

  104. @jazzdan
    How do we gain
    more confidence?

    View Slide

  105. @jazzdan
    … and also
    validate our hypothesis?

    View Slide

  106. Tee Traffic

    View Slide

  107. Load Balancer
    API
    API-HHVM
    API-TEST

    View Slide

  108. Load Balancer
    API
    API-HHVM
    API-TEST

    View Slide

  109. Load Balancer
    API
    API-HHVM
    API-TEST

    View Slide

  110. @jazzdan
    Infrastructure
    experiments are hard

    View Slide

  111. @jazzdan
    Same hardware

    View Slide

  112. @jazzdan
    Same traffic profile

    View Slide

  113. @jazzdan
    Same hacks

    View Slide

  114. @jazzdan
    Both Machines
    • Read Only MySQL Interface
    • Read Only memcached Interface
    • Read Only Redis interface
    • iptables blocking almost all the things
    • No log forwarding

    View Slide

  115. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  116. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  117. HHVM CPU 14 hour Zend CPU 14 hour
    140 rps peak 140 rps peak

    View Slide

  118. @jazzdan
    “Between 3x-6x Faster”

    View Slide

  119. #hhvm
    Repo Authoritative - Extra 20%
    • Produce bytecode SQLite database in advance
    • Build include map
    • Statically resolve file paths
    • Do non-type related optimizations at compile time

    View Slide

  120. Time
    Deploys
    About 60 deploys per day

    View Slide

  121. @jazzdan
    What about writing data?

    View Slide

  122. Employee Only Traffic

    View Slide

  123. @jazzdan
    memcached

    View Slide

  124. Memcached operation failed (returned false)
    when decrementing KEY

    View Slide

  125. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  126. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  127. View Slide

  128. @jazzdan
    All get()s were
    returning false

    View Slide

  129. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  130. Happiness and Performance Correlation
    Time
    Response Time Happiness

    View Slide

  131. View Slide

  132. @jazzdan
    Takeaway:

    View Slide

  133. @jazzdan
    HHVM is rock solid

    View Slide

  134. @jazzdan
    Extensions
    sometimes have bugs

    View Slide

  135. Slow Ramp Up

    View Slide

  136. [23/janv./2015:22:40:32 +0000]

    View Slide

  137. [23/ 1⽉月/2015:23:37:56]

    View Slide

  138. request 1
    setlocale(“a”) request 2
    setlocale(“b”)
    strftime()
    strftime()
    Time

    View Slide

  139. @jazzdan
    Solution:
    newlocale()/uselocale()

    View Slide

  140. @jazzdan
    Takeaway:

    View Slide

  141. @jazzdan
    HHVM is threaded

    View Slide

  142. Release!

    View Slide

  143. View Slide

  144. HHVM
    Average Response Time
    12 Hour
    Zend
    Average Response Time
    12 Hour

    View Slide

  145. HHVM
    p95 Response Time
    12 Hour
    Zend
    p95 Response Time
    12 Hour

    View Slide

  146. HHVM vs PHP 5.5 on Etsy Internal API
    Median
    p95
    p99
    Response Time in Milliseconds
    0 200 400 600 800
    HHVM PHP

    View Slide

  147. What else can HHVM do?

    View Slide

  148. @jazzdan
    Flame Graphs

    View Slide

  149. View Slide

  150. @jazzdan
    HHVM Debugger

    View Slide

  151. View Slide

  152. @jazzdan
    pfff

    View Slide

  153. @jazzdan
    sgrep

    View Slide

  154. #hhvm
    sgrep: Problem
    Find all invocations of foo() where the
    second argument is 1, with any number of
    arguments after

    View Slide

  155. #hhvm
    sgrep: Solution
    $ sgrep -e 'foo(X, 1, ...)' *.php

    View Slide

  156. #hhvm
    sgrep: Problem
    Find all && where both operands are the same

    View Slide

  157. #hhvm
    sgrep: Solution
    $ sgrep -e 'X && X' *.php

    View Slide

  158. #hhvm
    sgrep: Solution
    $ sgrep -pvar X -e 'X && X' *.php

    View Slide

  159. #hhvm
    sgrep: Problem
    Find all calls to foo() where the first
    argument is 1

    View Slide

  160. #hhvm
    sgrep: Solution
    $ sgrep -e 'foo(1, ...)' *.php

    View Slide

  161. #hhvm
    sgrep: Problem
    Find all method calls “addPreparable()”
    with any number of arguments

    View Slide

  162. #hhvm
    sgrep: Solution
    $ sgrep -e 'X->addPreparable(...)'

    View Slide

  163. @jazzdan
    spatch

    View Slide

  164. #hhvm
    spatch: Problem
    Remove the second argument from all
    invocations of function foo()

    View Slide

  165. #hhvm
    spatch: Solution
    //remove_second_arg_foo.spatch
    foo(X
    - ,Y
    )

    View Slide

  166. #hhvm
    spatch: Solution
    $ spatch -f remove_second_arg_foo.spatch
    *.php

    View Slide

  167. #hhvm
    spatch: Problem
    Rename a function with a variable number of
    arguments

    View Slide

  168. #hhvm
    spatch: Solution
    - foo
    + bar
    (...)

    View Slide

  169. @jazzdan
    perf(1)

    View Slide

  170. View Slide

  171. View Slide

  172. @jazzdan
    perf(1)
    - 12.09% HPHP::f_sort
    + PHP::…getAllReplicantNames

    View Slide

  173. Lessons Learned

    View Slide

  174. @jazzdan
    If you’re running an old
    operating system…

    View Slide

  175. @jazzdan
    …you’re gonna have
    a bad time.

    View Slide

  176. @jazzdan
    Tee’ing traffic
    is a superpower

    View Slide

  177. @jazzdan
    HHVM is Rock Solid

    View Slide

  178. @jazzdan
    Extensions
    Sometimes Aren't

    View Slide

  179. @jazzdan
    Threads are Hard

    View Slide

  180. @jazzdan
    Tooling is Powerful

    View Slide

  181. @jazzdan
    Lessons Learned
    • Do:
    • Run a newer Linux distribution
    • Ramp up slowly
    • Don’t:
    • Trust that extensions are 100%
    • Assume that processes are like threads

    View Slide

  182. The Future

    View Slide

  183. View Slide

  184. View Slide

  185. View Slide

  186. View Slide

  187. View Slide

  188. View Slide

  189. @jazzdan
    What does
    the future hold?

    View Slide

  190. @jazzdan
    No one knows

    View Slide

  191. @jazzdan
    But now we are
    better prepared

    View Slide

  192. View Slide

  193. @jazzdan
    Questions?
    ( https://joind.in/13384 )

    View Slide