HHVM at Etsy

3b544695362c1fd66dae6a2c46f6ff91?s=47 Dan Miller
February 20, 2015

HHVM at Etsy

In 2014 Etsy’s infrastructure group was handed a challenge: scale Etsy’s API cluster 20x. Many efforts were simultaneously undertaken to meet this challenge, including a migration to HHVM after it showed a promising 5x increase in throughput. While getting our code to run on HHVM was easy, working through the deployment and operationalization proved to be a more difficult challenge.

This was presented at the PHP UK 2015 Conference.

3b544695362c1fd66dae6a2c46f6ff91?s=128

Dan Miller

February 20, 2015
Tweet

Transcript

  1. HHVM at Etsy Harder, Better, Faster, Stronger Dan Miller Core

    Platform Engineer Etsy
  2. @jazzdan http://bit.ly/hhvm_etsy

  3. The World’s Handmade Marketplace

  4. None
  5. Time Deploys About 60 deploys per day

  6. @jazzdan Overview • What is HHVM? • Why were we

    interested? • How did we migrate? What problems did we encounter? • What else can it do? • The Future
  7. None
  8. 2009 2015 ‘14 ‘13 ‘12 ‘11 ‘10

  9. 2009 2015 ‘14 ‘13 ‘12 ‘11 ‘10 • Linux •

    Apache • MySQL • PHP
  10. 2009 2015 ‘14 ‘13 ‘12 ‘11 ‘10 • Linux •

    Apache • MySQL • PHP • Couldn’t keep up with traffic growth
  11. 2009 2015 ‘14 ‘13 ‘12 ‘11 ‘10

  12. 2009 2015 ‘14 ‘13 ‘12 ‘11 ‘10 • HipHop (HPHP)

    • Compile PHP to C++ • Deploy binary • Separate development environment
  13. @jazzdan HipHop Virtual Machine (HHVM)

  14. @jazzdan HHVM is not a source transformer (That was HPHPc)

  15. Webserver PHP Backend Services FastCGI index.php Logger.php Tpl.php

  16. Webserver HHVM Backend Services FastCGI index.php Logger.php Tpl.php

  17. 2009 2015 ‘14 ‘13 ‘12 ‘11 ‘10

  18. @jazzdan HHVM is Open Source • Internal diffs developed in

    the open • Included in Linux distros • Over 2000 bugs opened and closed • Over 1000 pull requests accepted
  19. None
  20. None
  21. None
  22. @jazzdan HHVM is Open Source 1. google.com 2. facebook.com 3.

    youtube.com 4. yahoo.com 5. baidu.com 6. amazon.com 7. wikipedia.org 8. twitter.com 9. taobao.com 10. qq.com
  23. @jazzdan HHVM is Open Source 1. google.com 2. facebook.com 3.

    youtube.com 4. yahoo.com 5. baidu.com 6. amazon.com 7. wikipedia.org 8. twitter.com 9. taobao.com 10. qq.com
  24. @jazzdan HHVM is Compatible* with PHP • 60% of PHP

    unit tests fail • Missing Extensions • Different error message output • 20 of the top PHP projects on GitHub do pass • 97% of unit tests pass among top 50 projects on GitHub
  25. @jazzdan HHVM is Compatible* with PHP • 99% of Etsy

    unit tests pass • (20 suite failures/1,798 test suites) Fail 20 Pass 1,798
  26. @jazzdan HHVM is Faster

  27. @jazzdan “Between 3x-6x Faster”

  28. @jazzdan (for Facebook)

  29. None
  30. Why?

  31. None
  32. Model Controller API Business Logic A Business Logic B Business

    Logic A
  33. Model API v3 Business Logic

  34. @jazzdan “Bespoke” Endpoints • Specific to a view • Aggregate

    REST endpoints concurrently • Return bespoke response
  35. @jazzdan Bespoke Endpoints etsy.com/listing/124740565 Bespoke Recent Shop Overview Related Seller

    Listing
  36. @jazzdan Bespoke Endpoints etsy.com/listing/124740565 Bespoke Recent Shop Overview Related Seller

    Listing
  37. @jazzdan Bespoke Endpoints etsy.com/listing/124740565 Bespoke Recent Shop Overview Related Seller

    Listing
  38. @jazzdan Overview Related Listing Bespoke Endpoints etsy.com/listing/124740565 Bespoke Recent Shop

    Seller
  39. None
  40. @jazzdan curl_multi_* <?php   curl_multi_init();   curl_multi_add();   curl_multi_exec();  

    while(!$done)  {          curl_multi_select();          curl_multi_exec();          curl_multi_info_read();   }
  41. None
  42. None
  43. @jazzdan Bespoke Endpoints etsy.com/listing/124740565 Bespoke Recent Shop Overview Related Seller

    Listing
  44. API Traffic Web Traffic

  45. None
  46. None
  47. @jazzdan Lower latency = more RPS/box

  48. @jazzdan More RPS/box = Fewer boxes

  49. None
  50. @jazzdan Fewer boxes = less $/datacenter

  51. @jazzdan Fewer boxes =

  52. None
  53. None
  54. How?

  55. None
  56. $scp  -­‐r  rlerdorf.vms.etsy.com:/home/rlerdorf/hhvm-­‐root  .

  57. @jazzdan Missing memcached constants?

  58. @jazzdan /*Missing memcached constants*/

  59. @jazzdan Missing geoip extension?

  60. @jazzdan /*Missing geoip extension?*/

  61. @jazzdan Missing msgpack extension?

  62. @jazzdan Login doesn’t need to work for now

  63. @jazzdan Etc

  64. @jazzdan Ran some benchmarks

  65. @jazzdan It’s faster

  66. @jazzdan Way faster.

  67. None
  68. None
  69. Happiness and Performance Correlation Time Response Time Happiness

  70. Happiness and Performance Correlation Time Response Time Happiness

  71. @jazzdan Time to Compile HHVM

  72. @jazzdan How hard can it be?

  73. @jazzdan September 2nd, 2014: “Started compiling HHVM”

  74. @jazzdan September 23rd, 2014: “strace’d cmake”

  75. Happiness and Performance Correlation Time Response Time Happiness

  76. Happiness and Performance Correlation Time Response Time Happiness

  77. @jazzdan ( $git clean -fdx )

  78. @jazzdan September 24th, 2014: “GOT IT TO COMPILE”

  79. @jazzdan September 25th, 2014: “BUILT AN RPM”

  80. Happiness and Performance Correlation Time Response Time Happiness

  81. Happiness and Performance Correlation Time Response Time Happiness

  82. libs=$(ldd  "${hhvm_binary}"  |  awk  '{print  $3}'  |  grep  -­‐v  '^$'

  83. @jazzdan September 30th, 2014: “Fix the HHVM rpm we made

    last week that broke yum on every prod box”
  84. 6

  85. @jazzdan Had to Upgrade • gcc • libmcrypt • gmp

    • mpfr • mpc • glog • jemalloc • tbb • libdwarf • libmemcached • libc • cmake • libcurl • more
  86. The Test

  87. None
  88. Idea Code Release Time

  89. Idea Code Release Idea Code Release A/B Test

  90. Idea Code Release Idea Code A/B Test Release “Oh crap…”

    <- Wasted effort (maybe)
  91. Idea Code Release Idea Code A/B Test Release Idea Validate

    Prototype A/B Test Refinement A/B Test Release
  92. Idea Validate Prototype A/B Test Refinement A/B Test Release <-

    Possibly quite crappy
  93. Idea Validate Prototype A/B Test Refinement A/B Test Release <-

    Possibly quite crappy <- Make it less crappy here
  94. Idea Validate Prototype A/B Test Refinement A/B Test Release Can

    HHVM run etsy.com?
  95. Idea Validate Prototype A/B Test Refinement A/B Test Release Can

    HHVM run etsy.com? Is it faster?
  96. Scoping

  97. API Traffic Web Traffic

  98. Run synthetic benchmarks

  99. Response Time as Load Increases Response Time 0 1000 2000

    3000 4000 Requests per Second 10 30 50 70 90 110 130 150 170 190 210 230 250 270 HHVM PHP 5.4
  100. Happiness and Performance Correlation Time Response Time Happiness

  101. Happiness and Performance Correlation Time Response Time Happiness

  102. Run an experiment

  103. Idea Validate Prototype A/B Test Refinement A/B Test Release Can

    HHVM run Etsy’s internal API? Is it faster? Time to fix the problems we skipped
  104. @jazzdan How do we gain more confidence?

  105. @jazzdan … and also validate our hypothesis?

  106. Tee Traffic

  107. Load Balancer API API-HHVM API-TEST

  108. Load Balancer API API-HHVM API-TEST

  109. Load Balancer API API-HHVM API-TEST

  110. @jazzdan Infrastructure experiments are hard

  111. @jazzdan Same hardware

  112. @jazzdan Same traffic profile

  113. @jazzdan Same hacks

  114. @jazzdan Both Machines • Read Only MySQL Interface • Read

    Only memcached Interface • Read Only Redis interface • iptables blocking almost all the things • No log forwarding
  115. Happiness and Performance Correlation Time Response Time Happiness

  116. Happiness and Performance Correlation Time Response Time Happiness

  117. HHVM CPU 14 hour Zend CPU 14 hour 140 rps

    peak 140 rps peak
  118. @jazzdan “Between 3x-6x Faster”

  119. #hhvm Repo Authoritative - Extra 20% • Produce bytecode SQLite

    database in advance • Build include map • Statically resolve file paths • Do non-type related optimizations at compile time
  120. Time Deploys About 60 deploys per day

  121. @jazzdan What about writing data?

  122. Employee Only Traffic

  123. @jazzdan memcached

  124. Memcached operation failed (returned false) when decrementing KEY

  125. Happiness and Performance Correlation Time Response Time Happiness

  126. Happiness and Performance Correlation Time Response Time Happiness

  127. None
  128. @jazzdan All get()s were returning false

  129. Happiness and Performance Correlation Time Response Time Happiness

  130. Happiness and Performance Correlation Time Response Time Happiness

  131. None
  132. @jazzdan Takeaway:

  133. @jazzdan HHVM is rock solid

  134. @jazzdan Extensions sometimes have bugs

  135. Slow Ramp Up

  136. [23/janv./2015:22:40:32 +0000]

  137. [23/ 1⽉月/2015:23:37:56]

  138. request 1 setlocale(“a”) request 2 setlocale(“b”) strftime() strftime() Time

  139. @jazzdan Solution: newlocale()/uselocale()

  140. @jazzdan Takeaway:

  141. @jazzdan HHVM is threaded

  142. Release!

  143. None
  144. HHVM Average Response Time 12 Hour Zend Average Response Time

    12 Hour
  145. HHVM p95 Response Time 12 Hour Zend p95 Response Time

    12 Hour
  146. HHVM vs PHP 5.5 on Etsy Internal API Median p95

    p99 Response Time in Milliseconds 0 200 400 600 800 HHVM PHP
  147. What else can HHVM do?

  148. @jazzdan Flame Graphs

  149. None
  150. @jazzdan HHVM Debugger

  151. None
  152. @jazzdan pfff

  153. @jazzdan sgrep

  154. #hhvm sgrep: Problem Find all invocations of foo() where the

    second argument is 1, with any number of arguments after
  155. #hhvm sgrep: Solution $ sgrep -e 'foo(X, 1, ...)' *.php

  156. #hhvm sgrep: Problem Find all && where both operands are

    the same
  157. #hhvm sgrep: Solution $ sgrep -e 'X && X' *.php

  158. #hhvm sgrep: Solution $ sgrep -pvar X -e 'X &&

    X' *.php
  159. #hhvm sgrep: Problem Find all calls to foo() where the

    first argument is 1
  160. #hhvm sgrep: Solution $ sgrep -e 'foo(1, ...)' *.php

  161. #hhvm sgrep: Problem Find all method calls “addPreparable()” with any

    number of arguments
  162. #hhvm sgrep: Solution $ sgrep -e 'X->addPreparable(...)'

  163. @jazzdan spatch

  164. #hhvm spatch: Problem Remove the second argument from all invocations

    of function foo()
  165. #hhvm spatch: Solution //remove_second_arg_foo.spatch foo(X - ,Y )

  166. #hhvm spatch: Solution $ spatch -f remove_second_arg_foo.spatch *.php

  167. #hhvm spatch: Problem Rename a function with a variable number

    of arguments
  168. #hhvm spatch: Solution - foo + bar (...)

  169. @jazzdan perf(1)

  170. None
  171. None
  172. @jazzdan perf(1) - 12.09% HPHP::f_sort + PHP::…getAllReplicantNames

  173. Lessons Learned

  174. @jazzdan If you’re running an old operating system…

  175. @jazzdan …you’re gonna have a bad time.

  176. @jazzdan Tee’ing traffic is a superpower

  177. @jazzdan HHVM is Rock Solid

  178. @jazzdan Extensions Sometimes Aren't

  179. @jazzdan Threads are Hard

  180. @jazzdan Tooling is Powerful

  181. @jazzdan Lessons Learned • Do: • Run a newer Linux

    distribution • Ramp up slowly • Don’t: • Trust that extensions are 100% • Assume that processes are like threads
  182. The Future

  183. None
  184. None
  185. None
  186. None
  187. None
  188. None
  189. @jazzdan What does the future hold?

  190. @jazzdan No one knows

  191. @jazzdan But now we are better prepared

  192. None
  193. @jazzdan Questions? ( https://joind.in/13384 )