Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inside Treasure Data API / CRuby and ERB optimization at TD Hackathon

Inside Treasure Data API / CRuby and ERB optimization at TD Hackathon

Treasure Data Tech Talk 201706
https://eventdots.jp/event/620321

08d5432a5bc31e6d9edec87b94cb1db1?s=128

Takashi Kokubun

June 13, 2017
Tweet

Transcript

  1. Inside Treasure Data “API” Treasure Data Tech Talk 201706 @k0kubun

    Extra: CRuby and ERB optimization at TD Hackathon
  2. Who are you? • Takashi Kokubun (@k0kubun) • "API" team

    at Treasure Data • Ruby committer (New!) • Maintainer of ERB
  3. Agenda • What does "API" do at Treasure Data? •

    The architecture of our Rails application • Extra: Optimization of CRuby & ERB at TD Hackathon
  4. What does “API” do at Treasure Data?

  5. “API” team @uu59 Team Lead Embulk Integration @kamipo Tech Lead

    Infrastructure, DBA @crzrcn Frontend @k0kubun Backend middleware API team’s main task is developing Rails app called “API”. The latter explanation is additional roles other than API.
  6. None
  7. "Senior APIs Engineer"? • In Treasure Data, "API" means Rails

    app serving public and internal REST APIs • But it also does many other things…
  8. The Rails app does • Import data sent from Fluentd

    • Integrate with other systems by Embulk • Schema management of "PlazmaDB" • Manage Hive and Presto queries
  9. We should call not just "API" but... • "Treasure Data

    hosted analytics platform” • Actually the phrase is written in description • It's related to almost every system in TD • Let's look into its internals
  10. The architecture of our Rails application

  11. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication
  12. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication API’s DB, which has “PlazmaDB” table schema Fetch table permission and authorize
  13. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication Fetch table permission and authorize API’s DB, which has “PlazmaDB” table schema JNQPSUTUPSBHF S3 or RiakCS Store raw data
  14. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF S3 or RiakCS Store raw data Fetch table permission and authorize
  15. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Fetch table permission and authorize
  16. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar Ingest Spawn Java Fetch table permission and authorize
  17. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar l1MB[NB%#z Spawn Java Fetch table permission and authorize Ingest
  18. l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser

    (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar l1MB[NB%#z “API”’s work Spawn Java Fetch table permission and authorize Ingest
  19. Import data sent from Fluentd • This part is under

    high loads and bottleneck • Data import part will be replaced by @tagomoris's project • Maybe he'll publish its architecture in the future • I'm developing one of its middlewares in Java
  20. Integrate with other systems by Embulk

  21. Integrate with other systems by Embulk

  22. l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening

    TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails
  23. l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening

    TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Rails doesn’t serve assets with Asset Pipeline. Console application (React, Redux) is in a separated repository. Rails just proxies request to its S3 url for historical reasons.
  24. l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening

    TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority
  25. 5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS

    Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority guess, preview
  26. 5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS

    Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview Bulk load, export
  27. 5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS

    Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority 5%`TIPTUFE%JHEBH 5SFBTVSF8PSLqPX API requests for integration l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview Bulk load, export
  28. 5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS

    Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority 5%`TIPTUFE%JHEBH 5SFBTVSF8PSLqPX API requests for integration l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview “API”’s work Bulk load, export
  29. Schema management of "PlazmaDB"

  30. Manage Hive and Presto queries

  31. Schema management of “PlazmaDB" Manage Hive and Presto queries •

    Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails API’s DB, which has Table schema
  32. Schema management of “PlazmaDB" Manage Hive and Presto queries •

    Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails API’s DB, which has Table schema 8PSLFS Periodically requests schema update with sampled data
  33. Schema management of “PlazmaDB" Manage Hive and Presto queries •

    Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias 8PSLFS Periodically requests schema update with sampled data
  34. Schema management of “PlazmaDB" Manage Hive and Presto queries •

    Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias 8PSLFS Periodically requests schema update with sampled data
  35. Schema management of “PlazmaDB" Manage Hive and Presto queries •

    Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias l1MB[NB%#z Query 8PSLFS Periodically requests schema update with sampled data
  36. Schema management of “PlazmaDB" Manage Hive and Presto queries •

    Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias l1MB[NB%#z Query 8PSLFS Periodically requests schema update with sampled data “API”’s work
  37. What should we improve? • Scalability • It's multi-tenant system

    and every part must be scalable • Productivity • Splitting monolith to proper size to reduce communication cost • Release cost is high but it can be improved • Reliability • To avoid unexpected errors, we need a developer with great Rails understanding
  38. Why you should join us • Our Rails application is

    integrated with many interesting things which normal ones aren’t • Fluentd, Embulk, Digdag, Hive, Presto, Hadoop, perfectqueue, PlazmaDB, other Java workers and middlewares… • Treasure Data is a company that solves customer’s highly technical difficult problems • Since the problems are complex, building platform for them is hard and interesting
  39. Optimization of CRuby & ERB at TD Hackathon

  40. TD Hackathon • Treasure Data’s internal hackathon for 2 days

    • Free theme, but no TD usual work • I joined it with @nalsh to optimize something in CRuby Not all of following contents are done in TD Hackathon. We did SSE and researched about ERB optimization idea 1 & 2 at TD Hackathon. But others are just my hobby.
  41. Optimize CGI.escapeHTML using SSE • In template engine performance on

    Rails, HTML escaping is the bottleneck • We can search target characters in a batch using SSE instruction • I created "hescape gem" as PoC in the past • Can we improve and introduce it to CRuby?
  42. CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit

    Length Strings, Return Index
  43. CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit

    Length Strings, Return Index Create escaped characters mask Compare 16 chars in the same time, return 1st found index
  44. CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit

    Length Strings, Return Index
  45. CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit

    Length Strings, Return Index 0.87x
  46. • In this version, hescape's problems are fixed • Interface

    problem is already solved, no page faults, reduced number of variables • 3.74x faster on no HTML escaped chars • When many escaped characters are found, it's slow • Can we improve this using PCMPESTRM? CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index
  47. CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit

    Length Strings, Return Mask
  48. CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit

    Length Strings, Return Mask Load found indices in 16bit mask Compare 16 chars in the same time, return found mask
  49. CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit

    Length Strings, Return Mask 0.95x
  50. CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit

    Length Strings, Return Mask Before After With all characters escaped (worst case for SSE): 87.3% of old ver. 95.2% of old ver.
  51. • Much improved with escaped characters • 0% escaped: 3.74x

    → 3.10x • 1% escaped: 1.53x → 1.57x • 5% escaped: 1.30x → 1.28x • 10% escaped: 1.18x → 1.16x • 20% escaped: 0.97x → 0.98x • 100% escaped: 0.87x → 0.95x • Looks reasonable, but not so good enough to spoil readability… CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit Length Strings, Return Mask
  52. CGI.escapeHTML future works? • With SSE, for length < 16

    • To avoid page fault, current patch just skips the last fragment • We can solve that problem by adjusting scan position first • Reduce memory allocation times • houdini (and hescape) allocates extra (sometimes unnecessary) memory on buffer extension, and it’s fast • If we can predict exact result length WITH SOME MAGIC, it'd be fairly fast
  53. ERB rendering optimization • With Ruby 2.4’s ERB, rendering is

    slower than Erubi • Note: Erubi is Rails’ default template engine for erb • Can we improve ERB's rendering time, mostly for Sinatra? • For one-time rendering like Chef usage, we also need to care about compiling time • Usually optimizing rendering without slowing down compiling is hard
  54. ERB optimization idea 1: opt_ltlt • In YARV, there are

    instructions that can bypass method call overhead for some methods • ex) +, -, *, /, <, >, ==, <<, … • ERB generates _erbout.concat instead of _erbout <<, so there was optimization chance
  55. ERB optimization idea 1: opt_ltlt This patch won’t increase compiling

    cost!
  56. ERB optimization idea 1: opt_ltlt Before After

  57. ERB optimization idea 1: opt_ltlt See ruby/ruby#1612 for benchmark details

    Optimized 77%, but why? Ruby method call is so slow? (explained later)
  58. ERB optimization idea 2: opt_tos • If bypassing method call

    is fast, would it be fast to bypass #to_s calls too? • It’s not registered as opt_* instruction now • Note: tostring instruction doesn’t check String#to_s redefinition, so they’re different • Since I made some effort to omit #to_s calls in Hamlit, it would be happy if it’s fast
  59. ERB optimization idea 2: opt_tos

  60. ERB optimization idea 2: opt_tos In average, 2.103s → 2.022s.

    So only improved 4%. What’s different from <<?
  61. ERB optimization idea 2: opt_tos • String#concat was made slower

    in Ruby 2.4 • It’s changed to take multiple arguments and create a temporary buffer in Feature#12333. • In Ruby 2.4+, String#<< is just faster than String#concat
  62. ERB optimization idea 2: opt_tos • I got back String#concat’s

    performance (1.44x) and didn’t introduce opt_tos instruction
  63. ERB optimization idea 3: skip force encoding • Formerly generated

    code included both magic comment and force_encoding I wanted to skip fetching __ENCODING__ and calling #force_encoding
  64. ERB optimization idea 3: skip force encoding • To remove

    it, we need to utilize magic comment properly • String.new • It’s not affected by magic comment since it’s not a literal • '' (two single quotes) • It’s affected by magic comment since it’s a literal
  65. ERB optimization idea 3: skip force encoding • ruby/ruby#1147 This

    patch doesn’t increase compiling cost .dup is to support frozen_string_literal support
  66. ERB optimization idea 3: skip force encoding Before After

  67. ERB optimization idea 3: skip force encoding • Possible effects:

    • Gem that uses ERB#src as part of string eval to define method may return wrong encoding if non-script-encoding one is given to ERB • My opinion: • If you modify the result of ERB#src or depend on its internals, own your risk and fix it in third-party layer • If you use ERB#result or #def_method, it won’t be broken • In that case, you can pull magic comment to the first of evaluated string. I’ll fix it if I find such case.
  68. • For a long time, I’ve wanted to freeze string

    literals generated by ERB • But I couldn’t do that without increasing compiling cost… ERB optimization idea 4: opt_str_uminus For multiple newlines, it needs concatenation here and it’s slow
  69. • Unary plus/minus operators are introduced at Ruby 2.3 (String#+@,

    String#-@) • And String#-@ is optimized like String#freeze at trunk (Ruby 2.5) on 2017/03/27 • ERB can immediately use it because it’s standard library!!! ERB optimization idea 4: opt_str_uminus
  70. ERB optimization idea 4: opt_str_uminus

  71. ERB optimization idea 4: opt_str_uminus No compilation overhead because we

    can put the operator here String allocation is reduced, so it’s faster
  72. ERB optimization idea 4: opt_str_uminus Before After

  73. ERB optimization idea 5: str_uplus

  74. ERB optimization idea 5: str_uplus

  75. ERB optimization idea 5: str_uplus

  76. ERB optimization idea 5: str_uplus If string is not frozen,

    String#+@ skips rb_str_dup. So it’s faster than String#dup if not frozen. Very convenient!!!
  77. ERB optimization idea 5: str_uplus Faster~

  78. Before After ERB optimization idea 5: str_uplus No unnecessary things!!

  79. ERB optimization result (compiling & rendering) About 1.000s → 0.700s,

    1.43x faster in total of compiling & rendering No performance regression in total of compiling & rendering
  80. ERB optimization result (compiling & rendering) Not only Feature#12333 (String#concat),

    but also Feature#11936 (I proposed) was bad for compiling performance. The bottleneck of 2.4’s large drawback is actually Feature#11936 and I fixed it as Bug#12074. For compiling, I just did match pomp. Ruby 2.4’s features were bad for compiling, but it’s fixed at 2.5
  81. ERB optimization result (rendering only) ruby/benchmark/bm_erb_render.rb (Ruby 2.5.0 is trunk:

    1542ab670e) w/ Erubi: Erubi::Engine.new(data).src, Erubis: Erubis::Eruby.new(data).src 5JNFUPSFOEFSNUJNFT T      ERB Ruby 2.5.0 Erubis 2.7.0 Ruby 2.5.0 ERB Ruby 2.3.4 Erubi 1.6.0 Ruby 2.5.0 ERB Ruby 2.4.1      In Ruby 2.5, ERB became the fastest erb implementation in the world, and 2.21x faster than Ruby 2.4 (actually, 1.12x faster than Ruby 2.3)