Slide 1

Slide 1 text

Inside Treasure Data “API” Treasure Data Tech Talk 201706 @k0kubun Extra: CRuby and ERB optimization at TD Hackathon

Slide 2

Slide 2 text

Who are you? • Takashi Kokubun (@k0kubun) • "API" team at Treasure Data • Ruby committer (New!) • Maintainer of ERB

Slide 3

Slide 3 text

Agenda • What does "API" do at Treasure Data? • The architecture of our Rails application • Extra: Optimization of CRuby & ERB at TD Hackathon

Slide 4

Slide 4 text

What does “API” do at Treasure Data?

Slide 5

Slide 5 text

“API” team @uu59 Team Lead Embulk Integration @kamipo Tech Lead Infrastructure, DBA @crzrcn Frontend @k0kubun Backend middleware API team’s main task is developing Rails app called “API”. The latter explanation is additional roles other than API.

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

"Senior APIs Engineer"? • In Treasure Data, "API" means Rails app serving public and internal REST APIs • But it also does many other things…

Slide 8

Slide 8 text

The Rails app does • Import data sent from Fluentd • Integrate with other systems by Embulk • Schema management of "PlazmaDB" • Manage Hive and Presto queries

Slide 9

Slide 9 text

We should call not just "API" but... • "Treasure Data hosted analytics platform” • Actually the phrase is written in description • It's related to almost every system in TD • Let's look into its internals

Slide 10

Slide 10 text

The architecture of our Rails application

Slide 11

Slide 11 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication

Slide 12

Slide 12 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication API’s DB, which has “PlazmaDB” table schema Fetch table permission and authorize

Slide 13

Slide 13 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication Fetch table permission and authorize API’s DB, which has “PlazmaDB” table schema JNQPSUTUPSBHF S3 or RiakCS Store raw data

Slide 14

Slide 14 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF S3 or RiakCS Store raw data Fetch table permission and authorize

Slide 15

Slide 15 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Fetch table permission and authorize

Slide 16

Slide 16 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar Ingest Spawn Java Fetch table permission and authorize

Slide 17

Slide 17 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar l1MB[NB%#z Spawn Java Fetch table permission and authorize Ingest

Slide 18

Slide 18 text

l"1*z $VTUPNFS Import data sent from Fluentd 5%`TIPTUFEqVFOUE l&WFOU$PMMFDUPSz Browser (JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar l1MB[NB%#z “API”’s work Spawn Java Fetch table permission and authorize Ingest

Slide 19

Slide 19 text

Import data sent from Fluentd • This part is under high loads and bottleneck • Data import part will be replaced by @tagomoris's project • Maybe he'll publish its architecture in the future • I'm developing one of its middlewares in Java

Slide 20

Slide 20 text

Integrate with other systems by Embulk

Slide 21

Slide 21 text

Integrate with other systems by Embulk

Slide 22

Slide 22 text

l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails

Slide 23

Slide 23 text

l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Rails doesn’t serve assets with Asset Pipeline. Console application (React, Redux) is in a separated repository. Rails just proxies request to its S3 url for historical reasons.

Slide 24

Slide 24 text

l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority

Slide 25

Slide 25 text

5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority guess, preview

Slide 26

Slide 26 text

5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview Bulk load, export

Slide 27

Slide 27 text

5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority 5%`TIPTUFE%JHEBH 5SFBTVSF8PSLqPX API requests for integration l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview Bulk load, export

Slide 28

Slide 28 text

5%`TIPTUFE&NCVML l%BUB$POOFDUPSz l"1*z Integrate with other systems by Embulk $VTUPNFS Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority 5%`TIPTUFE%JHEBH 5SFBTVSF8PSLqPX API requests for integration l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview “API”’s work Bulk load, export

Slide 29

Slide 29 text

Schema management of "PlazmaDB"

Slide 30

Slide 30 text

Manage Hive and Presto queries

Slide 31

Slide 31 text

Schema management of “PlazmaDB" Manage Hive and Presto queries • Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails API’s DB, which has Table schema

Slide 32

Slide 32 text

Schema management of “PlazmaDB" Manage Hive and Presto queries • Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails API’s DB, which has Table schema 8PSLFS Periodically requests schema update with sampled data

Slide 33

Slide 33 text

Schema management of “PlazmaDB" Manage Hive and Presto queries • Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias 8PSLFS Periodically requests schema update with sampled data

Slide 34

Slide 34 text

Schema management of “PlazmaDB" Manage Hive and Presto queries • Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias 8PSLFS Periodically requests schema update with sampled data

Slide 35

Slide 35 text

Schema management of “PlazmaDB" Manage Hive and Presto queries • Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias l1MB[NB%#z Query 8PSLFS Periodically requests schema update with sampled data

Slide 36

Slide 36 text

Schema management of “PlazmaDB" Manage Hive and Presto queries • Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias l1MB[NB%#z Query 8PSLFS Periodically requests schema update with sampled data “API”’s work

Slide 37

Slide 37 text

What should we improve? • Scalability • It's multi-tenant system and every part must be scalable • Productivity • Splitting monolith to proper size to reduce communication cost • Release cost is high but it can be improved • Reliability • To avoid unexpected errors, we need a developer with great Rails understanding

Slide 38

Slide 38 text

Why you should join us • Our Rails application is integrated with many interesting things which normal ones aren’t • Fluentd, Embulk, Digdag, Hive, Presto, Hadoop, perfectqueue, PlazmaDB, other Java workers and middlewares… • Treasure Data is a company that solves customer’s highly technical difficult problems • Since the problems are complex, building platform for them is hard and interesting

Slide 39

Slide 39 text

Optimization of CRuby & ERB at TD Hackathon

Slide 40

Slide 40 text

TD Hackathon • Treasure Data’s internal hackathon for 2 days • Free theme, but no TD usual work • I joined it with @nalsh to optimize something in CRuby Not all of following contents are done in TD Hackathon. We did SSE and researched about ERB optimization idea 1 & 2 at TD Hackathon. But others are just my hobby.

Slide 41

Slide 41 text

Optimize CGI.escapeHTML using SSE • In template engine performance on Rails, HTML escaping is the bottleneck • We can search target characters in a batch using SSE instruction • I created "hescape gem" as PoC in the past • Can we improve and introduce it to CRuby?

Slide 42

Slide 42 text

CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index

Slide 43

Slide 43 text

CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index Create escaped characters mask Compare 16 chars in the same time, return 1st found index

Slide 44

Slide 44 text

CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index

Slide 45

Slide 45 text

CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index 0.87x

Slide 46

Slide 46 text

• In this version, hescape's problems are fixed • Interface problem is already solved, no page faults, reduced number of variables • 3.74x faster on no HTML escaped chars • When many escaped characters are found, it's slow • Can we improve this using PCMPESTRM? CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index

Slide 47

Slide 47 text

CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit Length Strings, Return Mask

Slide 48

Slide 48 text

CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit Length Strings, Return Mask Load found indices in 16bit mask Compare 16 chars in the same time, return found mask

Slide 49

Slide 49 text

CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit Length Strings, Return Mask 0.95x

Slide 50

Slide 50 text

CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit Length Strings, Return Mask Before After With all characters escaped (worst case for SSE): 87.3% of old ver. 95.2% of old ver.

Slide 51

Slide 51 text

• Much improved with escaped characters • 0% escaped: 3.74x → 3.10x • 1% escaped: 1.53x → 1.57x • 5% escaped: 1.30x → 1.28x • 10% escaped: 1.18x → 1.16x • 20% escaped: 0.97x → 0.98x • 100% escaped: 0.87x → 0.95x • Looks reasonable, but not so good enough to spoil readability… CGI.escapeHTML w/ PCMPESTRM (k0kubun/ruby @ 4aea42c) PCMPESTRM: Packed Compare Explicit Length Strings, Return Mask

Slide 52

Slide 52 text

CGI.escapeHTML future works? • With SSE, for length < 16 • To avoid page fault, current patch just skips the last fragment • We can solve that problem by adjusting scan position first • Reduce memory allocation times • houdini (and hescape) allocates extra (sometimes unnecessary) memory on buffer extension, and it’s fast • If we can predict exact result length WITH SOME MAGIC, it'd be fairly fast

Slide 53

Slide 53 text

ERB rendering optimization • With Ruby 2.4’s ERB, rendering is slower than Erubi • Note: Erubi is Rails’ default template engine for erb • Can we improve ERB's rendering time, mostly for Sinatra? • For one-time rendering like Chef usage, we also need to care about compiling time • Usually optimizing rendering without slowing down compiling is hard

Slide 54

Slide 54 text

ERB optimization idea 1: opt_ltlt • In YARV, there are instructions that can bypass method call overhead for some methods • ex) +, -, *, /, <, >, ==, <<, … • ERB generates _erbout.concat instead of _erbout <<, so there was optimization chance

Slide 55

Slide 55 text

ERB optimization idea 1: opt_ltlt This patch won’t increase compiling cost!

Slide 56

Slide 56 text

ERB optimization idea 1: opt_ltlt Before After

Slide 57

Slide 57 text

ERB optimization idea 1: opt_ltlt See ruby/ruby#1612 for benchmark details Optimized 77%, but why? Ruby method call is so slow? (explained later)

Slide 58

Slide 58 text

ERB optimization idea 2: opt_tos • If bypassing method call is fast, would it be fast to bypass #to_s calls too? • It’s not registered as opt_* instruction now • Note: tostring instruction doesn’t check String#to_s redefinition, so they’re different • Since I made some effort to omit #to_s calls in Hamlit, it would be happy if it’s fast

Slide 59

Slide 59 text

ERB optimization idea 2: opt_tos

Slide 60

Slide 60 text

ERB optimization idea 2: opt_tos In average, 2.103s → 2.022s. So only improved 4%. What’s different from <

Slide 61

Slide 61 text

ERB optimization idea 2: opt_tos • String#concat was made slower in Ruby 2.4 • It’s changed to take multiple arguments and create a temporary buffer in Feature#12333. • In Ruby 2.4+, String#<< is just faster than String#concat

Slide 62

Slide 62 text

ERB optimization idea 2: opt_tos • I got back String#concat’s performance (1.44x) and didn’t introduce opt_tos instruction

Slide 63

Slide 63 text

ERB optimization idea 3: skip force encoding • Formerly generated code included both magic comment and force_encoding I wanted to skip fetching __ENCODING__ and calling #force_encoding

Slide 64

Slide 64 text

ERB optimization idea 3: skip force encoding • To remove it, we need to utilize magic comment properly • String.new • It’s not affected by magic comment since it’s not a literal • '' (two single quotes) • It’s affected by magic comment since it’s a literal

Slide 65

Slide 65 text

ERB optimization idea 3: skip force encoding • ruby/ruby#1147 This patch doesn’t increase compiling cost .dup is to support frozen_string_literal support

Slide 66

Slide 66 text

ERB optimization idea 3: skip force encoding Before After

Slide 67

Slide 67 text

ERB optimization idea 3: skip force encoding • Possible effects: • Gem that uses ERB#src as part of string eval to define method may return wrong encoding if non-script-encoding one is given to ERB • My opinion: • If you modify the result of ERB#src or depend on its internals, own your risk and fix it in third-party layer • If you use ERB#result or #def_method, it won’t be broken • In that case, you can pull magic comment to the first of evaluated string. I’ll fix it if I find such case.

Slide 68

Slide 68 text

• For a long time, I’ve wanted to freeze string literals generated by ERB • But I couldn’t do that without increasing compiling cost… ERB optimization idea 4: opt_str_uminus For multiple newlines, it needs concatenation here and it’s slow

Slide 69

Slide 69 text

• Unary plus/minus operators are introduced at Ruby 2.3 (String#+@, String#-@) • And String#-@ is optimized like String#freeze at trunk (Ruby 2.5) on 2017/03/27 • ERB can immediately use it because it’s standard library!!! ERB optimization idea 4: opt_str_uminus

Slide 70

Slide 70 text

ERB optimization idea 4: opt_str_uminus

Slide 71

Slide 71 text

ERB optimization idea 4: opt_str_uminus No compilation overhead because we can put the operator here String allocation is reduced, so it’s faster

Slide 72

Slide 72 text

ERB optimization idea 4: opt_str_uminus Before After

Slide 73

Slide 73 text

ERB optimization idea 5: str_uplus

Slide 74

Slide 74 text

ERB optimization idea 5: str_uplus

Slide 75

Slide 75 text

ERB optimization idea 5: str_uplus

Slide 76

Slide 76 text

ERB optimization idea 5: str_uplus If string is not frozen, String#+@ skips rb_str_dup. So it’s faster than String#dup if not frozen. Very convenient!!!

Slide 77

Slide 77 text

ERB optimization idea 5: str_uplus Faster~

Slide 78

Slide 78 text

Before After ERB optimization idea 5: str_uplus No unnecessary things!!

Slide 79

Slide 79 text

ERB optimization result (compiling & rendering) About 1.000s → 0.700s, 1.43x faster in total of compiling & rendering No performance regression in total of compiling & rendering

Slide 80

Slide 80 text

ERB optimization result (compiling & rendering) Not only Feature#12333 (String#concat), but also Feature#11936 (I proposed) was bad for compiling performance. The bottleneck of 2.4’s large drawback is actually Feature#11936 and I fixed it as Bug#12074. For compiling, I just did match pomp. Ruby 2.4’s features were bad for compiling, but it’s fixed at 2.5

Slide 81

Slide 81 text

ERB optimization result (rendering only) ruby/benchmark/bm_erb_render.rb (Ruby 2.5.0 is trunk: 1542ab670e) w/ Erubi: Erubi::Engine.new(data).src, Erubis: Erubis::Eruby.new(data).src 5JNFUPSFOEFSNUJNFT T ERB Ruby 2.5.0 Erubis 2.7.0 Ruby 2.5.0 ERB Ruby 2.3.4 Erubi 1.6.0 Ruby 2.5.0 ERB Ruby 2.4.1 In Ruby 2.5, ERB became the fastest erb implementation in the world, and 2.21x faster than Ruby 2.4 (actually, 1.12x faster than Ruby 2.3)