Infrastructure, DBA @crzrcn Frontend @k0kubun Backend middleware API team’s main task is developing Rails app called “API”. The latter explanation is additional roles other than API.
(JS SDK) ELB (ALB) Receive data chunks w/ authentication Fetch table permission and authorize API’s DB, which has “PlazmaDB” table schema JNQPSUTUPSBHF S3 or RiakCS Store raw data
(JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF S3 or RiakCS Store raw data Fetch table permission and authorize
(JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Fetch table permission and authorize
(JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar Ingest Spawn Java Fetch table permission and authorize
(JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar l1MB[NB%#z Spawn Java Fetch table permission and authorize Ingest
(JS SDK) ELB (ALB) Receive data chunks w/ authentication QFSGFDU RVFVF API’s DB, which has “PlazmaDB” table schema Worker’s DB Enqueue TD worker’s task for imported data JNQPSUTUPSBHF perfectqueue worker Dequeue S3 or RiakCS Store raw data Worker class in jar l1MB[NB%#z “API”’s work Spawn Java Fetch table permission and authorize Ingest
high loads and bottleneck • Data import part will be replaced by @tagomoris's project • Maybe he'll publish its architecture in the future • I'm developing one of its middlewares in Java
TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Rails doesn’t serve assets with Asset Pipeline. Console application (React, Redux) is in a separated repository. Rails just proxies request to its S3 url for historical reasons.
TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority
Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority guess, preview
Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview Bulk load, export
Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority 5%`TIPTUFE%JHEBH 5SFBTVSF8PSLqPX API requests for integration l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview Bulk load, export
Browser opening TD console ELB (ALB) API requests for integration & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Dequeue w/ priority Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Embulk configs and account info Enqueue w/ priority 5%`TIPTUFE%JHEBH 5SFBTVSF8PSLqPX API requests for integration l1MB[NB%#z &WFSZUIJOH JOUIFXPSME guess, preview “API”’s work Bulk load, export
Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails API’s DB, which has Table schema
Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails API’s DB, which has Table schema 8PSLFS Periodically requests schema update with sampled data
Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias 8PSLFS Periodically requests schema update with sampled data
Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias 8PSLFS Periodically requests schema update with sampled data
Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias l1MB[NB%#z Query 8PSLFS Periodically requests schema update with sampled data
Browser opening TD l"1*z $VTUPNFS ELB (ALB) API requests to see schema and queries & asset requests Serve assets told by Rails Worker’s DB perfectqueue worker Worker class in jar Spawn Java QFSGFDU RVFVF API’s DB, which has Table schema Send query w/ column alias l1MB[NB%#z Query 8PSLFS Periodically requests schema update with sampled data “API”’s work
and every part must be scalable • Productivity • Splitting monolith to proper size to reduce communication cost • Release cost is high but it can be improved • Reliability • To avoid unexpected errors, we need a developer with great Rails understanding
integrated with many interesting things which normal ones aren’t • Fluentd, Embulk, Digdag, Hive, Presto, Hadoop, perfectqueue, PlazmaDB, other Java workers and middlewares… • Treasure Data is a company that solves customer’s highly technical difficult problems • Since the problems are complex, building platform for them is hard and interesting
• Free theme, but no TD usual work • I joined it with @nalsh to optimize something in CRuby Not all of following contents are done in TD Hackathon. We did SSE and researched about ERB optimization idea 1 & 2 at TD Hackathon. But others are just my hobby.
Rails, HTML escaping is the bottleneck • We can search target characters in a batch using SSE instruction • I created "hescape gem" as PoC in the past • Can we improve and introduce it to CRuby?
problem is already solved, no page faults, reduced number of variables • 3.74x faster on no HTML escaped chars • When many escaped characters are found, it's slow • Can we improve this using PCMPESTRM? CGI.escapeHTML w/ PCMPESTRI (k0kubun/ruby @ 5f9b4e2) PCMPESTRI: Packed Compare Explicit Length Strings, Return Index
• To avoid page fault, current patch just skips the last fragment • We can solve that problem by adjusting scan position first • Reduce memory allocation times • houdini (and hescape) allocates extra (sometimes unnecessary) memory on buffer extension, and it’s fast • If we can predict exact result length WITH SOME MAGIC, it'd be fairly fast
slower than Erubi • Note: Erubi is Rails’ default template engine for erb • Can we improve ERB's rendering time, mostly for Sinatra? • For one-time rendering like Chef usage, we also need to care about compiling time • Usually optimizing rendering without slowing down compiling is hard
instructions that can bypass method call overhead for some methods • ex) +, -, *, /, <, >, ==, <<, … • ERB generates _erbout.concat instead of _erbout <<, so there was optimization chance
is fast, would it be fast to bypass #to_s calls too? • It’s not registered as opt_* instruction now • Note: tostring instruction doesn’t check String#to_s redefinition, so they’re different • Since I made some effort to omit #to_s calls in Hamlit, it would be happy if it’s fast
in Ruby 2.4 • It’s changed to take multiple arguments and create a temporary buffer in Feature#12333. • In Ruby 2.4+, String#<< is just faster than String#concat
it, we need to utilize magic comment properly • String.new • It’s not affected by magic comment since it’s not a literal • '' (two single quotes) • It’s affected by magic comment since it’s a literal
• Gem that uses ERB#src as part of string eval to define method may return wrong encoding if non-script-encoding one is given to ERB • My opinion: • If you modify the result of ERB#src or depend on its internals, own your risk and fix it in third-party layer • If you use ERB#result or #def_method, it won’t be broken • In that case, you can pull magic comment to the first of evaluated string. I’ll fix it if I find such case.
literals generated by ERB • But I couldn’t do that without increasing compiling cost… ERB optimization idea 4: opt_str_uminus For multiple newlines, it needs concatenation here and it’s slow
String#-@) • And String#-@ is optimized like String#freeze at trunk (Ruby 2.5) on 2017/03/27 • ERB can immediately use it because it’s standard library!!! ERB optimization idea 4: opt_str_uminus
but also Feature#11936 (I proposed) was bad for compiling performance. The bottleneck of 2.4’s large drawback is actually Feature#11936 and I fixed it as Bug#12074. For compiling, I just did match pomp. Ruby 2.4’s features were bad for compiling, but it’s fixed at 2.5