Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Treasure Data Summer Intern 2015 Final Report

09923d8b0c79423a289b7d5dc31a59e4?s=47 mururu
September 30, 2015

Treasure Data Summer Intern 2015 Final Report



September 30, 2015


  1. Summer Intern 2015
 Final Report September 30, 2015 Yuki Ito

  2. Who am I? • Yuki Ito • Master’s student, Information

    Science and Technology, The University of Tokyo • msgpack-erlang and fluent-logger-erlang maintainer 
  3. TreasureData Summer Intern • 2015/08/03 ~ 2015/09/30 • @ TreasureData

    Tokyo office 
  4. What I did

  5. What I did • Nanosecond timestamp in Fluentd • Perfect

  6. What I did • Nanosecond timestamp in Fluentd • Perfect

  7. Nanosecond timestamp in Fluentd

  8. Current Fluentd timestamp • Unix Timestamp • Second resolution •

    2015-09-29 15:55:43 +0900 => 1443509743 
  9. Problem • Some storages and platforms (Elasticsearch, GCP…) expect/want sub-second

    (millisecond, nanosecond) as timestamp. • But current Fluentd timestamp cannot hold sub-second. • If we support nanosecond resolution, it will be able to cover all requirements because it is minimum resolution generally. (but there will be a little more overhead than millisecond/microsecond) 
  10. New Fluentd Timestamp with nanosecond resolution

  11. EventTime

  12. Implementation - EventTime • Two attributes • @sec: second integer

    (same with current Timestamp) • @nsec: nanosecond integer • In most cases, it behaves just like current Timestamp • It is serialized as MessagePack Ext type • Fluent::Engine.now and built-in parsers returns EventTime 
  13. Difficulties - EventTime • Backward compatibility • Performance 

  14. Difficulties - EventTime • Backward compatibility • Performance 

  15. Backward compatibility 1 • There are many plugins. They must

    works fine just as it is. • EventTime behaves like current Timestamp in most cases. • If a plugin doesn’t want to loose sub-second, it may need additional code to handle EventTime. • I have checked many many plugins. 
  16. Backward compatibility 2 • To keep sub-second resolution across nodes

    (forward plugins), external data format (MessagePack) have to serialize EventTime as a different type from old timestamp, but this may breaks old nodes. • Introduced time_as_integer option to output forward plugin to force timestamp to be serialized as Integer (same with the current timestamp). 
  17. Difficulties - EventTime • Backward compatibility • Performance 

  18. Performance concerns 1 • When time_as_integer is true and output

    forward plugin receives PackedForward, it is deserialized and serialized for converting EventTime to Integer. • By keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay node. 
  19. Performance concerns 2 • If timestamp format of logs include

    sub- second part, it is difficult to cache results of Time.strptime (Time.strptime is heavy task). • Introduced strptime gem, which can precompile a format string. 
  20. Benchmark Results - strptime • Measured in_tail parsing performance •

    Used dummer/flowcounter_simple • machine spec • CPU: Core i5 5250U • Memory: 8GB • Disc: SSD 256G • OS: OSX Yosemite DBDIF OPDBDIF OPDBDIF XJUITUSQUJNF JO@UBJM MJOFTTFD       
  21. Elasticsearch DEMO

  22. None
  23. Summary • Current Timestamp have only second resolution. • Some

    storages and platforms want nano(milli)second resolution. • I introduced new Timestamp which can have nanosecond resolution called EventTime. 
  24. Future work • Documentations for users and plugin developers. •

    “This branch will be merged in master branch about next month.” by Fluentd commiters 
  25. EventTime will be released as a part of v0.14!

  26. What I did • Nanosecond timestamp in Fluentd • Perfect

  27. Perfect Monitor Prototype of monitoring service for TD customers

  28. What for? - Perfect Monitor • Visualize used computing resources

    for customers in near real-time • e.g. number of records/bytes over event collector
 number of running jobs
 number of allocated CPU cores for a particular job • Reduce support cost by making customers to understand how they are using our computing resources • Make TD staffs known what/how our customers are doing data processing 
  29. What is? - Perfect Monitor • Dashboard for customers to

    know how they are using our computing resources • Collector for various metrics sent from workers • Storages to store metrics (InfluxDB/TD) • API server to handle requests from dashboard and query backend storage • Dashboard Application 
  30. Use Cases 1 - number of records -  •

    Almost system administrator doesn't know about how many logs they are generating. • The number of logs are affected by many events, like release of new services, new versions of apps, and so on.

  32. Use Cases 2 - number of running tasks - 

    • Customers doesn’t know how many are CPU cores is used by their each jobs now.
  33. Use Cases 3 - support/sales team side -  •

    A support/sales engineer found the cause of problems more easily.
  34. System Architecture


    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  Architecture query (only old data)

    *OqVY%# send metrics  How to collect metrics • Workers send metrics to monitoring Server. • Monitoring Server filters and aggregates (if needed) metrics, then store them to InfluxDB and TD.
  37. 5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE query (only old

    data) query cache  How to show metrics • Dashboard asks API Server for metrics based on its configuration. • API Server queries InfluxDB or TD(Presto) based on the time window. • Dashboard renders graph for query results.

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  The Point of Architecture 1 query (only old data)
  39. The Point of Architecture 1 • Perfect Monitor store same

    data to InfluxDB and TD(presto) • InfluxDB hold only recent data(e.g. 30 days) • TD(presto) hold all data • Why I chose InfluxDB? • It is fast enough. • We can write time-series data in a hash and query it on the fly flexibly, so it makes trial and error easy.  5% 1SFTUP *OqVY%# query query store store

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  The Point of Architecture 2 query (only old data)
  41. The Point of Architecture 2 • API server interprets queries

    from dashboard and queries InfluxDB or TD(presto) • Only when old data is needed, API Server queries TD(presto), otherwise InfluxDB • It can cache query results in redis.  "1* 4FSWFS 3FEJT query cache query (only old data)
  42. Failure case 1 • InfluxDB tends to be overloaded •

    When InfluxDB is down, just launch new InfluxDB node and load data from TD(presto).  5% 1SFTUP OFX*OqVY%# restore
  43. Failure case 2 • When InfluxDB is down or its

    response is delayed, API Server query TD(presto).  5% 1SFTUP *OqVY%# "1* 4FSWFS query
  44. For operation • The API server returns results even if

    happening InfluxDB degradation. • Because then API server uses TD(Presto) automatically. • To add new metrics, you just need to add configuration to monitoring server and dashboard application. • Of course, workers need to be able to send the metrics. 
  45. Summary • Show more metrics about computing resources to customers

    for customers and TD. • Perfect Monitor makes it easy. 
  46. Impression of Summer Intern • I have thought about “data“

    seriously everyday. This is my first experience. So excited. • I have learned a lot by trying both of OSS and TD internal task. 
  47. Thank you!