Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Treasure Data Summer Intern 2015 Final Report

mururu
September 30, 2015
2.9k

Treasure Data Summer Intern 2015 Final Report

mururu

September 30, 2015
Tweet

Transcript

  1. Summer Intern 2015
 Final Report September 30, 2015 Yuki Ito

  2. Who am I? • Yuki Ito • Master’s student, Information

    Science and Technology, The University of Tokyo • msgpack-erlang and fluent-logger-erlang maintainer 
  3. TreasureData Summer Intern • 2015/08/03 ~ 2015/09/30 • @ TreasureData

    Tokyo office 
  4. What I did

  5. What I did • Nanosecond timestamp in Fluentd • Perfect

    Monitor 
  6. What I did • Nanosecond timestamp in Fluentd • Perfect

    Monitor 
  7. Nanosecond timestamp in Fluentd

  8. Current Fluentd timestamp • Unix Timestamp • Second resolution •

    2015-09-29 15:55:43 +0900 => 1443509743 
  9. Problem • Some storages and platforms (Elasticsearch, GCP…) expect/want sub-second

    (millisecond, nanosecond) as timestamp. • But current Fluentd timestamp cannot hold sub-second. • If we support nanosecond resolution, it will be able to cover all requirements because it is minimum resolution generally. (but there will be a little more overhead than millisecond/microsecond) 
  10. New Fluentd Timestamp with nanosecond resolution

  11. EventTime

  12. Implementation - EventTime • Two attributes • @sec: second integer

    (same with current Timestamp) • @nsec: nanosecond integer • In most cases, it behaves just like current Timestamp • It is serialized as MessagePack Ext type • Fluent::Engine.now and built-in parsers returns EventTime 
  13. Difficulties - EventTime • Backward compatibility • Performance 

  14. Difficulties - EventTime • Backward compatibility • Performance 

  15. Backward compatibility 1 • There are many plugins. They must

    works fine just as it is. • EventTime behaves like current Timestamp in most cases. • If a plugin doesn’t want to loose sub-second, it may need additional code to handle EventTime. • I have checked many many plugins. 
  16. Backward compatibility 2 • To keep sub-second resolution across nodes

    (forward plugins), external data format (MessagePack) have to serialize EventTime as a different type from old timestamp, but this may breaks old nodes. • Introduced time_as_integer option to output forward plugin to force timestamp to be serialized as Integer (same with the current timestamp). 
  17. Difficulties - EventTime • Backward compatibility • Performance 

  18. Performance concerns 1 • When time_as_integer is true and output

    forward plugin receives PackedForward, it is deserialized and serialized for converting EventTime to Integer. • By keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay node. 
  19. Performance concerns 2 • If timestamp format of logs include

    sub- second part, it is difficult to cache results of Time.strptime (Time.strptime is heavy task). • Introduced strptime gem, which can precompile a format string. 
  20. Benchmark Results - strptime • Measured in_tail parsing performance •

    Used dummer/flowcounter_simple • machine spec • CPU: Core i5 5250U • Memory: 8GB • Disc: SSD 256G • OS: OSX Yosemite DBDIF OPDBDIF OPDBDIF XJUITUSQUJNF [email protected] MJOFTTFD       
  21. Elasticsearch DEMO

  22. None
  23. Summary • Current Timestamp have only second resolution. • Some

    storages and platforms want nano(milli)second resolution. • I introduced new Timestamp which can have nanosecond resolution called EventTime. 
  24. Future work • Documentations for users and plugin developers. •

    “This branch will be merged in master branch about next month.” by Fluentd commiters 
  25. EventTime will be released as a part of v0.14!

  26. What I did • Nanosecond timestamp in Fluentd • Perfect

    Monitor 
  27. Perfect Monitor Prototype of monitoring service for TD customers

  28. What for? - Perfect Monitor • Visualize used computing resources

    for customers in near real-time • e.g. number of records/bytes over event collector
 number of running jobs
 number of allocated CPU cores for a particular job • Reduce support cost by making customers to understand how they are using our computing resources • Make TD staffs known what/how our customers are doing data processing 
  29. What is? - Perfect Monitor • Dashboard for customers to

    know how they are using our computing resources • Collector for various metrics sent from workers • Storages to store metrics (InfluxDB/TD) • API server to handle requests from dashboard and query backend storage • Dashboard Application 
  30. Use Cases 1 - number of records -  •

    Almost system administrator doesn't know about how many logs they are generating. • The number of logs are affected by many events, like release of new services, new versions of apps, and so on.
  31. 

  32. Use Cases 2 - number of running tasks - 

    • Customers doesn’t know how many are CPU cores is used by their each jobs now.
  33. Use Cases 3 - support/sales team side -  •

    A support/sales engineer found the cause of problems more easily.
  34. System Architecture

  35. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  Architecture query (only old data)
  36. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# send metrics  How to collect metrics • Workers send metrics to monitoring Server. • Monitoring Server filters and aggregates (if needed) metrics, then store them to InfluxDB and TD.
  37. 5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE query (only old

    data) query cache  How to show metrics • Dashboard asks API Server for metrics based on its configuration. • API Server queries InfluxDB or TD(Presto) based on the time window. • Dashboard renders graph for query results.
  38. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  The Point of Architecture 1 query (only old data)
  39. The Point of Architecture 1 • Perfect Monitor store same

    data to InfluxDB and TD(presto) • InfluxDB hold only recent data(e.g. 30 days) • TD(presto) hold all data • Why I chose InfluxDB? • It is fast enough. • We can write time-series data in a hash and query it on the fly flexibly, so it makes trial and error easy.  5% 1SFTUP *OqVY%# query query store store
  40. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  The Point of Architecture 2 query (only old data)
  41. The Point of Architecture 2 • API server interprets queries

    from dashboard and queries InfluxDB or TD(presto) • Only when old data is needed, API Server queries TD(presto), otherwise InfluxDB • It can cache query results in redis.  "1* 4FSWFS 3FEJT query cache query (only old data)
  42. Failure case 1 • InfluxDB tends to be overloaded •

    When InfluxDB is down, just launch new InfluxDB node and load data from TD(presto).  5% 1SFTUP OFX*OqVY%# restore
  43. Failure case 2 • When InfluxDB is down or its

    response is delayed, API Server query TD(presto).  5% 1SFTUP *OqVY%# "1* 4FSWFS query
  44. For operation • The API server returns results even if

    happening InfluxDB degradation. • Because then API server uses TD(Presto) automatically. • To add new metrics, you just need to add configuration to monitoring server and dashboard application. • Of course, workers need to be able to send the metrics. 
  45. Summary • Show more metrics about computing resources to customers

    for customers and TD. • Perfect Monitor makes it easy. 
  46. Impression of Summer Intern • I have thought about “data“

    seriously everyday. This is my first experience. So excited. • I have learned a lot by trying both of OSS and TD internal task. 
  47. Thank you!