Treasure Data Summer Intern 2015 Final Report

Summer Intern 2015  Final Report September 30, 2015 Yuki Ito

Who am I? • Yuki Ito • Master’s student, Information
Science and Technology, The University of Tokyo • msgpack-erlang and ﬂuent-logger-erlang maintainer

TreasureData Summer Intern • 2015/08/03 ~ 2015/09/30 • @ TreasureData
Tokyo ofﬁce

What I did

What I did • Nanosecond timestamp in Fluentd • Perfect
Monitor

Nanosecond timestamp in Fluentd

Current Fluentd timestamp • Unix Timestamp • Second resolution •
2015-09-29 15:55:43 +0900 => 1443509743

Problem • Some storages and platforms (Elasticsearch, GCP…) expect/want sub-second
(millisecond, nanosecond) as timestamp. • But current Fluentd timestamp cannot hold sub-second. • If we support nanosecond resolution, it will be able to cover all requirements because it is minimum resolution generally. (but there will be a little more overhead than millisecond/microsecond)

New Fluentd Timestamp with nanosecond resolution

EventTime

Implementation - EventTime • Two attributes • @sec: second integer
(same with current Timestamp) • @nsec: nanosecond integer • In most cases, it behaves just like current Timestamp • It is serialized as MessagePack Ext type • Fluent::Engine.now and built-in parsers returns EventTime

Difﬁculties - EventTime • Backward compatibility • Performance

Backward compatibility 1 • There are many plugins. They must
works ﬁne just as it is. • EventTime behaves like current Timestamp in most cases. • If a plugin doesn’t want to loose sub-second, it may need additional code to handle EventTime. • I have checked many many plugins.

Backward compatibility 2 • To keep sub-second resolution across nodes
(forward plugins), external data format (MessagePack) have to serialize EventTime as a different type from old timestamp, but this may breaks old nodes. • Introduced time_as_integer option to output forward plugin to force timestamp to be serialized as Integer (same with the current timestamp).

Difﬁculties - EventTime • Backward compatibility • Performance

Performance concerns 1 • When time_as_integer is true and output
forward plugin receives PackedForward, it is deserialized and serialized for converting EventTime to Integer. • By keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay node.

Performance concerns 2 • If timestamp format of logs include
sub- second part, it is difﬁcult to cache results of Time.strptime (Time.strptime is heavy task). • Introduced strptime gem, which can precompile a format string.

Benchmark Results - strptime • Measured in_tail parsing performance •
Used dummer/ﬂowcounter_simple • machine spec • CPU: Core i5 5250U • Memory: 8GB • Disc: SSD 256G • OS: OSX Yosemite DBDIF OPDBDIF OPDBDIF XJUITUSQUJNF JO@UBJM MJOFTTFD

Elasticsearch DEMO

Summary • Current Timestamp have only second resolution. • Some
storages and platforms want nano(milli)second resolution. • I introduced new Timestamp which can have nanosecond resolution called EventTime.

Future work • Documentations for users and plugin developers. •
“This branch will be merged in master branch about next month.” by Fluentd commiters

EventTime will be released as a part of v0.14!

What I did • Nanosecond timestamp in Fluentd • Perfect
Monitor

Perfect Monitor Prototype of monitoring service for TD customers

What for? - Perfect Monitor • Visualize used computing resources
for customers in near real-time • e.g. number of records/bytes over event collector  number of running jobs  number of allocated CPU cores for a particular job • Reduce support cost by making customers to understand how they are using our computing resources • Make TD staffs known what/how our customers are doing data processing

What is? - Perfect Monitor • Dashboard for customers to
know how they are using our computing resources • Collector for various metrics sent from workers • Storages to store metrics (InﬂuxDB/TD) • API server to handle requests from dashboard and query backend storage • Dashboard Application

Use Cases 1 - number of records - •
Almost system administrator doesn't know about how many logs they are generating. • The number of logs are affected by many events, like release of new services, new versions of apps, and so on.

Use Cases 2 - number of running tasks -
• Customers doesn’t know how many are CPU cores is used by their each jobs now.

Use Cases 3 - support/sales team side - •
A support/sales engineer found the cause of problems more easily.

System Architecture

.POJUPSJOH  4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP
*OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache Architecture query (only old data)

*OqVY%# send metrics How to collect metrics • Workers send metrics to monitoring Server. • Monitoring Server ﬁlters and aggregates (if needed) metrics, then store them to InﬂuxDB and TD.

5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE query (only old
data) query cache How to show metrics • Dashboard asks API Server for metrics based on its conﬁguration. • API Server queries InﬂuxDB or TD(Presto) based on the time window. • Dashboard renders graph for query results.

*OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache The Point of Architecture 1 query (only old data)

The Point of Architecture 1 • Perfect Monitor store same
data to InfluxDB and TD(presto) • InfluxDB hold only recent data(e.g. 30 days) • TD(presto) hold all data • Why I chose InfluxDB? • It is fast enough. • We can write time-series data in a hash and query it on the fly flexibly, so it makes trial and error easy. 5% 1SFTUP *OqVY%# query query store store

*OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache The Point of Architecture 2 query (only old data)

The Point of Architecture 2 • API server interprets queries
from dashboard and queries InﬂuxDB or TD(presto) • Only when old data is needed, API Server queries TD(presto), otherwise InﬂuxDB • It can cache query results in redis. "1* 4FSWFS 3FEJT query cache query (only old data)

Failure case 1 • InfluxDB tends to be overloaded •
When InfluxDB is down, just launch new InfluxDB node and load data from TD(presto). 5% 1SFTUP OFX*OqVY%# restore

Failure case 2 • When InﬂuxDB is down or its
response is delayed, API Server query TD(presto). 5% 1SFTUP *OqVY%# "1* 4FSWFS query

For operation • The API server returns results even if
happening InﬂuxDB degradation. • Because then API server uses TD(Presto) automatically. • To add new metrics, you just need to add conﬁguration to monitoring server and dashboard application. • Of course, workers need to be able to send the metrics.

Summary • Show more metrics about computing resources to customers
for customers and TD. • Perfect Monitor makes it easy.

Impression of Summer Intern • I have thought about “data“
seriously everyday. This is my ﬁrst experience. So excited. • I have learned a lot by trying both of OSS and TD internal task.

Thank you!

Treasure Data Summer Intern 2015 Final Report

Treasure Data Summer Intern 2015 Final Report

More Decks by mururu

Featured

Transcript