Slide 1

Slide 1 text

Summer Intern 2015
 Final Report September 30, 2015 Yuki Ito

Slide 2

Slide 2 text

Who am I? • Yuki Ito • Master’s student, Information Science and Technology, The University of Tokyo • msgpack-erlang and fluent-logger-erlang maintainer

Slide 3

Slide 3 text

TreasureData Summer Intern • 2015/08/03 ~ 2015/09/30 • @ TreasureData Tokyo office

Slide 4

Slide 4 text

What I did

Slide 5

Slide 5 text

What I did • Nanosecond timestamp in Fluentd • Perfect Monitor

Slide 6

Slide 6 text

What I did • Nanosecond timestamp in Fluentd • Perfect Monitor

Slide 7

Slide 7 text

Nanosecond timestamp in Fluentd

Slide 8

Slide 8 text

Current Fluentd timestamp • Unix Timestamp • Second resolution • 2015-09-29 15:55:43 +0900 => 1443509743

Slide 9

Slide 9 text

Problem • Some storages and platforms (Elasticsearch, GCP…) expect/want sub-second (millisecond, nanosecond) as timestamp. • But current Fluentd timestamp cannot hold sub-second. • If we support nanosecond resolution, it will be able to cover all requirements because it is minimum resolution generally. (but there will be a little more overhead than millisecond/microsecond)

Slide 10

Slide 10 text

New Fluentd Timestamp with nanosecond resolution

Slide 11

Slide 11 text

EventTime

Slide 12

Slide 12 text

Implementation - EventTime • Two attributes • @sec: second integer (same with current Timestamp) • @nsec: nanosecond integer • In most cases, it behaves just like current Timestamp • It is serialized as MessagePack Ext type • Fluent::Engine.now and built-in parsers returns EventTime

Slide 13

Slide 13 text

Difficulties - EventTime • Backward compatibility • Performance

Slide 14

Slide 14 text

Difficulties - EventTime • Backward compatibility • Performance

Slide 15

Slide 15 text

Backward compatibility 1 • There are many plugins. They must works fine just as it is. • EventTime behaves like current Timestamp in most cases. • If a plugin doesn’t want to loose sub-second, it may need additional code to handle EventTime. • I have checked many many plugins.

Slide 16

Slide 16 text

Backward compatibility 2 • To keep sub-second resolution across nodes (forward plugins), external data format (MessagePack) have to serialize EventTime as a different type from old timestamp, but this may breaks old nodes. • Introduced time_as_integer option to output forward plugin to force timestamp to be serialized as Integer (same with the current timestamp).

Slide 17

Slide 17 text

Difficulties - EventTime • Backward compatibility • Performance

Slide 18

Slide 18 text

Performance concerns 1 • When time_as_integer is true and output forward plugin receives PackedForward, it is deserialized and serialized for converting EventTime to Integer. • By keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay node.

Slide 19

Slide 19 text

Performance concerns 2 • If timestamp format of logs include sub- second part, it is difficult to cache results of Time.strptime (Time.strptime is heavy task). • Introduced strptime gem, which can precompile a format string.

Slide 20

Slide 20 text

Benchmark Results - strptime • Measured in_tail parsing performance • Used dummer/flowcounter_simple • machine spec • CPU: Core i5 5250U • Memory: 8GB • Disc: SSD 256G • OS: OSX Yosemite DBDIF OPDBDIF OPDBDIF XJUITUSQUJNF JO@UBJM MJOFTTFD

Slide 21

Slide 21 text

Elasticsearch DEMO

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Summary • Current Timestamp have only second resolution. • Some storages and platforms want nano(milli)second resolution. • I introduced new Timestamp which can have nanosecond resolution called EventTime.

Slide 24

Slide 24 text

Future work • Documentations for users and plugin developers. • “This branch will be merged in master branch about next month.” by Fluentd commiters

Slide 25

Slide 25 text

EventTime will be released as a part of v0.14!

Slide 26

Slide 26 text

What I did • Nanosecond timestamp in Fluentd • Perfect Monitor

Slide 27

Slide 27 text

Perfect Monitor Prototype of monitoring service for TD customers

Slide 28

Slide 28 text

What for? - Perfect Monitor • Visualize used computing resources for customers in near real-time • e.g. number of records/bytes over event collector
 number of running jobs
 number of allocated CPU cores for a particular job • Reduce support cost by making customers to understand how they are using our computing resources • Make TD staffs known what/how our customers are doing data processing

Slide 29

Slide 29 text

What is? - Perfect Monitor • Dashboard for customers to know how they are using our computing resources • Collector for various metrics sent from workers • Storages to store metrics (InfluxDB/TD) • API server to handle requests from dashboard and query backend storage • Dashboard Application

Slide 30

Slide 30 text

Use Cases 1 - number of records - • Almost system administrator doesn't know about how many logs they are generating. • The number of logs are affected by many events, like release of new services, new versions of apps, and so on.

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Use Cases 2 - number of running tasks - • Customers doesn’t know how many are CPU cores is used by their each jobs now.

Slide 33

Slide 33 text

Use Cases 3 - support/sales team side - • A support/sales engineer found the cause of problems more easily.

Slide 34

Slide 34 text

System Architecture

Slide 35

Slide 35 text

.POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache Architecture query (only old data)

Slide 36

Slide 36 text

.POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP *OqVY%# send metrics How to collect metrics • Workers send metrics to monitoring Server. • Monitoring Server filters and aggregates (if needed) metrics, then store them to InfluxDB and TD.

Slide 37

Slide 37 text

5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE query (only old data) query cache How to show metrics • Dashboard asks API Server for metrics based on its configuration. • API Server queries InfluxDB or TD(Presto) based on the time window. • Dashboard renders graph for query results.

Slide 38

Slide 38 text

.POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache The Point of Architecture 1 query (only old data)

Slide 39

Slide 39 text

The Point of Architecture 1 • Perfect Monitor store same data to InfluxDB and TD(presto) • InfluxDB hold only recent data(e.g. 30 days) • TD(presto) hold all data • Why I chose InfluxDB? • It is fast enough. • We can write time-series data in a hash and query it on the fly flexibly, so it makes trial and error easy. 5% 1SFTUP *OqVY%# query query store store

Slide 40

Slide 40 text

.POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache The Point of Architecture 2 query (only old data)

Slide 41

Slide 41 text

The Point of Architecture 2 • API server interprets queries from dashboard and queries InfluxDB or TD(presto) • Only when old data is needed, API Server queries TD(presto), otherwise InfluxDB • It can cache query results in redis. "1* 4FSWFS 3FEJT query cache query (only old data)

Slide 42

Slide 42 text

Failure case 1 • InfluxDB tends to be overloaded • When InfluxDB is down, just launch new InfluxDB node and load data from TD(presto). 5% 1SFTUP OFX*OqVY%# restore

Slide 43

Slide 43 text

Failure case 2 • When InfluxDB is down or its response is delayed, API Server query TD(presto). 5% 1SFTUP *OqVY%# "1* 4FSWFS query

Slide 44

Slide 44 text

For operation • The API server returns results even if happening InfluxDB degradation. • Because then API server uses TD(Presto) automatically. • To add new metrics, you just need to add configuration to monitoring server and dashboard application. • Of course, workers need to be able to send the metrics.

Slide 45

Slide 45 text

Summary • Show more metrics about computing resources to customers for customers and TD. • Perfect Monitor makes it easy.

Slide 46

Slide 46 text

Impression of Summer Intern • I have thought about “data“ seriously everyday. This is my first experience. So excited. • I have learned a lot by trying both of OSS and TD internal task.

Slide 47

Slide 47 text

Thank you!