(millisecond, nanosecond) as timestamp. • But current Fluentd timestamp cannot hold sub-second. • If we support nanosecond resolution, it will be able to cover all requirements because it is minimum resolution generally. (but there will be a little more overhead than millisecond/microsecond)
(same with current Timestamp) • @nsec: nanosecond integer • In most cases, it behaves just like current Timestamp • It is serialized as MessagePack Ext type • Fluent::Engine.now and built-in parsers returns EventTime
works ﬁne just as it is. • EventTime behaves like current Timestamp in most cases. • If a plugin doesn’t want to loose sub-second, it may need additional code to handle EventTime. • I have checked many many plugins.
(forward plugins), external data format (MessagePack) have to serialize EventTime as a different type from old timestamp, but this may breaks old nodes. • Introduced time_as_integer option to output forward plugin to force timestamp to be serialized as Integer (same with the current timestamp).
forward plugin receives PackedForward, it is deserialized and serialized for converting EventTime to Integer. • By keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay node.
for customers in near real-time • e.g. number of records/bytes over event collector number of running jobs number of allocated CPU cores for a particular job • Reduce support cost by making customers to understand how they are using our computing resources • Make TD staffs known what/how our customers are doing data processing
know how they are using our computing resources • Collector for various metrics sent from workers • Storages to store metrics (InﬂuxDB/TD) • API server to handle requests from dashboard and query backend storage • Dashboard Application
data) query cache How to show metrics • Dashboard asks API Server for metrics based on its conﬁguration. • API Server queries InﬂuxDB or TD(Presto) based on the time window. • Dashboard renders graph for query results.
data to InﬂuxDB and TD(presto) • InﬂuxDB hold only recent data(e.g. 30 days) • TD(presto) hold all data • Why I chose InﬂuxDB? • It is fast enough. • We can write time-series data in a hash and query it on the ﬂy ﬂexibly, so it makes trial and error easy. 5% 1SFTUP *OqVY%# query query store store
from dashboard and queries InﬂuxDB or TD(presto) • Only when old data is needed, API Server queries TD(presto), otherwise InﬂuxDB • It can cache query results in redis. "1* 4FSWFS 3FEJT query cache query (only old data)
happening InﬂuxDB degradation. • Because then API server uses TD(Presto) automatically. • To add new metrics, you just need to add conﬁguration to monitoring server and dashboard application. • Of course, workers need to be able to send the metrics.