Agenda • Casual introduction of log-collectors in Go • fluent-forwarder, fluent-agent-hydra, Heka • In the case of Mercari • Fluentd and fluent-agent-hydra • Road to introduce fluent-agent-hydra into production environment
fluent-agent-hydra • https://github.com/fujiwara/fluent-agent-hydra • Features • in_tail & (in|out)_forward • Handle multiple file per 1 process • Monitoring API • Configuration with TOML • Support LTSV and JSON
Heka • https://github.com/mozilla-services/heka • Event processing / windowed stream operations • Provides Plugin system • User can write plugin in Go or Lua • Custom Input, Output, Encoder, Decoder, Filter • Configuration with TOML
Fluentd • Fluentd is very flexible and robust log- collector • Plugin systems • Active developers and communities • But performance is not always enough
PascalʙMercari analysis baseʙ • Built with the software blocks below • Puree, OpenResty (ngx_lua), Fluentd, fluent-agent-hydra, Google BigQuery • Aggregate various logs to Google BigQuery • Event in app • A/B Testing • etc…
Why switched to fluent-agent-hydraʁ • Low server resource is required for Pascal • Fluentd had consumed non-negligible amount of CPU resource (50ʙ60% at peak) • Pascal indicates modestly high workload • OpenResty processes a lot of JSONs and outputs various logs • Requests are come from not only device but API-servers also
Why switched to fluent-agent-hydraʁ • fluent-agent-hydra • CPU usage is less than Fluentd • Half as compared to Fluentd in our caseʂ • Enable handling multiple logs per 1 process • Simple
fluent-agent-hydra internal ɿgoroutine monitor out_forwarder in_forwarder watcher & in_tail for file go func() go func() go func() go func() run main() wait signal Some gorouines make more goroutines
goroutines communicates with channel ɿgoroutine monitor out_forwarder in_forwarder watcher & in_tail for file receiver is monitor receiver is out_forwarder
Monitoring fluent-agent-hydra • fluent-agent-hydra provides monitoring APIs • Application stats • current positions for tailing log files • sent amount and bytes per a log • other informations (e.g. error) • System stats • Powered by golang-stats-api-handler
After switched to fluent-agent-hydra BigQuery error in load operation: Error processing job Field:xxx: Cloud not convert value to integer ( bad value or out of range ) Field:yyy: Cloud not convert value to integer ( bad value or out of range ) Field:xxx: Cloud not convert value to integer ( bad value or out of range ) Field:xxx: Cloud not convert value to integer ( bad value or out of range ) Field:xxx: Cloud not convert value to integer ( bad value or out of range ) … ʂʁ
Bigquery’s demand • Google Bigquery demands fixed table schema and strict data format [ { “name”:”value”, “type”:”INTEGER” }, ] ■ schema.json {“value”:150} ■ valid data foramt ■ invalid data format {“value”:150.0} {“value”:”150”}
Special conversion behavior for numerical value • fluent-agent-hydra treats a numerical value as float64 even if its type is integer • When log-format is JSON • Whyʁ • Because fluent-agent-hydra uses encoding/json and unmarshal JSON into interface values
By the way, • fluent-agent-hydra provides the directive Types for converting type # in config.toml Types = “value:integer” But this was provided for only LTSV at that time…
Now • fluent-agent-hydra provides the directive Types for converting type # in config.toml Types = “value:integer” Always convert type to int64 regardless format is LTSV or JSON
Summary • Fluentd is very flexible and robust log-collector • But performance is not always enough • There are some alternatives in Go • fluent-agent-hydra might fit the case below • Want a faster and light-weight log-collector for in_tail & out_forward • But robustness is less than Fluentd • e.g. position file is not supported