whoami ● @syu_cream ● Mercari SRE newbie :) ● Develop/maintain Data processing systems and some middlewares ● My interests: ○ Go, mruby, and automation
Deep dive to around BigQuery ● Receive forwarded logs by Fluentd ● Manage uploading job by cron ○ Split and compress large log files ○ Upload logs to GCS ○ Load logs from GCS to BigQuery
Issues... ● It was hard to retry failed job ○ We catch slack alert when it fails ○ But we need to reproduce some cmds ● It didn’t visualize job status ● Uploading logic becomes complex ○ Related to BQ resource limit ■ Compressed JSON has 4GB upper limit ○ So we had split large file before uploading
Improve Job w/ digdag ● We start to use digdag, a simple workflow engine. ○ https://www.digdag.io/ ● And Split log files in Fluentd ○ By adjusting buffer_chunk_limit
Issues... ● It’s hard to maintain … ○ Analyser become very complex :( ○ Memory usage is very high ● It’s performance is poor. ○ Single thread ○ Sequential processing
Reimplement statistics analyser ● Replace w/ Cloud Dataflow ○ https://cloud.google.com/dataflow/ ○ Full-managed ETL service ○ It can offload machine resource issues. ● Reimplement analyser logic by Scala ○ Using https://github.com/spotify/scio ○ Performance is good and it’s auto-scaled ○ Job status is visualized automatically