History of Event Collector in Treasure Data

History of Event Collector One of the legacy systems in
Treasure Data Mitsunori Komatsu

About me • Mitsunori Komatsu (@komamitsu),   Software engineer (Backend
team) • Joined Treasure Data almost 5 years ago • Hive, Presto, PlazmaDB, Mobile SDKs, Datatank, Workﬂow, … Event Collector, Bigdam (Pig, Impala…) • Favorite language: OCaml • RE OSS dev • MessagePack-Java, Digdag, Fluency

Retired legacy systems in Treasure Data Apache Pig integration Apache
Impala integration Prestogres ?????????

Retired legacy systems in Treasure Data Apache Pig integration Apache
Impala integration Prestogres Event Collector  (retirement candidate) E C

What’s Event Collector? • HTTP server application receives events from
JavaScript SDK, Mobile SDKs, etc… • Buffers events for several minutes in local disk and uploads them to the existing import endpoint in Treasure Data • Existing original data ingestion endpoint td-api isn’t good at handling frequent small uploads • Consists of Fluentd in/out plugins (similar to in_http / out_tdlog) to rely on Fluentd’s buffering mechanizm • It’s been developed ad hoc and improved ad hoc…

Fluentd in_event_collector out_event_collector td-api Redis dedup 1 event, 1 event,
… events set events set System architecture (2014-06) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS

Problem #0 (2014-11) • Event Collector stores UUID of request
from Mobile SDKs into Redis for de-duplication • Event Collector should be scaled out. But the Redis needed to be scaled out…

Sharded Redis • UUIDs of requests are hash-partitioned and stored
in a sharded Redis cluster • We created a bit intelligent Redis client that can • fail over to secondary Redis instance • double-write UUIDs to another Redis instance as well as current assigned one so that re- partitioning can be done w/o duplicated data

Sharded Redis Event Collector Redis#0 Redis#1 Redis list Redis#0 Redis#1
event#0 w/ UUID#0 (=>1000) 1000 % 2 = 0 => Redis#0 UUID#0 UUID#0

Sharded Redis Event Collector Redis#0 Redis#1 Redis list Redis#0 Redis#1
event#1 w/ UUID#1 (=>1001) 1001 % 2 = 1 => Redis#1 UUID#1 UUID#0 UUID#1

Sharded Redis Event Collector Redis#0 Redis#1 Redis list - Redis#0
- Redis#1 New Redis list - Redis#0 - Redis#1 - Redis#2 event#4 w/ UUID#4 (=>1004) 1004 % 2 = 0 => Redis#0   1004 % 3 = 2 => Redis#2 for replication UUID#4 UUID#0 UUID#4 UUID#1 Redis#2 UUID#4 UUID#4 Replicate UUID in advance Not used for dedup for now

Sharded Redis Event Collector Redis#0 Redis#1 Redis list - Redis#0
- Redis#1 - Redis#2 event#11 w/ UUID#11 (=>1011) 1011 % 3 = 2 => Redis#2 UUID#11 UUID#0 UUID#4 UUID#1 UUID#7 Redis#2 UUID#4  UUID#7 UUID#11 For UUID#4/7, this Redis stores them in advance and can dedup them

Sharded Redis

Fluentd in_event_collector td-api Sharded Redis dedup events set System architecture
(2014-11) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Redis#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N out_event_collector 1 event, 1 event, … events set buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS

Problem #1 (2015-01) • The usage of Event Collector was
getting increased. It sometimes got down…                • “TCP: Possible SYN ﬂooding on port xxxx. Sending cookies.” in kern.log • “$ netstat -s  19855 times the listen queue of a socket overﬂowed  19855 SYNs to LISTEN sockets dropped”

Listen socket backlog • The default listen socket backlog queue
length was only 1024. • The short queue length was subject to trafﬁc spikes • Increased it up to 8192.

Fluentd in_event_collector td-api Sharded Redis dedup events set System architecture
(2015-01) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Redis#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 out_event_collector 1 event, 1 event, … events set buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS

Problem #2 (2015-05) • The usage of Event Collector was
getting more increased. It still sometimes got down… • There were 2 options, to optimize the source code and to run Event Collector in multiprocess

Optimization of source code • First…, profile! profile! profile! •
2 performance bottlenecks were: • sending metrics per request to another Fluentd ➡ Aggregated 5 seconds range metrics in memory to reduce the number of messages to another Fluentd • parsing UserAgent ➡ Cached 100 UserAgentParser (ua-parser) instances with LRU eviction • The performance was improved 50 times

In multiprocess • Only 1 Event Collector process was running.
But Ruby can’t make the best of multi-core with multi- threads • It was time to run Event Collector in multiprocess!

fluent-plugin-multiprocess • With fluent-plugin-multiprocess, 8 sets of input / output
plugin workers of Event Collector can run in multi processes • The performance improved 6 times with it • Drawback: • The number of output plugin workers also increased • As a result, the number of uploaded chunks to td-api increased significantly. td-api sometimes suffered from a lot of tiny uploaded chunk files… • In other words, td-api was a bottleneck in Event Collector’s scalability

ﬂuent-plugin-multiprocess Event Collector Nginx td-api input plugin worker output plugin
worker input plugin worker output plugin worker input plugin worker output plugin worker events

Fluentd (multi process #0) Fluentd (multi process #0) Fluentd (multi
process #0) in_event_collector td-api Sharded Redis dedup events set System architecture (2015-05) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Redis#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 - Avoid instantiating Ruby object - Reduce metrics requests to Fluentd out_event_collector 1 event, 1 event, … events set buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS Nginx

Problem #3 (2015-12) • td-api sometimes got unstable since the
number of uploaded chunk ﬁles had increased…. • We needed to reduce the number of uploaded chunk ﬁles to td-api from Event Collector

detach_process • Found a Fluentd conﬁguration item “detach_process” for input
plugins • With it, multiple input_plugin workers can run in multi processes keeping the number of output_plugin workers to 1 • The number of uploaded chunk ﬁles to td-api would be reduced!

detach_process Event Collector Nginx td-api input plugin worker  (parent) output
plugin worker events input plugin worker input plugin worker input plugin worker

Fluentd in_event_collector (detach_process #0) in_event_collector (detach_process #0) in_event_collector (detach_process #0)
td-api Sharded Redis dedup events set System architecture (2015-12) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Redis#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 - Avoid instantiating Ruby object - Reduce metrics requests to Fluentd out_event_collector 1 event, 1 event, … events set buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS Nginx

Problem #4 (2016-04) • There were some issues with “detach_process”
in a race condition • Also, “detach_process” got a deprecated feature! https://www.ﬂuentd.org/blog/ﬂuentd-v0.14.9-has-been-released

Backed to ﬂuent-plugin-multiprocess…

process #0) in_event_collector td-api Sharded Redis dedup events set System architecture (2016-04) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Redis#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 - Avoid instantiating Ruby object - Reduce metrics requests to Fluentd out_event_collector 1 event, 1 event, … events set buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS Nginx

Problem #5 (2016-05) • The throughput of Redis cluster sometimes
got a bottleneck • When dedup with Redis cluster for requests from Mobile SDK got stuck, processing requests from JavaScript SDK / Postback from SaaS were stuck too…

dedup in another thread • If de-duplication processing runs in
a different thread from input_plugin worker, input_plugin worker can continue to process requests even when accesses to Redis gets stuck • existing output_plugin runs in another thread, so it might be an option to dedup in it • But output_plugin handles large chunk files. So if it retries around the end of a chunk file, all records in the chunk file are handled as duplicated records. • We needed to mitigate the impact of this case • Let’s insert a new thin output plugin worker!

dedup in another thread input_plugin worker Redis Cluster output_plugin worker
Before event w/ UUID GetAndSet: UUID Fluentd's  Buffer Emit event response: OK : : event w/ UUID Chunked events Upload to td-api If dedup gets stuck, it affects processing of all requests… : :

dedup in another thread input_plugin worker Redis Cluster output_plugin worker
After event w/ UUID GetAndSet: UUID Fluentd's  Buffer#1 Emit event response: OK : : event w/ UUID Chunked events Retention time: 1m Upload to td-api thin  output_plugin worker Fluentd's  Buffer#0 Small Chunked events Retention time: 5s Emit event : : Even if dedup gets stuck, it doesn’t affect processing of all requests! Made input_plugin return response ASAP w/o dedup

process #0) in_event_collector out_event_collector td-api Sharded Redis dedup 1 event, 1 event, … events set events set System architecture (2016-05) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Redis#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 - Avoid instantiating Ruby object - Reduce metrics requests to Fluentd out_object_handover small events set buffer chunks (retention time: 5s) buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS Nginx

Problem #6 (2016-06) • The de-duplication with Redis tended to
be delayed • Even we upgrade the instance type of Redis cluster, Redis runs on a single core and can’t use beneﬁts of multicores… • Actually, we used Redis as just a KVS. The complex data types of Redis wasn’t needed in the end

Memcached • Replaced the dedup cluster with Memcached • Any
problems didn’t occur during the migration thanks to the double write feature • Based on benchmark results using actual access pattern of Event Collector, the performance improved twice and the memory consumption of dedup cluster was reduced down to 68% comparing to Redis

process #0) in_event_collector out_event_collector td-api Sharded Memcached 1 event, 1 event, … events set events set System architecture (2016-06) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Memcached#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 - Avoid instantiating Ruby object - Reduce metrics requests to Fluentd out_object_handover small events set buffer chunks (retention time: 5s) buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS Nginx dedup

Problem #7 (2016-12) • Needed to support 36 hours TTL
on the dedup cluster for one of our customers while using default 1 hour TTL for other customers’ requests • It sounded easy since Memcached’s APIs support TTL • But the Memcached dedup cluster stopped reclaiming expired entries that had 1 hour TTL and memory consumptions got increased drastically…

Reclamation mechanism in Memcached • When reading Memcached source code,
found the cause • Memcached removes expired entries as far as it continues to find expired entries in a row. It works fine when all entries have similar TTL • But, if Memcached has 1 hour TTL and 36 hours TTL entries, it stops reclaiming when it finds a 36 hours TTL entry even it has expired 1 hour TTL entries a lot behind 

Reclamation mechanism in Memcached

“lru_crawler crawl all” • Found Memcached provides an API “lru_crawler” 
              • Let’s call “lru_crawler crawl all” repeatedly!  - Takes a single, or a list of, numeric classids (ie: 1,3,10). This instructs the crawler to start at the tail of each of these classids and run to the head. The crawler cannot be stopped or restarted until it completes the previous request. The special keyword "all" instructs it to crawl all slabs with items in them. https://github.com/memcached/memcached/blob/master/doc/protocol.txt

process #0) in_event_collector out_event_collector td-api Sharded Memcached 1 event, 1 event, … events set events set System architecture (2016-12) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N Memcached#0 UUID#0, UUID#4, UUID#8, UUID#12,  …, UUID#N - LISTEN socket backlog : 8192 - Avoid instantiating Ruby object - Reduce metrics requests to Fluentd out_object_handover small events set buffer chunks (retention time: 5s) buffer chunks (retention time: 3min) lru_crawler HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS Nginx dedup

Fluentd in_event_collector out_event_collector td-api Redis dedup 1 event, 1 event,
… events set events set System architecture (2014-06) HTTPS UUID#0, UUID#1, UUID#2, UUID#3,  …, UUID#N buffer chunks (retention time: 1min) HTTPS event w/ UUID JS SDK / Mobile SDKs /  Postback from SaaS

Usage change requests/sec 2015-05 910 2018-02 19300 2015-05 2018-02 -
Increased more than 21 times! - Event Collector is very important  as a part of CDP service Triggered an alert!

But it still has problems… • Buffered data stored in
local disk isn’t replicated • Uploads of many small chunk ﬁles to td-api • Aggregation those small ones with in_forward plugin is an option, though... • Further performance improvement   

local disk isn’t replicated ➡ Bigdam will resolve it! • Uploads of many small chunk ﬁles to td-api • Aggregation those small ones with in_forward plugin is an option, though... • Further performance improvement   

local disk isn’t replicated ➡ Bigdam will resolve it! • Uploads of many small chunk ﬁles to td-api • Aggregation those small ones with in_forward plugin is an option, though... ➡ Bigdam will resolve it! • Further performance improvement   

local disk isn’t replicated ➡ Bigdam will resolve it! • Uploads of many small chunk ﬁles to td-api • Aggregation those small ones with in_forward plugin is an option, though... ➡ Bigdam will resolve it! • Further performance improvement ➡ Bigdam will resolve it!

History of Event Collector in Treasure Data

History of Event Collector in Treasure Data

More Decks by Mitsunori Komatsu

Other Decks in Technology

Featured

Transcript