CPEN 431 Final Presentation

Group 4A Violet

Architecture • Leverage Go's concurrency primitives ◦ Goroutines (threads but
lighter, faster, safer), channels (thread-safe FIFO queue), and selects (switch that waits on n-channel operations, blocking until a case can run) • Heavy lifting done on initialization ◦ Store, cache, workers creation, cluster initialization & synchronization ◦ Send sockets creation (flame-graph-ed optimization) • 1 lightweight inbound UDP requests listening ◦ Looping lightweight goroutine, reads bytes and pushes buffer into a jobs channel • N heavyweight worker goroutines ◦ Looping, pulls a request from jobs channel ◦ Deals with forwarding, replication, local storage interactions, caching ▪ Responsible for sending any requests it needs to • Interesting? ◦ Cache expiration done on GET() call, only iterates every 5s (flame-graph-ed optimization) ◦ Cluster membership delegated to open source library (hashicorp/memberlist)

Replication Approach • PUT/REMOVE identical to Dynamo • GET ◦
On receiving node, send GET_REP to all replicas (three), with response routed directly to client ◦ Client accepts first response (only 1 due to UID) • Advantages ◦ Superficially lowers GET latency ◦ Simple to implement ▪ Less state to implement - tracking avg. node latency, etc. • Disadvantages ◦ Poor scaling ▪ Increases system load by 3x ▪ No noticeable drop in throughput until high load

• Memory management (due to Go) ◦ No explicit stack
limit, cannot limit heap using command line ▪ Linux kernel too old to use ulimit command ◦ Required us to ‘manually’ limit memory usage to stay comparable ▪ Recycle memory byte buffers ◦ Use community tool to trace GC collection and memory usage • Network performance realizations ◦ ‘Fire and ForGET’ ◦ Only currently acknowledging forwards (will retry based on A1 behavior) ▪ Became too busy for A7 and A8, next step is to remove them and try only once ◦ Drop incoming requests when system is full (instead of responding with overload) ▪ Timeouts are low until high overload • Performance analysis flame graphs ◦ Took <5 minutes to set up, thanks to Go's design & community Performance and Resilience Factors

GC and memory usage trace

Flame graphs 1: stop sharing the connection Share connection for
both inbound requests and outbound responses (locking of connection) Create a new connection for each worker to write outbound responses (no locking of listen connection)

Flame graphs 2: initialize connections for replication ahead of time
Base Create a new connection for each replication request (system call is slow, so staccccked bro!) Replication Create a connection during worker initialization for replication (reduced stack, time to spend on other stuffs)

Flame graphs 3: reuse replication connections for forwarding, improves but
re-introduces issues Replication Use the replication connection for forwarding requests as well (note, these graphs are zoomed to the workers.worker call level) Forwarding Re-introduces the connection locking, with forwarding and replication contending for the same connection (seen in flame graph 1)

Performance results from flame graph optimizations (15 nodes, local machine,
A7 test client stage 1 results) Clients Base Replication Forwarding S1 500 820 1190 S16 1430 1670 3020 S32 1470 1760 2810 R1 590 610 1030 R16 1300 1580 2560 R32 1230 1560 2470 R64 1320 1670 2600 R128 1220 1500 2620 R256 970 1290 X X: out of buffer

What We Learned • Log, log, log! ◦ Centralized logging
service allows us to collect details on nodes & cluster state • It's about PL node selection, pick wisely ◦ NA: easy to set up, higher latency, fewer cores/node on average, weird behavior ◦ EU: more port availability & connection restriction, closer together, more cores • It's about timing ◦ Performance varies throughout the day, but stabilizes during the night (1-3 AM) ◦ But, no support when submission server is down late at night and can't post good results • It’s about resilience ◦ Our server's throughput was crippled if just one node misbehaved • It's about dedication ◦ Tailed-off effort towards end of class: ~5 hours / A6, A7, A8 ◦ Realized problem is node selection, not server, so skipped many identified optimizations • Give Go a try! ◦ Light, simple to write, great community, just waiting for GC to become generational

Questions?

CPEN 431 Final Presentation

CPEN 431 Final Presentation

Tony Li

Other Decks in Programming

Featured

Transcript

Group 4A Violet

Architecture • Leverage Go's concurrency primitives ◦ Goroutines (threads but

Replication Approach • PUT/REMOVE identical to Dynamo • GET ◦

• Memory management (due to Go) ◦ No explicit stack

GC and memory usage trace

Flame graphs 1: stop sharing the connection Share connection for

Flame graphs 2: initialize connections for replication ahead of time

Flame graphs 3: reuse replication connections for forwarding, improves but

Performance results from flame graph optimizations (15 nodes, local machine,

What We Learned • Log, log, log! ◦ Centralized logging

Questions?