Ten war stories while doing large-scale data management in Accudelta, and some approaches we took to survive the tsunami.
MONEYMATE / ACCUDELTA
Since 1991, ~100 employees
Ofﬁces in Dublin, Boston, London, New York,
Stockholm, Milan and Valencia.
Valencia is an engineering ofﬁce only
Black Rock, Fidelity, J.P. Morgan US, M&G, Prudential,
Charles Schwab, Schroders, State Street, Columbia
Threadneedle, Canada Life, IFDS, New Ireland, ...
#1 Wild Data
So… is that a bunch of Excel and
CSV files randomly piled up?
- Day 1, MoneyMate developer
#2 Timing ⏰
Data is inconsistent most of the time!
#3 Schema Agnostic
Every client has his own schema,
loading system has to be fast.
#3 Schema Agnostic
• Reduced load time from 22 h to 9min
• In-Memory and DB modes
• Avoid write-locks as much as possible
• Homeostasis: resilient/adaptive loading
• Reactive async publishing
#4 Parallel Testing
Replay one-month events in the system,
… using two software versions,
… then compare row-by-row, cell-by-cell.
#5 Schema Evolutions
• ~50MB of SQL, several more CSVs
• VCS and code review friendly
• Test-data & container migrations
• Forward-only, no rollbacks
• Exercised many times per day through CI builds
• etcd distributed locks, coordination
Right after the Brexit, one of our clients started to
load data in a daily-basis, instead of monthly.
Solvency II regulation was delayed for +2 years
#7 Latency, the hard way
Minimum network latency between New York and Dublin
Distance: 5111.28 km
Best fiber refractive index: 1.5 (n = c / v)
Max speed on that fiber: 199,861,639 m/s
= 5111.28 / vmax
= d / c = 17.04ms
#8 DIY Cluster
Cloud? Over my dead body.
- One of our lovely customers
That moment when you realize undersea cable broke and cluster is down (2014)
#9 Who needs a cluster?
Most of the problems are small.
Distributed systems are hard.
#10 Small Data
Big data is an excuse,
a catalyst improving the tools we have today