10 insane things on Big Data

Ae4eba7276de2b7b09199c4492f714cd?s=47 Luis Belloch
September 22, 2016

10 insane things on Big Data

Ten war stories while doing large-scale data management in Accudelta, and some approaches we took to survive the tsunami.

Ae4eba7276de2b7b09199c4492f714cd?s=128

Luis Belloch

September 22, 2016
Tweet

Transcript

  1. 10 insane things on big data LUIS BELLOCH 
 MONEYMATE

    / ACCUDELTA SEPT. 2016 ETSINF UPV
  2. Since 1991, ~100 employees Offices in Dublin, Boston, London, New

    York, Stockholm, Milan and Valencia. Valencia is an engineering office only Black Rock, Fidelity, J.P. Morgan US, M&G, Prudential, Charles Schwab, Schroders, State Street, Columbia Threadneedle, Canada Life, IFDS, New Ireland, ...
  3. None
  4. None
  5. #1 Wild Data So… is that a bunch of Excel

    and CSV files randomly piled up? - Day 1, MoneyMate developer “
  6. None
  7. None
  8. #2 Timing ⏰ Data is inconsistent most of the time!

  9. #3 Schema Agnostic Every client has his own schema,
 loading

    system has to be fast.
  10. #3 Schema Agnostic • Reduced load time from 22 h

    to 9min • In-Memory and DB modes • Avoid write-locks as much as possible • Homeostasis: resilient/adaptive loading • Reactive async publishing LOADING PUBLISHING
  11. #4 Parallel Testing Replay one-month events in the system,
 …

    using two software versions, 
 … then compare row-by-row, cell-by-cell.
  12. #5 Schema Evolutions • ~50MB of SQL, several more CSVs

    • VCS and code review friendly • Test-data & container migrations • Forward-only, no rollbacks • Exercised many times per day through CI builds • etcd distributed locks, coordination
  13. #6a Market Right after the Brexit, one of our clients

    started to load data in a daily-basis, instead of monthly.
  14. #6b Government Solvency II regulation was delayed for +2 years

  15. #7 Latency, the hard way Minimum network latency between New

    York and Dublin
 Distance: 5111.28 km Best fiber refractive index: 1.5 (n = c / v) Max speed on that fiber: 199,861,639 m/s tfiber = 5111.28 / vmax = 25.57ms tmin = d / c = 17.04ms
  16. (http://www.nanex.net/aqck2/4680.html)

  17. #8 DIY Cluster Cloud? Over my dead body. - One

    of our lovely customers “
  18. That moment when you realize undersea cable broke and cluster

    is down (2014)
  19. #9 Who needs a cluster? Most of the problems are

    small.
 Distributed systems are hard.
  20. #10 Small Data Big data is an excuse,
 a catalyst

    improving the tools we have today
  21. thanks! @luisbelloch