Upgrade to Pro — share decks privately, control downloads, hide ads and more …

10 insane things on Big Data

Luis Belloch
September 22, 2016

10 insane things on Big Data

Ten war stories while doing large-scale data management in Accudelta, and some approaches we took to survive the tsunami.

Luis Belloch

September 22, 2016
Tweet

More Decks by Luis Belloch

Other Decks in Programming

Transcript

  1. 10 insane
    things on
    big data
    LUIS BELLOCH 

    MONEYMATE / ACCUDELTA
    SEPT. 2016
    ETSINF UPV

    View full-size slide

  2. Since 1991, ~100 employees
    Offices in Dublin, Boston, London, New York,
    Stockholm, Milan and Valencia.
    Valencia is an engineering office only
    Black Rock, Fidelity, J.P. Morgan US, M&G, Prudential,
    Charles Schwab, Schroders, State Street, Columbia
    Threadneedle, Canada Life, IFDS, New Ireland, ...

    View full-size slide

  3. #1 Wild Data
    So… is that a bunch of Excel and
    CSV files randomly piled up?
    - Day 1, MoneyMate developer

    View full-size slide

  4. #2 Timing ⏰
    Data is inconsistent most of the time!

    View full-size slide

  5. #3 Schema Agnostic
    Every client has his own schema,

    loading system has to be fast.

    View full-size slide

  6. #3 Schema Agnostic
    • Reduced load time from 22 h to 9min
    • In-Memory and DB modes
    • Avoid write-locks as much as possible
    • Homeostasis: resilient/adaptive loading
    • Reactive async publishing
    LOADING
    PUBLISHING

    View full-size slide

  7. #4 Parallel Testing
    Replay one-month events in the system,

    … using two software versions, 

    … then compare row-by-row, cell-by-cell.

    View full-size slide

  8. #5 Schema Evolutions
    • ~50MB of SQL, several more CSVs
    • VCS and code review friendly
    • Test-data & container migrations
    • Forward-only, no rollbacks
    • Exercised many times per day through CI builds
    • etcd distributed locks, coordination

    View full-size slide

  9. #6a Market
    Right after the Brexit, one of our clients started to
    load data in a daily-basis, instead of monthly.

    View full-size slide

  10. #6b Government
    Solvency II regulation was delayed for +2 years

    View full-size slide

  11. #7 Latency, the hard way
    Minimum network latency between New York and Dublin

    Distance: 5111.28 km
    Best fiber refractive index: 1.5 (n = c / v)
    Max speed on that fiber: 199,861,639 m/s
    tfiber
    = 5111.28 / vmax
    = 25.57ms
    tmin
    = d / c = 17.04ms

    View full-size slide

  12. (http://www.nanex.net/aqck2/4680.html)

    View full-size slide

  13. #8 DIY Cluster
    Cloud? Over my dead body.
    - One of our lovely customers

    View full-size slide

  14. That moment when you realize undersea cable broke and cluster is down (2014)

    View full-size slide

  15. #9 Who needs a cluster?
    Most of the problems are small.

    Distributed systems are hard.

    View full-size slide

  16. #10 Small Data
    Big data is an excuse,

    a catalyst improving the tools we have today

    View full-size slide

  17. thanks!
    @luisbelloch

    View full-size slide