Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2016 - Trisha Kothari - Data in a dynamic syste...

PyBay
August 20, 2016

2016 - Trisha Kothari - Data in a dynamic system: Strategies for backwards compatibility

Description
There are several unanswered questions in deploying huge schema or logic changes: How do you modify systems with zero downtime or service interruption? How do you optimize online data migrations to allow for fallbacks? Any changes in schema or code in dynamic systems may cause existing users to experience downtime. The talk focuses on strategies to ensure backwards compatibility and prevent breaking data integrity.

Abstract
In an ideal scenario, feature development is easy. Just replace the old code with new code, and you’re done. This is, in fact, true for a system in state of inertia. However, in a dynamic system, with constantly moving pieces of business logic, this presents a hard problem. There are several unanswered questions while deploying huge schema or logic changes: How do you make code and schema changes with zero downtime or service interruption? How do you optimize online migrations of data to allow for fallbacks? Any changes in schema or code in dynamic systems may cause existing users to experience downtime. The talk focuses on strategies to ensure backwards compatibility and prevent breaking data integrity.

Bio
Trisha works as a Software Engineer at Affirm, a take on modern banking started by Max Levchin. At Affirm, Trisha has worked on several projects including the creation of the underlying financial system, architecture of systems for underwriting data processing, and several other product features. She graduated from the University of Pennsylvania studying Computer Science.

PyBay

August 20, 2016
Tweet

More Decks by PyBay

Other Decks in Programming

Transcript

  1. What is a loan? Movement of money in a ledger

    Double entry accounting system id balance amount Mate id 1 cash 100 2 2 principal -100 1
  2. Dynamic systems • Service level changes • Changes to data

    at rest DATA INTEGRITY IS OF PARAMOUNT IMPORTANCE!
  3. Changes to service level code • Conditionals ◦ if <Condition1>:

    Treatment1() else: Treatment2() ◦ Messy code :( • API versioning • Deploying new dependencies first ◦ Optional arguments ◦ Make sure results can be consumed by the caller
  4. Why is dealing with data at rest hard? Data is

    big! Data is dumb! How do you get backwards compatibility?
  5. luigi Open sourced in late 2012 Awesome for batch jobs

    Not for replacing Hive or Pig Spotify, Foursquare, Stripe, Affirm, hotels.com, etc Why Luigi for data migration?
  6. Alembic Database migration tool for SQLAlchemy Autogenerate: • Easy! •

    Gotcha: Renaming a column ⇒ Removal and addition of new column • Another gotcha: “Multiple heads not supported”