Handling data in distributed systems

338220c813cb92748fdac5bbb6c5ee43?s=47 Aviran Mordo
September 27, 2018

Handling data in distributed systems

Some of the patterns to handle data in a distributed system used by Wix.com

338220c813cb92748fdac5bbb6c5ee43?s=128

Aviran Mordo

September 27, 2018
Tweet

Transcript

  1. @aviranm Aviran Mordo, VP of Engineering, Wix.com Handling Data in

    Distributed Systems Arrested by the CAP Twitter: @aviranm linkedin/aviran aviransplace.com
  2. @aviranm Service A Service B

  3. @aviranm What is this arrow? Service A Service B

  4. @aviranm Microservices = Distributed System

  5. Over 800 Microservices (unique) in Production

  6. Aviran Mordo, VP of Engineering, Wix.com Hello @aviranm

  7. @aviranm

  8. @aviranm Wix.com in Numbers Vilnius Kiev Dnipro Tel-Aviv Be’er Sheva

    130M website builders (+2M monthly) 600M monthly visitors Multiple clouds & data centers (Google, Amazon) 2000 Employees (~50% R&D) 5 R&D centers #5 best software companies to work for worldwide (according to Glassdoor) Ukraine Israel Lithuania
  9. @aviranm AGENDA Avoiding database transactions Handling database schema changes Read

    consistency in a distributed system Dealing with multiple datacenters
  10. @aviranm Avoid DB Transactions 01

  11. @aviranm Create an Invoice

  12. @aviranm Create an Invoice Multiple line items Header

  13. @aviranm Create an Invoice Multiple line items Header Save as

    Transaction
  14. @aviranm Create an Site Multiple Pages

  15. How do we save multiple pages in a transaction (without

    DB transaction)?
  16. Replace DB Transaction with Logical Transaction

  17. @aviranm Logical DB transaction Saving a Wix Site’s Data Site

    Pages DB Save page(s) 1. Save each page as an atomic operation 2. Finalize transaction by sending site header (pointers to pages) Can generate orphaned pages, not a problem in practice Site Header DB Save header Browser Editor Server Save page(s) Save header List of page IDs
  18. @aviranm Master-Master Replication across DCs Pages MySQL Pages MySQL MySQL

    Active – Active DC-2 DC-1
  19. @aviranm Write Traffic may Flow to Both Datacenters Pages MySQL

    Pages MySQL DC-2 DC-1 Browser Browser Save page Save page
  20. @aviranm Replication Conflict Pages MySQL Pages MySQL MySQL strategy Stop

    replication or Ignore conflict (drop incoming) DC-2 DC-1 Wix users change millions of pages every day.
  21. @aviranm Pages MySQL Pages MySQL DB Conflicts can be safely

    ignored as content is identical Page ID is a content-based hash: • Immutable data • Idempotent operation Avoiding Replication Conflicts DC-2 DC-1
  22. @aviranm Database & Schema Changes 02

  23. @aviranm No More Downtime

  24. @aviranm Database Changes 1. Add Fields 2. Remove Fields 3.

    Complete Schema / Database Change Altering very large tables may take a very long time and cause downtime.
  25. @aviranm Database Changes 1. Add Fields 2. Remove Fields 3.

    Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well).
  26. @aviranm Database Changes 1. Add Fields 2. Remove Fields 3.

    Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed) Use another table and join by primary key.
  27. @aviranm Database Changes 1. Add Fields 2. Remove Fields 3.

    Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed fields) Use another table and join by primary key. 2. Stop using it in the code. Do not do any DB schema changes.
  28. @aviranm Database Changes 1. Add Fields 2. Remove Fields 3.

    Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed fields) Use another table and join by primary key. 2. Stop using it in the code. Do not do any DB schema changes. 3. Lazy migration
  29. @aviranm Feature Toggles

  30. @aviranm Feature Toggle = Code branch Not just a Boolean,

    can also be a state. Can have criteria: Company employees Specific users / group Percentage of traffic By GEO By Language By user-agent User Profile based Any other context… FT Open New Code Old Code FT Open http://github.com/wix/petri
  31. @aviranm New DB Schema with Data Migration Deploy the new

    schema/DB Plan a lazy migration path controlled by feature toggle
  32. @aviranm Point of No Return Warning! Distributed Transaction Fail on

    write to old, “ignore" failure on new #1 Backward compatibility is a must! Your old DB is now read-only and will not change. #2 Write to both (first old then new) / Read from old #3 Write to both / Read from New, fallback to old #6 Write and Read to new - Remove migration code #5 Eagerly migrate data in the background #4 Write only to New / Read from new, fallback to old Write to old / Read from old http://www.aviransplace.com/2015/12/15/safe-database-migration-pattern-without-downtime/
  33. @aviranm Remove old DB https://hiveminer.com/Tags/cosplay%2Cgoldfish http://www.aviransplace.com/2015/12/15/safe-database- migration-pattern-without-downtime/

  34. @aviranm Consistent Read 03

  35. Store owner Glasses.com Customer

  36. @aviranm Product Service Slave DB Master DB UpdateProduct(…) Save data

    Replicate Store owner updates a product’s details
  37. @aviranm Product Service Slave DB Master DB GetProduct(…) Replicate Read

    data Customer wants to view a product
  38. @aviranm Product Service Slave DB Master DB GetProduct(…) Replicate Read

    data Usually not an issue... Store owner wants to view a product for update
  39. @aviranm Product Service Slave DB Master DB GetProduct(…) Replicate Read

    data Store owner wants to view a product for update ...unless there’s a replication lag.
  40. @aviranm Product Service Slave DB Master DB GetConsistentProduct(…) Separate API

    for consistent reads Read data Replicate Store owner wants to view a product for update
  41. @aviranm Multiple Datacenters 04

  42. @aviranm GetConsistentProduct(…) Multiple Data Centers Product Service Slave DB Master

    DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetConsistentProduct(…)
  43. @aviranm GetConsistentProduct(…) Product Service Slave DB Master DB Read data

    Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetConsistentProduct(…) Inconsistent data Cross DC Replication Lag
  44. @aviranm Cross DC Flows DC-1 DC-2 Load Balancer Load Balancer

    Product Service Slave DB Master DB Read data Replicate Product Service Slave DB Master DB Read data Replicate Replicate
  45. Option 1 Pin APIs to Active DC

  46. @aviranm Master DC Configure Master DC in the LB Configure

    API-level Stickiness DC-1 GetConsistentProduct(…) GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Load Balancer Load Balancer Replicate
  47. @aviranm Master DC Configure Master DC in the LB Configure

    API-level Stickiness DC-1 GetConsistentProduct(…) GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Load Balancer Load Balancer Replicate Pros: • Fine grain control over API • No changes for the service Cons: • Complicated LB configuration • Multiple connection strings (one for master and one for replica DB
  48. Option #2 Separate read/write Services

  49. @aviranm Master DC Configure Master DC in the LB Configure

    Service-level Stickiness DC-1 GetConsistentProduct(…) Product Write Service Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Read Service Product Write Service Product Read Service
  50. @aviranm Master DC Configure Master DC in the LB Configure

    Service-level Stickiness DC-1 GetConsistentProduct(…) Product Write Service Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Read Service Product Write Service Product Read Service Pros: • No multiple DB connection strings • Simpler LB configuration • Fits microservices architecture best practice • Better for scaling read services Cons: • More complicated system (adding another microservice) • Additional service for the client to talk with
  51. Option #3 Pin DB to Service using SQLProxy

  52. @aviranm Master DC Configure Master DC in the SQL Proxy

    DC-1 GetConsistentProduct(…) Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Service Product Service SQL Proxy SQL Proxy
  53. @aviranm Master DC Configure Master DC in the SQL Proxy

    DC-1 GetConsistentProduct(…) Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Service Product Service SQL Proxy SQL Proxy Pros: • Simple microservice DB configuration • DB replication lag monitoring • Adds DB maintenance flexibility Cons: • Adding DB access latency • Take away control from the developers
  54. Option #4 Redirect Client

  55. @aviranm Product Service Slave DB Master DB Read data Replicate

    Product Service Slave DB Master DB Read data Replicate Replicate Browser Client Routing GetProduct(…) GetProduct(…) DC-1 DC-2
  56. @aviranm Master DC GetConsistentProduct(…) Product Service Slave DB Master DB

    Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetProduct(…) Browser Client Routing GetProduct(…)
  57. @aviranm Master DC GetConsistentProduct(…) Product Service Slave DB Master DB

    Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetProduct(…) Browser Client Routing GetProduct(…) Pros: • Fine grain control over API • Simpler DC configuration Cons: • Complicated client configuration • Traffic changes need to update all clients with new config
  58. @aviranm RECAP Option 1– API-level cross DC Option 2 –

    Separate Service Option 3 - ProxySQL (pin to DC) Option 4 – Client routing
  59. @aviranm WHAT WE DO AT WIX Option 1– API-level cross

    DC Option 2 – Separate Service Option 3 - ProxySQL (pin to DC) Option 4 – Client routing
  60. @aviranm Informing the users of eventual consistency processes Your changes

    are being applied, it may take few minutes to show up on the site…
  61. @aviranm Arrow -> Distributed System Avoiding database transactions Handling database

    schema changes Read consistency in a distributed system Dealing with multiple datacenters
  62. @aviranm Thank You twitter@aviranm linkedin/aviran aviransplace.com Download presentation at: http://wix.to/sUCZAGY