Slide 1

Slide 1 text

@aviranm Aviran Mordo, VP of Engineering, Wix.com Handling Data in Distributed Systems Arrested by the CAP Twitter: @aviranm linkedin/aviran aviransplace.com

Slide 2

Slide 2 text

@aviranm Service A Service B

Slide 3

Slide 3 text

@aviranm What is this arrow? Service A Service B

Slide 4

Slide 4 text

@aviranm Microservices = Distributed System

Slide 5

Slide 5 text

Over 800 Microservices (unique) in Production

Slide 6

Slide 6 text

Aviran Mordo, VP of Engineering, Wix.com Hello @aviranm

Slide 7

Slide 7 text

@aviranm

Slide 8

Slide 8 text

@aviranm Wix.com in Numbers Vilnius Kiev Dnipro Tel-Aviv Be’er Sheva 130M website builders (+2M monthly) 600M monthly visitors Multiple clouds & data centers (Google, Amazon) 2000 Employees (~50% R&D) 5 R&D centers #5 best software companies to work for worldwide (according to Glassdoor) Ukraine Israel Lithuania

Slide 9

Slide 9 text

@aviranm AGENDA Avoiding database transactions Handling database schema changes Read consistency in a distributed system Dealing with multiple datacenters

Slide 10

Slide 10 text

@aviranm Avoid DB Transactions 01

Slide 11

Slide 11 text

@aviranm Create an Invoice

Slide 12

Slide 12 text

@aviranm Create an Invoice Multiple line items Header

Slide 13

Slide 13 text

@aviranm Create an Invoice Multiple line items Header Save as Transaction

Slide 14

Slide 14 text

@aviranm Create an Site Multiple Pages

Slide 15

Slide 15 text

How do we save multiple pages in a transaction (without DB transaction)?

Slide 16

Slide 16 text

Replace DB Transaction with Logical Transaction

Slide 17

Slide 17 text

@aviranm Logical DB transaction Saving a Wix Site’s Data Site Pages DB Save page(s) 1. Save each page as an atomic operation 2. Finalize transaction by sending site header (pointers to pages) Can generate orphaned pages, not a problem in practice Site Header DB Save header Browser Editor Server Save page(s) Save header List of page IDs

Slide 18

Slide 18 text

@aviranm Master-Master Replication across DCs Pages MySQL Pages MySQL MySQL Active – Active DC-2 DC-1

Slide 19

Slide 19 text

@aviranm Write Traffic may Flow to Both Datacenters Pages MySQL Pages MySQL DC-2 DC-1 Browser Browser Save page Save page

Slide 20

Slide 20 text

@aviranm Replication Conflict Pages MySQL Pages MySQL MySQL strategy Stop replication or Ignore conflict (drop incoming) DC-2 DC-1 Wix users change millions of pages every day.

Slide 21

Slide 21 text

@aviranm Pages MySQL Pages MySQL DB Conflicts can be safely ignored as content is identical Page ID is a content-based hash: • Immutable data • Idempotent operation Avoiding Replication Conflicts DC-2 DC-1

Slide 22

Slide 22 text

@aviranm Database & Schema Changes 02

Slide 23

Slide 23 text

@aviranm No More Downtime

Slide 24

Slide 24 text

@aviranm Database Changes 1. Add Fields 2. Remove Fields 3. Complete Schema / Database Change Altering very large tables may take a very long time and cause downtime.

Slide 25

Slide 25 text

@aviranm Database Changes 1. Add Fields 2. Remove Fields 3. Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well).

Slide 26

Slide 26 text

@aviranm Database Changes 1. Add Fields 2. Remove Fields 3. Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed) Use another table and join by primary key.

Slide 27

Slide 27 text

@aviranm Database Changes 1. Add Fields 2. Remove Fields 3. Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed fields) Use another table and join by primary key. 2. Stop using it in the code. Do not do any DB schema changes.

Slide 28

Slide 28 text

@aviranm Database Changes 1. Add Fields 2. Remove Fields 3. Complete Schema / Database Change 1.1. For adding metadata (non- indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed fields) Use another table and join by primary key. 2. Stop using it in the code. Do not do any DB schema changes. 3. Lazy migration

Slide 29

Slide 29 text

@aviranm Feature Toggles

Slide 30

Slide 30 text

@aviranm Feature Toggle = Code branch Not just a Boolean, can also be a state. Can have criteria: Company employees Specific users / group Percentage of traffic By GEO By Language By user-agent User Profile based Any other context… FT Open New Code Old Code FT Open http://github.com/wix/petri

Slide 31

Slide 31 text

@aviranm New DB Schema with Data Migration Deploy the new schema/DB Plan a lazy migration path controlled by feature toggle

Slide 32

Slide 32 text

@aviranm Point of No Return Warning! Distributed Transaction Fail on write to old, “ignore" failure on new #1 Backward compatibility is a must! Your old DB is now read-only and will not change. #2 Write to both (first old then new) / Read from old #3 Write to both / Read from New, fallback to old #6 Write and Read to new - Remove migration code #5 Eagerly migrate data in the background #4 Write only to New / Read from new, fallback to old Write to old / Read from old http://www.aviransplace.com/2015/12/15/safe-database-migration-pattern-without-downtime/

Slide 33

Slide 33 text

@aviranm Remove old DB https://hiveminer.com/Tags/cosplay%2Cgoldfish http://www.aviransplace.com/2015/12/15/safe-database- migration-pattern-without-downtime/

Slide 34

Slide 34 text

@aviranm Consistent Read 03

Slide 35

Slide 35 text

Store owner Glasses.com Customer

Slide 36

Slide 36 text

@aviranm Product Service Slave DB Master DB UpdateProduct(…) Save data Replicate Store owner updates a product’s details

Slide 37

Slide 37 text

@aviranm Product Service Slave DB Master DB GetProduct(…) Replicate Read data Customer wants to view a product

Slide 38

Slide 38 text

@aviranm Product Service Slave DB Master DB GetProduct(…) Replicate Read data Usually not an issue... Store owner wants to view a product for update

Slide 39

Slide 39 text

@aviranm Product Service Slave DB Master DB GetProduct(…) Replicate Read data Store owner wants to view a product for update ...unless there’s a replication lag.

Slide 40

Slide 40 text

@aviranm Product Service Slave DB Master DB GetConsistentProduct(…) Separate API for consistent reads Read data Replicate Store owner wants to view a product for update

Slide 41

Slide 41 text

@aviranm Multiple Datacenters 04

Slide 42

Slide 42 text

@aviranm GetConsistentProduct(…) Multiple Data Centers Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetConsistentProduct(…)

Slide 43

Slide 43 text

@aviranm GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetConsistentProduct(…) Inconsistent data Cross DC Replication Lag

Slide 44

Slide 44 text

@aviranm Cross DC Flows DC-1 DC-2 Load Balancer Load Balancer Product Service Slave DB Master DB Read data Replicate Product Service Slave DB Master DB Read data Replicate Replicate

Slide 45

Slide 45 text

Option 1 Pin APIs to Active DC

Slide 46

Slide 46 text

@aviranm Master DC Configure Master DC in the LB Configure API-level Stickiness DC-1 GetConsistentProduct(…) GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Load Balancer Load Balancer Replicate

Slide 47

Slide 47 text

@aviranm Master DC Configure Master DC in the LB Configure API-level Stickiness DC-1 GetConsistentProduct(…) GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Load Balancer Load Balancer Replicate Pros: • Fine grain control over API • No changes for the service Cons: • Complicated LB configuration • Multiple connection strings (one for master and one for replica DB

Slide 48

Slide 48 text

Option #2 Separate read/write Services

Slide 49

Slide 49 text

@aviranm Master DC Configure Master DC in the LB Configure Service-level Stickiness DC-1 GetConsistentProduct(…) Product Write Service Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Read Service Product Write Service Product Read Service

Slide 50

Slide 50 text

@aviranm Master DC Configure Master DC in the LB Configure Service-level Stickiness DC-1 GetConsistentProduct(…) Product Write Service Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Read Service Product Write Service Product Read Service Pros: • No multiple DB connection strings • Simpler LB configuration • Fits microservices architecture best practice • Better for scaling read services Cons: • More complicated system (adding another microservice) • Additional service for the client to talk with

Slide 51

Slide 51 text

Option #3 Pin DB to Service using SQLProxy

Slide 52

Slide 52 text

@aviranm Master DC Configure Master DC in the SQL Proxy DC-1 GetConsistentProduct(…) Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Service Product Service SQL Proxy SQL Proxy

Slide 53

Slide 53 text

@aviranm Master DC Configure Master DC in the SQL Proxy DC-1 GetConsistentProduct(…) Slave DB Master DB Replicate DC-1 Slave DB Master DB Replicate DC-2 Replicate Load Balancer Load Balancer Product Service Product Service SQL Proxy SQL Proxy Pros: • Simple microservice DB configuration • DB replication lag monitoring • Adds DB maintenance flexibility Cons: • Adding DB access latency • Take away control from the developers

Slide 54

Slide 54 text

Option #4 Redirect Client

Slide 55

Slide 55 text

@aviranm Product Service Slave DB Master DB Read data Replicate Product Service Slave DB Master DB Read data Replicate Replicate Browser Client Routing GetProduct(…) GetProduct(…) DC-1 DC-2

Slide 56

Slide 56 text

@aviranm Master DC GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetProduct(…) Browser Client Routing GetProduct(…)

Slide 57

Slide 57 text

@aviranm Master DC GetConsistentProduct(…) Product Service Slave DB Master DB Read data Replicate DC-1 Product Service Slave DB Master DB Read data Replicate DC-2 Replicate GetProduct(…) Browser Client Routing GetProduct(…) Pros: • Fine grain control over API • Simpler DC configuration Cons: • Complicated client configuration • Traffic changes need to update all clients with new config

Slide 58

Slide 58 text

@aviranm RECAP Option 1– API-level cross DC Option 2 – Separate Service Option 3 - ProxySQL (pin to DC) Option 4 – Client routing

Slide 59

Slide 59 text

@aviranm WHAT WE DO AT WIX Option 1– API-level cross DC Option 2 – Separate Service Option 3 - ProxySQL (pin to DC) Option 4 – Client routing

Slide 60

Slide 60 text

@aviranm Informing the users of eventual consistency processes Your changes are being applied, it may take few minutes to show up on the site…

Slide 61

Slide 61 text

@aviranm Arrow -> Distributed System Avoiding database transactions Handling database schema changes Read consistency in a distributed system Dealing with multiple datacenters

Slide 62

Slide 62 text

@aviranm Thank You twitter@aviranm linkedin/aviran aviransplace.com Download presentation at: http://wix.to/sUCZAGY