Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Just" shard it

"Just" shard it

Given April 15 2015 with Keyur Govande at Percona Live in Santa Clara, CA

A talk on Etsy's second take on our database sharding architecture.

Etsy started with a monolithic postgres database and moved to a horizontally sharded mysql store with lookup table and master-master replication. Scaling your data store is never as easy as just adding more hardware though, even with sharding.

In this talk, we'll go over the pains and perils of sharding databases and how we solved some of the problems in Etsy's second take on sharding, what we call logical sharding. Logical sharding is where we put several databases on one mysqld, as opposed to one database per server. Upon hitting a resource limit on a box (CPU, Memory, Connections, etc) that would normally require individual row migration, we can instead do file level migrations of a database, replicate, then drop previous databases, which is super fast and a cleaner way to delete old unnecessary shard data. We'll also go over how logical sharding benefits other daily operational challenges like schema changes and backups, as well as makes life easier for our downstream analytics consumers.

Etsy is a fast growing site with over 40 million members and 26 million items listed. Learn about Etsy's painpoints in growing a 120TB+ sharded MySQL datastore and how the architecture and tooling evolved over time.

Maggie Zhou

April 15, 2015
Tweet

More Decks by Maggie Zhou

Other Decks in Programming

Transcript

  1. “Just” shard it Logical sharding at Etsy Maggie Zhou @zmagg

    Percona Live 2015 Keyur Govande @keyurdg
  2. Overview • How did Etsy do sharding the first time?

    • What were the problems with it? • How did we solve these problems?
  3. Migrations were • locked shops & users out of changes

    for up to hours • error prone • arbitrary • developers had to be aware • created orphaned data • slow to administer
  4. Migrations were Error prone? We can fix the errors! We

    can make the script more robust!! Arbitrary? We can write tooling that figures out which rows are right to migrate off for optimal balance!! Developers had to be aware? We can write better interfaces!
  5. Orphaned data? • Deletes are expensive, so we didn’t do

    them. • Migrations created orphaned data on old pairs that were still picked up by full table scans (downstream systems: search, analytics). !
  6. Migrations were • error prone • arbitrary • developers had

    to be aware • created orphaned data • locked shops & users out of changes for up to hours • slow to administer
  7. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  8. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  9. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  10. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  11. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  12. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  13. Etsy%App Shard%1% Shard%2% Shard%3% Shard%4 db_pair_1 Shard%1% Shard%2% Shard%3% Shard%4

    A B Shard%1% Shard%2% Shard%3% Shard%4 db_pair_31 Shard%1% Shard%2% Shard%3% Shard%4 A B
  14. Etsy%App Shard%1% Shard%2 db_pair_1 Shard%1% Shard%2 A B ! !

    Shard%3% Shard%4 db_pair_31 ! ! Shard%3% Shard%4 A B
  15. Nice side effects! • schema changes (alters) now run in

    parallel, significantly faster! • downstream analytics systems replicate faster at the shard- by-shard level! • no more orphaned data! • multi-threaded replication w/ 5.6 • Index files can be shipped
  16. Summary • How did Etsy do sharding the first time?

    • What were the problems with it? • How did we solve these problems?
  17. Resources • Using a tickets database for ID generation: http://code.flickr.net/2010/02/08/ticket-servers-distributed-

    unique-primary-keys-on-the-cheap/ • Using master-master replication: https://codeascraft.com/2012/04/20/two-sides-for-salvation/ • Etsy’s shard architecture, the 2012 edition: http://www.percona.com/live/mysql-conference-2012/sessions/ etsy-shard-architecture-starts-s-and-ends-hard • Scaling Etsy, 2011: http://surge.omniti.com/2011/speakers/ross-snyder • Instagram’s shard & id architecture: http://instagram-engineering.tumblr.com/post/10853187575/sharding- ids-at-instagram • Scaling Pinterest: https://speakerdeck.com/yashh/scaling-pinterest
  18. “Just” shard it Logical sharding at Etsy Maggie Zhou @zmagg

    Percona Live 2015 Keyur Govande @keyurdg