Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Challenges when scaling: continued adventures in Swift's sharding

Challenges when scaling: continued adventures in Swift's sharding

Talk on how the sharding of the container metadata layer is progressing which will allow you to store huge amounts of objects in a single Openstack Swift container.

mattoliverau

January 18, 2017
Tweet

Other Decks in Technology

Transcript

  1. 2 • Sofware Developer at Rackspace Australia • Openstack Swif

    Core • Been attending LCA since Melbourne 2008 Who am I
  2. 3 • Swif overview • Problem description • Journey so

    far – POC 1 - 3 • Reaching the technical limit on some technology choices – POC 4 • Eventual consistency strikes – POC 5 Overview
  3. 4 In essence there are 3 resources in Swif, the

    account, container and object. Swif overview - Accounts, Containers and Objects GET /v1/AUTH_matt/LCA2017/example.png
  4. 10 Journey so far.. Or recap from ACT 1 Proof

    of concepts: 1. Object hashing shard_no = md5(object) >> shard_power
  5. 11 Journey so far.. Or recap from ACT 1 Proof

    of concepts: 1. Object hashing 2. Distributed prefix trees
  6. 12 Journey so far.. Or recap from ACT 1 Proof

    of concepts: 1. Object hashing 2. Distributed prefix trees 3. Pivot Points/Split tree
  7. 13 Finding the pivot point was very simple.. But doesn’t

    scale: POC #3 - What went wrong SELECT name FROM object WHERE deleted=0 LIMIT 1 OFFSET ( SELECT object_count / 2 FROM policy_stat);
  8. 14 POC #3 - What’s wrong with OFFSET? In SQLite

    OFFSET under the hood really is just: LIMIT x OFFSET y LIMIT x + y (and discard the first y values).
  9. 15 POC #3 - What’s wrong with OFFSET? In SQLite

    OFFSET under the hood really is just: LIMIT 1 OFFSET 350 000 000 LIMIT 1 + 350 000 000 (and discard the first 350 million values).
  10. 21 POC #4 - Eventual consistency strikes again! At any

    point in time, other primary nodes could be in any other state. Any action we do on a container database needs a timestamp so at some point all primary nodes coalesce into the correct final state. Deleting rows, afer moving to a shard, means writing rows to the large database.
  11. 22 POC #5 - Read only db Container DB Container

    DB Pivot DB New database can store: • Metadata • References to pivots • Holding table for objects
  12. 23 POC #5 - Read only db Container DB Container

    DB Pivot DB Pivot DB UNSHARDED SHARDING SHARDED