Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ceph, Gluster, Swift : Similarities and differences

Prashanth Pai
February 05, 2016

Ceph, Gluster, Swift : Similarities and differences

Ceph, Gluster and OpenStack Swift are among the most popular and widely used open source distributed storage solutions deployed on the cloud today. This talk aims to briefly introduce the audience to these projects and covers the similarities and differences in them without debating on which is better. All three projects often have to solve the same set of problems involved in distribution, replication, availability, access methods and data consistency.

This was presented at DevConf.CZ 2016

Prashanth Pai

February 05, 2016
Tweet

More Decks by Prashanth Pai

Other Decks in Technology

Transcript

  1. Started 2007 2005 2010 Language C++ C Python Projects, brief

    history and Community • Open Source • Software defined storage • Commodity hardware • No vendor lock-in • Massively scalable ◦ CERN, Facebook, Rackspace • Vibrant Community
  2. OSD OSD MDS OSD OSD Mon Mon OSD OSD OSD

    OSD OSD OSD OSD OSD OSD OSD OSD Mon MDS MDS OSD OSD OSD OSDs • 10s to 1000s • One per disk • Serves objects to clients • Peer replication Monitors • Maintain cluster membership and state • Consensus for decision making • Small, odd number OSD OSD OSD OSD OSD OSD OSD OSD
  3. Distribution and Replication in Ceph - CRUSH • Pools are

    logical groups • Pools are made up of PGs • PGs mapped to OSDs • Rule based configuration • Pseudo-random placement • Repeatable and deterministic
  4. Gluster Architecture Node 1 Node 2 Node 3 Node 4

    Volume (2 x 2) (Distributed - Replicate) FUSE NFS SMB Object Aggregated and presented as single namespace
  5. Distribution in Gluster 0-24 25-74 74-99 /animals/cats/kitten.jpg 41 = hash_fn(kitten.jpg)

    DHT • No central metadata server (No SPOF). • Hash space divided into N ranges mapped to N bricks. • Directories are created on all bricks. • Hash ranges assigned to directories. • Renames are special. • Rebalance moves data.
  6. Distribution + Replication in Gluster • Replication is synchronous. •

    Provides high availability on failure. • Self-healing (automatic file repair). • Optionally enforce quorum. • Follows a transaction model. AFR AFR /animals/cats/kitten.jpg 41 = hash_fn(kitten.jpg) DHT 0-49 0-49 49-99 49-99
  7. Distribution and Replication in Swift • “Ring files” = Cluster

    map • Recreated when cluster layout changes • Hash range divided into partitions • Devices are assigned to partitions • Replica algorithm: As far as possible
  8. Storage Nodes FS BLOCK FS BLOCK swift object server FS

    BLOCK glusterfsd Communicate directly with ceph client. (socket) Communicate directly with glusterfs client. (socket) Communicate with swift proxy server. (HTTP) OSD (Object Server Daemon) xfs (fs with xattr support) xfs (fs with xattr support) xfs (fs with xattr support)
  9. Redundancy type (Replication and EC) and redundancy factor granularity Pool

    Volume Container Replica placement into failure domains Managed by CRUSH Manual effort by Admin [1] Managed by Rings Rebalance migrates Placement Groups Individual Files Partitions Differences in Redundancy and Rebalance [1] - Luis Pabon will save us https://github.com/heketi/heketi
  10. Replication osd client osd osd osd osd osd osd osd

    osd client brick brick brick brick brick brick brick brick brick proxy obj. s obj. s obj. s obj. s obj. s obj. s obj. s obj. s obj. s http client RBD librados Fuse libgfapi
  11. Replication osd http client osd osd osd osd osd osd

    osd osd NFS client brick brick brick brick brick brick brick brick brick NFS Server RGW
  12. Where’s my data ? # rados put -p testpool kitten.jpg

    kitten.jpg # ceph osd map testpool kitten.jpg osdmap e14 pool 'testpool' (3) object 'kitten.jpg' -> pg 3.9e17671a (3.2) -> up [2,1] acting [2,1] /var/lib/ceph/osd/ceph-2/current/3.2_head/kitten.jpg__head_9E17671A__3 # cd /mnt/gluster-vol # touch animals/cat/kitten.jpg /export/brick1/animals/cat/kitten.jpg # curl -X PUT http://example.com:8080/v1/AUTH_test/animals/cat/kitten.jpg /mnt/sdb1/objects/778/69f/c2b307d78b6c419c0c1b76d91c08c69f/1412628708.01757.data
  13. Quota Pool, bucket and user quota Volume, Directory and Inode

    Count Account and Container quota Tiering yes yes no Geo-replication active-passive* active-passive active-active Erasure Coding yes yes yes Bit-rot detection yes yes yes Feature Parity