Slide 1

Slide 1 text

Ceph | Gluster | Swift Similarities and Differences Thiago da Silva Prashanth Pai

Slide 2

Slide 2 text

Who, What, Why?

Slide 3

Slide 3 text

Started 2007 2005 2010 Language C++ C Python Projects, brief history and Community ● Open Source ● Software defined storage ● Commodity hardware ● No vendor lock-in ● Massively scalable ○ CERN, Facebook, Rackspace ● Vibrant Community

Slide 4

Slide 4 text

Storage Types BLOCK FILE OBJECT

Slide 5

Slide 5 text

Architecture

Slide 6

Slide 6 text

Ceph Architecture

Slide 7

Slide 7 text

OSD OSD MDS OSD OSD Mon Mon OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD Mon MDS MDS OSD OSD OSD OSDs ● 10s to 1000s ● One per disk ● Serves objects to clients ● Peer replication Monitors ● Maintain cluster membership and state ● Consensus for decision making ● Small, odd number OSD OSD OSD OSD OSD OSD OSD OSD

Slide 8

Slide 8 text

Distribution and Replication in Ceph - CRUSH ● Pools are logical groups ● Pools are made up of PGs ● PGs mapped to OSDs ● Rule based configuration ● Pseudo-random placement ● Repeatable and deterministic

Slide 9

Slide 9 text

Gluster Architecture Node 1 Node 2 Node 3 Node 4 Volume (2 x 2) (Distributed - Replicate) FUSE NFS SMB Object Aggregated and presented as single namespace

Slide 10

Slide 10 text

Distribution in Gluster 0-24 25-74 74-99 /animals/cats/kitten.jpg 41 = hash_fn(kitten.jpg) DHT ● No central metadata server (No SPOF). ● Hash space divided into N ranges mapped to N bricks. ● Directories are created on all bricks. ● Hash ranges assigned to directories. ● Renames are special. ● Rebalance moves data.

Slide 11

Slide 11 text

Distribution + Replication in Gluster ● Replication is synchronous. ● Provides high availability on failure. ● Self-healing (automatic file repair). ● Optionally enforce quorum. ● Follows a transaction model. AFR AFR /animals/cats/kitten.jpg 41 = hash_fn(kitten.jpg) DHT 0-49 0-49 49-99 49-99

Slide 12

Slide 12 text

Swift Architecture

Slide 13

Slide 13 text

Distribution and Replication in Swift ● “Ring files” = Cluster map ● Recreated when cluster layout changes ● Hash range divided into partitions ● Devices are assigned to partitions ● Replica algorithm: As far as possible

Slide 14

Slide 14 text

Similarities and Differences

Slide 15

Slide 15 text

Storage Nodes FS BLOCK FS BLOCK swift object server FS BLOCK glusterfsd Communicate directly with ceph client. (socket) Communicate directly with glusterfs client. (socket) Communicate with swift proxy server. (HTTP) OSD (Object Server Daemon) xfs (fs with xattr support) xfs (fs with xattr support) xfs (fs with xattr support)

Slide 16

Slide 16 text

Redundancy type (Replication and EC) and redundancy factor granularity Pool Volume Container Replica placement into failure domains Managed by CRUSH Manual effort by Admin [1] Managed by Rings Rebalance migrates Placement Groups Individual Files Partitions Differences in Redundancy and Rebalance [1] - Luis Pabon will save us https://github.com/heketi/heketi

Slide 17

Slide 17 text

Replication osd client osd osd osd osd osd osd osd osd client brick brick brick brick brick brick brick brick brick proxy obj. s obj. s obj. s obj. s obj. s obj. s obj. s obj. s obj. s http client RBD librados Fuse libgfapi

Slide 18

Slide 18 text

Replication osd http client osd osd osd osd osd osd osd osd NFS client brick brick brick brick brick brick brick brick brick NFS Server RGW

Slide 19

Slide 19 text

Where’s my data ? # rados put -p testpool kitten.jpg kitten.jpg # ceph osd map testpool kitten.jpg osdmap e14 pool 'testpool' (3) object 'kitten.jpg' -> pg 3.9e17671a (3.2) -> up [2,1] acting [2,1] /var/lib/ceph/osd/ceph-2/current/3.2_head/kitten.jpg__head_9E17671A__3 # cd /mnt/gluster-vol # touch animals/cat/kitten.jpg /export/brick1/animals/cat/kitten.jpg # curl -X PUT http://example.com:8080/v1/AUTH_test/animals/cat/kitten.jpg /mnt/sdb1/objects/778/69f/c2b307d78b6c419c0c1b76d91c08c69f/1412628708.01757.data

Slide 20

Slide 20 text

Quota Pool, bucket and user quota Volume, Directory and Inode Count Account and Container quota Tiering yes yes no Geo-replication active-passive* active-passive active-active Erasure Coding yes yes yes Bit-rot detection yes yes yes Feature Parity

Slide 21

Slide 21 text

IRC: #ceph-devel #gluster-dev #openstack-swift Thank You