Slide 1

Slide 1 text

PITHOS @PYR #CASSANDRASUMMIT 0

Slide 2

Slide 2 text

@PYR CTO at Exoscale, Swiss Cloud Hosting. Open source developer: pithos, cyanite, riemann, collectd.

Slide 3

Slide 3 text

AIM OF THIS TALK Presenting object storage Show-casing efficient uses of object storage Presenting pithos Feedback on usage

Slide 4

Slide 4 text

OUTLINE Object Storage 101 6 things you should do with S3 Pithos, your personal Object Store Pithos in production

Slide 5

Slide 5 text

OBJECT STORAGE 101

Slide 6

Slide 6 text

THE ELEVATOR PITCH Object Storage is a storage architecture that manages data as objects Wikipedia

Slide 7

Slide 7 text

INCEPTION Asset and content storage for large hosting platforms. Livejournal's MogileFS. A shift in how we perceive distributed storage.

Slide 8

Slide 8 text

ESSENTIAL PROPERTIES No POSIX guarantees No atomicity Eventual consistency Pushes some responsibility back to the application.

Slide 9

Slide 9 text

THE OBJECT STORAGE LANDSCAPE Mostly hosted solutions: AWS S3 Rackspace Cloud Files DreamObjects Exoscale SOS No real API standardisation AWS S3 is the de-facto standard

Slide 10

Slide 10 text

THE ON-PREMISE OBJECT STORAGE LANDSCAPE Some vendor-backed solutions: EMC Atmos Scality Cloudian Swift Ceph Riak CS Pithos

Slide 11

Slide 11 text

A TYPICAL OBJECT STORE REQUEST # c u r l - X P U T - d @ f i l e . t x t h t t p s : / / m y b u c k e t . m y p r o v i d e r . c o m / s o m e - f i l e . t x t # c u r l h t t p s : / / m y b u c k e t . m y p r o v i d e r . c o m / s o m e - f i l e . t x t

Slide 12

Slide 12 text

S3 TERMINOLOGY Region: Determines where objects will be stored. Storage Class: Storage properties for objects. Bucket: A named container for objects. Object: A file.

Slide 13

Slide 13 text

THE S3 API A global bucket namespace Artificial hierarchy support Authentication and Authorization through ACLs Multipart uploads CORS support & Form based uploads Eventual consistency

Slide 14

Slide 14 text

A GLOBAL BUCKET NAMESPACE A single consistent namespace for buckets: Across tenants. There is only one highlander bucket. A bucket is located within a region.

Slide 15

Slide 15 text

HIERACHY SUPPORT Listing requests may supply a delimiter and prefix. Emulates directories when keys contain slashes.

Slide 16

Slide 16 text

HIERARCHY SUPPORT G E T / ? d e l i m i t e r = / H T T P / 1 . 1 H o s t : m y b u c k e t . s e r v i c e . u r i D a t e : < d a t e > A u t h o r i z a t i o n : A W S < k e y > : < s i g n a t u r e >

Slide 17

Slide 17 text

HIERARCHY SUPPORT < ? x m l v e r s i o n = " 1 . 0 " e n c o d i n g = " U T F - 8 " ? > < L i s t B u c k e t R e s u l t x m l n s = " h t t p : / / s 3 . a m a z o n a w s . c o m / d o c / 2 0 0 6 - 0 3 - 0 1 / " > < N a m e > b a t m a n < / N a m e > < P r e f i x > < / P r e f i x > < M a x K e y s > 1 0 0 < / M a x K e y s > < D e l i m i t e r > / < / D e l i m i t e r > < I s T r u n c a t e d > f a l s e < / I s T r u n c a t e d > < C o n t e n t s > < K e y > s a m p l e . t x t < / K e y > < L a s t M o d i f i e d > 2 0 1 4 - 1 0 - 1 7 T 1 2 : 3 5 : 1 0 . 4 2 3 Z < / L a s t M o d i f i e d > < E T a g > " a 4 b 7 9 2 3 f 7 b 2 d f 9 b c 9 6 f b 2 6 3 9 7 8 c 8 b c 4 0 " < / E T a g > < S i z e > 1 6 0 3 < / S i z e > < O w n e r > < I D > t e s t @ e x a m p l e . c o m < / I D > < D i s p l a y N a m e > t e s t @ e x a m p l e . c o m < / D i s p l a y N a m e > < / O w n e r > < S t o r a g e C l a s s > S t a n d a r d < / S t o r a g e C l a s s > < / C o n t e n t s > < / L i s t B u c k e t R e s u l t >

Slide 18

Slide 18 text

AUTHENTICATION & AUTHORIZATION THROUGH ACLS Simple canned ACLs allow common settings. e.g: public. An XML syntax is also available.

Slide 19

Slide 19 text

MULTIPART UPLOADS Allows uploading several chunks of files. User-controlled re-aggregation step. Limits the impact of upload failures for large files.

Slide 20

Slide 20 text

CORS SUPPORT AND FORM-BASED UPLOADS Web interaction without any backend components. CORS setup through an XML configuration syntax. Form based uploads through pre-signed requests.

Slide 21

Slide 21 text

EVENTUAL CONSISTENCY An easy sell at Cassandra Summit Possible delay between PUT and GET availability. Checksums avoid massive inconsistencies.

Slide 22

Slide 22 text

6 THINGS TO DO WITH S3

Slide 23

Slide 23 text

12-FACTOR APP SUPPORT FOR PERSISTENCE Eliminates the need for NFS Eases interaction with PaaS type platforms http://12factor.net/

Slide 24

Slide 24 text

STATIC CONTENT HOSTING Perfect for hosting CSS, JS and other static assets Simply requires setting a bucket's ACL to public

Slide 25

Slide 25 text

FORM BASED UPLOADS Pre-signed requests Requests encapsulate a policy No proxying to the S3 service required Great for supporting user generated content

Slide 26

Slide 26 text

ARTIFACT STORAGE Supported in Maven Supported in Docker Registry Supported in Apt Supported in Mesos fetcher

Slide 27

Slide 27 text

BACKUPS Great Open-Source options like duplicity. Commercial storage gateway support. Some home NAS-type products support S3 as well.

Slide 28

Slide 28 text

CLIENT-SIDE ENCRYPTION GPG encryption support. Guarantees full data ownership, even when leveraging third- party providers. Don't lose your keys!

Slide 29

Slide 29 text

PITHOS, YOUR PERSONAL OBJECT-STORE

Slide 30

Slide 30 text

FROM THE WEBSITE Pithos is a daemon which provides an S3- compatible frontend for storing files in a Cassandra cluster.

Slide 31

Slide 31 text

WHY ? Provide your own S3-compatible service (that's us!) Restricted from using hosted object-storage services. Willingness to fully own availability.

Slide 32

Slide 32 text

PITHOS ESSENTIAL PROPERTIES Extensive S3 API coverage. Fully Stateless. Multi-region support. Fully Cassandra-backed. Extensible. Open-Source.

Slide 33

Slide 33 text

MISC. Runs on the JVM. Written in Clojure. Small codebase (~ 5300 LoC). Can run an embedded cassandra for tests purposes.

Slide 34

Slide 34 text

PITHOS ARCHITECTURE A daemon built out of 5 isolated and pluggable components.

Slide 35

Slide 35 text

PITHOS ARCHITECTURE Keystore Bucketstore Metastore Blobstore Reporter

Slide 36

Slide 36 text

OVERALL CONCEPT

Slide 37

Slide 37 text

THE KEYSTORE Authentication & Authorization handled outside of pithos. Only component which doesn't rely on Cassandra by default. Default implementation relies on the pithos configuration file. Maps an API key to a credentials. Example alternative implementation in the documentation.

Slide 38

Slide 38 text

THE KEYSTORE { " t e n a n t " : " t e n a n t n a m e " , " s e c r e t " : " s e c r e t k e y " , " m e m b e r o f " : [ " g r o u p 1 " , " g r o u p 2 " ] }

Slide 39

Slide 39 text

THE BUCKETSTORE Stores essential bucket properties Bucket tenant. Region and storage-class where bucket is located. Optional CORS properties.

Slide 40

Slide 40 text

THE BUCKETSTORE Bucket ownership is transactional. Cassandra is not the best suited for this task. The lightweight transaction features help.

Slide 41

Slide 41 text

THE BUCKETSTORE { " b u c k e t " : " b a t m a n " , " c r e a t e d " : " 2 0 1 2 - 0 1 - 0 1 0 1 : 3 0 : 0 0 " , " t e n a n t " : " t e s t @ e x a m p l e . c o m " , " r e g i o n " : " c h - d k - 2 " , " a c l " : " . . . " , " c o r s " : " . . . " }

Slide 42

Slide 42 text

THE METASTORE Stores all object details. References an inode an version in the bucketstore. Using the path as a key in a wide colum ensures keys are sorted.

Slide 43

Slide 43 text

THE METASTORE { " b u c k e t " : " t e s t " , " o b j e c t " : " f i l e . t x t " , " i n o d e " : " 4 e 6 8 2 d 3 d - 2 8 f a - 4 e a 6 - a a 2 8 - 2 8 2 c 2 7 5 7 f 3 1 b " , " v e r s i o n " : " c 9 7 8 9 4 c d - e 2 c d - 4 6 d 5 - a 2 1 7 - 1 a d d 5 4 4 e 8 8 a 4 " , " a t i m e " : " 2 0 1 2 - 0 1 - 0 1 0 1 : 3 0 : 0 0 " , " s i z e " : 1 0 2 4 , " c h e c k s u m " : " d 4 1 d 8 c d 9 8 f 0 0 b 2 0 4 e 9 8 0 0 9 9 8 e c f 8 4 2 7 e " , " s t o r a g e c l a s s " : " s t a n d a r d " , " a c l " : " . . . " , " m e t a d a t a " : { } }

Slide 44

Slide 44 text

THE BLOBSTORE Stores data. Inodes are lists of blocks. Blocks are lists of chunks. Chunks contain small (128k) chunks of the file.

Slide 45

Slide 45 text

THE BLOBSTORE Not what Cassandra was meant for. Works suprisingly well.

Slide 46

Slide 46 text

THE REPORTER Emits useful usage information. Good basis for building billing extensions.

Slide 47

Slide 47 text

CONFIGURATION A single configuration file to configure all aspects Logging & server options. Keystore, bucketstore, metastore and blobstore. Each can have its own details / cassandra cluster.

Slide 48

Slide 48 text

CONFIGURATION s e r v i c e : h o s t : " 0 . 0 . 0 . 0 " p o r t : 8 0 8 0 l o g g i n g : l e v e l : i n f o c o n s o l e : t r u e o v e r r i d e s : i o . p i t h o s : d e b u g o p t i o n s : s e r v i c e - u r i : s 3 . e x a m p l e . c o m d e f a u l t - r e g i o n : m y r e g i o n

Slide 49

Slide 49 text

CONFIGURATION k e y s t o r e : k e y s : A K I A I O S F O D N N 7 E X A M P L E : t e n a n t : t e s t @ e x a m p l e . c o m s e c r e t : ' w J a l r X U t n F E M I / K 7 M D E N G / b P x R f i C Y E X A M P L E K E Y ' b u c k e t s t o r e : d e f a u l t - r e g i o n : m y r e g i o n c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e r e g i o n s : m y r e g i o n : m e t a s t o r e : c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e s t o r a g e - c l a s s e s : s t a n d a r d : c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e m a x - c h u n k : " 1 2 8 k " m a x - b l o c k - c h u n k : 1 0 2 4

Slide 50

Slide 50 text

AREAS OF IMPROVEMENT V4 Signatures. Overall S3 API coverage. Overall S3 Client coverage. Promoting Cassandra compact storage. Simple web interface. More contributors and users!

Slide 51

Slide 51 text

V4 SIGNATURES V4 type signatures are still not supported in pithos and are item number 1 on the todo-list.

Slide 52

Slide 52 text

OVERALL S3 API COVERAGE The S3 API is byzantine and corner cases are poorly documented. Still missing some useful bits (versioning, bucket policies, session tokens).

Slide 53

Slide 53 text

OVERALL S3 CLIENT COVERAGE Some clients are very sensitive with regard to API behavior. The essentials work. Glitches are quickly fixed when caught.

Slide 54

Slide 54 text

PROMOTING CASSANDRA COMPACT STORAGE W I T H C O M P A C T S T O R A G E gives great benefits. Not yet promoted or automatically converged on startup.

Slide 55

Slide 55 text

SIMPLE WEB INTERFACE A simple JavaScript SPA would be nice.

Slide 56

Slide 56 text

PITHOS IN PRODUCTION

Slide 57

Slide 57 text

A WORD OF WARNING Running an object-store is not necessarily for the faint of heart.

Slide 58

Slide 58 text

HOW WE USE IT No multi-datacenter clusters. Dedicated metadata cluster. Dedicated "blobstore" clusters.

Slide 59

Slide 59 text

ELSEWHERE Few known installations (in the 10s). Always rather large. Always used where cassandra previously existed.

Slide 60

Slide 60 text

MAINTENANCE (PITHOS) A few cases generate orphan inodes and must be pruned manually. Internal tooling used for this, should eventually be released. Rather worry-free

Slide 61

Slide 61 text

MAINTENANCE (CASSANDRA) The usual applies Schedule regular repairs of your clusters Follow releases Best supported version: 2.1.x Quorum is satisfactory in terms of performance.

Slide 62

Slide 62 text

SCALING Pithos is stateless. Colocate cassandra and pithos daemons. Split blobstore and metastore keyspaces into separate clusters. Split Data/Proxy nodes is worth investigating for huge deployments. Haproxy to distribute queries to pithos instances.

Slide 63

Slide 63 text

PARTING WORDS Try it out! (There's an all-in-one version) Get involved Docs need proof-reading, additions. Some issues need to be tackled.

Slide 64

Slide 64 text

THANKS ! Pithos owes a lot to: Max Penet (@mpenet) for the great alia & jet libraries Datastax for the awesome cassandra java-driver Its contributors Apache Cassandra obviously @pyr