Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pithos

 Pithos

Pierre-Yves Ritschard

September 23, 2015
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Technology

Transcript

  1. AIM OF THIS TALK Presenting object storage Show-casing efficient uses

    of object storage Presenting pithos Feedback on usage
  2. OUTLINE Object Storage 101 6 things you should do with

    S3 Pithos, your personal Object Store Pithos in production
  3. INCEPTION Asset and content storage for large hosting platforms. Livejournal's

    MogileFS. A shift in how we perceive distributed storage.
  4. THE OBJECT STORAGE LANDSCAPE Mostly hosted solutions: AWS S3 Rackspace

    Cloud Files DreamObjects Exoscale SOS No real API standardisation AWS S3 is the de-facto standard
  5. A TYPICAL OBJECT STORE REQUEST # c u r l

    - X P U T - d @ f i l e . t x t h t t p s : / / m y b u c k e t . m y p r o v i d e r . c o m / s o m e - f i l e . t x t # c u r l h t t p s : / / m y b u c k e t . m y p r o v i d e r . c o m / s o m e - f i l e . t x t
  6. S3 TERMINOLOGY Region: Determines where objects will be stored. Storage

    Class: Storage properties for objects. Bucket: A named container for objects. Object: A file.
  7. THE S3 API A global bucket namespace Artificial hierarchy support

    Authentication and Authorization through ACLs Multipart uploads CORS support & Form based uploads Eventual consistency
  8. A GLOBAL BUCKET NAMESPACE A single consistent namespace for buckets:

    Across tenants. There is only one highlander bucket. A bucket is located within a region.
  9. HIERACHY SUPPORT Listing requests may supply a delimiter and prefix.

    Emulates directories when keys contain slashes.
  10. HIERARCHY SUPPORT G E T / ? d e l

    i m i t e r = / H T T P / 1 . 1 H o s t : m y b u c k e t . s e r v i c e . u r i D a t e : < d a t e > A u t h o r i z a t i o n : A W S < k e y > : < s i g n a t u r e >
  11. HIERARCHY SUPPORT < ? x m l v e r

    s i o n = " 1 . 0 " e n c o d i n g = " U T F - 8 " ? > < L i s t B u c k e t R e s u l t x m l n s = " h t t p : / / s 3 . a m a z o n a w s . c o m / d o c / 2 0 0 6 - 0 3 - 0 1 / " > < N a m e > b a t m a n < / N a m e > < P r e f i x > < / P r e f i x > < M a x K e y s > 1 0 0 < / M a x K e y s > < D e l i m i t e r > / < / D e l i m i t e r > < I s T r u n c a t e d > f a l s e < / I s T r u n c a t e d > < C o n t e n t s > < K e y > s a m p l e . t x t < / K e y > < L a s t M o d i f i e d > 2 0 1 4 - 1 0 - 1 7 T 1 2 : 3 5 : 1 0 . 4 2 3 Z < / L a s t M o d i f i e d > < E T a g > " a 4 b 7 9 2 3 f 7 b 2 d f 9 b c 9 6 f b 2 6 3 9 7 8 c 8 b c 4 0 " < / E T a g > < S i z e > 1 6 0 3 < / S i z e > < O w n e r > < I D > t e s t @ e x a m p l e . c o m < / I D > < D i s p l a y N a m e > t e s t @ e x a m p l e . c o m < / D i s p l a y N a m e > < / O w n e r > < S t o r a g e C l a s s > S t a n d a r d < / S t o r a g e C l a s s > < / C o n t e n t s > < / L i s t B u c k e t R e s u l t >
  12. AUTHENTICATION & AUTHORIZATION THROUGH ACLS Simple canned ACLs allow common

    settings. e.g: public. An XML syntax is also available.
  13. MULTIPART UPLOADS Allows uploading several chunks of files. User-controlled re-aggregation

    step. Limits the impact of upload failures for large files.
  14. CORS SUPPORT AND FORM-BASED UPLOADS Web interaction without any backend

    components. CORS setup through an XML configuration syntax. Form based uploads through pre-signed requests.
  15. EVENTUAL CONSISTENCY An easy sell at Cassandra Summit Possible delay

    between PUT and GET availability. Checksums avoid massive inconsistencies.
  16. 12-FACTOR APP SUPPORT FOR PERSISTENCE Eliminates the need for NFS

    Eases interaction with PaaS type platforms http://12factor.net/
  17. STATIC CONTENT HOSTING Perfect for hosting CSS, JS and other

    static assets Simply requires setting a bucket's ACL to public
  18. FORM BASED UPLOADS Pre-signed requests Requests encapsulate a policy No

    proxying to the S3 service required Great for supporting user generated content
  19. CLIENT-SIDE ENCRYPTION GPG encryption support. Guarantees full data ownership, even

    when leveraging third- party providers. Don't lose your keys!
  20. FROM THE WEBSITE Pithos is a daemon which provides an

    S3- compatible frontend for storing files in a Cassandra cluster.
  21. WHY ? Provide your own S3-compatible service (that's us!) Restricted

    from using hosted object-storage services. Willingness to fully own availability.
  22. PITHOS ESSENTIAL PROPERTIES Extensive S3 API coverage. Fully Stateless. Multi-region

    support. Fully Cassandra-backed. Extensible. Open-Source.
  23. MISC. Runs on the JVM. Written in Clojure. Small codebase

    (~ 5300 LoC). Can run an embedded cassandra for tests purposes.
  24. THE KEYSTORE Authentication & Authorization handled outside of pithos. Only

    component which doesn't rely on Cassandra by default. Default implementation relies on the pithos configuration file. Maps an API key to a credentials. Example alternative implementation in the documentation.
  25. THE KEYSTORE { " t e n a n t

    " : " t e n a n t n a m e " , " s e c r e t " : " s e c r e t k e y " , " m e m b e r o f " : [ " g r o u p 1 " , " g r o u p 2 " ] }
  26. THE BUCKETSTORE Stores essential bucket properties Bucket tenant. Region and

    storage-class where bucket is located. Optional CORS properties.
  27. THE BUCKETSTORE Bucket ownership is transactional. Cassandra is not the

    best suited for this task. The lightweight transaction features help.
  28. THE BUCKETSTORE { " b u c k e t

    " : " b a t m a n " , " c r e a t e d " : " 2 0 1 2 - 0 1 - 0 1 0 1 : 3 0 : 0 0 " , " t e n a n t " : " t e s t @ e x a m p l e . c o m " , " r e g i o n " : " c h - d k - 2 " , " a c l " : " . . . " , " c o r s " : " . . . " }
  29. THE METASTORE Stores all object details. References an inode an

    version in the bucketstore. Using the path as a key in a wide colum ensures keys are sorted.
  30. THE METASTORE { " b u c k e t

    " : " t e s t " , " o b j e c t " : " f i l e . t x t " , " i n o d e " : " 4 e 6 8 2 d 3 d - 2 8 f a - 4 e a 6 - a a 2 8 - 2 8 2 c 2 7 5 7 f 3 1 b " , " v e r s i o n " : " c 9 7 8 9 4 c d - e 2 c d - 4 6 d 5 - a 2 1 7 - 1 a d d 5 4 4 e 8 8 a 4 " , " a t i m e " : " 2 0 1 2 - 0 1 - 0 1 0 1 : 3 0 : 0 0 " , " s i z e " : 1 0 2 4 , " c h e c k s u m " : " d 4 1 d 8 c d 9 8 f 0 0 b 2 0 4 e 9 8 0 0 9 9 8 e c f 8 4 2 7 e " , " s t o r a g e c l a s s " : " s t a n d a r d " , " a c l " : " . . . " , " m e t a d a t a " : { } }
  31. THE BLOBSTORE Stores data. Inodes are lists of blocks. Blocks

    are lists of chunks. Chunks contain small (128k) chunks of the file.
  32. CONFIGURATION A single configuration file to configure all aspects Logging

    & server options. Keystore, bucketstore, metastore and blobstore. Each can have its own details / cassandra cluster.
  33. CONFIGURATION s e r v i c e : h

    o s t : " 0 . 0 . 0 . 0 " p o r t : 8 0 8 0 l o g g i n g : l e v e l : i n f o c o n s o l e : t r u e o v e r r i d e s : i o . p i t h o s : d e b u g o p t i o n s : s e r v i c e - u r i : s 3 . e x a m p l e . c o m d e f a u l t - r e g i o n : m y r e g i o n
  34. CONFIGURATION k e y s t o r e :

    k e y s : A K I A I O S F O D N N 7 E X A M P L E : t e n a n t : t e s t @ e x a m p l e . c o m s e c r e t : ' w J a l r X U t n F E M I / K 7 M D E N G / b P x R f i C Y E X A M P L E K E Y ' b u c k e t s t o r e : d e f a u l t - r e g i o n : m y r e g i o n c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e r e g i o n s : m y r e g i o n : m e t a s t o r e : c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e s t o r a g e - c l a s s e s : s t a n d a r d : c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e m a x - c h u n k : " 1 2 8 k " m a x - b l o c k - c h u n k : 1 0 2 4
  35. AREAS OF IMPROVEMENT V4 Signatures. Overall S3 API coverage. Overall

    S3 Client coverage. Promoting Cassandra compact storage. Simple web interface. More contributors and users!
  36. V4 SIGNATURES V4 type signatures are still not supported in

    pithos and are item number 1 on the todo-list.
  37. OVERALL S3 API COVERAGE The S3 API is byzantine and

    corner cases are poorly documented. Still missing some useful bits (versioning, bucket policies, session tokens).
  38. OVERALL S3 CLIENT COVERAGE Some clients are very sensitive with

    regard to API behavior. The essentials work. Glitches are quickly fixed when caught.
  39. PROMOTING CASSANDRA COMPACT STORAGE W I T H C O

    M P A C T S T O R A G E gives great benefits. Not yet promoted or automatically converged on startup.
  40. ELSEWHERE Few known installations (in the 10s). Always rather large.

    Always used where cassandra previously existed.
  41. MAINTENANCE (PITHOS) A few cases generate orphan inodes and must

    be pruned manually. Internal tooling used for this, should eventually be released. Rather worry-free
  42. MAINTENANCE (CASSANDRA) The usual applies Schedule regular repairs of your

    clusters Follow releases Best supported version: 2.1.x Quorum is satisfactory in terms of performance.
  43. SCALING Pithos is stateless. Colocate cassandra and pithos daemons. Split

    blobstore and metastore keyspaces into separate clusters. Split Data/Proxy nodes is worth investigating for huge deployments. Haproxy to distribute queries to pithos instances.
  44. PARTING WORDS Try it out! (There's an all-in-one version) Get

    involved Docs need proof-reading, additions. Some issues need to be tackled.
  45. THANKS ! Pithos owes a lot to: Max Penet (@mpenet)

    for the great alia & jet libraries Datastax for the awesome cassandra java-driver Its contributors Apache Cassandra obviously @pyr