Save 37% off PRO during our Black Friday Sale! »

Pithos

 Pithos

2fcc875f98607b3007909fe4be99160d?s=128

Pierre-Yves Ritschard

September 23, 2015
Tweet

Transcript

  1. PITHOS @PYR #CASSANDRASUMMIT 0

  2. @PYR CTO at Exoscale, Swiss Cloud Hosting. Open source developer:

    pithos, cyanite, riemann, collectd.
  3. AIM OF THIS TALK Presenting object storage Show-casing efficient uses

    of object storage Presenting pithos Feedback on usage
  4. OUTLINE Object Storage 101 6 things you should do with

    S3 Pithos, your personal Object Store Pithos in production
  5. OBJECT STORAGE 101

  6. THE ELEVATOR PITCH Object Storage is a storage architecture that

    manages data as objects Wikipedia
  7. INCEPTION Asset and content storage for large hosting platforms. Livejournal's

    MogileFS. A shift in how we perceive distributed storage.
  8. ESSENTIAL PROPERTIES No POSIX guarantees No atomicity Eventual consistency Pushes

    some responsibility back to the application.
  9. THE OBJECT STORAGE LANDSCAPE Mostly hosted solutions: AWS S3 Rackspace

    Cloud Files DreamObjects Exoscale SOS No real API standardisation AWS S3 is the de-facto standard
  10. THE ON-PREMISE OBJECT STORAGE LANDSCAPE Some vendor-backed solutions: EMC Atmos

    Scality Cloudian Swift Ceph Riak CS Pithos
  11. A TYPICAL OBJECT STORE REQUEST # c u r l

    - X P U T - d @ f i l e . t x t h t t p s : / / m y b u c k e t . m y p r o v i d e r . c o m / s o m e - f i l e . t x t # c u r l h t t p s : / / m y b u c k e t . m y p r o v i d e r . c o m / s o m e - f i l e . t x t
  12. S3 TERMINOLOGY Region: Determines where objects will be stored. Storage

    Class: Storage properties for objects. Bucket: A named container for objects. Object: A file.
  13. THE S3 API A global bucket namespace Artificial hierarchy support

    Authentication and Authorization through ACLs Multipart uploads CORS support & Form based uploads Eventual consistency
  14. A GLOBAL BUCKET NAMESPACE A single consistent namespace for buckets:

    Across tenants. There is only one highlander bucket. A bucket is located within a region.
  15. HIERACHY SUPPORT Listing requests may supply a delimiter and prefix.

    Emulates directories when keys contain slashes.
  16. HIERARCHY SUPPORT G E T / ? d e l

    i m i t e r = / H T T P / 1 . 1 H o s t : m y b u c k e t . s e r v i c e . u r i D a t e : < d a t e > A u t h o r i z a t i o n : A W S < k e y > : < s i g n a t u r e >
  17. HIERARCHY SUPPORT < ? x m l v e r

    s i o n = " 1 . 0 " e n c o d i n g = " U T F - 8 " ? > < L i s t B u c k e t R e s u l t x m l n s = " h t t p : / / s 3 . a m a z o n a w s . c o m / d o c / 2 0 0 6 - 0 3 - 0 1 / " > < N a m e > b a t m a n < / N a m e > < P r e f i x > < / P r e f i x > < M a x K e y s > 1 0 0 < / M a x K e y s > < D e l i m i t e r > / < / D e l i m i t e r > < I s T r u n c a t e d > f a l s e < / I s T r u n c a t e d > < C o n t e n t s > < K e y > s a m p l e . t x t < / K e y > < L a s t M o d i f i e d > 2 0 1 4 - 1 0 - 1 7 T 1 2 : 3 5 : 1 0 . 4 2 3 Z < / L a s t M o d i f i e d > < E T a g > " a 4 b 7 9 2 3 f 7 b 2 d f 9 b c 9 6 f b 2 6 3 9 7 8 c 8 b c 4 0 " < / E T a g > < S i z e > 1 6 0 3 < / S i z e > < O w n e r > < I D > t e s t @ e x a m p l e . c o m < / I D > < D i s p l a y N a m e > t e s t @ e x a m p l e . c o m < / D i s p l a y N a m e > < / O w n e r > < S t o r a g e C l a s s > S t a n d a r d < / S t o r a g e C l a s s > < / C o n t e n t s > < / L i s t B u c k e t R e s u l t >
  18. AUTHENTICATION & AUTHORIZATION THROUGH ACLS Simple canned ACLs allow common

    settings. e.g: public. An XML syntax is also available.
  19. MULTIPART UPLOADS Allows uploading several chunks of files. User-controlled re-aggregation

    step. Limits the impact of upload failures for large files.
  20. CORS SUPPORT AND FORM-BASED UPLOADS Web interaction without any backend

    components. CORS setup through an XML configuration syntax. Form based uploads through pre-signed requests.
  21. EVENTUAL CONSISTENCY An easy sell at Cassandra Summit Possible delay

    between PUT and GET availability. Checksums avoid massive inconsistencies.
  22. 6 THINGS TO DO WITH S3

  23. 12-FACTOR APP SUPPORT FOR PERSISTENCE Eliminates the need for NFS

    Eases interaction with PaaS type platforms http://12factor.net/
  24. STATIC CONTENT HOSTING Perfect for hosting CSS, JS and other

    static assets Simply requires setting a bucket's ACL to public
  25. FORM BASED UPLOADS Pre-signed requests Requests encapsulate a policy No

    proxying to the S3 service required Great for supporting user generated content
  26. ARTIFACT STORAGE Supported in Maven Supported in Docker Registry Supported

    in Apt Supported in Mesos fetcher
  27. BACKUPS Great Open-Source options like duplicity. Commercial storage gateway support.

    Some home NAS-type products support S3 as well.
  28. CLIENT-SIDE ENCRYPTION GPG encryption support. Guarantees full data ownership, even

    when leveraging third- party providers. Don't lose your keys!
  29. PITHOS, YOUR PERSONAL OBJECT-STORE

  30. FROM THE WEBSITE Pithos is a daemon which provides an

    S3- compatible frontend for storing files in a Cassandra cluster.
  31. WHY ? Provide your own S3-compatible service (that's us!) Restricted

    from using hosted object-storage services. Willingness to fully own availability.
  32. PITHOS ESSENTIAL PROPERTIES Extensive S3 API coverage. Fully Stateless. Multi-region

    support. Fully Cassandra-backed. Extensible. Open-Source.
  33. MISC. Runs on the JVM. Written in Clojure. Small codebase

    (~ 5300 LoC). Can run an embedded cassandra for tests purposes.
  34. PITHOS ARCHITECTURE A daemon built out of 5 isolated and

    pluggable components.
  35. PITHOS ARCHITECTURE Keystore Bucketstore Metastore Blobstore Reporter

  36. OVERALL CONCEPT

  37. THE KEYSTORE Authentication & Authorization handled outside of pithos. Only

    component which doesn't rely on Cassandra by default. Default implementation relies on the pithos configuration file. Maps an API key to a credentials. Example alternative implementation in the documentation.
  38. THE KEYSTORE { " t e n a n t

    " : " t e n a n t n a m e " , " s e c r e t " : " s e c r e t k e y " , " m e m b e r o f " : [ " g r o u p 1 " , " g r o u p 2 " ] }
  39. THE BUCKETSTORE Stores essential bucket properties Bucket tenant. Region and

    storage-class where bucket is located. Optional CORS properties.
  40. THE BUCKETSTORE Bucket ownership is transactional. Cassandra is not the

    best suited for this task. The lightweight transaction features help.
  41. THE BUCKETSTORE { " b u c k e t

    " : " b a t m a n " , " c r e a t e d " : " 2 0 1 2 - 0 1 - 0 1 0 1 : 3 0 : 0 0 " , " t e n a n t " : " t e s t @ e x a m p l e . c o m " , " r e g i o n " : " c h - d k - 2 " , " a c l " : " . . . " , " c o r s " : " . . . " }
  42. THE METASTORE Stores all object details. References an inode an

    version in the bucketstore. Using the path as a key in a wide colum ensures keys are sorted.
  43. THE METASTORE { " b u c k e t

    " : " t e s t " , " o b j e c t " : " f i l e . t x t " , " i n o d e " : " 4 e 6 8 2 d 3 d - 2 8 f a - 4 e a 6 - a a 2 8 - 2 8 2 c 2 7 5 7 f 3 1 b " , " v e r s i o n " : " c 9 7 8 9 4 c d - e 2 c d - 4 6 d 5 - a 2 1 7 - 1 a d d 5 4 4 e 8 8 a 4 " , " a t i m e " : " 2 0 1 2 - 0 1 - 0 1 0 1 : 3 0 : 0 0 " , " s i z e " : 1 0 2 4 , " c h e c k s u m " : " d 4 1 d 8 c d 9 8 f 0 0 b 2 0 4 e 9 8 0 0 9 9 8 e c f 8 4 2 7 e " , " s t o r a g e c l a s s " : " s t a n d a r d " , " a c l " : " . . . " , " m e t a d a t a " : { } }
  44. THE BLOBSTORE Stores data. Inodes are lists of blocks. Blocks

    are lists of chunks. Chunks contain small (128k) chunks of the file.
  45. THE BLOBSTORE Not what Cassandra was meant for. Works suprisingly

    well.
  46. THE REPORTER Emits useful usage information. Good basis for building

    billing extensions.
  47. CONFIGURATION A single configuration file to configure all aspects Logging

    & server options. Keystore, bucketstore, metastore and blobstore. Each can have its own details / cassandra cluster.
  48. CONFIGURATION s e r v i c e : h

    o s t : " 0 . 0 . 0 . 0 " p o r t : 8 0 8 0 l o g g i n g : l e v e l : i n f o c o n s o l e : t r u e o v e r r i d e s : i o . p i t h o s : d e b u g o p t i o n s : s e r v i c e - u r i : s 3 . e x a m p l e . c o m d e f a u l t - r e g i o n : m y r e g i o n
  49. CONFIGURATION k e y s t o r e :

    k e y s : A K I A I O S F O D N N 7 E X A M P L E : t e n a n t : t e s t @ e x a m p l e . c o m s e c r e t : ' w J a l r X U t n F E M I / K 7 M D E N G / b P x R f i C Y E X A M P L E K E Y ' b u c k e t s t o r e : d e f a u l t - r e g i o n : m y r e g i o n c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e r e g i o n s : m y r e g i o n : m e t a s t o r e : c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e s t o r a g e - c l a s s e s : s t a n d a r d : c l u s t e r : " l o c a l h o s t " k e y s p a c e : s t o r a g e m a x - c h u n k : " 1 2 8 k " m a x - b l o c k - c h u n k : 1 0 2 4
  50. AREAS OF IMPROVEMENT V4 Signatures. Overall S3 API coverage. Overall

    S3 Client coverage. Promoting Cassandra compact storage. Simple web interface. More contributors and users!
  51. V4 SIGNATURES V4 type signatures are still not supported in

    pithos and are item number 1 on the todo-list.
  52. OVERALL S3 API COVERAGE The S3 API is byzantine and

    corner cases are poorly documented. Still missing some useful bits (versioning, bucket policies, session tokens).
  53. OVERALL S3 CLIENT COVERAGE Some clients are very sensitive with

    regard to API behavior. The essentials work. Glitches are quickly fixed when caught.
  54. PROMOTING CASSANDRA COMPACT STORAGE W I T H C O

    M P A C T S T O R A G E gives great benefits. Not yet promoted or automatically converged on startup.
  55. SIMPLE WEB INTERFACE A simple JavaScript SPA would be nice.

  56. PITHOS IN PRODUCTION

  57. A WORD OF WARNING Running an object-store is not necessarily

    for the faint of heart.
  58. HOW WE USE IT No multi-datacenter clusters. Dedicated metadata cluster.

    Dedicated "blobstore" clusters.
  59. ELSEWHERE Few known installations (in the 10s). Always rather large.

    Always used where cassandra previously existed.
  60. MAINTENANCE (PITHOS) A few cases generate orphan inodes and must

    be pruned manually. Internal tooling used for this, should eventually be released. Rather worry-free
  61. MAINTENANCE (CASSANDRA) The usual applies Schedule regular repairs of your

    clusters Follow releases Best supported version: 2.1.x Quorum is satisfactory in terms of performance.
  62. SCALING Pithos is stateless. Colocate cassandra and pithos daemons. Split

    blobstore and metastore keyspaces into separate clusters. Split Data/Proxy nodes is worth investigating for huge deployments. Haproxy to distribute queries to pithos instances.
  63. PARTING WORDS Try it out! (There's an all-in-one version) Get

    involved Docs need proof-reading, additions. Some issues need to be tackled.
  64. THANKS ! Pithos owes a lot to: Max Penet (@mpenet)

    for the great alia & jet libraries Datastax for the awesome cassandra java-driver Its contributors Apache Cassandra obviously @pyr