Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mucura - Bringing cloud storage to your desk

airnandez
October 18, 2012

Mucura - Bringing cloud storage to your desk

Talk given at the HEPiX Fall 2012 Workshop in Beijing (China) on October 2012.
Details: http://indico.cern.ch/conferenceDisplay.py?confId=199025

airnandez

October 18, 2012
Tweet

More Decks by airnandez

Other Decks in Science

Transcript

  1. HEPix  Fall  2012  —  Beijing,  October  18th  2012 Fabio Hernandez

    [email protected] on behalf of the Mucura team Mucura   Bringing cloud storage to your desk
  2. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Preamble

    This talk covers an ongoing exploratory work Your feedback is very much appreciated, both as an individual and as a data center operator This work is funded by IN2P3/CNRS, IHEP/CAS and the Embassy of France in China 2
  3. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Contributors

    • Current Wu Wenjing Kan Wenxiao Li Sha • Past Wu Jie Du Ran • Experts/Advisors Wang Lu Cheng Yaodong • Infrastructure support Yan Xiaofei 3
  4. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Summary

    We implemented a prototype of a multi-user remote le repository backed by unstructured data stores, usable both interactively and by grid jobs 4
  5. CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre “If you’re not embarrassed

    by the first version of your product, you’ve launched too late.” Reid Hoffman, founder of LinkedIn
  6. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Outlook

    • Goal • System overview • Architecture • Implementation status • Demo • Perspectives 6
  7. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Goal

    • To develop an open source software system for operating multi-tenant highly available remote le repositories • Targets end users: individuals needing a personal, always-on shareable storage space, which is convenient to use service providers: data centers supporting scienti c research communities 8
  8. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Mucura

    [mükürə] Clay container, sort of amphora, used for storing beverages, water and cereals and employed for funeral rites by natives of pre- colombian ethnic groups in Colombia and other American countries Conveys the notion of container for personal items 9 Adapted from the mucura entry in Wikipedia Español Photo: Wikipedia
  9. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Vision

    11 Your les are physically stored on remote servers accessible seamless through the network. You interact with your remote les as you usually do with your local les, using your personal computer’s metaphors ( le explorer, drag & drop, etc.). You are free to organize your own storage space. The system provides you signi cantly more storage capacity than is locally available in your personal computer. You can share your les with selected users of the system.
  10. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Vision

    (cont.) • Who this service is intended for? individuals, in particular those of the high energy physics community • Why would you use this service? because it provides a convenient way for storing your individual les, for instance, the logs of your (grid) jobs, datasets you are analyzing, etc. because you may want to share some les with some colleagues sitting at the next door or across the world because you would have total exibility to manage your own space according to your needs and working methods because it may eventually provide you more storage capacity than you have internally on your personal computer 13
  11. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Vision

    (cont.) • Who would operate the service? computing centres of the high energy physics community: good national and international connectivity, 24x7 service, expertise in running IT services, trustable, reliable, ... les would be physically located in disk servers managed and operated by one or more of those centres 14
  12. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Use-case

    & scale • Use-case pro le retrieval of les more frequent than storage repository used as a high-capacity highly-available archive system: not intended to directly serve I/O-intensive applications in addition to allow interactive le access, grid jobs can also interact with the repository for storing/retrieving les: a kind of personal storage element • Scale initially to provide each user an individual storage capacity of a few hundred GB thousands of directories, tens of thousands of les, per user le size in the region of 1KB to 5GB, but most of the les expected in the area of a few hundreds MB 15
  13. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Features

    • Reduced set of basic operations • list, store, retrieve, delete les • create, delete directories although the system does not expose complete POSIX semantics, emulation at the client site is possible compliant to a well documented and well supported API: Amazon S3 • Service operation the system must be operator-friendly: scalable, reduced amount of skilled manpower required to operate it, resilient to hardware failures and to network partitioning (both inter- and intra-centre) cost-effective: built with commodity hardware no need to explicitly back the les up 16
  14. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Features

    (cont.) • Con dentiality use secure channels for transporting the data between the user’s personal computer and the remote disk servers should les be stored encrypted? • Performance no stringent I/O performance requirements the store is NOT intended to be used as a high-performance le system to serve the data directly to the applications, but rather as resilient and scalable repository for storing les online however, some level of interactivity is highly desirable 17
  15. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Architecture

    • Client-server model client and server are interconnected through a wide area network • Client personal computer worker node of a (grid) compute element both GUI- and CLI-based interfaces • Server runs on machines operated by the service provider (data center) • Interface Amazon S3 REST API 19
  16. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Architecture

    (cont.) • Server-side modular design front-end server: exposes Amazon S3 REST API metadata store: stores le metadata le contents store: stores opaque sequences of bytes temporary credentials delivery service: generates time-limited S3- compatible credentials from grid credentials (X509 certi cates and proxies) user registration service: allows for new users to enroll and for administration of the user’s account settings 20
  17. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Storage

    back-ends • Modular design allows for interchangeable storage back-ends • Aim is to build metadata and le contents (blob) store on top of other systems, in particular, unstructured distributed data stores 22
  18. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Implementation

    status • Client GUI: application Cyberduck works unmodi ed also tested Expandrive and Transmit, but they need modi cation to be used with non- Amazon server CLI: boto (open source Python API speaking S3) works unmodi ed FUSE-based s3fs does not work out-of-the box on MacOS X • Authorization Amazon S3-style credentials generated when the user enrolls time-limited credentials generated based on grid proxies intended to be used mainly by grid jobs • HTTP reverse proxy using nginx for workload balancing and client-facing connection management • Meta-data store, File contents store, S3 servers see next slides 24 http://nginx.org
  19. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Storage

    back-end: metadata (cont.) • Current implementation uses Redis to store le metadata • Redis open source, advanced key-value store data structure server since keys can contain strings, hashes, lists, sets and sorted sets in-memory data set, several options for persistence atomic operations master-slave data replication very fast • Durability is not its most prominent feature but it offers some possibilities to prevent data loss 26 http://redis.io
  20. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Storage

    back-end: metadata (cont.) 27 Key Value index:buckets:<user id> private public list of buckets belonging to <user id>* Key Value bucket:<user id>:public:attributes { ”owner": “161ef303-c60a-4bc1-ac6c-...”, "cdate": "2012-10-17T04:03:09Z", "bucket": "private", "acl": "public-read" } JSON-encoded attributes for bucket ‘public’ owned by user <user id> * <user id> is a unique identifier the form “161ef303-c60a-4bc1-ac6c-366afb59bc6f”
  21. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Storage

    back-end: metadata (cont.) 28 Key Value bucket:<user id>:public:contents documents/EmptyFile.txt videos/vision-Saturn.avi videos/ pictures/pleine_lune.jpg pictures/cry-128.png pictures/big_smile-128.png documents/ pictures/ ... Key Value object:<user id>:public:pictures/Alice Detector.jpg { "name": "pictures/Alice Detector.jpg", "bucket": "public", "acl": "public-read", "content": "image/jpeg", "owner": “161ef303-c60a-4bc1-ac6c-366afb59bc6f”, "cdate": "2012-10-17T04:05:15Z", "usermd": { “my-metadata-field” : “my metadata value” }, "md5": "5c03d54a555284688704a765bc1efcb0", "size": 51699 } objects contained in bucket ‘public’ owned by user <user id> JSON-encoded attributes of object ‘pictures/Alice Detector.jpg’ contained in bucket ‘public’ owned by user <user id>
  22. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Storage

    back-end: file contents 29 Con gurable depth of intermediate directories, to prevent creating too many les under a single directory in the underlying le system Object meta-data stored as an extended attribute of each file. Useful for recovering the meta-data of the whole system, in case of disaster with the metadata store Current implementation on top of a networked or local le system (Gluster, Lustre, GPFS, ext3, xfs, etc.)
  23. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre S3

    servers • Implement S3 protocol it is basically a gateway between the object store and the client handle authentication of requests following S3 speci cations no storage-related intelligence: independent of the way the storage backends are implemented • Most le-related operations speci ed in Amazon S3 API implemented store (single or multiple parts), retrieve, copy, delete, list les update object metadata • Tornado web framework open source, active community, scalable, Python 30 http://www.tornadoweb.org
  24. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Implementation

    status (cont.) • Pending ACLs, virtual hosting, le versioning, per-user quotas tools for operations e.g. throttling traffic per user, draining servers, etc. • Missing server-side le encryption Torrents download • All components developed in Python 31
  25. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre What

    you are going to see • Interactive usage: we will use Cyberduck to interact with our personal storage create directories upload les download les browse existing les and directories delete les & directories share les via browser • Batch usage: we will submit some jobs to IHEP’s DIRAC server those jobs will create les and upload them to our personal repository once uploaded, we can visualize their contents interactively 33 Not shown today
  26. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre User

    credentials 35 Mucura instance endpoint: https://www.mucura.org:44000 Access Key: 0PT5J1746GZHT7XF3X82 Secret Key: uM9F3YluFGv41cknvTaGwgjvx4QpvBcleU8e7H2o
  27. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre User

    credentials (cont.) 36 Mucura server and port number User access key http://cyberduck.ch
  28. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Perspectives

    • We intend to explore other storage back-ends both, for storing le contents and for storing metadata Riak is an interesting candidate, but also HDFS (Hadoop le system), HBase and even OpenStack’s Swift look well suited • Demonstrate integration with ROOT you could share with your peers les in ROOT format • Explore other storage back-ends performance, scalability, manageability • Perform end-to-end performance and scalability tests including when client and server are distant • Make code and process compliant to open source standards distribute software through Github, documentation, issue tracking, mailing list, web presence, … • Start alpha tests with (adventurous) users and collect feedback add instrumentation to collect data to identify usage patterns 40
  29. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Perspectives

    (cont.) • Explore other client-side clients that integrate better with the personal computer environment “The most interesting technology is technology that doesn’t appear to be there at all.” G. Booch • Explore ways for implementing inter-site le replication and erasure coding • Prototype a service using the Huawei S3 appliance just installed at IHEP • Explore building value-added services on top of a S3 backend e.g. synchronization of les between several client hosts (à la Dropbox), clients for mobile devices (iOS, Android) 41
  30. fabio  hernandez [email protected] CAS/IHEP  CompuDng  Centre CNRS/IN2P3  CompuDng  Centre Perspectives

    (cont.) • As an individual, would you be interested in using such a service? • As a data center operator, would you operate such a service? what features would you like to see in it to make it more operator-friendly 42