$30 off During Our Annual Pro Sale. View Details »

MultiStack @ OpenStack Mini-conference - OSIDays

MultiStack @ OpenStack Mini-conference - OSIDays

MultiStack presentation at OpenStack mini-conference during OpenSource India days 2013.

Other talks at http://dharmeshkakadia.github.io/talks

dharmeshkakadia

November 11, 2013
Tweet

More Decks by dharmeshkakadia

Other Decks in Technology

Transcript

  1. MultiStack
    Dharmesh Kakadia Shashank Sahni
    Search and Information Extraction Lab
    IIIT-Hyderabad

    View Slide

  2. $whoarewe
    • Research Students @ IIIT-H with Prof. Vasudeva
    Varma
    • Cloud research group - Scheduling, MultiCloud,
    SDN
    • Manage OpenStack based small private cloud
    and data infrastructure of SIEL
    • Available for hire :)

    View Slide

  3. About SIEL
    • Deal with lot of Data and wide range of
    processing frameworks
    • Early adopters (and contributors) of cloud and
    big data
    • FOSS Lovers
    • Academic environment (limited resources,
    unlimited scope)

    View Slide

  4. Our Story
    Hadoop Cluster
    • Unmanaged
    • Underutilized

    View Slide

  5. Our Story
    • Managed Infrastructure
    • Quick Cloud Adoption

    View Slide

  6. But we still need Hadoop
    • Shell Scripts
    • worked fine
    • for short time !!
    • Chef-cookbooks
    • Need higher abstraction

    View Slide

  7. HadoopStack
    • Abstraction of a Hadoop job
    • It worked great !! Everyone liked it !!
    • They wanted to run more frameworks
    • We ran out of capacity on our OpenStack cloud
    • Need more Hardware ?
    • Need more stacks ?

    View Slide

  8. Going MultiCloud…
    Public clouds are the new Hardware

    View Slide

  9. MultiStack

    View Slide

  10. Multi Stack
    Data
    Infrastructure

    View Slide

  11. Multi Stack
    Data!
    Infrastructure

    View Slide

  12. Multi Stack
    Data!
    Infrastructure

    View Slide

  13. Multi Stack
    Data
    Infrastructure

    View Slide

  14. Multi Stack
    Data
    Infrastructure

    View Slide

  15. Demo
    {Start Praying}

    View Slide

  16. Features
    • Support for multiple clouds and frameworks
    • Auto-scaling
    • Smart scheduling - deadline, cost
    • Job workflows
    • Auto-selection of frameworks

    View Slide

  17. View Slide

  18. API Server - Flask

    View Slide

  19. Scheduling - Quota based
    API Server - Flask

    View Slide

  20. Provisioning - Chef
    Scheduling - Quota based
    API Server - Flask

    View Slide

  21. Provisioning - Chef
    Scheduling - Quota based
    API Server - Flask
    Database - MongoDB

    View Slide

  22. Challenge Accepted
    • Hiding difference between clouds and leveraging cloud-specific
    features
    • Run mysecurejob on-premise, batchjob on public cloud
    • Spot Instance on EC2
    • Quota on OpenStack
    • Everything Dynamic and Automated
    • Opportunities to improve
    • Better caching - Tachyon
    • Optimized configuration

    View Slide

  23. Currently working on
    • Better UI (no more curls)
    • Better Documentation (to get started quickly)
    • Monitoring + Logging (App + Cloud)
    • Test-cases
    • Production-ready
    • Logo :)

    View Slide

  24. Next is what?
    • More Clouds (RackSpace, CloudStack, Azure)
    • More Frameworks (GraphLab)
    • Vertical frameworks (Hive, Shark, Mahout,
    MLBase, …)

    View Slide

  25. Lessons
    • Need “One BigData Cloud”.
    • Give users what they want(which they don’t
    know yet…)

    View Slide

  26. Get in touch
    !
    MultiStack.org
    @dharmeshkakadia @shredder12

    View Slide