MultiStack
Dharmesh Kakadia Shashank Sahni
Search and Information Extraction Lab
IIIT-Hyderabad
Slide 2
Slide 2 text
$whoarewe
• Research Students @ IIIT-H with Prof. Vasudeva
Varma
• Cloud research group - Scheduling, MultiCloud,
SDN
• Manage OpenStack based small private cloud
and data infrastructure of SIEL
• Available for hire :)
Slide 3
Slide 3 text
About SIEL
• Deal with lot of Data and wide range of
processing frameworks
• Early adopters (and contributors) of cloud and
big data
• FOSS Lovers
• Academic environment (limited resources,
unlimited scope)
Slide 4
Slide 4 text
Our Story
Hadoop Cluster
• Unmanaged
• Underutilized
Slide 5
Slide 5 text
Our Story
• Managed Infrastructure
• Quick Cloud Adoption
Slide 6
Slide 6 text
But we still need Hadoop
• Shell Scripts
• worked fine
• for short time !!
• Chef-cookbooks
• Need higher abstraction
Slide 7
Slide 7 text
HadoopStack
• Abstraction of a Hadoop job
• It worked great !! Everyone liked it !!
• They wanted to run more frameworks
• We ran out of capacity on our OpenStack cloud
• Need more Hardware ?
• Need more stacks ?
Slide 8
Slide 8 text
Going MultiCloud…
Public clouds are the new Hardware
Slide 9
Slide 9 text
MultiStack
Slide 10
Slide 10 text
Multi Stack
Data
Infrastructure
Slide 11
Slide 11 text
Multi Stack
Data!
Infrastructure
Slide 12
Slide 12 text
Multi Stack
Data!
Infrastructure
Slide 13
Slide 13 text
Multi Stack
Data
Infrastructure
Slide 14
Slide 14 text
Multi Stack
Data
Infrastructure
Slide 15
Slide 15 text
Demo
{Start Praying}
Slide 16
Slide 16 text
Features
• Support for multiple clouds and frameworks
• Auto-scaling
• Smart scheduling - deadline, cost
• Job workflows
• Auto-selection of frameworks
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
API Server - Flask
Slide 19
Slide 19 text
Scheduling - Quota based
API Server - Flask
Slide 20
Slide 20 text
Provisioning - Chef
Scheduling - Quota based
API Server - Flask
Slide 21
Slide 21 text
Provisioning - Chef
Scheduling - Quota based
API Server - Flask
Database - MongoDB
Slide 22
Slide 22 text
Challenge Accepted
• Hiding difference between clouds and leveraging cloud-specific
features
• Run mysecurejob on-premise, batchjob on public cloud
• Spot Instance on EC2
• Quota on OpenStack
• Everything Dynamic and Automated
• Opportunities to improve
• Better caching - Tachyon
• Optimized configuration
Slide 23
Slide 23 text
Currently working on
• Better UI (no more curls)
• Better Documentation (to get started quickly)
• Monitoring + Logging (App + Cloud)
• Test-cases
• Production-ready
• Logo :)
Slide 24
Slide 24 text
Next is what?
• More Clouds (RackSpace, CloudStack, Azure)
• More Frameworks (GraphLab)
• Vertical frameworks (Hive, Shark, Mahout,
MLBase, …)
Slide 25
Slide 25 text
Lessons
• Need “One BigData Cloud”.
• Give users what they want(which they don’t
know yet…)
Slide 26
Slide 26 text
Get in touch
!
MultiStack.org
@dharmeshkakadia @shredder12