Slide 1

Slide 1 text

MultiStack Dharmesh Kakadia Shashank Sahni Search and Information Extraction Lab IIIT-Hyderabad

Slide 2

Slide 2 text

$whoarewe • Research Students @ IIIT-H with Prof. Vasudeva Varma • Cloud research group - Scheduling, MultiCloud, SDN • Manage OpenStack based small private cloud and data infrastructure of SIEL • Available for hire :)

Slide 3

Slide 3 text

About SIEL • Deal with lot of Data and wide range of processing frameworks • Early adopters (and contributors) of cloud and big data • FOSS Lovers • Academic environment (limited resources, unlimited scope)

Slide 4

Slide 4 text

Our Story Hadoop Cluster • Unmanaged • Underutilized

Slide 5

Slide 5 text

Our Story • Managed Infrastructure • Quick Cloud Adoption

Slide 6

Slide 6 text

But we still need Hadoop • Shell Scripts • worked fine • for short time !! • Chef-cookbooks • Need higher abstraction

Slide 7

Slide 7 text

HadoopStack • Abstraction of a Hadoop job • It worked great !! Everyone liked it !! • They wanted to run more frameworks • We ran out of capacity on our OpenStack cloud • Need more Hardware ? • Need more stacks ?

Slide 8

Slide 8 text

Going MultiCloud… Public clouds are the new Hardware

Slide 9

Slide 9 text

MultiStack

Slide 10

Slide 10 text

Multi Stack Data Infrastructure

Slide 11

Slide 11 text

Multi Stack Data! Infrastructure

Slide 12

Slide 12 text

Multi Stack Data! Infrastructure

Slide 13

Slide 13 text

Multi Stack Data Infrastructure

Slide 14

Slide 14 text

Multi Stack Data Infrastructure

Slide 15

Slide 15 text

Demo {Start Praying}

Slide 16

Slide 16 text

Features • Support for multiple clouds and frameworks • Auto-scaling • Smart scheduling - deadline, cost • Job workflows • Auto-selection of frameworks

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

API Server - Flask

Slide 19

Slide 19 text

Scheduling - Quota based API Server - Flask

Slide 20

Slide 20 text

Provisioning - Chef Scheduling - Quota based API Server - Flask

Slide 21

Slide 21 text

Provisioning - Chef Scheduling - Quota based API Server - Flask Database - MongoDB

Slide 22

Slide 22 text

Challenge Accepted • Hiding difference between clouds and leveraging cloud-specific features • Run mysecurejob on-premise, batchjob on public cloud • Spot Instance on EC2 • Quota on OpenStack • Everything Dynamic and Automated • Opportunities to improve • Better caching - Tachyon • Optimized configuration

Slide 23

Slide 23 text

Currently working on • Better UI (no more curls) • Better Documentation (to get started quickly) • Monitoring + Logging (App + Cloud) • Test-cases • Production-ready • Logo :)

Slide 24

Slide 24 text

Next is what? • More Clouds (RackSpace, CloudStack, Azure) • More Frameworks (GraphLab) • Vertical frameworks (Hive, Shark, Mahout, MLBase, …)

Slide 25

Slide 25 text

Lessons • Need “One BigData Cloud”. • Give users what they want(which they don’t know yet…)

Slide 26

Slide 26 text

Get in touch ! MultiStack.org @dharmeshkakadia @shredder12