Managing Resources at Scale with
Apache Mesos
Dharmesh Kakadia
@dharmeshkakadia
Large Scale Production Engineering Meetup
June, 2014
Slide 2
Slide 2 text
whoami
●
Research Assistant @ Microsoft Research India
●
Have been stuck with schedulers
●
Working on predicting resource requirements and
execution time of distributed jobs/query, to improve
resource management @ MSR
●
Love large scale data/cloud/distributed-*
●
Writing a book on Apache Mesos
Slide 3
Slide 3 text
Mesos
is a data center kernel
Slide 4
Slide 4 text
Why?
• Because distributed systems
○ everything fails
○ everything need to scale, linearly
○ are hard to get right
• Because Murphy’s law
• Lamport got a Turing award for a reason
Slide 5
Slide 5 text
Symptoms
• I have a lot of data or I have a lot of
applications
• They are dynamic
• I have low resource utilization
Slide 6
Slide 6 text
Mesos
Analytics
ML
Schedulers
Graph Processing
Databases
Web frameworks
Slide 7
Slide 7 text
Why now?
●
Single Machine VMs Containers
●
More powerful machine but even more data
●
One kind of analysis all kinds of analytics
●
Static Dynamic
●
Everything connected
Slide 8
Slide 8 text
Why now?
• Can’t afford static partitioning anymore
• Can’t afford to be in-accessible
• Can’t afford to wait for releasing next
feature
Slide 9
Slide 9 text
What you care about?
• Scalable
• Fault tolerant
• High resource utilization
• Isolation
Slide 10
Slide 10 text
Bonus
• Mesos-isphy anything. Extremely easy to
port any.
• Battle tested in the field.
• Great community.
• Awesome UI.
Slide 11
Slide 11 text
Who is using Mesos?
Slide 12
Slide 12 text
Popular?
Slide 13
Slide 13 text
Give it a try
• Mesos has always been good in tooling. Its
becoming even more easier.
• Run over AWS. Now also, Elastic Mesos()
• Vargant scripts
• Chef-cookbooks
• Binary packages, debs,..