Benjamin
Hindman
–
@benh
Apache
Mesos
incubator.apache.org/mesos
@ApacheMesos
Slide 2
Slide 2 text
history
Berkeley
research
project
including
Benjamin
Hindman,
Andy
Konwinski,
Matei
Zaharia,
Ali
Ghodsi,
Anthony
D.
Joseph,
Randy
Katz,
Scott
Shenker,
Ion
Stoica
http://incubator.apache.org/mesos/research.html
Slide 3
Slide 3 text
Mesos
aims
to
make
it
easier
to
build
distributed
applications/frameworks
and
share
datacenter
resources
Slide 4
Slide 4 text
applications/frameworks
services
analytics
Slide 5
Slide 5 text
analytics
services
applications/frameworks
Slide 6
Slide 6 text
deploying
things
today:
static
partitioning
Node
Node
Hadoop
Node
Node
service
…
Slide 7
Slide 7 text
static
partitioning
considered
harmful
Node
Node
Hadoop
Node
Node
service
…
Slide 8
Slide 8 text
static
partitioning
considered
harmful
hard
to
fully
utilize
machines
(e.g.,
72
GB
RAM
and
24
CPUs)
Node
Node
Hadoop
Node
Node
service
…
Slide 9
Slide 9 text
static
partitioning
considered
harmful
harder
to
deal
with
failures
Node
Node
Hadoop
Node
Node
service
…
X
Slide 10
Slide 10 text
static
partitioning
considered
harmful
Node
Node
Hadoop
Node
Node
service
…
Node
Node
Node
harder
to
scale
elastically
Slide 11
Slide 11 text
Mesos
Node
Node
Hadoop
Node
Node
service
…
Slide 12
Slide 12 text
level
of
indirection
Mesos
Node
Node
Node
Node
Hadoop
service
…
Node
Node
Hadoop
Node
Node
service
…
Slide 13
Slide 13 text
a
“kernel”
for
the
datacenter
Mesos
Node
Node
Node
Node
Hadoop
service
…
Node
Node
Hadoop
Node
Node
service
…
Slide 14
Slide 14 text
Twitter’s
“kernel”
for
the
datacenter
Mesos
Node
Node
Node
Node
Hadoop
service
…
Node
Node
Hadoop
Node
Node
service
…
architecture
Mesos
master
Mesos
slave
Mesos
slave
service
Y
scheduler
requests
resources,
assign
tasks
Slide 19
Slide 19 text
frameworks
1.
scheduler
2.
executor
(optional,
if
you
don’t
just
want
to
run
a
single
command)
Slide 20
Slide 20 text
architecture
Mesos
master
Mesos
slave
Mesos
slave
service
Y
scheduler
service
Y
task
(Netty
server)
service
Y
executor
Netty
Server
runs
tasks,
reports
status
updates
Slide 21
Slide 21 text
architecture
service
X
scheduler
allocation
module
Mesos
master
Mesos
slave
Mesos
slave
decides
how
to
allocate
resources
service
Y
scheduler
service
Y
task
(Netty
server)
service
Y
executor
Netty
Server
Slide 22
Slide 22 text
“two-‐level
scheduling”
Mesos:
controls
resource
allocations
to
applications/frameworks
applications/frameworks:
make
decisions
about
what
to
run
Slide 23
Slide 23 text
dominant
resource-‐fairness
default
allocation
policy
(see
incubator.apache.org/mesos/research.html
for
more
info)
help
us
write
new
allocators!
Slide 24
Slide 24 text
architecture
service
X
scheduler
allocation
module
Mesos
master
Mesos
slave
service
X
executor
Mesos
slave
task
launches,
isolates,
and
monitors
tasks
and
executors
service
Y
scheduler
service
Y
task
(Netty
server)
service
Y
executor
Netty
Server
request
offer
Slide 25
Slide 25 text
“kernel”
primitives
for
building
frameworks
messaging
(unreliable)
mechanisms
for
high-‐availability
fault-‐detection
resource
isolation
(cgroups)
resource
monitoring
Twitter
framework
a
framework
that
makes
deploying
and
managing
productions
servers
easy
jobs/servers
are
submitted
to
the
framework
via
a
configuration
file
provides
mechanisms:
» rolling
restarts/updates
» relaunching
processes
after
failures
(if
requested)
» and
more!
Slide 29
Slide 29 text
demo
Slide 30
Slide 30 text
details
50,000+
lines
of
C++
libprocess
for
asynchronous
actor
style
concurrency
(github.com/libprocess)
APIs
in
C++,
Java,
Python
protobuf
for
data
transport,
data
types
zookeeper
support
for
high-‐availability
linux
control
groups
support
(LXC/cgroups)
genomics
researchers
using
Hadoop
and
Spark
Building
a
new
framework
for
job
workflows,
wants
to
use
Spark
and
Hadoop
too
Built
DPark
(a
Python
clone
of
Spark),
also
running
MPI
Hadoop
and
Spark
used
by
machine
learning
researchers
Slide 33
Slide 33 text
future
smarter
allocator
support
(priority,
weighted
fair-‐sharing,
etc)
better
resource
monitoring/collection
other
primitives
for
building
applications/
frameworks
systems?
other
frameworks!?
Slide 34
Slide 34 text
try
it
out!
run
on
bare-‐metal
or
virtual
machines
–
develop
against
Mesos
API
and
run
in
private
datacenter,
or
the
cloud,
or
both!