George
Stathis
VP
Engineering
14+
years
of
experience
building
full-‐stack
web
soHware
systems
with
a
past
focus
on
e-‐commerce
and
publishing.
Currently
responsible
for
building
engineering
capability
to
enable
Traackr's
growth
goals.
What’s
this
talk
about?
• Share
what
we
know
about
Big
Data/NoSQL:
what’s
behind
the
buzz
words?
• Our
reasons
and
method
for
picking
a
NoSQL
database
• Share
the
lessons
we
learned
going
through
the
process
What
is
Big
Data?
Volume
+
Velocity
• Data
sets
too
large
or
coming
in
at
too
high
a
velocity
to
process
using
tradi;onal
databases
or
desktop
tools.
E.g.
big
science
web
logs
rfid
sensor
networks
social
networks
social
data
internet
text
and
documents
internet
search
indexing
call
detail
records
Astronomy
atmospheric
science
genomics
biogeochemical
military
surveillance
medical
records
photography
archives
video
archives
large-‐scale
e-‐commerce
What
is
NoSQL?
• NoSQL
≠
No
SQL
• NoSQL
≈
Not
Only
SQL
• NoSQL
addresses
RDBMS
limita;ons,
it’s
not
about
the
SQL
language
• RDBMS
=
sta;c
schema
• NoSQL
=
schema
flexibility;
don’t
have
to
know
exact
structure
before
storing
What
is
Distributed
Compu;ng?
• Sharing
the
workload:
divide
a
problem
into
many
tasks,
each
of
which
can
be
solved
by
one
or
more
computers
• Allows
computa;ons
to
be
accomplished
in
acceptable
;meframes
• Distributed
computa;on
approaches
were
developed
to
leverage
mul;ple
machines:
MapReduce
• With
MapReduce,
the
program
goes
to
the
data
since
the
data
is
too
big
to
move
What
is
Big
Data?
Velocity
• In
some
instances,
being
able
to
process
large
amounts
of
data
in
real-‐;me
can
yield
a
compe;;ve
advantage.
E.g.
– Online
retailers
leveraging
buying
history
and
click-‐ though
data
for
real-‐;me
recommenda;ons
• No
;me
to
wait
for
MapReduce
jobs
to
finish
• Solu;ons:
streaming
processing
(e.g.
Twider
Storm),
pre-‐compu;ng
(e.g.
aggregate
and
count
analy;cs
as
data
arrives),
quick
to
read
key/value
stores
(e.g.
distributed
hashes)
What
is
Big
Data?
Data
Science
• Emergence
of
Data
Science
• Data
Scien;st
≈
Sta;s;cian
• Possess
scien;fic
discipline
&
exper;se
• Formulate
and
test
hypotheses
• Understand
the
math
behind
the
algorithms
so
they
can
tweak
when
they
don’t
work
• Can
dis;ll
the
results
into
an
easy
to
understand
story
• Help
businesses
gain
ac;onable
insights
Traackr:
context
• A
cloud
compu;ng
company
as
about
to
launch
a
new
plakorm;
how
does
it
find
the
most
influen;al
IT
bloggers
on
the
web
that
can
help
bring
visibility
to
the
new
product?
How
does
it
find
the
opinion
leaders,
the
people
that
mader?
Requirement:
batch
processing
MapReduce
+
RDBMS:
Possible
but
proprietary
solu;ons
Usually
involves
expor;ng
data
from
RDBMS
into
a
NoSQL
system
anyway.
Defeats
data
locality
benefit
of
MR
Traackr’s
Datastore
Requirements
• Schema
flexibility
• Good
at
storing
lots
of
variable
length
text
• Batch
processing
op;ons
✓
✓
A
NoSQL
op;on
is
the
right
fit
✓
Bewildering
number
of
op;ons
(early
2010)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Bewildering
number
of
op;ons
(early
2010)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Graph
Databases:
while
we
can
model
our
domain
as
a
graph
we
don’t
want
to
pigeonhole
ourselves
into
this
structure.
We’d
rather
use
these
tools
for
specialized
data
analysis
but
not
as
the
main
data
store.
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Memcache:
memory-‐based,
we
need
true
persistence
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Amazon
SimpleDB:
not
willing
to
store
our
data
in
a
proprietary
datastore.
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Not
willing
to
store
our
data
in
a
proprietary
datastore.
Redis
and
LinkedIn’s
Project
Voldermort:
no
query
filters,
beder
used
as
queues
or
distributed
caches
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
CouchDB:
no
ad-‐hoc
queries;
maturity
in
early
2010
made
us
shy
away
although
we
did
try
early
prototypes.
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Cassandra:
in
early
2010,
maturity
ques;ons,
no
secondary
indexes
and
no
batch
processing
op;ons
(came
later
on).
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
MongoDB:
in
early
2010,
maturity
ques;ons,
adop;on
ques;ons
and
no
batch
processing
op;ons.
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Riak:
very
close
but
in
early
2010,
we
had
adop;on
ques;ons.
Trimming
op;ons
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
HBase:
came
across
as
the
most
mature
at
the
;me,
with
several
deployments,
a
healthy
community,
"out-‐of-‐the
box"
secondary
indexes
through
a
contrib
and
support
for
batch
processing
using
Hadoop/MR
.
Rewards:
Choices
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
When
Big-‐Data
=
Big
Architectures
Source:
hdp://www.larsgeorge.com/2009/10/hbase-‐architecture-‐101-‐storage.html
Must
have
a
Hadoop
HDFS
cluster
of
at
least
2x
replica;on
factor
nodes
Must
have
an
odd
number
of
Zookeeper
quorum
nodes
Then
you
can
run
your
Hbase
nodes
but
it’s
recommended
to
co-‐locate
regionservers
with
hadoop
datanodes
so
you
have
to
manage
resources.
Master/slave
architecture
means
a
single
point
of
failure,
so
you
need
to
protect
your
master.
And
then
we
also
have
to
manage
the
MapReduce
processes
and
resources
in
the
Hadoop
layer.
Unique
key
“adributes”
column
family
for
general
adributes
“influencerId”
column
family
for
influencer
ranks
and
foreign
keys
Mapping
an
saved
search
to
a
column
store
Let’s
get
this
straight
• Hbase
no
longer
comes
with
secondary
indexing
out-‐of-‐the-‐box
• It’s
been
moved
out
of
the
trunk
to
GitHub
• Where
only
one
other
company
besides
us
seems
to
care
about
it
Cracks
in
the
data
model
huffingtonpost.com
huffingtonpost.com
hdp://www.huffingtonpost.com/arianna-‐huffington/post_1.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_2.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_3.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post1.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post2.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post3.html
writes
for
authored
by
published
under
writes
for
authored
by
published
under
Cracks
in
the
data
model
huffingtonpost.com
huffingtonpost.com
hdp://www.huffingtonpost.com/arianna-‐huffington/post_1.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_2.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_3.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post1.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post2.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post3.html
writes
for
authored
by
published
under
writes
for
authored
by
published
under
Denormalized/duplicated
for
fast
run;me
access
and
storage
of
influencer-‐ to-‐site
rela;onship
proper;es
Cracks
in
the
data
model
huffingtonpost.com
huffingtonpost.com
hdp://www.huffingtonpost.com/arianna-‐huffington/post_1.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_2.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_3.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post1.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post2.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post3.html
writes
for
authored
by
published
under
writes
for
authored
by
published
under
Content
adribu;on
logic
could
some;mes
mis-‐adribute
posts
because
of
the
duplicated
data.
Cracks
in
the
data
model
huffingtonpost.com
huffingtonpost.com
hdp://www.huffingtonpost.com/arianna-‐huffington/post_1.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_2.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_3.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post1.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post2.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post3.html
writes
for
authored
by
published
under
writes
for
authored
by
published
under
Exacerbated
when
we
started
tracking
people’s
content
on
a
daily
basis
in
mid-‐2011
Fixing
the
cracks
in
the
data
model
huffingtonpost.com
hdp://www.huffingtonpost.com/arianna-‐huffington/post_1.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_2.html
hdp://www.huffingtonpost.com/arianna-‐huffington/post_3.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post1.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post2.html
hdp://www.huffingtonpost.com/shaun-‐donovan/post3.html
writes
for
authored
by
published
under
writes
for
authored
by
published
under
Normalize
the
sites
Traackr’s
Datastore
Requirements
(Revisited)
• Schema
flexibility
• Good
at
storing
lots
of
variable
length
text
• Out-‐of-‐the-‐box
SECONDARY
INDEX
support!
• Simple
to
use
and
administer
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Nope!
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Graph
Databases:
we
looked
at
Neo4J
a
bit
closer
but
passed
again
for
the
same
reasons
as
before.
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Memcache:
s;ll
no
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Amazon
SimpleDB:
s;ll
no.
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Not
willing
to
store
our
data
in
a
proprietary
datastore.
Redis
and
LinkedIn’s
Project
Voldermort:
s;ll
no
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
CouchDB:
more
mature
but
s;ll
no
ad-‐hoc
queries.
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Cassandra:
matured
quite
a
bit,
added
secondary
indexes
and
batch
processing
op;ons
but
more
restric;ve
in
its’
use
than
other
solu;ons.
AHer
the
Hbase
lesson,
simplicity
of
use
was
now
more
important.
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
Riak:
strong
contender
s;ll
but
adop;on
ques;ons
remained.
NoSQL
picking
–
Round
2
(mid
2011)
Key/Value
Databases
• Distributed
hashtables
• Designed
for
high
load
• In-‐memory
or
on-‐disk
• Eventually
consistent
Column
Databases
• Spread
sheet
like
• Key
is
a
row
id
• Adributes
are
columns
• Columns
can
be
grouped
into
families
Document
Databases
• Like
Key/Value
• Value
=
Document
• Document
=
JSON/BSON
• JSON
=
Flexible
Schema
Graph
Databases
• Graph
Theory
G=(E,V)
• Great
for
modeling
networks
• Great
for
graph-‐based
query
algorithms
MongoDB:
matured
by
leaps
and
bounds,
increased
adop;on,
support
from
10gen,
advanced
indexing
out-‐of-‐the-‐box
as
well
as
some
batch
processing
op;ons,
breeze
to
use,
well
documented
and
fit
into
our
exis;ng
code
base
very
nicely.
Immediate
Benefits
• No
more
maintaining
custom
applica;on-‐layer
secondary
indexing
code
• Single
binary
installa;on
greatly
simplifies
administra;on
• Our
NoSQL
could
now
support
our
domain
model
Other
Benefits
• Ad
hoc
queries
and
reports
became
easier
to
write
with
JavaScript:
no
need
for
a
Java
developer
to
write
map
reduce
code
to
extract
the
data
in
a
usable
form
like
it
was
needed
with
Hbase.
• Simpler
backups:
Hbase
mostly
relied
on
HDFS
redundancy;
intra-‐ cluster
replica;on
is
available
but
experimental
and
a
lot
more
involved
to
setup.
• Great
documenta;on
• Great
adop;on
and
community
Recap
&
Final
Thoughts
• 3
Vs
of
Big
Data:
– Volume
– Velocity
– Variety
ß
Traackr
• Big
Data
technologies
are
complementary
to
SQL
and
RDBMS
• Un;l
machines
can
think
for
themselves
Data
Science
will
be
increasingly
important
Recap
&
Final
Thoughts
• Be
prepared
to
deal
with
less
mature
tech
• Be
as
flexible
as
the
data
=>
fearless
refactoring
• Importance
of
ease
of
use
and
administra;on
cannot
be
overstated
for
a
small
startup