What
about
it?
• We
always
associate
solr
with
searching
• solr
can
also
serve
as
your
non-‐relational
data
layer
Slide 3
Slide 3 text
solr
?
NoSQL
?
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Hmmm
why
not?
• Hey
solr
is
already
part
of
my
stack
• I
love
solr
• It’s
fast,
scalable
and
there
are
some
great
python
interfaces
out
there
Slide 6
Slide 6 text
When
would
you
consider
it?
• You
have
a
DB
that’s
frequently
read
and
infrequently
written
• You
want
robust
search
&
filtering
on
your
data
• You
want
to
leverage
the
faceting
feature
• You
want
an
awesome
scalable
data
layer
Slide 7
Slide 7 text
What’s
not
so
cool?
• Doesn’t
support
transactions
• Not
all
SQL
queries
can
be
translated
into
solr
queries
• Generating
indices
can
take
a
long
time
• Index
optimization
can
take
a
long
time
Slide 8
Slide 8 text
But..
• You
don’t
have
to
give
up
your
relational
data
layer
• Create
a
non-‐relational
layer
on
top
of
your
relational
data
layer
• Get
best
of
the
both
worlds
Slide 9
Slide 9 text
Why
did
we
choose
solr?
• We
deal
with
medical
survey
data
• Say:
– About
300
multiple
choice
questions
– Responses
can
be
multi-‐dimensional
– 7000+
different
answer
choices
per
question
– 2000+
respondents
per
survey
– 15+
surveys
and
growing
Slide 10
Slide 10 text
Osteoarthritis
Rheumatoid
Arthritis
Traumatic
Arthritis
Psoriatic
Arthritis
Other
Less
than
a
year
ago
þ
☐
☐
☐
☐
More
than
a
year
ago
☐
☐
þ
☐
☐
When
were
you
diagnosed
with
the
following
types
of
Arthri5s?
What
a
survey
question
looks
like
Slide 11
Slide 11 text
When
were
you
diagnosed
with
the
following
types
of
Arthri5s?
Osteoarthritis
Rheumatoid
Arthritis
Traumatic
Arthritis
Psoriatic
Arthritis
Other
Less
than
a
year
ago
1
0
0
0
0
More
than
a
year
ago
0
0
1
0
0
Storing
a
single
response
Slide 12
Slide 12 text
When
were
you
diagnosed
with
the
following
types
of
Arthri5s?
Osteoarthritis
Rheumatoid
Arthritis
Traumatic
Arthritis
Psoriatic
Arthritis
Other
Less
than
a
year
ago
63
155
19
27
268
More
than
a
year
ago
190
46
8
213
325
Aggregating
over
2000
responses
Slide 13
Slide 13 text
What
did
we
do?
• Each
survey
response
=
solr
document
• Add
respondent
meta
information:
age,
profession,
interests
• Up
to
3000
boolean
variables
per
document
indicating
chosen
answers
Slide 14
Slide 14 text
What
did
we
do?
• Filter
by
age,
interest,
profession
• Facet
across
boolean
field
• Result:
what
group
of
people
chose
what
group
of
answers
Slide 15
Slide 15 text
Why
solr
is
awesome..
• Faceting
across
boolean
field
uses
very
little
memory
• Combining
3000
fields
for
2000
documents
takes
1
~
2
ms
• Allowed
us
to
reduce
API
response
time
from
a
variable
of
2
~
15
seconds
(sucked!)
to
an
almost
constant
~50
ms
Slide 16
Slide 16 text
Good
to
know..
• sunburnt:
Awesome
python
solr
interface
github.com/tow/sunburnt
• Programmatic
querying
as
well
as
raw
queries
• Supports
most
advanced
solr
options
• If
you
only
required
facets,
specify
rows=0