The solr Power

Tareque Hossain Sr. Software Engineer The
Power

What about it? •  We always associate solr with
searching •  solr can also serve as your non-‐relational data layer

solr ? NoSQL ?

Hmmm why not? •  Hey solr is already part
of my stack •  I love solr •  It’s fast, scalable and there are some great python interfaces out there

When would you consider it? •  You have a
DB that’s frequently read and infrequently written •  You want robust search & ﬁltering on your data •  You want to leverage the faceting feature •  You want an awesome scalable data layer

What’s not so cool? •  Doesn’t support transactions
•  Not all SQL queries can be translated into solr queries •  Generating indices can take a long time •  Index optimization can take a long time

But.. •  You don’t have to give up your
relational data layer •  Create a non-‐relational layer on top of your relational data layer •  Get best of the both worlds

Why did we choose solr? •  We deal with
medical survey data •  Say: – About 300 multiple choice questions – Responses can be multi-‐dimensional – 7000+ diﬀerent answer choices per question – 2000+ respondents per survey – 15+ surveys and growing

Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis
Psoriatic Arthritis Other Less than a year ago þ ☐ ☐ ☐ ☐ More than a year ago ☐ ☐ þ ☐ ☐ When were you diagnosed with the following types of Arthri5s? What a survey question looks like

When were you diagnosed with the following types of
Arthri5s? Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis Psoriatic Arthritis Other Less than a year ago 1 0 0 0 0 More than a year ago 0 0 1 0 0 Storing a single response

When were you diagnosed with the following types of
Arthri5s? Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis Psoriatic Arthritis Other Less than a year ago 63 155 19 27 268 More than a year ago 190 46 8 213 325 Aggregating over 2000 responses

What did we do? •  Each survey response =
solr document •  Add respondent meta information: age, profession, interests •  Up to 3000 boolean variables per document indicating chosen answers

What did we do? •  Filter by age, interest,
profession •  Facet across boolean ﬁeld •  Result: what group of people chose what group of answers

Why solr is awesome.. •  Faceting across boolean ﬁeld
uses very little memory •  Combining 3000 ﬁelds for 2000 documents takes 1 ~ 2 ms •  Allowed us to reduce API response time from a variable of 2 ~ 15 seconds (sucked!) to an almost constant ~50 ms

Good to know.. •  sunburnt: Awesome python solr interface
github.com/tow/sunburnt •  Programmatic querying as well as raw queries •  Supports most advanced solr options •  If you only required facets, specify rows=0

Questions? •  wisertogether.com •  @tarequeh

The solr Power

The solr Power

tarequeh

Other Decks in Programming

Featured

Transcript

Tareque Hossain Sr. Software Engineer The

What about it? •  We always associate solr with

solr ? NoSQL ?

Hmmm why not? •  Hey solr is already part

When would you consider it? •  You have a

What’s not so cool? •  Doesn’t support transactions

But.. •  You don’t have to give up your

Why did we choose solr? •  We deal with

Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis

When were you diagnosed with the following types of

When were you diagnosed with the following types of

What did we do? •  Each survey response =

What did we do? •  Filter by age, interest,

Why solr is awesome.. •  Faceting across boolean ﬁeld

Good to know.. •  sunburnt: Awesome python solr interface

Questions? •  wisertogether.com •  @tarequeh