The solr Power - Speaker Deck

Slide 1

Slide 1 text

Tareque Hossain Sr. Software Engineer The Power

Slide 2

Slide 2 text

What about it? •  We always associate solr with searching •  solr can also serve as your non-‐relational data layer

Slide 3

Slide 3 text

solr ? NoSQL ?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Hmmm why not? •  Hey solr is already part of my stack •  I love solr •  It’s fast, scalable and there are some great python interfaces out there

Slide 6

Slide 6 text

When would you consider it? •  You have a DB that’s frequently read and infrequently written •  You want robust search & ﬁltering on your data •  You want to leverage the faceting feature •  You want an awesome scalable data layer

Slide 7

Slide 7 text

What’s not so cool? •  Doesn’t support transactions •  Not all SQL queries can be translated into solr queries •  Generating indices can take a long time •  Index optimization can take a long time

Slide 8

Slide 8 text

But.. •  You don’t have to give up your relational data layer •  Create a non-‐relational layer on top of your relational data layer •  Get best of the both worlds

Slide 9

Slide 9 text

Why did we choose solr? •  We deal with medical survey data •  Say: – About 300 multiple choice questions – Responses can be multi-‐dimensional – 7000+ diﬀerent answer choices per question – 2000+ respondents per survey – 15+ surveys and growing

Slide 10

Slide 10 text

Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis Psoriatic Arthritis Other Less than a year ago þ ☐ ☐ ☐ ☐ More than a year ago ☐ ☐ þ ☐ ☐ When were you diagnosed with the following types of Arthri5s? What a survey question looks like

Slide 11

Slide 11 text

When were you diagnosed with the following types of Arthri5s? Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis Psoriatic Arthritis Other Less than a year ago 1 0 0 0 0 More than a year ago 0 0 1 0 0 Storing a single response

Slide 12

Slide 12 text

When were you diagnosed with the following types of Arthri5s? Osteoarthritis Rheumatoid Arthritis Traumatic Arthritis Psoriatic Arthritis Other Less than a year ago 63 155 19 27 268 More than a year ago 190 46 8 213 325 Aggregating over 2000 responses

Slide 13

Slide 13 text

What did we do? •  Each survey response = solr document •  Add respondent meta information: age, profession, interests •  Up to 3000 boolean variables per document indicating chosen answers

Slide 14

Slide 14 text

What did we do? •  Filter by age, interest, profession •  Facet across boolean ﬁeld •  Result: what group of people chose what group of answers

Slide 15

Slide 15 text

Why solr is awesome.. •  Faceting across boolean ﬁeld uses very little memory •  Combining 3000 ﬁelds for 2000 documents takes 1 ~ 2 ms •  Allowed us to reduce API response time from a variable of 2 ~ 15 seconds (sucked!) to an almost constant ~50 ms

Slide 16

Slide 16 text

Good to know.. •  sunburnt: Awesome python solr interface github.com/tow/sunburnt •  Programmatic querying as well as raw queries •  Supports most advanced solr options •  If you only required facets, specify rows=0

Slide 17

Slide 17 text

Questions? •  wisertogether.com •  @tarequeh