Slide 1

Slide 1 text

Tareque  Hossain   Sr.  Software  Engineer     The   Power  

Slide 2

Slide 2 text

What  about  it?   •  We  always  associate  solr  with  searching   •  solr  can  also  serve  as  your  non-­‐relational   data  layer  

Slide 3

Slide 3 text

solr  ?    NoSQL  ?  

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Hmmm  why  not?   •  Hey  solr  is  already  part  of  my  stack   •  I  love  solr   •  It’s  fast,  scalable  and  there  are  some  great   python              interfaces  out  there  

Slide 6

Slide 6 text

When  would  you  consider  it?   •  You  have  a  DB  that’s  frequently  read  and   infrequently  written   •  You  want  robust  search  &  filtering  on  your   data   •  You  want  to  leverage  the  faceting  feature   •  You  want  an  awesome  scalable  data  layer  

Slide 7

Slide 7 text

What’s  not  so  cool?   •  Doesn’t  support  transactions   •  Not  all  SQL  queries  can  be  translated  into   solr  queries   •  Generating  indices  can  take  a  long  time   •  Index  optimization  can  take  a  long  time  

Slide 8

Slide 8 text

But..   •  You  don’t  have  to  give  up  your  relational   data  layer   •  Create  a  non-­‐relational  layer  on  top  of  your   relational  data  layer   •  Get  best  of  the  both  worlds  

Slide 9

Slide 9 text

Why  did  we  choose  solr?   •  We  deal  with  medical  survey  data   •  Say:   – About  300  multiple  choice  questions   – Responses  can  be  multi-­‐dimensional   – 7000+  different  answer  choices  per  question   – 2000+  respondents  per  survey   – 15+  surveys  and  growing  

Slide 10

Slide 10 text

Osteoarthritis   Rheumatoid   Arthritis   Traumatic   Arthritis   Psoriatic   Arthritis   Other   Less  than  a   year  ago   þ   ☐   ☐   ☐   ☐   More  than  a   year  ago   ☐   ☐   þ   ☐   ☐   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   What  a  survey  question  looks  like  

Slide 11

Slide 11 text

When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Osteoarthritis   Rheumatoid   Arthritis   Traumatic   Arthritis   Psoriatic   Arthritis   Other   Less  than  a   year  ago   1   0   0   0   0   More  than  a   year  ago   0   0   1   0   0   Storing  a  single  response  

Slide 12

Slide 12 text

When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Osteoarthritis   Rheumatoid   Arthritis   Traumatic   Arthritis   Psoriatic   Arthritis   Other   Less  than  a   year  ago   63   155   19   27   268   More  than  a   year  ago   190   46   8   213   325   Aggregating  over  2000  responses  

Slide 13

Slide 13 text

What  did  we  do?   •  Each  survey  response  =  solr  document   •  Add  respondent  meta  information:  age,   profession,  interests   •  Up  to  3000  boolean  variables  per  document   indicating  chosen  answers  

Slide 14

Slide 14 text

What  did  we  do?   •  Filter  by  age,  interest,  profession   •  Facet  across  boolean  field   •  Result:  what  group  of  people  chose  what   group  of  answers    

Slide 15

Slide 15 text

Why  solr  is  awesome..   •  Faceting  across  boolean  field  uses  very  little   memory   •  Combining  3000  fields  for  2000  documents   takes  1  ~  2  ms   •  Allowed  us  to  reduce  API  response  time   from  a  variable  of  2  ~  15  seconds  (sucked!)  to   an  almost  constant  ~50  ms    

Slide 16

Slide 16 text

Good  to  know..   •  sunburnt:  Awesome  python  solr  interface          github.com/tow/sunburnt   •  Programmatic  querying  as  well  as  raw   queries   •  Supports  most  advanced  solr  options   •  If  you  only  required  facets,  specify  rows=0  

Slide 17

Slide 17 text

Questions?   •  wisertogether.com   •  @tarequeh