Slide 1

Slide 1 text

M A G G I E N E L S O N ( @ M A G G I E 1 0 0 0 ) Findery: an Elasticsearch case study

Slide 2

Slide 2 text

R E D I S C O V E R Y O U R W O R L D What is Findery?

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

F I N D E RY T I M E L I N E

Slide 7

Slide 7 text

L O N G , L O N G A G O …

Slide 8

Slide 8 text

• Young startup

Slide 9

Slide 9 text

• RoR application

Slide 10

Slide 10 text

• Easy to prototype

Slide 11

Slide 11 text

• Easy to build in technical debt

Slide 12

Slide 12 text

“Hey, Maggie, can you add search?”

Slide 13

Slide 13 text

S E A R C H : F I R S T I T E R AT I O N

Slide 14

Slide 14 text

• using Solr • welcoming and informative community

Slide 15

Slide 15 text

• Solr isn’t AWS’s target audience • lots of babysitting of machines

Slide 16

Slide 16 text

• very robust, lots of features • geo features not in by default pre 4.x release - manage via plugins

Slide 17

Slide 17 text

• query language has a steep learning curve • difficult to share code with new/inexperienced developers

Slide 18

Slide 18 text

S E A R C H : E L A S T I C S E A R C H

Slide 19

Slide 19 text

“Y’know, for search!”

Slide 20

Slide 20 text

• a welcoming and informative community

Slide 21

Slide 21 text

• easy to set up and monitor on AWS • chef driven setup, multiple data nodes behind a load balancer • new nodes discoverable via well-maintained plugins • additional internal monitoring using Sematext

Slide 22

Slide 22 text

• easy to share with new developers (JSON ftw!) • geo features in by default

Slide 23

Slide 23 text

A S T H E C O M PA N Y G R O W S …

Slide 24

Slide 24 text

• Moving portions of the app from RoR to Node.js • Easy transition

Slide 25

Slide 25 text

• JSON queries remain usually unchanged • some changes to indexing strategy

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

G E N E R A L S I T E S E A R C H • ES 101 • index objects, search objects using basic match / query_string queries

Slide 28

Slide 28 text

W O R D A U T O C O M P L E T E • I <3 ngrams • tags autocomplete • username autocomplete

Slide 29

Slide 29 text

H AV E R S I N E F O R M U L A D I S TA N C E B E T W E E N P O I N T S O N T H E S U R FA C E O F A S P H E R E

Slide 30

Slide 30 text

A U T O C O M P L E T E W I T H A G E O C O M P O N E N T • “find venues near me” • about 60M records • ngram autocomplete for names of venues • combined with sorting by latitude and longitude • good performance when queries are pre warmed (in 10’s of ms) • even better performance if you can set a max on distance

Slide 31

Slide 31 text

D ATA O N A M A P : I S S U E S

Slide 32

Slide 32 text

• potentially lots of notes in a small area

Slide 33

Slide 33 text

• potentially lots of notes in a small area • potentially very few notes in a big area

Slide 34

Slide 34 text

• potentially lots of notes in a small area • potentially very few notes in a big area • some notes are (very subjectively) better than others

Slide 35

Slide 35 text

• potentially lots of notes in a small area • potentially very few notes in a big area • some notes are (very subjectively) better than others • some notes are (very objectively) better than others

Slide 36

Slide 36 text

D ATA O N A M A P : S O L U T I O N S

Slide 37

Slide 37 text

“ M A N Y N O T E S I N A S M A L L A R E A ” • simple bounding box query • boost notes based on scores • prevent “rich get richer” with score half life • boost newer content where it makes sense

Slide 38

Slide 38 text

“ F E W N O T E S I N A B I G A R E A ” • expand bbox until you find something? • order by distance AND score? (kind of expensive) • determine some sort of a base score? • quality vs. performance

Slide 39

Slide 39 text

“ S O M E N O T E S A R E ( S U B J E C T I V E LY ) B E T T E R T H A N O T H E R S ” • people tend to like what their friends like (incorporate the friend graph into queries) • people who liked X also liked Y (generate a few Y for every X) • a little bit of editorial content (easiest tech-wise, needs lots of people)

Slide 40

Slide 40 text

“ S O M E N O T E S A R E ( O B J E C T I V E LY ) B E T T E R T H A N O T H E R S ” • spam detection • note scoring per-user, per-IP, per-anything

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

“ P O P U L A R TA G S I N T H I S A R E A ” • fun to implement • lots of lat/long pairs • essentially a simple geo search based on object’s attributes

Slide 43

Slide 43 text

C O L L E C T I O N S O F N O T E S • multiple lat/long points in a collection • what does “near me” mean? • shape queries

Slide 44

Slide 44 text

“The interesting problems are the ones that are difficult.”

Slide 45

Slide 45 text

Q & A

Slide 46

Slide 46 text

• Photo credits: • http://www.flickr.com/photos/m2w2/1400035437/ • https://www.artsjournal.com/aestheticgrounds/bunnies-and-rabbits-and-hares-passivity-in-publicart/roas- rabbit-in-hackney-2008a/ • http://www.florentijnhofman.nl/dev/project.php?id=181 • http://exhibitioninquisition.wordpress.com/2012/02/10/four-facts-this-will-have-been/ • http://silence-design.deviantart.com/art/Rabbit-Street-Art-159681282 • http://www.flickr.com/photos/boonovista/391812573/ ! • More rabbits to visit: https://findery.com/heather/sets/rabbits-to-visit

Slide 47

Slide 47 text

M A G G I E N E L S O N @ M A G G I E 1 0 0 0 T H A N K S ! ! ! ! ! ! F I N D E RY. C O M