An Introduction to Search-index.js

An Introduction to search-index.js Fergus McDowall Web Rebels Oslo 2015

Deepish, but not too deep

Fergus McDowall https://github.com/fergiemcdowall https://twitter.com/fergiemcdowall https://no.linkedin.com/in/fergusmcdowall

search-index.js  Node.js module on the LeBroN stack containing core search
functionality (A bit like Lucene) Norch  HTTP-GET wrapper around search-index.js (a bit like Solr/Elastic)

LeBroN (Level*, Browserify, Node.js) http://lebron.technology/

Old ’n’ Busted: Search Indexes on the server All magic
happening on the server Clients sending search queries to server and reading returned responses

New Hotness: Search indexes in the Browser Less magic happening
on the server Clients replicating and building their own indexes, query<->result is local

Make “Small Data” Search Apps

The GitHubs you need https://github.com/fergiemcdowall/search-index https://github.com/fergiemcdowall/norch

The Document Processing Pipeline

Document processing

Lodash is your friend! For example you might process: https://raw.githubusercontent.com/
fergiemcdowall/world-bank-dataset/ master/world-bank-projects.json With https://gist.github.com/fergiemcdowall/ dceec9930327cb92467b EXPLORE!

Norch.js Apps

Install norch.js: ➜ norchdir npm install norch Run norch.js: ➜
norchdir ./node_modules/norch/bin/norch Index some data into norch.js: ➜ datadir curl --form document=@node_modules/ reuters-21578-json/data/full/reuters-000.json http:// localhost:3030/indexer --form filterOn=places,topics,organisations

An ng front end that talks to your norch server.
git clone https://github.com/fergiemcdowall/norch-angular-app cd norch-angular-app curl --form document=@world-bank-projects-norchified.json http://localhost:3030/indexer --form filterOn=mjtheme,totalamt (for localhost development be aware of access-control-origin- header) norch -c http://localhost:8000

Node.js Apps

Main.js Run main.js: ➜ examples node main.js

Replicating an entire index over the net-    -MADNESS…?

…or perhaps not? Runs on ALL browsers persistent (because IndexedDB)
Indexes are small Lower server costs Network caching is magical Net getting faster User experience

Browser Apps (Data from Browser)

index.html: main.js: Browserify that bad boy: ➜ dir browserify main.js
-o bundle.js …and open index.html in a browser:

Browser Apps (Replicate to Browser)

Index some data and then use the replicate API to
create a snapshot: Code snippet in main.js that handles replication: Run ➜ browserifydir node indexgenerator.js   ➜ browserifydir gunzip backup.gz ➜ browserifydir browserify main.js -o bundle.js

Browser Apps (data from PouchDB)

Index an entire PouchDB instance:

For source code, deeper explanation, and full demos check out
https://github.com/fergiemcdowall/search- index/tree/master/examples

Indexing Documents (getting stuff in)

Document Format All fields are optional, but if id isn’t
present, it will be autogenerated {  id: ‘aTotallyOptionalID’,  title: ‘A Really Cool Title’,  tags: [‘coolness’, ‘awsomeness’]  body: ‘Bla bla bla bla, lots of text here…’  }

Batch Format Use batches to index lots of data. Bigger
batches are faster if your hardware can cope. [  {  id: ‘1’,  title: ‘A Really Cool Title’,  tags: [‘coolness’, ‘awsomeness’]  body: ‘Sparkly w00p w00p, lots of text here…’  },  {  id: ‘two’,  title: ‘A Really Boring Title’,  tags: [‘dullness’, ‘boringness’]  body: ‘Bla bla bla bla, lots of text here…’  }  ]

A word on numeric sorting search-index sorts alphabetically, so all
numbers have to be stored as strings. [  {  id: ‘1’,  name: ‘Ruckus’,  price: [‘000000000050000’]  manufacturer: ‘Honda’  },  {  id: ‘2’,  name: ‘Grom’,  price: [‘000000000100000’]  manufacturer: ‘Honda’  }  ]

A word on numeric sorting Here’s a nice number stringify-padding
function:

Working with Queries (getting stuff out)

Basic Queries Search all fields for “africa bank”  {  "query":
{"*": ["africa", “bank"]}  } Search title field for “africa bank”  {  "query": {"title": ["africa", “bank"]}  } Search title field for “africa”, body for “bank”  {  "query": {"title": [“africa”], "body": [“bank”]}  }

Basic Queries Return everything in index  {  "query": {"*": [“*"]} 
} 

Facets Simple facets  {  "query": {"*": ["africa", “bank”]},  ”facets”: {“totalamt":
{}}  } Or define ranges of values  {  "query": {"*": ["africa", “bank”]},  ”facets”: {  "totalamt": {  "ranges":[  ["000000000000000","000000050000000"],  ["000000050000001","100000000000000"  ]  }  }  } You can also sort and limit your facets

Filters Filters allow you to query on facets  {  "query":
{"*": ["africa", “bank”]},  ”filter”: {  “totalamt" {["000000000000000",  "000000050000000"]}  }  } You always specify a range so to filter on one value do  {  "query": {"*": ["africa", “bank”]},  ”filter”: {  “totalamt" {["000000050000000",  "000000050000000"]}  }  } You can filter on as many ranges as you want.

Other stuff pageSize  hits per page offset  used for paging
teaser  creates a small text preview containing query terms in the document weight  used to create relevancy models

Results Example here

Parsing Results Caveman JavaScript Angular …and anything else you can
think of

Replication

Replication in Norch Make snapshot of mother index  curl http://localhost:3030/snapshot
-o snapshot.gz Empty target index (if necessary)  curl http://localhost:3030/empty Replicate into new or emptied index  curl -X POST http://localhost:3030/replicate  --data-binary @snapshot.gz  -H "Content-Type: application/gzip"

Replication in  search-index.js Make snapshot of mother index   Empty
target index    Replicate into new or emptied index

Strengths and  Weaknesses

Strengths Super-portable Easy to install and use Performant for simple
queries Runs on low-end hardware (server and browser) Replication Weaknesses Strictly a small data Limited feature set Relatively small community compared to other search technologies (Elastic, Solr)

Future Direction

Richer query syntax Better docs, examples and tutorials Service for
generating indexes Better compression Mad science Performance, bugfixes, etc

Hey Browsers! Allow OpenSearch to talk to search indexes in
the webpage

Get Involved Submit a pull-request Make cool stuff Anything on
IOS or Android

Thanks For Listening! @fergiemcdowall https://github.com/fergiemcdowall/norch  https://github.com/fergiemcdowall/search-index

An Introduction to Search-index.js

An Introduction to Search-index.js

Featured

Transcript