Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Search-index.js

Fergie McDowall
May 23, 2015
240

An Introduction to Search-index.js

WebRebels, Oslo, May 2015.

Fergie McDowall

May 23, 2015
Tweet

Transcript

  1. search-index.js
 Node.js module on the LeBroN stack containing core search

    functionality (A bit like Lucene) Norch
 HTTP-GET wrapper around search-index.js (a bit like Solr/Elastic)
  2. Old ’n’ Busted: Search Indexes on the server All magic

    happening on the server Clients sending search queries to server and reading returned responses
  3. New Hotness: Search indexes in the Browser Less magic happening

    on the server Clients replicating and building their own indexes, query<->result is local
  4. Lodash is your friend! For example you might process: https://raw.githubusercontent.com/

    fergiemcdowall/world-bank-dataset/ master/world-bank-projects.json With https://gist.github.com/fergiemcdowall/ dceec9930327cb92467b EXPLORE!
  5. Install norch.js: ➜ norchdir npm install norch Run norch.js: ➜

    norchdir ./node_modules/norch/bin/norch Index some data into norch.js: ➜ datadir curl --form document=@node_modules/ reuters-21578-json/data/full/reuters-000.json http:// localhost:3030/indexer --form filterOn=places,topics,organisations
  6. An ng front end that talks to your norch server.

    git clone https://github.com/fergiemcdowall/norch-angular-app cd norch-angular-app curl --form [email protected] http://localhost:3030/indexer --form filterOn=mjtheme,totalamt (for localhost development be aware of access-control-origin- header) norch -c http://localhost:8000
  7. …or perhaps not? Runs on ALL browsers persistent (because IndexedDB)

    Indexes are small Lower server costs Network caching is magical Net getting faster User experience
  8. index.html: main.js: Browserify that bad boy: ➜ dir browserify main.js

    -o bundle.js …and open index.html in a browser:
  9. Index some data and then use the replicate API to

    create a snapshot: Code snippet in main.js that handles replication: Run ➜ browserifydir node indexgenerator.js 
 ➜ browserifydir gunzip backup.gz ➜ browserifydir browserify main.js -o bundle.js
  10. For source code, deeper explanation, and full demos check out

    https://github.com/fergiemcdowall/search- index/tree/master/examples
  11. Document Format All fields are optional, but if id isn’t

    present, it will be autogenerated {
 id: ‘aTotallyOptionalID’,
 title: ‘A Really Cool Title’,
 tags: [‘coolness’, ‘awsomeness’]
 body: ‘Bla bla bla bla, lots of text here…’
 }
  12. Batch Format Use batches to index lots of data. Bigger

    batches are faster if your hardware can cope. [
 {
 id: ‘1’,
 title: ‘A Really Cool Title’,
 tags: [‘coolness’, ‘awsomeness’]
 body: ‘Sparkly w00p w00p, lots of text here…’
 },
 {
 id: ‘two’,
 title: ‘A Really Boring Title’,
 tags: [‘dullness’, ‘boringness’]
 body: ‘Bla bla bla bla, lots of text here…’
 }
 ]
  13. A word on numeric sorting search-index sorts alphabetically, so all

    numbers have to be stored as strings. [
 {
 id: ‘1’,
 name: ‘Ruckus’,
 price: [‘000000000050000’]
 manufacturer: ‘Honda’
 },
 {
 id: ‘2’,
 name: ‘Grom’,
 price: [‘000000000100000’]
 manufacturer: ‘Honda’
 }
 ]
  14. Basic Queries Search all fields for “africa bank”
 {
 "query":

    {"*": ["africa", “bank"]}
 } Search title field for “africa bank”
 {
 "query": {"title": ["africa", “bank"]}
 } Search title field for “africa”, body for “bank”
 {
 "query": {"title": [“africa”], "body": [“bank”]}
 }
  15. Facets Simple facets
 {
 "query": {"*": ["africa", “bank”]},
 ”facets”: {“totalamt":

    {}}
 } Or define ranges of values
 {
 "query": {"*": ["africa", “bank”]},
 ”facets”: {
 "totalamt": {
 "ranges":[
 ["000000000000000","000000050000000"],
 ["000000050000001","100000000000000"
 ]
 }
 }
 } You can also sort and limit your facets
  16. Filters Filters allow you to query on facets
 {
 "query":

    {"*": ["africa", “bank”]},
 ”filter”: {
 “totalamt" {["000000000000000",
 "000000050000000"]}
 }
 } You always specify a range so to filter on one value do
 {
 "query": {"*": ["africa", “bank”]},
 ”filter”: {
 “totalamt" {["000000050000000",
 "000000050000000"]}
 }
 } You can filter on as many ranges as you want.
  17. Other stuff pageSize
 hits per page offset
 used for paging

    teaser
 creates a small text preview containing query terms in the document weight
 used to create relevancy models
  18. Replication in Norch Make snapshot of mother index
 curl http://localhost:3030/snapshot

    -o snapshot.gz Empty target index (if necessary)
 curl http://localhost:3030/empty Replicate into new or emptied index
 curl -X POST http://localhost:3030/replicate
 --data-binary @snapshot.gz
 -H "Content-Type: application/gzip"
  19. Replication in
 search-index.js Make snapshot of mother index 
 Empty

    target index
 
 Replicate into new or emptied index
  20. Strengths Super-portable Easy to install and use Performant for simple

    queries Runs on low-end hardware (server and browser) Replication Weaknesses Strictly a small data Limited feature set Relatively small community compared to other search technologies (Elastic, Solr)
  21. Richer query syntax Better docs, examples and tutorials Service for

    generating indexes Better compression Mad science Performance, bugfixes, etc