An Introduction to Search-index.js

6db304f42ee72f9081b090ddbf42bc56?s=47 Fergie McDowall
May 23, 2015
130

An Introduction to Search-index.js

WebRebels, Oslo, May 2015.

6db304f42ee72f9081b090ddbf42bc56?s=128

Fergie McDowall

May 23, 2015
Tweet

Transcript

  1. An Introduction to search-index.js Fergus McDowall Web Rebels Oslo 2015

  2. Deepish, but not too deep

  3. Fergus McDowall https://github.com/fergiemcdowall https://twitter.com/fergiemcdowall https://no.linkedin.com/in/fergusmcdowall

  4. search-index.js
 Node.js module on the LeBroN stack containing core search

    functionality (A bit like Lucene) Norch
 HTTP-GET wrapper around search-index.js (a bit like Solr/Elastic)
  5. LeBroN (Level*, Browserify, Node.js) http://lebron.technology/

  6. Old ’n’ Busted: Search Indexes on the server All magic

    happening on the server Clients sending search queries to server and reading returned responses
  7. New Hotness: Search indexes in the Browser Less magic happening

    on the server Clients replicating and building their own indexes, query<->result is local
  8. Make “Small Data” Search Apps

  9. The GitHubs you need https://github.com/fergiemcdowall/search-index https://github.com/fergiemcdowall/norch

  10. The Document Processing Pipeline

  11. Document processing

  12. Lodash is your friend! For example you might process: https://raw.githubusercontent.com/

    fergiemcdowall/world-bank-dataset/ master/world-bank-projects.json With https://gist.github.com/fergiemcdowall/ dceec9930327cb92467b EXPLORE!
  13. Norch.js Apps

  14. Install norch.js: ➜ norchdir npm install norch Run norch.js: ➜

    norchdir ./node_modules/norch/bin/norch Index some data into norch.js: ➜ datadir curl --form document=@node_modules/ reuters-21578-json/data/full/reuters-000.json http:// localhost:3030/indexer --form filterOn=places,topics,organisations
  15. An ng front end that talks to your norch server.

    git clone https://github.com/fergiemcdowall/norch-angular-app cd norch-angular-app curl --form document=@world-bank-projects-norchified.json http://localhost:3030/indexer --form filterOn=mjtheme,totalamt (for localhost development be aware of access-control-origin- header) norch -c http://localhost:8000
  16. Node.js Apps

  17. Main.js Run main.js: ➜ examples node main.js

  18. Replicating an entire index over the net-
 
 -MADNESS…?

  19. …or perhaps not? Runs on ALL browsers persistent (because IndexedDB)

    Indexes are small Lower server costs Network caching is magical Net getting faster User experience
  20. Browser Apps (Data from Browser)

  21. index.html: main.js: Browserify that bad boy: ➜ dir browserify main.js

    -o bundle.js …and open index.html in a browser:
  22. Browser Apps (Replicate to Browser)

  23. Index some data and then use the replicate API to

    create a snapshot: Code snippet in main.js that handles replication: Run ➜ browserifydir node indexgenerator.js 
 ➜ browserifydir gunzip backup.gz ➜ browserifydir browserify main.js -o bundle.js
  24. Browser Apps (data from PouchDB)

  25. Index an entire PouchDB instance:

  26. For source code, deeper explanation, and full demos check out

    https://github.com/fergiemcdowall/search- index/tree/master/examples
  27. Indexing Documents (getting stuff in)

  28. Document Format All fields are optional, but if id isn’t

    present, it will be autogenerated {
 id: ‘aTotallyOptionalID’,
 title: ‘A Really Cool Title’,
 tags: [‘coolness’, ‘awsomeness’]
 body: ‘Bla bla bla bla, lots of text here…’
 }
  29. Batch Format Use batches to index lots of data. Bigger

    batches are faster if your hardware can cope. [
 {
 id: ‘1’,
 title: ‘A Really Cool Title’,
 tags: [‘coolness’, ‘awsomeness’]
 body: ‘Sparkly w00p w00p, lots of text here…’
 },
 {
 id: ‘two’,
 title: ‘A Really Boring Title’,
 tags: [‘dullness’, ‘boringness’]
 body: ‘Bla bla bla bla, lots of text here…’
 }
 ]
  30. A word on numeric sorting search-index sorts alphabetically, so all

    numbers have to be stored as strings. [
 {
 id: ‘1’,
 name: ‘Ruckus’,
 price: [‘000000000050000’]
 manufacturer: ‘Honda’
 },
 {
 id: ‘2’,
 name: ‘Grom’,
 price: [‘000000000100000’]
 manufacturer: ‘Honda’
 }
 ]
  31. A word on numeric sorting Here’s a nice number stringify-padding

    function:
  32. Working with Queries (getting stuff out)

  33. Basic Queries Search all fields for “africa bank”
 {
 "query":

    {"*": ["africa", “bank"]}
 } Search title field for “africa bank”
 {
 "query": {"title": ["africa", “bank"]}
 } Search title field for “africa”, body for “bank”
 {
 "query": {"title": [“africa”], "body": [“bank”]}
 }
  34. Basic Queries Return everything in index
 {
 "query": {"*": [“*"]}


    }

  35. Basic Queries Return everything in index
 {
 "query": {"*": [“*"]}


    }

  36. Facets Simple facets
 {
 "query": {"*": ["africa", “bank”]},
 ”facets”: {“totalamt":

    {}}
 } Or define ranges of values
 {
 "query": {"*": ["africa", “bank”]},
 ”facets”: {
 "totalamt": {
 "ranges":[
 ["000000000000000","000000050000000"],
 ["000000050000001","100000000000000"
 ]
 }
 }
 } You can also sort and limit your facets
  37. Filters Filters allow you to query on facets
 {
 "query":

    {"*": ["africa", “bank”]},
 ”filter”: {
 “totalamt" {["000000000000000",
 "000000050000000"]}
 }
 } You always specify a range so to filter on one value do
 {
 "query": {"*": ["africa", “bank”]},
 ”filter”: {
 “totalamt" {["000000050000000",
 "000000050000000"]}
 }
 } You can filter on as many ranges as you want.
  38. Other stuff pageSize
 hits per page offset
 used for paging

    teaser
 creates a small text preview containing query terms in the document weight
 used to create relevancy models
  39. Results Example here

  40. Parsing Results Caveman JavaScript Angular …and anything else you can

    think of
  41. Replication

  42. Replication in Norch Make snapshot of mother index
 curl http://localhost:3030/snapshot

    -o snapshot.gz Empty target index (if necessary)
 curl http://localhost:3030/empty Replicate into new or emptied index
 curl -X POST http://localhost:3030/replicate
 --data-binary @snapshot.gz
 -H "Content-Type: application/gzip"
  43. Replication in
 search-index.js Make snapshot of mother index 
 Empty

    target index
 
 Replicate into new or emptied index
  44. Strengths and
 Weaknesses

  45. Strengths Super-portable Easy to install and use Performant for simple

    queries Runs on low-end hardware (server and browser) Replication Weaknesses Strictly a small data Limited feature set Relatively small community compared to other search technologies (Elastic, Solr)
  46. Future Direction

  47. Richer query syntax Better docs, examples and tutorials Service for

    generating indexes Better compression Mad science Performance, bugfixes, etc
  48. Hey Browsers! Allow OpenSearch to talk to search indexes in

    the webpage
  49. Get Involved Submit a pull-request Make cool stuff Anything on

    IOS or Android
  50. Thanks For Listening! @fergiemcdowall https://github.com/fergiemcdowall/norch
 https://github.com/fergiemcdowall/search-index