Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scrutineer - Melbourne Search November 2013

Avatar for Paul Smith Paul Smith
November 21, 2013

Scrutineer - Melbourne Search November 2013

Presentation about Scrutineer, a product to verify your ElasticSearch index content.

Avatar for Paul Smith

Paul Smith

November 21, 2013
Tweet

Other Decks in Technology

Transcript

  1. whoami Paul Smith - Engineering Manager @ Aconex Worked with

    Lucene since 2005 Worked with ElasticSearch ~2 years
  2. The Why You are indexing content from, say, a DB

    It is a large index - it took a long time to build You don’t really want to rebuild it “for kicks” Days/Weeks pass – Lots of updates have happened How do you know the index is still correct?
  3. The Why Maybe you have a bug in your indexing

    code We don’t have 2 Phase Commit &^%$# Happens
  4. The Why Filesystems have fsck to check consistency What could

    we do to help validate, identify items for correction? Identifying when things happen can help work out what/how to fix the problem Checking frequently helps identify/isolate any problems
  5. Versions ElasticSearch can use an External version type Fantastic for

    Optimistic Locking pattern Version could be a sequential version, or timestamp Allows verification of content against source
  6. How Scrutineer works Run SQL to retrieve {ID,Version}, store locally

    presumed to be sorted – DB’s do that well Execute ES Scan query to retrieve {ID,Version}, store locally Sorts ES results by ID Compares both streams to find ID ‘gaps’ & incorrect versions
  7. Example DB Index {1,12345} {2,23455} {3,84757} {4,98765} {6,34556} {1,12345} {3,84757}

    {4,98765} {6,34666} 6 is there but wrong {5,38475} 2 is Missing 5 was deleted but missed
  8. By the numbers 61.5M item index – 5 Shards 1

    Replica, no compression 3 ES Nodes, 4Gb Heap, Virtualized (SAN storage) DB Query: 23ms ES Scan & Download: 6m46s, ~95k items/second Sorting: 59 seconds Comparison: ~60 seconds Total Verification Time: 8m45s
  9. Factors - ES Scan Lots of data walked by ES

    == IO may benefit from compression Only hits Primary Shards Host with lots of primaries gets hammered Results deliberately not sorted by ES
  10. Factors - Sorting Über Indexes – can’t hold all results

    in memory Uses java-merge-sort Github project external multi-way sorting Defaults to 256Mb RAM used for sorting
  11. Future Work – Incremental Verification If Versions are timestamps.. Save

    time by only checking recent changes DB must track deletes