Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Updates from Lucene Land

Updates from Lucene Land

Building Elasticsearch's powerful search and analytics functionality has required an equal amount of investment into its underlying technology, Apache Lucene.

In this talk, Robert Muir will tell you everything you ever wanted to know about the Lucene project, including the many contributions to the project made by Elasticsearch's developer team.

He'll cover:
New features in Lucene 5
Recent improvements that have hardened both Lucene and Elasticsearch
Future direction of Lucene development

Attendees will leave this presentation with a solid understanding of the interworkings of Elasticsearch and Lucene, and how future developments in Lucene will contribute to enhanced functionality in Elasticsearch 2.0 and beyond.

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 10, 2015
Tweet

More Decks by Elastic Co

Other Decks in Programming

Transcript

  1. Updates from Lucene land Robert Muir

  2. { } BEER-WARE r42 OUTLINE 2 • Introduction • Query

    Execution • Bitset Compression • Index Compression • Indexing Performance • Index Safety • Other Changes ! !
  3. { } BEER-WARE r42 3 Introduction to Lucene

  4. { } BEER-WARE r42 INTRODUCTION 4 LUCENE • Search engine

    library in Java • Produced by the Apache Software Foundation • 1999-Present ! !
  5. { } BEER-WARE r42 FULL-TEXT SEARCH 5

  6. { } BEER-WARE r42 INVERTED INDEX 6

  7. { } BEER-WARE r42 RELEASE TIMELINE 7

  8. { } BEER-WARE r42 8 Query Execution

  9. { } BEER-WARE r42 TWO PHASE INTERSECTION 9 Dense Medium

    Sparse 128.24 QPS 67.53 QPS 19.98 QPS 93.39 QPS 58.27 QPS 21.19 QPS WIKIPEDIA ENGLISH: FILTERED PHRASE QUERY Before After
  10. { } BEER-WARE r42 FASTER PROHIBITED CLAUSES 10 Dense Medium

    Sparse 922.24 QPS 183.47 QPS 50.06 QPS 62.74 QPS 57.14 QPS 33.71 QPS WIKIPEDIA ENGLISH: MUST_NOT Before After
  11. { } BEER-WARE r42 OPTIMIZE QUERY FOR FILTER CLAUSES 11

    Dense Medium Sparse 1,144.49 QPS 205.45 QPS 49.25 QPS 959.96 QPS 185.01 QPS 49.49 QPS WIKIPEDIA ENGLISH: MUST_NOT Before After
  12. { } BEER-WARE r42 OPTIMIZE QUERY FOR FILTER CLAUSES 12

    Dense Medium Sparse 59.38 QPS 26.37 QPS 5.14 QPS 45.19 QPS 21.3 QPS 5.14 QPS WIKIPEDIA ENGLISH: FILTERED SLOPPY PHRASE Before After
  13. { } BEER-WARE r42 QUERY EXECUTION 13 • Merge Query

    and Filter • Automatic caching • Cost-based execution • Two-phase intersection ! ! ! ! !
  14. { } BEER-WARE r42 14 Bitset Compression

  15. { } BEER-WARE r42 COMPRESSED BITSETS 15 Fixed Sparse Roaring

    0% 20% 40% 60% 80% 100% 2% 12% 100% MEMORY USAGE (0.1%)
  16. { } BEER-WARE r42 COMPRESSED BITSETS 16 Fixed Sparse Roaring

    0x 1x 2x 3x 4x 3.9x 2x 1x ITERATION SPEED (0.1%)
  17. { } BEER-WARE r42 COMPRESSED BITSETS 17 • Cached Filters

    • Range, Prefix, Wildcard query execution • Nested Documents (join) • Scoring Factors (norms) ! !
  18. { } BEER-WARE r42 18 Index Compression

  19. { } BEER-WARE r42 INDEX COMPRESSION 19 RAW DATA BEST

    SPEED BEST SIZE 0 MB 4,000 MB 8,000 MB 12,000 MB 16,000 MB 2,322 MB 4,691 MB 14,641 MB FIELDS STORAGE (_source) APACHE LOGS
  20. { } BEER-WARE r42 INDEX COMPRESSION 20 Lucene 4.8 Lucene

    4.10 Lucene 5 0 MB 40 MB 80 MB 120 MB 160 MB 41 MB 89 MB 160 MB 28 MB 42 MB 160 MB RAM USAGE (all lucene features) geonames.org Clean Dirty
  21. { } BEER-WARE r42 INDEX COMPRESSION 21 • “best space”

    option (archive/cold storage) • optimized merge • sparse normalization factors, docvalues • patched compression for outliers, exceptions !
  22. { } BEER-WARE r42 22 Indexing Performance

  23. { } BEER-WARE r42 INDEXING PERFORMANCE 23 Lucene 4.10 Lucene

    5 18.7 12.1 K DOCS/SEC (Apache logs)
  24. { } BEER-WARE r42 INDEXING PERFORMANCE 24 • Adaptive merge

    throttling • Reduced cpu usage (stored fields data) • Reduced memory usage • SSD auto-detection in merge scheduler ! ! !
  25. { } BEER-WARE r42 25 Index Safety

  26. { } BEER-WARE r42 INDEX SAFETY 26 • segment and

    commit identifiers • atomic commits • verify integrity at merge • test filesystems • faster checkindex • improved error messages ! ! !
  27. { } BEER-WARE r42 27 Other Changes

  28. { } BEER-WARE r42 OTHER CHANGES 28 • Verbose memory

    reporting • Improved parallel execution • Result diversification support • Faster index sorting • … ! !
  29. { } Thank you twitter.com/rcmuir

  30. { } /* * --------------------------------------------------------------- * "THE BEER-WARE LICENSE" (Revision

    42): * <rmuir@apache.org> wrote this file. As long as you retain this notice you * can do whatever you want with this stuff. If we meet some day, and you * think this stuff is worth it, you can buy me a beer in return. Robert Muir * --------------------------------------------------------------- */ BEER-WARE r42