Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploiting Concurrency to Lucene Indexing

Simon Willnauer
May 17, 2011
120

Exploiting Concurrency to Lucene Indexing

A lightning talk about Documents Writer Per Thread the new Lucene 4 IndexWriter internals given at Lucene Revolution 2011

Simon Willnauer

May 17, 2011
Tweet

Transcript

  1. Simon Willnauer @ Lucene Revolution 2011
    PMC Member & Core Comitter Apache Lucene
    [email protected] / [email protected]
    Exploiting Concurrency to Lucene Indexing

    View Slide

  2. IndexWriter in 3.x
    2
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    Thread
    State
    DocumentsWriter
    IndexWriter
    Thread
    State
    Thread
    State
    Thread
    State
    Thread
    State
    do
    do
    do
    do
    do
    doc
    merge segments in memory
    Flush to Disk
    Merge on flush
    Multi-Threaded
    Single-Threaded
    Directory

    View Slide

  3. Influence on Indexing - Throughput
    3

    View Slide

  4. Lucene 4 with DocumentsWriterPerThread
    4
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    d
    d
    d
    d
    d
    do
    DWPT
    DocumentsWriter
    IndexWriter
    DWPT DWPT DWPT DWPT
    Flush to Disk
    Multi-Threaded
    Directory

    View Slide

  5. Indexing Throughput with DWPT
    5

    View Slide

  6. Looking at nightly benchmarks - IMPRESSIVE!
    6

    View Slide

  7. Looking at nightly benchmarks - IMPRESSIVE!
    7

    View Slide

  8. Wanna know more?
    http://blog.jteam.nl/2011/05/03/lucene-indexing-gains-
    concurrency/
    http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-
    have-i-can-use-them/
    8

    View Slide

  9. Thank you!
    9

    View Slide