Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Apache Lucene

Introduction to Apache Lucene

社内の Lucene 勉強会で使った資料です。Apache Lucene https://lucene.apache.org/ の概要について発表しました。
サンプルコード: https://github.com/takuyaa/hello-lucene

Takuya Asano

April 05, 2021
Tweet

More Decks by Takuya Asano

Other Decks in Programming

Transcript

  1. 5BLVZB"TBOP [email protected]

    *OUSPEVDUJPOUP

    "QBDIF-VDFOF

    -VDFOF3FBEJOH

    View Slide

  2. w +BWBͰॻ͔ΕͨશจݕࡧΤϯδϯϥΠϒϥϦ
    w ೥ݱࡏɺશจݕࡧʹ͓͚ΔσϑΝΫτελϯμʔυ
    w શจݕࡧʹඞཁͳػೳ͕΄΅࣮૷͞Ε͍ͯΔ
    w &MBTUJDTFBSDI΍4PMSͰݕࡧͷίΞ෦෼ͱͯ͠࢖ΘΕ͍ͯΔ
    w &MBTUJDTFBSDI4PMS͸-VDFOFͷ3&45"1*ΛϢʔβʹఏڙ
    w ଞʹ΋-VDFOFʹ͸ͳ͍ػೳʢ෼ࢄݕࡧͳͲʣΛ࣮૷͍ͯ͠Δ
    ͕͜͜Ͱ͸ׂѪ
    8IBUJT"QBDIF-VDFOF
    IUUQTHJUIVCDPNBQBDIFMVDFOF

    View Slide

  3. -VDFOF$PSF$PNQPOFOUT
    $PNQPOFOU 3FMBUFE$MBTTFT
    2VFSZQBSTJOH QueryParser
    "OBMZTJT Analyzer
    4FBSDI IndexSearcher
    2VFSJFT Query
    *OEFYJOH IndexWriter
    *OEFYBDDFTT IndexReader
    4UPSBHFBDDFTT Directory, DirectoryReader
    %PDVNFOUSFQSFTFOUBUJPO Document, Field
    $PEFDT JOEFY
    fi
    MFGPSNBUT
    Codec, PostingsFormat, DocValuesFormat, StoredFieldsFormat,
    FieldInfosFormat, SegmentInfoFormat, LiveDocsFormat, PointsFormat, …
    "MHPSJUINT%BUBTUSVDUVSFT LZ4, LevenshteinAutomata, FST, BKDReader, BKDWriter, PackedInts,
    FixedBitSet, PriorityQueue, …

    View Slide

  4. 'VMMUFYU4FBSDI#BTJDTPO-VDFOF
    *OWFSUFEJOEFY5IFDPSFEBUBTUSVDUVSFPGTFBSDIFOHJOFT
    w શจݕࡧͰ͸ɺݕࡧର৅ͷίϯςϯπ͸จॻ EPDVNFOU
    ͱͯ͠
    ϞσϧԽ͞ΕΔ
    w &$ݕࡧͰ͋Ε͹঎඼
    w 8FCݕࡧͰ͋Ε͹8FCϖʔδ
    w -VDFOFͰ͸DocumentΫϥεʹରԠ
    w -VDFOF͸సஔΠϯσοΫε JOWFSUFEJOEFY
    ʹΑΓ

    จॻΛΠϯσοΫεԽ͢Δ
    w సஔΠϯσοΫεํࣜͷશจݕࡧ͸ɺେن໛ͳจॻू߹͔Βͷ
    ݕࡧʹ޲͍͍ͯΔ
    w -VDFOFͰ͸ɺจॻ͸EPDJEͰࣝผ͞ΕΔ
    5FSN 1PTUJOHT-JTU
    BDUJPO
    DPPLCPPL
    JO
    MVDFOF
    "OFYBNQMFPGJOWFSUFEJOEFYTUSVDUVSFGPSEPDVNFOUT
    l-VDFOFJO"DUJPOzBOEl-VDFOF$PPLCPPLz"
    TFUPGBMMUFSNTJTPGUFOSFGFSSFEUPBTBlUFSN
    EJDUJPOBSZzPSTJNQMZlEJDUJPOBSZz
    SFG*OGPSNBUJPO3FUSJFWBMBOE8FC4FBSDI·ͱΊ
    సஔΠϯσοΫεTUPQUIFXPSME

    IUUQTTUPQUIFXPSMEIBUFOBCMPHDPNFOUSZDTJOGPSNBUJPOSFUSJFWBM

    View Slide

  5. -VDFOFT*OEFY
    $SFBUJOH4FBSDIJOHBOJOEFY
    w ΠϯσοΫεͷ࡞੒
    w ΠϯσοΫε͸ϑΝΠϧγεςϜʹอଘ͞ΕΔ
    w ΠϯσοΫε͸ෳ਺ͷϑΝΠϧ͔Βߏ੒͞ΕΔ
    w λʔϜࣙॻɺϙεςΟϯάϦετͳͲʢޙड़ʣ
    w ΠϯσοΫεΛݕࡧ
    w ϑΝΠϧͱͯ͠อଘ͞ΕͨΠϯσοΫε͔Βݕࡧ
    w ΫΤϦʹϚον͢Δจॻू߹Λฦ͢
    w จॻ*%ɺจॻͷ಺༰ͳͲؚ͕·ΕΔ
    -VDFOF*OEFY
    4&31
    %PDVNFOUT

    View Slide

  6. *OEFYJOH1SPDFTT0WFSWJFX
    $PSFDPNQPOFOUTGPSJOEFYJOH
    $POTUSVDUBEPDVNFOUPCKFDU

    จॻΦϒδΣΫτΛߏங
    "OBMZ[FUFYUDPOUFOUT QSFQSPDFTTJOH


    ςΩετղੳʢલॲཧʣ
    #VJMEB-VDFOFJOEFY

    ΠϯσοΫεߏங
    8SJUFBOJOEFYUPBTUPSBHF

    ΠϯσοΫεॻ͖ࠐΈ
    Analyzer
    Directory
    IndexWriter
    Document

    View Slide

  7. %PDVNFOU3FQSFTFOUBUJPO
    w จॻ͸DocumentΫϥεͷΦϒδΣΫτͱͯ͠දݱ͞ΕΔ
    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));
    .PEFMPCKFDUUPCFJOEFYFE

    View Slide

  8. %PDVNFOU3FQSFTFOUBUJPO
    w จॻ͸DocumentΫϥεͷΦϒδΣΫτͱͯ͠දݱ͞ΕΔ
    w Document͸ෳ਺ͷField͔Βߏ੒͞ΕΔ
    w ,FZWBMVFNBQͷΑ͏ͳσʔλߏ଄
    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));
    .PEFMPCKFDUUPCFJOEFYFE

    View Slide

  9. %PDVNFOU3FQSFTFOUBUJPO
    w จॻ͸DocumentΫϥεͷΦϒδΣΫτͱͯ͠දݱ͞ΕΔ
    w Document͸ෳ਺ͷField͔Βߏ੒͞ΕΔ
    w ,FZWBMVFNBQͷΑ͏ͳσʔλߏ଄
    w Field͸ϑΟʔϧυ໊ͱ಺༰ɺϑΟʔϧυͷλΠϓΛ΋ͭ
    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));
    'JFMEOBNF 'JFMEDPOUFOU 'JFMEUZQFzTUPSFEPSOPUTUPSFE
    4UPSFE
    fi
    FMETBSFTUPSFEJOBOJOEFY5IJTBMMPXTZPVUP

    SFUSJFWFUIF
    fi
    FMEDPOUFOUTBUTFBSDIUJNF
    .PEFMPCKFDUUPCFJOEFYFE

    View Slide

  10. "OBMZ[FS
    5FYUQSFQSPDFTTPST
    w AnalyzerTokenizerFilters
    w Tokenizer
    w ςΩετจࣈྻΛ5PLFOͷྻʹ෼ׂ͢Δ
    w Filter
    w 5PLFOΛҰఆͷϧʔϧͰআڈ͢ΔʢFHStopFilterʣ
    w 5PLFOͷจࣈྻΛҰఆͷϧʔϧͰஔ׵͢ΔʢFHLowerCaseFilterʣ
    w AnalyzerͷྫStandardAnalyzer
    w StandardAnalyzerStandardTokenizer + StopFilter
    LowerCaseFilter
    w 6OJDPEF5FYU4FHNFOUBUJPO ϕʔεͷ Tokenizer
    "Lucene in Action"
    "Lucene", "in", "Action"
    "Lucene", "Action"
    "lucene", "action"
    StandardTokenizer
    StopFilter
    LowerCaseFilter
    *GUIFStopFilterIBTlJOzBTBTUPQXPSE

    View Slide

  11. #BTJD*OEFYJOH"1*
    "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY
    // Create a directory for storing Lucene index


    Path indexDirPath = Files.createDirectory(Path.of("index"));


    Directory directory = FSDirectory.open(indexDirPath);


    // Set up IndexWriter


    Analyzer analyzer = new StandardAnalyzer();


    IndexWriterConfig config = new IndexWriterConfig(analyzer);


    IndexWriter indexWriter = new IndexWriter(directory, config);


    // Index a document: "Lucene in Action"


    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    indexWriter.addDocument(doc1);


    // Index a document: "Lucene Cookbook"


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));


    indexWriter.addDocument(doc2);


    // Write index to the directory


    indexWriter.close();

    View Slide

  12. #BTJD*OEFYJOH"1*
    "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY
    w FSDirectory
    w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ
    w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ
    w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ
    // Create a directory for storing Lucene index


    Path indexDirPath = Files.createDirectory(Path.of("index"));


    Directory directory = FSDirectory.open(indexDirPath);


    // Set up IndexWriter


    Analyzer analyzer = new StandardAnalyzer();


    IndexWriterConfig config = new IndexWriterConfig(analyzer);


    IndexWriter indexWriter = new IndexWriter(directory, config);


    // Index a document: "Lucene in Action"


    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    indexWriter.addDocument(doc1);


    // Index a document: "Lucene Cookbook"


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));


    indexWriter.addDocument(doc2);


    // Write index to the directory


    indexWriter.close();

    View Slide

  13. #BTJD*OEFYJOH"1*
    "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY
    w FSDirectory
    w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ
    w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ
    w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ
    w IndexWriter
    w ΠϯσοΫεͷॻ͖ࠐΈΛΦʔέετϨʔγϣϯ͢ΔΫϥε
    // Create a directory for storing Lucene index


    Path indexDirPath = Files.createDirectory(Path.of("index"));


    Directory directory = FSDirectory.open(indexDirPath);


    // Set up IndexWriter


    Analyzer analyzer = new StandardAnalyzer();


    IndexWriterConfig config = new IndexWriterConfig(analyzer);


    IndexWriter indexWriter = new IndexWriter(directory, config);


    // Index a document: "Lucene in Action"


    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    indexWriter.addDocument(doc1);


    // Index a document: "Lucene Cookbook"


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));


    indexWriter.addDocument(doc2);


    // Write index to the directory


    indexWriter.close();

    View Slide

  14. #BTJD*OEFYJOH"1*
    "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY
    w FSDirectory
    w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ
    w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ
    w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ
    w IndexWriter
    w ΠϯσοΫεͷॻ͖ࠐΈΛΦʔέετϨʔγϣϯ͢ΔΫϥε
    w addDocument()ϝιουͰDocumentΛ௥Ճ
    // Create a directory for storing Lucene index


    Path indexDirPath = Files.createDirectory(Path.of("index"));


    Directory directory = FSDirectory.open(indexDirPath);


    // Set up IndexWriter


    Analyzer analyzer = new StandardAnalyzer();


    IndexWriterConfig config = new IndexWriterConfig(analyzer);


    IndexWriter indexWriter = new IndexWriter(directory, config);


    // Index a document: "Lucene in Action"


    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    indexWriter.addDocument(doc1);


    // Index a document: "Lucene Cookbook"


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));


    indexWriter.addDocument(doc2);


    // Write index to the directory


    indexWriter.close();

    View Slide

  15. #BTJD*OEFYJOH"1*
    "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY
    w FSDirectory
    w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ
    w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ
    w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ
    w IndexWriter
    w ΠϯσοΫεͷॻ͖ࠐΈΛΦʔέετϨʔγϣϯ͢ΔΫϥε
    w addDocument()ϝιουͰDocumentΛ௥Ճ
    w close()͢ΔͱʢσϑΥϧτͷઃఆͰ͸ʣDirectoryʹ
    ॻ͖ࠐΉ
    // Create a directory for storing Lucene index


    Path indexDirPath = Files.createDirectory(Path.of("index"));


    Directory directory = FSDirectory.open(indexDirPath);


    // Set up IndexWriter


    Analyzer analyzer = new StandardAnalyzer();


    IndexWriterConfig config = new IndexWriterConfig(analyzer);


    IndexWriter indexWriter = new IndexWriter(directory, config);


    // Index a document: "Lucene in Action"


    Document doc1 = new Document();


    doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED));


    indexWriter.addDocument(doc1);


    // Index a document: "Lucene Cookbook"


    Document doc2 = new Document();


    doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED));


    indexWriter.addDocument(doc2);


    // Write index to the directory


    indexWriter.close();

    View Slide

  16. #VGGFSJOHBOE'MVTIJOHBU*OEFYJOH
    8IFOIndexWriterXSJUFTEPDVNFOUT
    w IndexWriterʹaddDocument()ͯ͠΋ɺ͙͢ʹετϨʔδʹ

    ॻ͖ࠐ·ΕΔΘ͚Ͱ͸ͳ͍
    w 3".ʢ+7.ώʔϓʣ্ͷόοϑΝྖҬʹॻ͖ࠐ·ΕΔ
    w IndexWriter͸ద౰ͳλΠϛϯάͰDirectoryʹϑϥογϡ͢Δ
    w όοϑΝʹอଘ͞Εͨ3".αΠζ΍จॻ਺ͳͲ͕τϦΨʔ
    w ໌ࣔతʹflush()ΛݺͿ͜ͱͰϑϥογϡͰ͖Δ

    ʢͨͩ͠DPNNJUॲཧ͸͠ͳ͍ʣ
    w commit()ΛݺͿͱϑϥογϡ͔ͯ͠ΒDPNNJUॲཧΛߦ͏
    addDocument()
    IndexWriter
    Directory
    #V
    ff
    FSJO3".
    'MVTI

    View Slide

  17. _0.fdm


    _0.fdt


    _0.fdx


    _0.fnm


    _0.nvd


    _0.nvm


    _0.si


    _0_Lucene84_0.doc


    _0_Lucene84_0.pos


    _0_Lucene84_0.tim


    _0_Lucene84_0.tip


    _0_Lucene84_0.tmd


    segments_1


    write.lock


    *OEFYBGUFSTUDPNNJU
    4FHNFOU
    4FHNFOUT'JMF
    -PDL'JMF
    [email protected]
    4FHNFOU
    w ΠϯσοΫε͸ෳ਺ͷηάϝϯτ TFHNFOU
    ͔ΒͳΔ
    w ͢΂ͯಉ͡σΟϨΫτϦʹอଘ͞ΕΔ
    w ηάϝϯτ͸αϒΠϯσοΫε
    w ୯ମͰ΄΅-VDFOFΠϯσοΫεͱͯ͠ػೳ͢Δ
    w ηάϝϯτ͸ෳ਺ͷϑΝΠϧ͔ΒͳΔ
    w ϑΝΠϧ໊͸_gen.extPS_gen_Lucene84_0.extͷܗࣜ
    w &H_0.fnm _0_Lucene84_0.pos ʜ
    w genηάϝϯτͷੈ୅ FH

    w extϑΥʔϚοτ͝ͱͷ֦ுࢠ FHGON QPT

    w IndexWriter͕
    fl
    VTIͨ͠ͱ͖ʹηάϝϯτ͕ͭ࡞ΒΕΔ
    w DPNNJU͞Εͨͱ͖ʹॳΊͯsegments_N͔Βࢀর͞ΕΔ
    w N ʜ
    *OEFY4FHNFOUT
    -VDFOFJOEFY
    fi
    MFT

    View Slide

  18. *OEFY4FHNFOUT
    -VDFOFJOEFY
    fi
    MFT
    w ΠϯσοΫε͸ෳ਺ͷηάϝϯτ TFHNFOU
    ͔ΒͳΔ
    w ͢΂ͯಉ͡σΟϨΫτϦʹอଘ͞ΕΔ
    w ηάϝϯτ͸αϒΠϯσοΫε
    w ୯ମͰ΄΅-VDFOFΠϯσοΫεͱͯ͠ػೳ͢Δ
    w ηάϝϯτ͸ෳ਺ͷϑΝΠϧ͔ΒͳΔ
    w ϑΝΠϧ໊͸_gen.extPS_gen_Lucene84_0.extͷܗࣜ
    w &H_0.fnm _0_Lucene84_0.pos ʜ
    w genηάϝϯτͷੈ୅ FH

    w extϑΥʔϚοτ͝ͱͷ֦ுࢠ FHGON QPT

    w IndexWriter͕
    fl
    VTIͨ͠ͱ͖ʹηάϝϯτ͕ͭ࡞ΒΕΔ
    w DPNNJU͞Εͨͱ͖ʹॳΊͯsegments_N͔Βࢀর͞ΕΔ
    w N ʜ
    _0.fdm


    _0.fdt


    _0.fdx


    _0.fnm


    _0.nvd


    _0.nvm


    _0.si


    _0_Lucene84_0.doc


    _0_Lucene84_0.pos


    _0_Lucene84_0.tim


    _0_Lucene84_0.tip


    _0_Lucene84_0.tmd


    segments_1


    write.lock


    _0.fdm


    _0.fdt


    _0.fdx


    _0.fnm


    _0.nvd


    _0.nvm


    _0.si


    _0_Lucene84_0.doc


    _0_Lucene84_0.pos


    _0_Lucene84_0.tim


    _0_Lucene84_0.tip


    _0_Lucene84_0.tmd


    _1.fdm


    _1.fdt


    _1.fdx


    _1.fnm


    _1.nvd


    _1.nvm


    _1.si


    _1_Lucene84_0.doc


    _1_Lucene84_0.pos


    _1_Lucene84_0.tim


    _1_Lucene84_0.tip


    _1_Lucene84_0.tmd


    segments_2


    write.lock


    *OEFYBGUFSTUDPNNJU *OEFYBGUFSOEDPNNJU
    4FHNFOU 4FHNFOU
    4FHNFOU
    4FHNFOUT'JMF
    -PDL'JMF
    4FHNFOUT'JMF
    -PDL'JMF
    [email protected]
    4FHNFOU
    4FHNFOU
    [email protected]
    4FHNFOU

    View Slide

  19. *OEFY$PNNJUT$POTJTUFODZ
    )PXIndexWriter#commit()XPSLT
    όοϑΝʹ͋ΔจॻΛ͢΂ͯ
    fl
    VTI͠ɺͦͷ͋ͱDirectoryΛTZOD͢Δ
    w FSDirectoryͷͱ͖ɺϑΝΠϧγεςϜʹॻ͖ࠐ·Εͨ͜ͱΛอূ
    [email protected] 4FHNFOU
    [email protected] 4FHNFOU
    4FHNFOU
    'MVTI4FHNFOU

    4ZOD
    fi
    MFTZTUFN
    *OEFY3FBEFS
    *OEFY3FBEFS

    View Slide

  20. *OEFY$PNNJUT$POTJTUFODZ
    )PXIndexWriter#commit()XPSLT
    όοϑΝʹ͋ΔจॻΛ͢΂ͯ
    fl
    VTI͠ɺͦͷ͋ͱDirectoryΛTZOD͢Δ
    w FSDirectoryͷͱ͖ɺϑΝΠϧγεςϜʹॻ͖ࠐ·Εͨ͜ͱΛอূ
    segments_NϑΝΠϧΛॻ͖ࠐΜͰTZOD͢Δ
    w IndexReader͔Βݟ͑ΔΑ͏ʹͳΔ

    ʢIndexReader͸DPNNJU͞ΕͨηάϝϯτͷΈΛಡΈࠐΉʣ
    [email protected] 4FHNFOU
    [email protected] 4FHNFOU
    4FHNFOU
    [email protected] 4FHNFOU
    4FHNFOU
    'MVTI4FHNFOU

    4ZOD
    fi
    MFTZTUFN
    [email protected]

    4ZOD
    fi
    MFTZTUFN
    [email protected]
    *OEFY3FBEFS
    *OEFY3FBEFS
    *OEFY3FBEFS

    View Slide

  21. *OEFY$PNNJUT$POTJTUFODZ
    )PXIndexWriter#commit()XPSLT
    όοϑΝʹ͋ΔจॻΛ͢΂ͯ
    fl
    VTI͠ɺͦͷ͋ͱDirectoryΛTZOD͢Δ
    w FSDirectoryͷͱ͖ɺϑΝΠϧγεςϜʹॻ͖ࠐ·Εͨ͜ͱΛอূ
    segments_NϑΝΠϧΛॻ͖ࠐΜͰTZOD͢Δ
    w IndexReader͔Βݟ͑ΔΑ͏ʹͳΔ

    ʢIndexReader͸DPNNJU͞ΕͨηάϝϯτͷΈΛಡΈࠐΉʣ
    ݹ͍DPNNJUʢݹ͍segments_NʣΛ࡟আ
    [email protected] 4FHNFOU
    [email protected] 4FHNFOU
    4FHNFOU
    [email protected] 4FHNFOU
    4FHNFOU
    'MVTI4FHNFOU

    4ZOD
    fi
    MFTZTUFN
    [email protected]

    4ZOD
    fi
    MFTZTUFN
    %[email protected]
    [email protected]
    4FHNFOU
    4FHNFOU
    [email protected]
    *OEFY3FBEFS
    *OEFY3FBEFS
    *OEFY3FBEFS
    *OEFY3FBEFS

    View Slide

  22. *OEFY4FHNFOU.FSHJOH
    IndexWriterMergePolicy
    w IndexWriter͸ηάϝϯτ܈ΛMergePolicyʹैͬͯɺΑΓେ͖͍ηάϝϯτ
    ΁ͱϚʔδ͢Δ
    w TieredMergePolicy EFGBVMU
    LogMergePolicy FUD
    w MergePolicy͸ɺͲͷηάϝϯτΛϚʔδ͢Δ͔Λܾఆ͢Δ
    w খ͍͞ηάϝϯτ͕େྔʹ͋Δͱݕࡧ͕஗͘ͳΔ
    w Ϛʔδͯ͠େ͖͍ηάϝϯτʹ·ͱΊΔ͜ͱͰύϑΥʔϚϯε͕޲্͢Δ
    w MergePolicy ʹΑͬͯΠϯσΩγϯάͷεϧʔϓοτ΍શମͷෛՙͳͲΛνϡʔ
    χϯάͰ͖Δ
    4FHNFOU
    4FHNFOU
    4FHNFOU 4FHNFOU
    4FHNFOU
    4FHNFOU
    SFG$IBOHJOH#JUT7JTVBMJ[JOH-VDFOFTTFHNFOUNFSHFT

    IUUQCMPHNJLFNDDBOEMFTTDPNWJTVBMJ[JOHMVDFOFTTFHNFOUNFSHFTIUNM
    merge()

    View Slide

  23. *OEFY'JMF'PSNBUT
    'PSNBU/BNF &YUFOTJPO 3FMBUFE$MBTT %FTDSJQUJPO
    4FHNFOU'JMF segments_N SegmentInfos ίϛοτϙΠϯτΛอ࣋
    -PDL'JMF write.lock N/A εϨουηʔϑͷͨΊͷϩοΫϑΝΠϧ
    4FHNFOU*OGP .si SegmentInfoFormat ηάϝϯτͷϝλσʔλ
    'JFMET .fnm FieldInfosFormat ϑΟʔϧυ৘ใʢ໊લͳͲʣΛอ࣋
    'JFME*OEFY .fdx StoredFieldsFormat ϑΟʔϧυσʔλ΁ͷϙΠϯλ
    'JFME%BUB .fdt StoredFieldsFormat จॻͷϑΟʔϧυσʔλ
    5FSN%JDUJPOBSZ .tim PostingsFormat λʔϜࣙॻʢλʔϜ৘ใΛอ࣋ʣ
    5FSN%JDUJPOBSZ.FUBEBUB .tmd PostingsFormat λʔϜࣙॻͷϝλσʔλ
    5FSN*OEFY .tip PostingsFormat λʔϜࣙॻ΁ͷϙΠϯλ
    'SFRVFODJFT .doc PostingsFormat సஔΠϯσοΫεͱεΩοϓϦετ
    1PTJUJPOT .pos PostingsFormat จॻ಺ͷλʔϜͷग़ݱҐஔΛอ࣋
    1BZMPBET .pay PostingsFormat ग़ݱҐஔ͝ͱͷϝλσʔλʢจࣈΦϑηοτͳͲʣ
    -JWF%PDVNFOUT .liv Lucene50LiveDocsFormat ࡟আ͞Ε͍ͯͳ͍ MJWF
    จॻͷ৘ใ
    1PJOU7BMVFT .dii, .dim PointsFormat ਺஋σʔλΛอ࣋

    View Slide

  24. #SPXTF-VDFOF*OEFYVTJOH-VLF
    -VLF5IF(6*UPPMCPYGPSJOUSPTQFDUJOH-VDFOFJOEFY
    %PXOMPBE"QBDIF-VDFOFGSPN

    IUUQTMVDFOFBQBDIFPSHDPSFEPXOMPBETIUNM
    &YUSBDU.tgz
    fi
    MF
    &YFDVUF
    w -JOVYNBD04lucene-8.8.1/luke/luke.s
    h

    w 8JOEPXTlucene-8.8.1/luke/luke.bat
    SFG-VDFOF௒ೖ໳XJUI-VLFCZNPDPCFUB

    IUUQTNPDPCFUBNFEJVNDPNMVDFOF&#&"&XJUIMVLFBDDCDB

    View Slide

  25. #SPXTF-VDFOF*OEFYVTJOH-VLF
    -VLF5IF(6*UPPMCPYGPSJOUSPTQFDUJOH-VDFOFJOEFY
    *OQVUJOEFYEJSFDUPSZQBUI

    View Slide

  26. #SPXTF-VDFOF*OEFYVTJOH-VLF
    -VLF5IF(6*UPPMCPYGPSJOUSPTQFDUJOH-VDFOFJOEFY
    *OEFYTUBUJTUJDTBOENFUBEBUB
    4FMFDU
    fi
    FMEOBNF 5FSNTUBUJTUJDT

    View Slide

  27. 2VFSZ1SPDFTTJOH0WFSWJFX
    6TJOHIndexSearcherXJUIQuery
    IndexSearcher
    Directory
    IndexReader
    Query TopDocs
    1BTTB2VFSZPCKFDU

    ΫΤϦΦϒδΣΫτΛ౉͢
    3FUVSOUPQLSFTVMUT

    5PQLͷ݁Ռ͕ฦΔ
    TermQuery PhraseQuery FUD
    3FBE
    fi
    MFT
    "DUVBMTUPSBHFBDDFTT
    3FBE-VDFOFJOEFY

    View Slide

  28. #BTJD4FBSDI"1*
    "DPEFFYBNQMFUPTFBSDIB-VDFOFJOEFY
    // Open a directory which stores index


    Directory directory = FSDirectory.open(Path.of("index"));


    // Create an IndexSearcher


    IndexReader indexReader = DirectoryReader.open(directory);


    IndexSearcher indexSearcher = new IndexSearcher(indexReader);


    // Create Query object that searches for "lucene" on "title" field


    Query query = new TermQuery(new Term("title", "lucene"));


    TopDocs results = indexSearcher.search(query, 10);


    ScoreDoc[] hits = results.scoreDocs;


    // Iterate through the results


    for (ScoreDoc hit : hits) {


    Document hitDoc = indexSearcher.doc(hit.doc);


    System.out.println("Hit: " + hitDoc.get("title"));


    }


    // Post-processing


    indexReader.close();


    directory.close();


    w IndexSearcher
    w IndexReaderΛ௨ͯ͠ΠϯσοΫεʹΞΫηε

    View Slide

  29. #BTJD4FBSDI"1*
    "DPEFFYBNQMFUPTFBSDIB-VDFOFJOEFY
    // Open a directory which stores index


    Directory directory = FSDirectory.open(Path.of("index"));


    // Create an IndexSearcher


    IndexReader indexReader = DirectoryReader.open(directory);


    IndexSearcher indexSearcher = new IndexSearcher(indexReader);


    // Create Query object that searches for "lucene" on "title" field


    Query query = new TermQuery(new Term("title", "lucene"));


    TopDocs results = indexSearcher.search(query, 10);


    ScoreDoc[] hits = results.scoreDocs;


    // Iterate through the results


    for (ScoreDoc hit : hits) {


    Document hitDoc = indexSearcher.doc(hit.doc);


    System.out.println("Hit: " + hitDoc.get("title"));


    }


    // Post-processing


    indexReader.close();


    directory.close();


    w IndexSearcher
    w IndexReaderΛ௨ͯ͠ΠϯσοΫεʹΞΫηε
    w TermQuery
    w TermʢసஔΠϯσοΫεʹ͓͚ΔΩʔʣͰݕࡧ
    w BOBMZ[FޙͷςΩετΛ౉͢
    w -VDFOFͰݕࡧͯ͠΋ώοτͤͣɺMVDFOFͳΒώοτ

    View Slide

  30. #BTJD4FBSDI"1*
    "DPEFFYBNQMFUPTFBSDIB-VDFOFJOEFY
    w IndexSearcher
    w IndexReaderΛ௨ͯ͠ΠϯσοΫεʹΞΫηε
    w TermQuery
    w TermʢసஔΠϯσοΫεʹ͓͚ΔΩʔʣͰݕࡧ
    w BOBMZ[FޙͷςΩετΛ౉͢
    w -VDFOFͰݕࡧͯ͠΋ώοτͤͣɺMVDFOFͳΒώοτ
    w IndexSearcher#search()ʹ౉ͯ͠ݕࡧ
    w TopDocsʢ5PQLͷ݁Ռʣ͕ಘΒΕΔ
    // Open a directory which stores index


    Directory directory = FSDirectory.open(Path.of("index"));


    // Create an IndexSearcher


    IndexReader indexReader = DirectoryReader.open(directory);


    IndexSearcher indexSearcher = new IndexSearcher(indexReader);


    // Create Query object that searches for "lucene" on "title" field


    Query query = new TermQuery(new Term("title", "lucene"));


    TopDocs results = indexSearcher.search(query, 10);


    ScoreDoc[] hits = results.scoreDocs;


    // Iterate through the results


    for (ScoreDoc hit : hits) {


    Document hitDoc = indexSearcher.doc(hit.doc);


    System.out.println("Hit: " + hitDoc.get("title"));


    }


    // Post-processing


    indexReader.close();


    directory.close();


    View Slide

  31. -VDFOF2VFSJFT
    $MBTT %FTDSJQUJPO
    Query ͢΂ͯͷ2VFSZͷBCTUSBDUͳجఈΫϥε
    TermQuery 5FSNʹϚον͢Δݕࡧ
    PrefixQuery ઀಄ࣙ QSF
    fi
    Y
    ʹΑΔݕࡧ
    PhraseQuery ϑϨʔζݕࡧ
    PhrasePrefixQuery ϑϨʔζͰ઀಄ࣙݕࡧ
    SpanQuery, SpanTermQuery ग़ݱҐஔͷൣғ TQBO
    Λߟྀͨ͠ݕࡧ
    TermRangeQuery λʔϜͷόΠτॱ #ZUFT3FG
    Ͱͷൣғݕࡧ
    NumericRangeQuery ਺஋Ͱͷൣғݕࡧ
    FuzzyQuery ͍͋·͍ݕࡧ
    BooleanQuery ϒʔϦΞϯݕࡧʢଞͷΫΤϦͱ૊Έ߹ΘͤΔෳ߹ΫΤϦʣ
    FilteredQuery ߜΓࠐΈݕࡧ

    View Slide

  32. 4DPSJOHCZ4JNJMBSJUJFT
    $VTUPNJ[FSBOLJOHNPEFMT
    w IndexSearcherIBTBSimilarity
    w *UIBTBBM25Similarity CBTFEPO0LBQJ#.NPEFM
    BTBEFGBVMUTJNJMBSJUZ
    w "XIJMFBHP JUXBTBTFIDFSimilarity CBTFEPO7FDUPS4QBDF.PEFM

    w :PVDBOTFUBOPUIFSTJNJMBSJUZCZIndexSearcher#setSimilarity()
    w DFRSimilarity DFISimilarity IBSimilarity LMDirichletSimilarity
    LMJelinekMercerSimilarity BOETPPO
    SFGPSHBQBDIFMVDFOFTFBSDITJNJMBSJUJFT -VDFOF"1*


    [email protected]@DPSFPSHBQBDIFMVDFOFTFBSDITJNJMBSJUJFTQBDLBHFTVNNBSZIUNM

    View Slide

  33. 0UIFS5PQJDT
    "OENPSF
    w *OEFY'JMF#JOBSZ'PSNBU
    w %FMFUJOH%PDVNFOUT -JWF%PD

    w 0QUJNJ[JOH*OEFY
    w 8PSLJOHXJUI'JMUFST'JMUFS$BDIF
    w 'JMF4ZTUFN$BDIF ..BQ%JSFDUPSZ

    w /FBS3FBMUJNF4FBSDI /35

    View Slide

  34. 3FGFSFODF
    w 0
    ffi
    DJBMEPDVNFOU

    [email protected]@JOEFYIUNM
    w "1*+BWBEPD

    [email protected]@DPSFJOEFYIUNM
    w $IBOHFT

    [email protected]@DIBOHFT$IBOHFTIUNM
    w *TTVFT

    IUUQTJTTVFTBQBDIFPSHKJSBQSPKFDUT-6$&/&JTTVFT
    w #PPLT
    w -VDFOFJO"DUJPO 4FDPOE&EJUJPO

    IUUQTXXXNBOOJOHDPNCPPLTMVDFOFJOBDUJPOTFDPOEFEJUJPO
    w 7JEFPT
    w -VDJEXPSLT

    IUUQTXXXZPVUVCFDPNDIBOOFM6$1*[email protected]:+T"TFBSDI RVFSZ-VDFOF
    w -VDFOF4PMS3FWPMVUJPO

    IUUQTXXXZPVUVCFDPNDIBOOFM6$,V3S[&2:1QG$H$/JMH2WJEFPT

    View Slide