Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Apache Lucene

Introduction to Apache Lucene

社内の Lucene 勉強会で使った資料です。Apache Lucene https://lucene.apache.org/ の概要について発表しました。
サンプルコード: https://github.com/takuyaa/hello-lucene

13f3313ae1ec1d9b3ed76ccbd746291b?s=128

Takuya Asano

April 05, 2021
Tweet

Transcript

  1. 5BLVZB"TBOP !UBLVZB@C *OUSPEVDUJPOUP 
 "QBDIF-VDFOF 
 -VDFOF3FBEJOH

  2. w +BWBͰॻ͔ΕͨશจݕࡧΤϯδϯϥΠϒϥϦ w ೥ݱࡏɺશจݕࡧʹ͓͚ΔσϑΝΫτελϯμʔυ w શจݕࡧʹඞཁͳػೳ͕΄΅࣮૷͞Ε͍ͯΔ w &MBTUJDTFBSDI΍4PMSͰݕࡧͷίΞ෦෼ͱͯ͠࢖ΘΕ͍ͯΔ w &MBTUJDTFBSDI4PMS͸-VDFOFͷ3&45"1*ΛϢʔβʹఏڙ

    w ଞʹ΋-VDFOFʹ͸ͳ͍ػೳʢ෼ࢄݕࡧͳͲʣΛ࣮૷͍ͯ͠Δ ͕͜͜Ͱ͸ׂѪ 8IBUJT"QBDIF-VDFOF IUUQTHJUIVCDPNBQBDIFMVDFOF
  3. -VDFOF$PSF$PNQPOFOUT $PNQPOFOU 3FMBUFE$MBTTFT 2VFSZQBSTJOH QueryParser "OBMZTJT Analyzer 4FBSDI IndexSearcher 2VFSJFT

    Query *OEFYJOH IndexWriter *OEFYBDDFTT IndexReader 4UPSBHFBDDFTT Directory, DirectoryReader %PDVNFOUSFQSFTFOUBUJPO Document, Field $PEFDT JOEFY fi MFGPSNBUT Codec, PostingsFormat, DocValuesFormat, StoredFieldsFormat, FieldInfosFormat, SegmentInfoFormat, LiveDocsFormat, PointsFormat, … "MHPSJUINT%BUBTUSVDUVSFT LZ4, LevenshteinAutomata, FST, BKDReader, BKDWriter, PackedInts, FixedBitSet, PriorityQueue, …
  4. 'VMMUFYU4FBSDI#BTJDTPO-VDFOF *OWFSUFEJOEFY5IFDPSFEBUBTUSVDUVSFPGTFBSDIFOHJOFT w શจݕࡧͰ͸ɺݕࡧର৅ͷίϯςϯπ͸จॻ EPDVNFOU ͱͯ͠ ϞσϧԽ͞ΕΔ w &$ݕࡧͰ͋Ε͹঎඼ w

    8FCݕࡧͰ͋Ε͹8FCϖʔδ w -VDFOFͰ͸DocumentΫϥεʹରԠ w -VDFOF͸సஔΠϯσοΫε JOWFSUFEJOEFY ʹΑΓ 
 จॻΛΠϯσοΫεԽ͢Δ w సஔΠϯσοΫεํࣜͷશจݕࡧ͸ɺେن໛ͳจॻू߹͔Βͷ ݕࡧʹ޲͍͍ͯΔ w -VDFOFͰ͸ɺจॻ͸EPDJEͰࣝผ͞ΕΔ 5FSN 1PTUJOHT-JTU BDUJPO  DPPLCPPL  JO  MVDFOF   "OFYBNQMFPGJOWFSUFEJOEFYTUSVDUVSFGPSEPDVNFOUT  l-VDFOFJO"DUJPOzBOEl-VDFOF$PPLCPPLz" TFUPGBMMUFSNTJTPGUFOSFGFSSFEUPBTBlUFSN EJDUJPOBSZzPSTJNQMZlEJDUJPOBSZz SFG*OGPSNBUJPO3FUSJFWBMBOE8FC4FBSDI·ͱΊ  సஔΠϯσοΫεTUPQUIFXPSME 
 IUUQTTUPQUIFXPSMEIBUFOBCMPHDPNFOUSZDTJOGPSNBUJPOSFUSJFWBM
  5. -VDFOFT*OEFY $SFBUJOH4FBSDIJOHBOJOEFY w ΠϯσοΫεͷ࡞੒ w ΠϯσοΫε͸ϑΝΠϧγεςϜʹอଘ͞ΕΔ w ΠϯσοΫε͸ෳ਺ͷϑΝΠϧ͔Βߏ੒͞ΕΔ w λʔϜࣙॻɺϙεςΟϯάϦετͳͲʢޙड़ʣ

    w ΠϯσοΫεΛݕࡧ w ϑΝΠϧͱͯ͠อଘ͞ΕͨΠϯσοΫε͔Βݕࡧ w ΫΤϦʹϚον͢Δจॻू߹Λฦ͢ w จॻ*%ɺจॻͷ಺༰ͳͲؚ͕·ΕΔ -VDFOF*OEFY 4&31 %PDVNFOUT
  6. *OEFYJOH1SPDFTT0WFSWJFX $PSFDPNQPOFOUTGPSJOEFYJOH $POTUSVDUBEPDVNFOUPCKFDU 
 จॻΦϒδΣΫτΛߏங "OBMZ[FUFYUDPOUFOUT QSFQSPDFTTJOH 
 ςΩετղੳʢલॲཧʣ #VJMEB-VDFOFJOEFY

    
 ΠϯσοΫεߏங 8SJUFBOJOEFYUPBTUPSBHF 
 ΠϯσοΫεॻ͖ࠐΈ Analyzer Directory IndexWriter Document
  7. %PDVNFOU3FQSFTFOUBUJPO w จॻ͸DocumentΫϥεͷΦϒδΣΫτͱͯ͠දݱ͞ΕΔ Document doc1 = new Document(); doc1.add(new Field("title",

    "Lucene in Action", TextField.TYPE_STORED)); Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); .PEFMPCKFDUUPCFJOEFYFE
  8. %PDVNFOU3FQSFTFOUBUJPO w จॻ͸DocumentΫϥεͷΦϒδΣΫτͱͯ͠දݱ͞ΕΔ w Document͸ෳ਺ͷField͔Βߏ੒͞ΕΔ w ,FZWBMVFNBQͷΑ͏ͳσʔλߏ଄ Document doc1 =

    new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); .PEFMPCKFDUUPCFJOEFYFE
  9. %PDVNFOU3FQSFTFOUBUJPO w จॻ͸DocumentΫϥεͷΦϒδΣΫτͱͯ͠දݱ͞ΕΔ w Document͸ෳ਺ͷField͔Βߏ੒͞ΕΔ w ,FZWBMVFNBQͷΑ͏ͳσʔλߏ଄ w Field͸ϑΟʔϧυ໊ͱ಺༰ɺϑΟʔϧυͷλΠϓΛ΋ͭ Document

    doc1 = new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); 'JFMEOBNF 'JFMEDPOUFOU 'JFMEUZQFzTUPSFEPSOPUTUPSFE 4UPSFE fi FMETBSFTUPSFEJOBOJOEFY5IJTBMMPXTZPVUP 
 SFUSJFWFUIF fi FMEDPOUFOUTBUTFBSDIUJNF .PEFMPCKFDUUPCFJOEFYFE
  10. "OBMZ[FS 5FYUQSFQSPDFTTPST w AnalyzerTokenizer Filters w Tokenizer w ςΩετจࣈྻΛ5PLFOͷྻʹ෼ׂ͢Δ w

    Filter w 5PLFOΛҰఆͷϧʔϧͰআڈ͢ΔʢFHStopFilterʣ w 5PLFOͷจࣈྻΛҰఆͷϧʔϧͰஔ׵͢ΔʢFHLowerCaseFilterʣ w AnalyzerͷྫStandardAnalyzer w StandardAnalyzerStandardTokenizer + StopFilter  LowerCaseFilter w 6OJDPEF5FYU4FHNFOUBUJPO ϕʔεͷ Tokenizer "Lucene in Action" "Lucene", "in", "Action" "Lucene", "Action" "lucene", "action" StandardTokenizer StopFilter LowerCaseFilter *GUIFStopFilterIBTlJOzBTBTUPQXPSE
  11. #BTJD*OEFYJOH"1* "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY // Create a directory for storing Lucene index

    Path indexDirPath = Files.createDirectory(Path.of("index")); Directory directory = FSDirectory.open(indexDirPath); // Set up IndexWriter Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); // Index a document: "Lucene in Action" Document doc1 = new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); indexWriter.addDocument(doc1); // Index a document: "Lucene Cookbook" Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); indexWriter.addDocument(doc2); // Write index to the directory indexWriter.close();
  12. #BTJD*OEFYJOH"1* "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY w FSDirectory w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ

    // Create a directory for storing Lucene index Path indexDirPath = Files.createDirectory(Path.of("index")); Directory directory = FSDirectory.open(indexDirPath); // Set up IndexWriter Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); // Index a document: "Lucene in Action" Document doc1 = new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); indexWriter.addDocument(doc1); // Index a document: "Lucene Cookbook" Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); indexWriter.addDocument(doc2); // Write index to the directory indexWriter.close();
  13. #BTJD*OEFYJOH"1* "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY w FSDirectory w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ

    w IndexWriter w ΠϯσοΫεͷॻ͖ࠐΈΛΦʔέετϨʔγϣϯ͢ΔΫϥε // Create a directory for storing Lucene index Path indexDirPath = Files.createDirectory(Path.of("index")); Directory directory = FSDirectory.open(indexDirPath); // Set up IndexWriter Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); // Index a document: "Lucene in Action" Document doc1 = new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); indexWriter.addDocument(doc1); // Index a document: "Lucene Cookbook" Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); indexWriter.addDocument(doc2); // Write index to the directory indexWriter.close();
  14. #BTJD*OEFYJOH"1* "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY w FSDirectory w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ

    w IndexWriter w ΠϯσοΫεͷॻ͖ࠐΈΛΦʔέετϨʔγϣϯ͢ΔΫϥε w addDocument()ϝιουͰDocumentΛ௥Ճ // Create a directory for storing Lucene index Path indexDirPath = Files.createDirectory(Path.of("index")); Directory directory = FSDirectory.open(indexDirPath); // Set up IndexWriter Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); // Index a document: "Lucene in Action" Document doc1 = new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); indexWriter.addDocument(doc1); // Index a document: "Lucene Cookbook" Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); indexWriter.addDocument(doc2); // Write index to the directory indexWriter.close();
  15. #BTJD*OEFYJOH"1* "DPEFFYBNQMFUPCVJMEB-VDFOFJOEFY w FSDirectory w DirectoryʢετϨʔδΞΫηε"1*ʣͷ࣮૷ͷҰͭ w ϑΝΠϧγεςϜ΁ͷΞΫηεΛఏڙ͢Δ w Directoryͷ࣮૷ʹ͸ଞʹ΋RAMDirectoryͳͲ͕͋Δ

    w IndexWriter w ΠϯσοΫεͷॻ͖ࠐΈΛΦʔέετϨʔγϣϯ͢ΔΫϥε w addDocument()ϝιουͰDocumentΛ௥Ճ w close()͢ΔͱʢσϑΥϧτͷઃఆͰ͸ʣDirectoryʹ ॻ͖ࠐΉ // Create a directory for storing Lucene index Path indexDirPath = Files.createDirectory(Path.of("index")); Directory directory = FSDirectory.open(indexDirPath); // Set up IndexWriter Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); // Index a document: "Lucene in Action" Document doc1 = new Document(); doc1.add(new Field("title", "Lucene in Action", TextField.TYPE_STORED)); indexWriter.addDocument(doc1); // Index a document: "Lucene Cookbook" Document doc2 = new Document(); doc2.add(new Field("title", "Lucene Cookbook", TextField.TYPE_STORED)); indexWriter.addDocument(doc2); // Write index to the directory indexWriter.close();
  16. #VGGFSJOHBOE'MVTIJOHBU*OEFYJOH 8IFOIndexWriterXSJUFTEPDVNFOUT w IndexWriterʹaddDocument()ͯ͠΋ɺ͙͢ʹετϨʔδʹ 
 ॻ͖ࠐ·ΕΔΘ͚Ͱ͸ͳ͍ w 3".ʢ+7.ώʔϓʣ্ͷόοϑΝྖҬʹॻ͖ࠐ·ΕΔ w IndexWriter͸ద౰ͳλΠϛϯάͰDirectoryʹϑϥογϡ͢Δ

    w όοϑΝʹอଘ͞Εͨ3".αΠζ΍จॻ਺ͳͲ͕τϦΨʔ w ໌ࣔతʹflush()ΛݺͿ͜ͱͰϑϥογϡͰ͖Δ 
 ʢͨͩ͠DPNNJUॲཧ͸͠ͳ͍ʣ w commit()ΛݺͿͱϑϥογϡ͔ͯ͠ΒDPNNJUॲཧΛߦ͏ addDocument() IndexWriter Directory #V ff FSJO3". 'MVTI
  17. _0.fdm _0.fdt _0.fdx _0.fnm _0.nvd _0.nvm _0.si _0_Lucene84_0.doc _0_Lucene84_0.pos _0_Lucene84_0.tim

    _0_Lucene84_0.tip _0_Lucene84_0.tmd segments_1 write.lock *OEFYBGUFSTUDPNNJU 4FHNFOU 4FHNFOUT'JMF -PDL'JMF TFHNFOUT@ 4FHNFOU w ΠϯσοΫε͸ෳ਺ͷηάϝϯτ TFHNFOU ͔ΒͳΔ w ͢΂ͯಉ͡σΟϨΫτϦʹอଘ͞ΕΔ w ηάϝϯτ͸αϒΠϯσοΫε w ୯ମͰ΄΅-VDFOFΠϯσοΫεͱͯ͠ػೳ͢Δ w ηάϝϯτ͸ෳ਺ͷϑΝΠϧ͔ΒͳΔ w ϑΝΠϧ໊͸_gen.extPS_gen_Lucene84_0.extͷܗࣜ w &H_0.fnm _0_Lucene84_0.pos ʜ w genηάϝϯτͷੈ୅ FH    w extϑΥʔϚοτ͝ͱͷ֦ுࢠ FHGON QPT   w IndexWriter͕ fl VTIͨ͠ͱ͖ʹηάϝϯτ͕ͭ࡞ΒΕΔ w DPNNJU͞Εͨͱ͖ʹॳΊͯsegments_N͔Βࢀর͞ΕΔ w N  ʜ *OEFY4FHNFOUT -VDFOFJOEFY fi MFT
  18. *OEFY4FHNFOUT -VDFOFJOEFY fi MFT w ΠϯσοΫε͸ෳ਺ͷηάϝϯτ TFHNFOU ͔ΒͳΔ w ͢΂ͯಉ͡σΟϨΫτϦʹอଘ͞ΕΔ

    w ηάϝϯτ͸αϒΠϯσοΫε w ୯ମͰ΄΅-VDFOFΠϯσοΫεͱͯ͠ػೳ͢Δ w ηάϝϯτ͸ෳ਺ͷϑΝΠϧ͔ΒͳΔ w ϑΝΠϧ໊͸_gen.extPS_gen_Lucene84_0.extͷܗࣜ w &H_0.fnm _0_Lucene84_0.pos ʜ w genηάϝϯτͷੈ୅ FH    w extϑΥʔϚοτ͝ͱͷ֦ுࢠ FHGON QPT   w IndexWriter͕ fl VTIͨ͠ͱ͖ʹηάϝϯτ͕ͭ࡞ΒΕΔ w DPNNJU͞Εͨͱ͖ʹॳΊͯsegments_N͔Βࢀর͞ΕΔ w N  ʜ _0.fdm _0.fdt _0.fdx _0.fnm _0.nvd _0.nvm _0.si _0_Lucene84_0.doc _0_Lucene84_0.pos _0_Lucene84_0.tim _0_Lucene84_0.tip _0_Lucene84_0.tmd segments_1 write.lock _0.fdm _0.fdt _0.fdx _0.fnm _0.nvd _0.nvm _0.si _0_Lucene84_0.doc _0_Lucene84_0.pos _0_Lucene84_0.tim _0_Lucene84_0.tip _0_Lucene84_0.tmd _1.fdm _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1_Lucene84_0.doc _1_Lucene84_0.pos _1_Lucene84_0.tim _1_Lucene84_0.tip _1_Lucene84_0.tmd segments_2 write.lock *OEFYBGUFSTUDPNNJU *OEFYBGUFSOEDPNNJU 4FHNFOU 4FHNFOU 4FHNFOU 4FHNFOUT'JMF -PDL'JMF 4FHNFOUT'JMF -PDL'JMF TFHNFOUT@ 4FHNFOU 4FHNFOU TFHNFOUT@ 4FHNFOU
  19. *OEFY$PNNJUT$POTJTUFODZ )PXIndexWriter#commit()XPSLT  όοϑΝʹ͋ΔจॻΛ͢΂ͯ fl VTI͠ɺͦͷ͋ͱDirectoryΛTZOD͢Δ w FSDirectoryͷͱ͖ɺϑΝΠϧγεςϜʹॻ͖ࠐ·Εͨ͜ͱΛอূ TFHNFOUT@ 4FHNFOU

    TFHNFOUT@ 4FHNFOU 4FHNFOU 'MVTI4FHNFOU 
 4ZOD fi MFTZTUFN *OEFY3FBEFS *OEFY3FBEFS
  20. *OEFY$PNNJUT$POTJTUFODZ )PXIndexWriter#commit()XPSLT  όοϑΝʹ͋ΔจॻΛ͢΂ͯ fl VTI͠ɺͦͷ͋ͱDirectoryΛTZOD͢Δ w FSDirectoryͷͱ͖ɺϑΝΠϧγεςϜʹॻ͖ࠐ·Εͨ͜ͱΛอূ  segments_NϑΝΠϧΛॻ͖ࠐΜͰTZOD͢Δ

    w IndexReader͔Βݟ͑ΔΑ͏ʹͳΔ 
 ʢIndexReader͸DPNNJU͞ΕͨηάϝϯτͷΈΛಡΈࠐΉʣ TFHNFOUT@ 4FHNFOU TFHNFOUT@ 4FHNFOU 4FHNFOU TFHNFOUT@ 4FHNFOU 4FHNFOU 'MVTI4FHNFOU 
 4ZOD fi MFTZTUFN 8SJUFTFHNFOUT@ 
 4ZOD fi MFTZTUFN TFHNFOUT@ *OEFY3FBEFS *OEFY3FBEFS *OEFY3FBEFS
  21. *OEFY$PNNJUT$POTJTUFODZ )PXIndexWriter#commit()XPSLT  όοϑΝʹ͋ΔจॻΛ͢΂ͯ fl VTI͠ɺͦͷ͋ͱDirectoryΛTZOD͢Δ w FSDirectoryͷͱ͖ɺϑΝΠϧγεςϜʹॻ͖ࠐ·Εͨ͜ͱΛอূ  segments_NϑΝΠϧΛॻ͖ࠐΜͰTZOD͢Δ

    w IndexReader͔Βݟ͑ΔΑ͏ʹͳΔ 
 ʢIndexReader͸DPNNJU͞ΕͨηάϝϯτͷΈΛಡΈࠐΉʣ  ݹ͍DPNNJUʢݹ͍segments_NʣΛ࡟আ TFHNFOUT@ 4FHNFOU TFHNFOUT@ 4FHNFOU 4FHNFOU TFHNFOUT@ 4FHNFOU 4FHNFOU 'MVTI4FHNFOU 
 4ZOD fi MFTZTUFN 8SJUFTFHNFOUT@ 
 4ZOD fi MFTZTUFN %FMFUFTFHNFOUT@ TFHNFOUT@ 4FHNFOU 4FHNFOU TFHNFOUT@ *OEFY3FBEFS *OEFY3FBEFS *OEFY3FBEFS *OEFY3FBEFS
  22. *OEFY4FHNFOU.FSHJOH IndexWriterMergePolicy w IndexWriter͸ηάϝϯτ܈ΛMergePolicyʹैͬͯɺΑΓେ͖͍ηάϝϯτ ΁ͱϚʔδ͢Δ w TieredMergePolicy EFGBVMU LogMergePolicy FUD

    w MergePolicy͸ɺͲͷηάϝϯτΛϚʔδ͢Δ͔Λܾఆ͢Δ w খ͍͞ηάϝϯτ͕େྔʹ͋Δͱݕࡧ͕஗͘ͳΔ w Ϛʔδͯ͠େ͖͍ηάϝϯτʹ·ͱΊΔ͜ͱͰύϑΥʔϚϯε͕޲্͢Δ w MergePolicy ʹΑͬͯΠϯσΩγϯάͷεϧʔϓοτ΍શମͷෛՙͳͲΛνϡʔ χϯάͰ͖Δ 4FHNFOU 4FHNFOU 4FHNFOU 4FHNFOU 4FHNFOU 4FHNFOU SFG$IBOHJOH#JUT7JTVBMJ[JOH-VDFOFTTFHNFOUNFSHFT 
 IUUQCMPHNJLFNDDBOEMFTTDPNWJTVBMJ[JOHMVDFOFTTFHNFOUNFSHFTIUNM merge()
  23. *OEFY'JMF'PSNBUT 'PSNBU/BNF &YUFOTJPO 3FMBUFE$MBTT %FTDSJQUJPO 4FHNFOU'JMF segments_N SegmentInfos ίϛοτϙΠϯτΛอ࣋ -PDL'JMF

    write.lock N/A εϨουηʔϑͷͨΊͷϩοΫϑΝΠϧ 4FHNFOU*OGP .si SegmentInfoFormat ηάϝϯτͷϝλσʔλ 'JFMET .fnm FieldInfosFormat ϑΟʔϧυ৘ใʢ໊લͳͲʣΛอ࣋ 'JFME*OEFY .fdx StoredFieldsFormat ϑΟʔϧυσʔλ΁ͷϙΠϯλ 'JFME%BUB .fdt StoredFieldsFormat จॻͷϑΟʔϧυσʔλ 5FSN%JDUJPOBSZ .tim PostingsFormat λʔϜࣙॻʢλʔϜ৘ใΛอ࣋ʣ 5FSN%JDUJPOBSZ.FUBEBUB .tmd PostingsFormat λʔϜࣙॻͷϝλσʔλ 5FSN*OEFY .tip PostingsFormat λʔϜࣙॻ΁ͷϙΠϯλ 'SFRVFODJFT .doc PostingsFormat సஔΠϯσοΫεͱεΩοϓϦετ 1PTJUJPOT .pos PostingsFormat จॻ಺ͷλʔϜͷग़ݱҐஔΛอ࣋ 1BZMPBET .pay PostingsFormat ग़ݱҐஔ͝ͱͷϝλσʔλʢจࣈΦϑηοτͳͲʣ -JWF%PDVNFOUT .liv Lucene50LiveDocsFormat ࡟আ͞Ε͍ͯͳ͍ MJWF จॻͷ৘ใ 1PJOU7BMVFT .dii, .dim PointsFormat ਺஋σʔλΛอ࣋
  24. #SPXTF-VDFOF*OEFYVTJOH-VLF -VLF5IF(6*UPPMCPYGPSJOUSPTQFDUJOH-VDFOFJOEFY  %PXOMPBE"QBDIF-VDFOFGSPN 
 IUUQTMVDFOFBQBDIFPSHDPSFEPXOMPBETIUNM  &YUSBDU.tgz fi MF

     &YFDVUF w -JOVYNBD04lucene-8.8.1/luke/luke.s h w 8JOEPXTlucene-8.8.1/luke/luke.bat SFG-VDFOF௒ೖ໳XJUI-VLFCZNPDPCFUB 
 IUUQTNPDPCFUBNFEJVNDPNMVDFOF&#&"&XJUIMVLFBDDCDB
  25. #SPXTF-VDFOF*OEFYVTJOH-VLF -VLF5IF(6*UPPMCPYGPSJOUSPTQFDUJOH-VDFOFJOEFY *OQVUJOEFYEJSFDUPSZQBUI

  26. #SPXTF-VDFOF*OEFYVTJOH-VLF -VLF5IF(6*UPPMCPYGPSJOUSPTQFDUJOH-VDFOFJOEFY *OEFYTUBUJTUJDTBOENFUBEBUB 4FMFDU fi FMEOBNF 5FSNTUBUJTUJDT

  27. 2VFSZ1SPDFTTJOH0WFSWJFX 6TJOHIndexSearcherXJUIQuery IndexSearcher Directory IndexReader Query TopDocs 1BTTB2VFSZPCKFDU 
 ΫΤϦΦϒδΣΫτΛ౉͢

    3FUVSOUPQLSFTVMUT 
 5PQLͷ݁Ռ͕ฦΔ TermQuery PhraseQuery FUD 3FBE fi MFT "DUVBMTUPSBHFBDDFTT 3FBE-VDFOFJOEFY
  28. #BTJD4FBSDI"1* "DPEFFYBNQMFUPTFBSDIB-VDFOFJOEFY // Open a directory which stores index Directory

    directory = FSDirectory.open(Path.of("index")); // Create an IndexSearcher IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); // Create Query object that searches for "lucene" on "title" field Query query = new TermQuery(new Term("title", "lucene")); TopDocs results = indexSearcher.search(query, 10); ScoreDoc[] hits = results.scoreDocs; // Iterate through the results for (ScoreDoc hit : hits) { Document hitDoc = indexSearcher.doc(hit.doc); System.out.println("Hit: " + hitDoc.get("title")); } // Post-processing indexReader.close(); directory.close(); w IndexSearcher w IndexReaderΛ௨ͯ͠ΠϯσοΫεʹΞΫηε
  29. #BTJD4FBSDI"1* "DPEFFYBNQMFUPTFBSDIB-VDFOFJOEFY // Open a directory which stores index Directory

    directory = FSDirectory.open(Path.of("index")); // Create an IndexSearcher IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); // Create Query object that searches for "lucene" on "title" field Query query = new TermQuery(new Term("title", "lucene")); TopDocs results = indexSearcher.search(query, 10); ScoreDoc[] hits = results.scoreDocs; // Iterate through the results for (ScoreDoc hit : hits) { Document hitDoc = indexSearcher.doc(hit.doc); System.out.println("Hit: " + hitDoc.get("title")); } // Post-processing indexReader.close(); directory.close(); w IndexSearcher w IndexReaderΛ௨ͯ͠ΠϯσοΫεʹΞΫηε w TermQuery w TermʢసஔΠϯσοΫεʹ͓͚ΔΩʔʣͰݕࡧ w BOBMZ[FޙͷςΩετΛ౉͢ w -VDFOFͰݕࡧͯ͠΋ώοτͤͣɺMVDFOFͳΒώοτ
  30. #BTJD4FBSDI"1* "DPEFFYBNQMFUPTFBSDIB-VDFOFJOEFY w IndexSearcher w IndexReaderΛ௨ͯ͠ΠϯσοΫεʹΞΫηε w TermQuery w TermʢసஔΠϯσοΫεʹ͓͚ΔΩʔʣͰݕࡧ

    w BOBMZ[FޙͷςΩετΛ౉͢ w -VDFOFͰݕࡧͯ͠΋ώοτͤͣɺMVDFOFͳΒώοτ w IndexSearcher#search()ʹ౉ͯ͠ݕࡧ w TopDocsʢ5PQLͷ݁Ռʣ͕ಘΒΕΔ // Open a directory which stores index Directory directory = FSDirectory.open(Path.of("index")); // Create an IndexSearcher IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); // Create Query object that searches for "lucene" on "title" field Query query = new TermQuery(new Term("title", "lucene")); TopDocs results = indexSearcher.search(query, 10); ScoreDoc[] hits = results.scoreDocs; // Iterate through the results for (ScoreDoc hit : hits) { Document hitDoc = indexSearcher.doc(hit.doc); System.out.println("Hit: " + hitDoc.get("title")); } // Post-processing indexReader.close(); directory.close();
  31. -VDFOF2VFSJFT $MBTT %FTDSJQUJPO Query ͢΂ͯͷ2VFSZͷBCTUSBDUͳجఈΫϥε TermQuery 5FSNʹϚον͢Δݕࡧ PrefixQuery ઀಄ࣙ QSF

    fi Y ʹΑΔݕࡧ PhraseQuery ϑϨʔζݕࡧ PhrasePrefixQuery ϑϨʔζͰ઀಄ࣙݕࡧ SpanQuery, SpanTermQuery ग़ݱҐஔͷൣғ TQBO Λߟྀͨ͠ݕࡧ TermRangeQuery λʔϜͷόΠτॱ #ZUFT3FG Ͱͷൣғݕࡧ NumericRangeQuery ਺஋Ͱͷൣғݕࡧ FuzzyQuery ͍͋·͍ݕࡧ BooleanQuery ϒʔϦΞϯݕࡧʢଞͷΫΤϦͱ૊Έ߹ΘͤΔෳ߹ΫΤϦʣ FilteredQuery ߜΓࠐΈݕࡧ
  32. 4DPSJOHCZ4JNJMBSJUJFT $VTUPNJ[FSBOLJOHNPEFMT w IndexSearcherIBTBSimilarity w *UIBTBBM25Similarity CBTFEPO0LBQJ#.NPEFM BTBEFGBVMUTJNJMBSJUZ w "XIJMFBHP

    JUXBTBTFIDFSimilarity CBTFEPO7FDUPS4QBDF.PEFM  w :PVDBOTFUBOPUIFSTJNJMBSJUZCZIndexSearcher#setSimilarity() w DFRSimilarity DFISimilarity IBSimilarity LMDirichletSimilarity  LMJelinekMercerSimilarity BOETPPO SFGPSHBQBDIFMVDFOFTFBSDITJNJMBSJUJFT -VDFOF"1* 
 IUUQTMVDFOFBQBDIFPSHDPSF@@DPSFPSHBQBDIFMVDFOFTFBSDITJNJMBSJUJFTQBDLBHFTVNNBSZIUNM
  33. 0UIFS5PQJDT "OENPSF w *OEFY'JMF#JOBSZ'PSNBU w %FMFUJOH%PDVNFOUT -JWF%PD  w 0QUJNJ[JOH*OEFY

    w 8PSLJOHXJUI'JMUFST'JMUFS$BDIF w 'JMF4ZTUFN$BDIF ..BQ%JSFDUPSZ  w /FBS3FBMUJNF4FBSDI /35
  34. 3FGFSFODF w 0 ffi DJBMEPDVNFOU 
 IUUQTMVDFOFBQBDIFPSHDPSF@@JOEFYIUNM w "1*+BWBEPD 


    IUUQTMVDFOFBQBDIFPSHDPSF@@DPSFJOEFYIUNM w $IBOHFT 
 IUUQTMVDFOFBQBDIFPSHDPSF@@DIBOHFT$IBOHFTIUNM w *TTVFT 
 IUUQTJTTVFTBQBDIFPSHKJSBQSPKFDUT-6$&/&JTTVFT w #PPLT w -VDFOFJO"DUJPO 4FDPOE&EJUJPO 
 IUUQTXXXNBOOJOHDPNCPPLTMVDFOFJOBDUJPOTFDPOEFEJUJPO w 7JEFPT w -VDJEXPSLT 
 IUUQTXXXZPVUVCFDPNDIBOOFM6$1*U0EG6L@UKMWRHHL:+T"TFBSDI RVFSZ-VDFOF w -VDFOF4PMS3FWPMVUJPO 
 IUUQTXXXZPVUVCFDPNDIBOOFM6$,V3S[&2:1QG$H$/JMH2WJEFPT