Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearchで多言語検索対応してみた話.pdf

motsat
July 19, 2018

 Elasticsearchで多言語検索対応してみた話.pdf

motsat

July 19, 2018
Tweet

More Decks by motsat

Other Decks in Programming

Transcript

  1. ຋༁Λߦ͏ͨΊͷAPI ɾGoogle Translation API 100 ສจࣈ - 20υϧ ɹˠ Pubmed຋༁ʹෆࣗવͳ఺͕গͳ͍

    ɾMicrosoft Translator API 100 ສจࣈ - 10υϧʢ1120ԁ) ɹˠ ྉۚ͸͍͕҆ɺPubmed຋༁͢Δͱෆࣗવͳ఺͕ΘΓͱ͋Δ ɾAmazon Translate ɹ೔ຊޠະରԠ(2017/ळࠒ)

  2. Google Translation API͸100 ສจࣈ - 20υϧ ɾֹۚ (221ԯ / 100ສจࣈ)

    * 20υϧ = 442000υϧ ɹ ೔ຊԁ = ໿4889ສԁ ɹ (2017/07/07࣌఺)
  3. pubmed: { properties: { title_en: { type: "text", analyzer: “english_analyzer"

    }, title_ja: { type: "text", analyzer: "ja_analyzer" }, body_en: { type: “text”, analyzer: "english_analyzer" }, body_ja: { type: "text", analyzer: "ja_analyzer" }, }, }, Indexͷproperties(Ϛοϐϯά) ɹɾ೔ຊޠϑΟʔϧυ(title_ja/body_ja)ɺ
 ɹɹӳޠϑΟʔϧυ(title_en/body_en)ΛλΠτϧ/ຊจ
 ɹɹͦΕͧΕͰ༻ҙ ɾӳޠϑΟʔϧυɺ೔ຊޠϑΟʔϧυͰanalyzerΛ෼͚Δ Elasticsearch༻ͷઃఆ
  4. Indexͷanalysisઃఆ ɹɾfilterʹtype:”synonym”Ͱઃఆ௥Ճ ɹɾӳޠϑΟʔϧυ༻ͷʮenglish_analyzerʯͷfillterʹɺ ɹɹsynonym filterΛ࢖͏Α͏ઃఆ(ଞެࣜͷEnglishઃఆΛϕʔεʹ) Elasticsearch༻ͷઃఆ { “index” : {

    “analysis“: { “filter“ : { “synonym“ : { “type“ : "synonym", “synonyms“ : [‘ߴ݂ѹ => hypertension’, …]}, “analyzer“: { “english_analyzer“: { “tokenizer”: "standard", “filter”: [“synoncym”,”english_possessive_stemmer”,”lowercase “,…] }, …}, }, }, }
  5. ྉۚ
 ɾ຋༁ྉ(ݻఆ) ɹ ࣄલʹλΠτϧ͚ͩ຋༁͢Δࣄʹͨ͠෼
 ɹ 1700ສ݅ * λΠτϧͷΈ(100จࣈ) = 376ສԁɹ(Google

    Translate API)
 ɾ຋༁ྉ(มಈ) ɹදࣔ࣌ʹ຋༁͢Δࣄʹͨ͠ = αສԁʢPV౳ඇެද਺஋ͷؔ܎Ͱग़ ͤͳ͍Ͱ͕͢ɺݕࡧ͞Εදࣔ͞ΕΔϖʔδ͸Ұ෦ͳͷͰ͔ͳΓ҆͘ʣ 
 ɾࣙॻ୅
 ɹ גࣜձࣾϩθολ༷ (ҩྍܥಛԽͷ࿨ӳࣙॻ)ɹ= bສԁʢ͜Ε΋ެ ද☓ͳͷͰ͕͢ɺۃΊ͓ͯ҆͘ఏڙ௖͍͍ͯ·͢ʣ