Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Elasticsearch2系
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
tsuyoshi nakamura
August 31, 2016
Technology
90
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Elasticsearch2系
Elasticsearch2系で日本語検索を試す
tsuyoshi nakamura
August 31, 2016
More Decks by tsuyoshi nakamura
See All by tsuyoshi nakamura
社内の勉強会で発表した_output_一部抜粋版_.pdf
tsuyoshi
0
500
PHPを少しでも早く_条件はあるよ_.pdf
tsuyoshi
0
87
スタートアップ6年目のレビュー文化.pdf
tsuyoshi
1
2k
PHPを少し深堀るよ.pdf
tsuyoshi
0
390
Reactive_Manifesto.pdf
tsuyoshi
0
87
About_Resilience.pdf
tsuyoshi
1
95
エンジニアの循環ってgood_or_bad_.pdf
tsuyoshi
0
1.3k
スタートアップしてからの失敗の数々
tsuyoshi
0
2.5k
スタートアップエンジニアの役割
tsuyoshi
0
550
Other Decks in Technology
See All in Technology
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
2.3k
新しいVibe Codingと”自走”について
watany
5
240
Mastering Ruby Box
tagomoris
3
150
Snowflakeと仲良くなる第一歩
coco_se
3
300
Amazon Bedrock AgentCore ワークショップ JAWS UG TOHOKU / amazon-bedrock-agentcore-workshop-jawsug-tohoku-2026
gawa
9
500
もりもり新機能を一挙紹介! AgentCoreに入門して、AWS上にAIエージェントを構築しよう
minorun365
PRO
6
860
「コーディング」しない人のための Claude Code 入門 ChatGPT の次の一歩 — 業務に組み込む 育成・共有・自動化
rfdnxbro
2
1.2k
DevOps Agentで始めるAWS運用 〜フロンティアエージェントが変える運用の現場〜
nyankotaro
1
340
Chart.js が簡単に使えるようになっていたので OGP 画像生成に使った話
kamekyame
0
170
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
530
LLMにもCAP定理があるという話
harukasakihara
0
260
「速く作る」から「正しく作る」へ ─ 生成AI時代の開発フロー改革の ロードマップと実行 ─
starfish719
0
9.2k
Featured
See All Featured
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
270
Practical Orchestrator
shlominoach
191
11k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
550
エンジニアに許された特別な時間の終わり
watany
107
250k
The Pragmatic Product Professional
lauravandoore
37
7.3k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
720
The Invisible Side of Design
smashingmag
302
52k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
540
Chasing Engaging Ingredients in Design
codingconduct
0
210
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.4k
Transcript
࣮ફElasticsearch(2.1.1) 2016-01-22 ࣾษڧձ Tsuyoshi Nakamura
全文検索エンジンとしては色々な歴史をたどってきました͕ɺࠓશจݕࡧΤϯδ ϯͱ͍͑”Elas'csearch”͕ྑ͍Έ͍ͨͳײ͡ͰɺAWSʹొ
Agenda • Install͔Βconfigઃఆ • Kuromoji • AnalysisϞδϡʔϧ • ओཁϞδϡʔϧ •
Demo • ௐࠪΓ͠
Install ʙ config • JavayumͰinstall • Elasticsearchެࣜͷrepositories͔ΒkeyΛinport. • Yumઃఆͯ͠yum installͰ࠷৽(2.1.1)͕ೖΔ
• ຊޠͷશจݕࡧʹඞཁͳpluginΛinstall Kuromoji plugin install bin/plugin install analysis-kuromoji ※https://www.elastic.co/guide/en/elasticsearch/plugins/master/analysis- kuromoji.html ※https://github.com/elastic/elasticsearch-analysis-kuromoji
Kuromoji • ͷ໊લʢΫϩϞδʁʣʁ༶ࢬͷࠇจࣈʁ • ͔Βͳ͍͚Ͳઈศར • Solr͍ͬͯͨ࣌͡ຊޠ༻ͷࣙॻʢMecabΒChasenΒʣΛࣗͰ ೖΕͯɺɺɺͱ৭ʑͱ໘͚ͩͬͨͨ
AnalysisϞδϡʔϧ Analyzer • ෳઃఆՄೳ • τʔΫφΠζॲཧʢܗଶૉղੳʣ • ϑΟϧλʔॲཧ IndexΛ࡞͢Δ࣌ɺݕࡧ͢Δ࣌͜Μͳॲཧ͕ߦΘΕΔ
AnalysisϞδϡʔϧ Tokenizer • τʔΫφΠζํࣜΛઃఆ • KuromojiΛͬͯτʔΫφΠζ͢Δͱ͔ • ngramࣜʹτʔΫφΠζ͢Δͱ͔ Token Filters
• τʔΫφΠζॲཧޙͷτʔΫϯʹରͯ͠ՃॲཧΛ͢Δ • શ֯ӳࣈΛ֯ʹ͠ɼ֯ΧλΧφΛશ֯ʹ͢ͱ͔ Char Filters • τʔΫφΠζॲཧલͷจࣈʹରͯ͠ՃॲཧΛ͢Δ • ه߸ͩͬͨΓɺʮʑʯͩͬͨΓΛআڈ͢Δ࣌ʹ͏
ओཁϞδϡʔϧ Ngram Tkenizer • N-άϥϜͰτʔΫφΠζɻElasticsearchʹ͋Δ cjk_width Token Filter • ֯શ֯Λ౷Ұ͢ΔϑΟϧλɻElasticsearchʹ͋Δ
Lowercase Token Filter • ӳࣈͷେจࣈখจࣈΛ౷Ұ͢ΔϑΟϧλɻElasticsearchʹ͋Δ Synonym Token Filter • ಉٛޠΛ݁ͼ͚ͭΔϑΟϧλɻElasticsearchʹ͋Δ Stop Token Filter • ҙͷϫʔυΛআڈ͢ΔϑΟϧλɻElasticsearchʹ͋Δ HTML Strip Char Filter • HTMLλάΛআڈ͢ΔϑΟϧλɻElasticsearchʹ͋Δ
ࠓճ࡞ͬͨconfig(elasticsearch.yml) # ---------------------------------- Index ----------------------------------- index : analysis : analyzer
: ja : type : custom tokenizer : ja_tokenizer char_filter : [ html_strip, kuromoji_iteration_mark ] filter : [ lowercase, cjk_width, katakana_stemmer, kuromoji_part_of_speech ] ja_ngram : type : custom tokenizer : ngram_ja_tokenizer char_filter : [html_strip] filter : [ cjk_width, lowercase ] tokenizer : ja_tokenizer : type : kuromoji_tokenizer mode : search user_dictionary : /etc/elasticsearch/userdict_ja.txt ngram_ja_tokenizer : type : nGram min_gram : 2 max_gram : 3 token_chars : [letter, digit] filter : katakana_stemmer : type : kuromoji_stemmer
ࠓճ࡞ͬͨindex mapping { "order": 0, "template": "projects01-*", "settings": { "index":
{ "number_of_shards": "1", "number_of_replicas": "0" } }, "mappings": { "project": { "_source": { "enabled": false }, "_all": { "analyzer": "ja", "enabled": true }, "properties": { "update_time": { "format": "YYYY-MM-dd HH:mm:ss", "type": "date" }, "project_id": { "index": "not_analyzed", "type": "string" }, "detail": { "analyzer": "ja", "type": "string" }, "suggest": { "search_analyzer": "ja", "analyzer": "ja", "type": "completion" }, "detail_ngram": { "analyzer": "ja_ngram", "type": "string" }, "title": { "analyzer": "ja", "type": "string" }, "title_ngram": { "analyzer": "ja_ngram", "type": "string" } } } }, "aliases": { }
Demo
Demo • Elasticsearchͷཧπʔϧ(kopf)ΛݟΔ • ͍Ζ͍Ζػೳ͋Δɻ • Mappingͱ͔͜͜Ͱొͨ͠ • ศརɺ͔͍͍ͬ͜ •
IndexΛ࡞ͬͯΈΔ • ݕࡧͯ͠ΈΔ • αδΣετػೳ(completion)͔ͭͬͯΈΔ
·ͩௐ͕ࠪඞཁͳՕॴ • Indexͷӡ༻ɺߋ৽ϑϩʔ • Pyhon curator • Score • ಉ͡ݕࡧͰݱࡏਐߦ͍ͯ͠ΔPJΛݕࡧ݁Ռͷ্ҐΈ͍ͨͳཁ݅
͕ग़͖ͯͦ͏ • SlowΫΤϦͱ͔ͷᮢ • ES_HEAP_SIZEɺεϫοϓ • clusterɺshardɺreplica • IndexͷόοΫΞοϓɺϦετΞ • Pyhonͷtoolɺ_snapshotɺόΠφϦόοΫΞοϓ • Facet? AggregationsͰ͍͚Δʁ