Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Elasticsearch 구조 및 검색기능 기획 시 고려 사항
Search
정한빈
March 08, 2024
0
37
Elasticsearch 구조 및 검색기능 기획 시 고려 사항
정한빈
March 08, 2024
Tweet
Share
More Decks by 정한빈
See All by 정한빈
System Design
hanbin
0
49
Evolutionary Architecture
hanbin
0
21
Featured
See All Featured
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
200
We Are The Robots
honzajavorek
0
170
Git: the NoSQL Database
bkeepers
PRO
432
66k
A Tale of Four Properties
chriscoyier
162
24k
Building the Perfect Custom Keyboard
takai
2
690
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
470
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
170
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
270
How to train your dragon (web standard)
notwaldorf
97
6.5k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
290
Transcript
ೠ࠼/23.12.6 Elasticsearch
Contents • elasticsearchۆ • Full Scan VS Pre fi x
VS Full Text • Tokenizer / Normalizer • Ѩ࢝ ঌҊ્ܻ / Fuzziness • Ѧ۽ ަ ਸ ࣻ חо
Elasticsearchۆ
Full Scan VS Prefix VS Full Text
Full Scan
Prefix CREATE INDEX `IDX_content` ON document (content); or CREATE INDEX
`IDX_content` ON document (content(n)); content ஸۢਵ۽ ੋؙझ ࢤࢿ
Prefix
Full Text - ࢝ੋ
Full Text - ࢝ੋ • Tokenizer - ࢝ੋೡ ױয Ѿ
ߑߨ • ۳ - ௪ܻо ݃ա ޙࢲ৬ ೠо?
Full Text - ࢝ੋ • a, an, theܳ Ѩ࢝೧ঠೡө? •
‘’, ‘’, ‘झ’۽ Ѩ࢝ೞݶ ‘झ’о Ѩ࢝غযঠೡө? • “झо”ۄҊ Ѩ࢝ೞݶ “झо”݅ Ѩ࢝غযঠೡө? ইפݶ “झীѱ”ب Ѩ࢝غযঠೡө? • url যڌѱ Ѩ࢝ೞ? googleਸ ଢ଼ਸٸ “google.com", “https://google.com" ١ Ѩ࢝ؼ ࣻ ਸө?
Full Text - ࢝ੋ ޙࢲܳ Ѩ࢝ೡ ࣻ ח ױয(token)ਵ۽ աׇח
ஹನք Tokenizer
• ౠ షਸ ܲ షਵ۽ ߄Լח ஹನք • Ex) ࠂࣻܳ
ױࣻചदெષ, җѢഋਸ അഋਵ۽ ߄Լષ, ޙܳ ࣗޙ۽ ߄Լષ • The foxes loved the little prince => [the, fox, love, little, prince] Normalizer
None
۳ • elasticsearchীࢲח bm25ۄח ঌҊ્ܻਵ۽ ௪ܻی ޙࢲо ݃ա ҙ۲חܳ ࣻ
۽ ն • ੌ߈ਵ۽ ੜ ٜযݏח ҕध • ցޖ ൔೠ ష оܳ ծ • ష ޙࢲীࢲ ৈ۞ߣ әغݶ ࣻܳ ֫ੋ • ӡо ૣ ޙࢲо જ ੌ߈ਵ۽ ੜٜযݏח ௪ܻ৬ ޙࢲ ҙ۲ب ࣻ ҅ध
BM25 ঌҊ્ܻ https://www.elastic.co/kr/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables • “झо જইೞח ֢ې” -> [“झ”, “જইೞ”,
“֢ې”]
Ѩ࢝ ঌҊ્ܻ / Fuzziness Why ?????
Ѩ࢝ ঌҊ્ܻ / Fuzziness - Tokenizer / Normalizer - (ઑࢎо
ੜ ܻغחо?) - Fuzziness (auto)
Ѩ࢝ ӝמ ѐߊद ӝࣿ ৻ਵ۽ Ҋ۰೧ঠೞח ࢎ೦
࠺ѐߊ ҵ ൔ ೞח য়೧
Ѩ࢝ ӝמ ӝദ != Ѩ࢝ػ ചݶ
Ѩ࢝ ߧਤ ઁݾ? ղਊ? ࢿ? ؆Ӗ? ӝఋ ݫఋؘఠ?
ஶబஎ ۳ ୭न ࢤࢿࣽ? Ѩ࢝ ب ࣽ? જইਃ ࣽ?
Ӓ৻ Ѩ࢝ ҃җ ۨ٘ য় ਬ য়ఋܳ ݃ա ࠊ Ѫੋо?
<-> ӝദ, ࢿמ زয, ਬযܳ ܻ೧ Ѫੋо <-> ҳഅ դب, ١۾ ࢿמ, Ѩ࢝ ࢿמ Ѩ࢝ ௪ܻח যוب ӡੋо <-> ҳഅ դب, Ѩ࢝ ࢿמ Ѩ࢝ ௪ܻח ױযੋо ޙੋо <-> ҳഅ դب, Ѩ࢝ ࢿמ