Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
日本語の表記ゆれ 解決方法の検討と実装
Search
Takahiko Ito
November 18, 2017
2
2k
日本語の表記ゆれ 解決方法の検討と実装
日本語の表記ゆれを解決する方法について検討し実装方法を紹介する。
Takahiko Ito
November 18, 2017
Tweet
Share
More Decks by Takahiko Ito
See All by Takahiko Ito
Elasticsearch における類似度ベクトル検索のベストプラクティスを求めて/es-vector-search
takahiko03
9
5.8k
pfm
takahiko03
0
990
機械学習チームにおけるソフトウェアエンジニア〜役割、キャリア /devsum-2018-summer
takahiko03
8
11k
機械学習プロジェクトを頑健にする施策 ML Ops Study #2
takahiko03
12
4.3k
Cookiecutter Template for Data Scientists Working in Docker Containers
takahiko03
2
2.1k
Cookiecutter for ML experiments with Docker
takahiko03
0
1k
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
16
2.1k
GitHub's CSS Performance
jonrohan
1025
450k
Intergalactic Javascript Robots from Outer Space
tanoku
266
26k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
18
6.9k
Building Flexible Design Systems
yeseniaperezcruz
319
37k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
274
13k
Building a Scalable Design System with Sketch
lauravandoore
456
32k
Building Applications with DynamoDB
mza
88
5.6k
5 minutes of I Can Smell Your CMS
philhawksworth
199
19k
Robots, Beer and Maslow
schacon
PRO
155
7.9k
Large-scale JavaScript Application Architecture
addyosmani
504
110k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.9k
Transcript
ຊޠͷදهΏΕ ղܾํ๏ͷݕ౼ͱ ࣮ ҏ౻ܟ
ࣗݾհ 3FE1FOͱ͍͏จॻͷ νΣοΫπʔϧΛ࡞ͬ ͍ͯ·͢ ݕࡧͱϚΠχϯάΛ Δ͜ͱ͕ଟ͍Ͱ͢ ࢠҭͯத
४උɿදهΏΕ 044ϓϩδΣΫτͷυΩϡϝϯτΛॻ͘ͱɺදهΏΕʹ ·͞ΕΔ ϰΣτφϜWTϕτφϜ จࣈͷ༳Ε &YDFMWTΤΫηϧʢจࣈछͷ༳Εʣ ߦ͏WTߦͳ͏ʢૹΓԾ໊ʣ ຊൃදͰɺදهΏΕରॲ͢ΔΞΠσΞͱʢ3FE1FO Ͱར༻Ͱ͖Δʣ࣮ʹ͍ͭͯհ
දهΏΕͱ͍͏ ଟ͘ͷ߹ΫϦςΟΧϧͰͳ͍ ͔͠͠ɻɻɻൃݟ͢Δͷʹίετ͕͔͔Δɻɻ ෳਓͰυΩϡϝϯτΛඋɿಛʹදهΏΕ͕ ൃੜ͍͢͠ɻɻɻ
ྫදهΏΕΛؚΉจॻ ࠷ۙར༻͞Ε͍ͯΔιϑτΣΞͷதʹෳͷܭࢉػ্Ͱಈ࡞ʢ ࢄʣ͢Δͷ͕ଟ͘ଘࡏ͠·͢ɻ͜ͷΑ͏ͳࢄιϑτΣΞෳ ͷܭࢉػͰಈ࡞͢Δ͜ͱͰେྔͷσʔλΛѻ͑ͨΓɺߴෛՙͳঢ়گʹ ରॲͰ͖ͨΓ͠·͢ɻຊߘͰ ෳͷܭࢉػʢ$MVTUFSʣͰಈ࡞͢Δ ֤αʔόʔΛʮΠϯελϯεʯͱݺͼ·͢ɻͨͱ͑ݕࡧΤϯδϯ ἢσʔλϕʔεͰΠϯσοΫεΛΫϥελͷJOTUBODFͰׂ͠ ͯอ࣋͠·͢ɻ͜ͷΑ͏ͳ߹ɺ֤ΠϯσΫεͷ݁ՌΛϚʔδͯ͠Ϋ
ϥΠΞϯτϓϩάϥϜʹ͢ػߏ͕ඞཁͱͳΓ·͢ɻ
ྫදهΏΕΛؚΉจॻ ࠷ۙར༻͞Ε͍ͯΔιϑτΣΞͷதʹෳͷܭࢉػ্Ͱಈ࡞ʢ ࢄʣ͢Δͷ͕ଟ͘ଘࡏ͠·͢ɻ͜ͷΑ͏ͳࢄιϑτΣΞෳ ͷܭࢉػͰಈ࡞͢Δ͜ͱͰେྔͷσʔλΛѻ͑ͨΓɺߴෛՙͳঢ়گʹ ରॲͰ͖ͨΓ͠·͢ɻຊߘͰ ෳͷܭࢉػʢ$MVTUFSʣͰಈ࡞͢Δ ֤αʔόʔΛʮΠϯελϯεʯͱݺͼ·͢ɻͨͱ͑ݕࡧΤϯδϯ ἢσʔλϕʔεͰΠϯσοΫεΛΫϥελͷJOTUBODFͰׂ͠ ͯอ࣋͠·͢ɻ͜ͷΑ͏ͳ߹ɺ֤ΠϯσΫεͷ݁ՌΛϚʔδͯ͠Ϋ
ϥΠΞϯτϓϩάϥϜʹ͢ػߏ͕ඞཁͱͳΓ·͢ɻ
ݱঢ়·Ͱͷରॲํ๏ 3FE1FOͰΧλΧφ୯ޠ͚ͩʹ͍ͭͯ͋Δఔ ରԠͰ͖Δ จॻ͔ΒΧλΧφ୯ޠΛநग़ͯ͠ɺฤूڑ ͕ྨࣅ͢ΔදهΛ࣋ͭ୯ޠϖΞΛநग़ ɿΧλΧφҎ֎ͷ୯ޠͰػೳ͠ͳ͍ ɾ Тɾʆ
৽͍͠Ξϓϩʔν ୯ޠͷಡΈʹͨ͠ํ๏Λߟ͑ͨ ௐ͍ͯͳ͍͕ɺҰൠతͳํ๏͔ දهΏΕͷଟ͘ಉ͡ಡΈΛ࣋ͭ ྫɿʮ$MVTUFSʯʮΫϥελʯͱಡΉ Ξϓϩʔνɿಉ͡ಡΈΛ࣋ͭ୯ޠΛؒҧ͍ީิͱ ͯ͠நग़ͯ͋͛͠Δ
࣮ࡍͷॲཧ จॻͷ֤୯ޠΛಡΈͰΠϯσΫε͢Δ ಉҰͷಡΈΛ͕࣋ͭผ୯ޠͷϖΞΛΤϥʔͱ͠ ͯग़ྗ͢Δ
ྫɿΏΒ͗ݕॲཧ ෳͷܭࢉػʢ$MVTUFSʣͰಈ ࡞͢Δ֤αʔόʔΛʮΠϯελ ϯεʯͱݺͼ·͢ɻͨͱ͑ݕ ࡧΤϯδϯσʔλϕʔε TFSWFSͰΠϯσοΫεΛΫ ϥελͷJOTUBODFͰׂͯ͠ อ࣋͠·͢ɻ ಡΈ
୯ޠϦετ Ϋϥελ $MVTUFSɺΫϥελ ϑΫε ෳ αʔό TFSWFSɺαʔόʔ ʜ ʜ ಉ͡ಡΈΛ࣋ͭ୯ޠϖΞ දهΏΕͷՄೳੑ
ಡΈͲ͏ͬͯऔ ಘ͢Δͷ͔ ղܾ๏ɿࣙॻʹཔΔʂ ຊޠͷܗଶૉղੳثͰར༻͞ΕΔࣙॻʹ৭ʑ͋Δ /"*45+%*$ *QBEJD 6OJEJD /&PMPHE
બͨࣙ͠ॻɿ/&PMPHE ಡΈใΛఏڙ͢Δେن ͳࣙॻ /&PMPHE ࣙॻͷαΠζɿ̏̌̌ສ ֤୯ޠͷಡΈใ͕ఏڙ ͞Ε͍ͯΔ
ͬͯΈΔ 3FE1FOWʹ࣮ ΛՃ +BQBOFTF&YQSFTTJPO 7BSJBUJPO IUUQSFEQFOIFSPLVBQQDPN
ͬͯΈΔʢେ͖Ίॻ੶ʣ աڈʹ͕ࣗࣥචͨ͠ຊͷιʔεΛೖྗ Ͳ͏͍ͬͨ݁Ռ͕ಘΒΕΔͷ͔ΛνΣοΫͯ͠Έ ͨɻ
݁Ռɿ͏·͍ͬͨ͘Օॴ ਓखͰؾ͖ͮʹ͍͘දهΏΕ͕औಘͰ͖ͨ ྫɿ ߦͳ͏ ߦ͏ ͍ Έ͔͍͡
݁Ռɿ͏·͘ߦ͔ͳ͔ͬͨ Օॴ ಡΈಉ͡ͰදهΏΕͰͳ͍ϖΞऔಘ͞Εͯ ͠·͏ɻɻɻ l༷ɺ༻ ໊ࢺ ผه ͖ ॿಈࢺ
දهΏΕͷநग़ػೳͷ༻ʹҙ͕ඞཁ
ར༻ํ๏ ɿͷͳ͍ࣄྫ͕நग़͞Εͯ͠·͏ɻ ௨ৗͷνΣοΫͰར༻ͤͣɺඞཁͳͱ͖͚࣮ͩ ߦ͢ΔͱΑ͛͞ɻ ˠ3FE1FOW͔ΒΤϥʔϨϕϧΛબͰ͖ ΔΑ͏ʹͳΓ·ͨ͠ɻ
3FE1FOɿΤϥʔϨϕϧ ͷࢦఆ SFEQFODPOGMBOHKB WBMJEBUPST WBMJEBUPSOBNF+BQBOFTF&YQSFTTJPO7BSJBUJPOMFWFM*OGP WBMJEBUPSOBNF%PVCMF/FHBUJWF WBMJEBUPSOBNF1BSBHSBQI/VNCFS WBMJEBUPSOBNF-JTU-FWFM WBMJEBUPST SFEQFODPOG
ϨϕϧΛ*OGPʹઃఆ͓ͯ͘͠ͱΤϥʔग़ྗ͞Ε ͳ͍ʢग़ྗΛ͍ͨ͠ͱ͖ʹύϥϝλύϥϝλl UISFIPMEJOGPzΛࢦఆ͢Δʣɻ
%&.0 SFEQFOUJOGPDDPOGSFEQFODPOGKBYNM TBNQMFEPDKBTBNQMFEPDKBSF
·ͱΊ ຊޠͷදهΏΕʹಡΈͰରॲ͢Δํ๏Λհ ࣙॻʢ/&PMPHEʣ͕͋Εɺ࣮͘͠ͳ͍ ࣗ༻ʹ࡞͓ͬͯ͘ͱศར͔ 3FE1FOͰͷදهΏΕͷ༻ํ๏ͪ͜ΒΛࢀর ͍ͩ͘͞IUUQCJUMZ[08/HJ
͝੩ௌ͋Γ͕ͱ͏͝ ͍͟·ͨ͠