Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
파이썬과 커뮤니티와 한국어 오픈데이터
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Lucy Park
August 17, 2019
Programming
1.8k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
파이썬과 커뮤니티와 한국어 오픈데이터
PyCon Korea 2019 키노트 발표 자료입니다.
Lucy Park
August 17, 2019
Other Decks in Programming
See All in Programming
Composerを使ったサプライチェーン攻撃の様子を眺めてみる #phpstudy
o0h
PRO
2
250
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
780
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
350
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
230
Inside Stream API
skrb
1
700
キャリア迷子上等 ─ "ない道"は自分で作ればいい
16bitidol
3
2.1k
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
2
630
AI時代のUIはどこへ行く?その2!
yusukebe
21
7.1k
New "Type" system on PicoRuby
pocke
1
880
Webフレームワークの ベンチマークについて
yusukebe
0
160
コンテキストの使い捨てをやめる — ビジネスルール駆動開発と miko —
ioki
0
190
AIで効率化できた業務・日常
ochtum
0
130
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.7k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Technical Leadership for Architectural Decision Making
baasie
3
410
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
390
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
200
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
730
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
230
Skip the Path - Find Your Career Trail
mkilby
1
150
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
120k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
250
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
250
Transcript
ॆҗ ழޭפ౭৬ ೠҴয য়ؘఠ 2019-08-17 ߅
߅ a.k.a. lucypark, echojuliett, e9t • ӝ҅ߣਸ ೞח ݠन۞ ূפয
• ؘఠ۽ ध ߷ਸ ծ୶Ҋ र ࢎۈ • Yak shaver 2
ٸח 2019֙ 5ਘ • ӝഥо ػݶ ೞҊ र ݈ য!
• "ઁউ೧࣊ࢲ хࢎפ! যڃ ফӝܳ ೧ঠೡ ഛन ٜח ঋ݅ ਬ ೠ ফӝܳ ೡ ࣻ ب۾ ળ࠺೧ࠅѱਃ." • ߊө 3ѐਘա ਵפ ળ࠺ೡ दрب ࠙ೞ! 3
ԝԝೠ ҅ദ • ೠҴয NLP য়ࣗझ/ؘఠী ೠ ࢲ߬ܳ ೧ࢲ, അടਸ
ҕਬೞҊ э ߊदெաоҊ ઁউ೧ঠѷয! • ః֢ۄݶ ࠁܳ ׳ೞӝࠁח ನਊinclusion , নࢿdiversity , ழޭפ౭৬ э оܳ ࠗпೞח ߊݶ જਸ Ѫ эই • Ӓؘ۠ ژ פө ॆী ೠ ফӝب ೧ঠ • Ӓۧݶ 3֙݅ী konlpyب ખ সؘ ೧ࠅө? пઙ ؘఠࣇਸ konlpy.downloadী ನೣ೧ࢲ ࢎۈٜ औѱ ۽٘ ߉ѱ ೞݶ જਸ Ѫ э! 4
ӒܻҊ 2019֙ 8ਘ • konlpyח Ҋசѱ ցޖ ݆... э ٜ݅যоਃ!
• ই ߊܐب হ֎... ƕƕ • ೯ ܲ ইஜ ߊפө ࢎۈ ߹۽ উৢѢঠ... 5
য়ט ೡ ঠӝ • ޙઁ ध ҕਬ • ழޭפ౭ী ѱա݃
ӝৈೡ ࣻ ؍ ѐੋੋ ҃ ҕਬ ঢ়զ ফӝ • ೠҴয য়ؘఠ ঠӝب ઑӘ ߸ ࢎۈٜ ݔզ ٜ؍ ফӝ *NBHFTPVSDF4BOESBBOE8PP 3JDIBSETHVJEFUPTPGUXBSFEFWFMPQNFOU 6
ৈ۞࠙ ۽Ӓې߁ਸ ৵ ೞदաਃ? 7
۽Ӓې߁ ৵ ೞחо? • ஂ • ݡҊࢎפ્ • ѐੋ ࢿ
• ա ழޭפ౭ী ӝৈೞӝ ೦ݾٜ࢚ഐߓఋঋ 8
ঢ়զ ঠӝ ೞա: ನಯ ॆਸ જইೠ द࠻ ೧ழٜ 9
ٸח 2011֙ 10
11
12
ਫ਼Ӭ, ܐо ࠗೞҊਃ? 13
ৈӝী חؘਃ? http://likms.assembly.go.kr 14
ೞ݅ ցޖ য۵Ҋ হ ־о ־ҵب ݽܰѷҊ ֙दചݶੑפӘউࠁदझమખ؊ಞܻೞѱ߄Շযਃ ! 15
ޙࢲח Ӕࢿ ڄয ֤ޙ ו՝ ޛঁ աח PDF 16
Ѩ࢝ূ Ӕೡ ࣻ হח ଼ ೞ݅ Ҵഥ ܐח ҕҕ. Ҵ
Ѩ࢝ೡ ࣻ যঠ! 17
ܻо о ӝࣿਸ ਊ೧ࢲ ࣁ࢚ਸ ߄Լࠁ! 18
ನಯ http://popong.com • ࠺ ٣ց, ѐߊ, ӝദ ١ਵ۽ ҳࢿػ ࢤ
10ৈ ݺ • ӝࣿ۽ॄ ೠҴ ܳ ٍൔ٘חѱ ݾ • ࢲ࠺झܳ ٜ݅যࠁݶࢲ ܻо ࢿೞח Ѫ ؒ! (ژח ؊ ݾ) 1010/(1VCMJD0QFO10MJUJDBMFOHJOFFSJ/( 19
ࢲ࠺झ ۽షఋೝਸ ೞҊ द೮؍۽షఋೝో#BMTBNJR ! ۽షఋೝҗҾӘೞݶৈӝ۽оࣁਃ 20
ਗ ֎ਕ ࠙ࢳب ೞҊ 3FBENPSFBCPVUUIJTIFSF 21
ݒ షਃੌ݃ ݽৈ ӝദҗ షۿҗ झఠ٣ ࣻо ੑߨҗ ೧ ইחѱ
ೞաب হѢٚਃ Ӓۧѱ 2ߣ ܻझ݃झܳ ೣԋ ݏೠ റ... 22
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ Ѫ ٸޙী ܀݂ب ৌब
ೞҊ ইऔѱبӘബਊ࠺࠺ਊ݆ػҊ౸ױ೧ࢲ֙ਘࠗఠPVUPGTFSWJDF ೞ݅ನಯӝоदबತࣗࢤۄदࣁ࢚ীաৢࣻبযਃ 23
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ ۿূ٘ب ߓҊ 24
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ D3৬ NLPীب ҙबਸ оѱ
غҊ 25
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ PDF यب Ҋ 26
South Korea/Seoul Maps ؘఠ ҕѐ ݫܰషܰ э ైߨب ߓ shpੌਸTopoJSONਵ۽߄Բחؘ݅֙աѦ۷যਃ
! 27
Ҵഥਗ/উ ؘఠ REST API • ࢎप ਢࢲ࠺झח ࣳாझ • ѱ
ਗې ೯ೞ۰؍ ۽ં • ؊ ݆ ࢎۈٜ ؘఠܳ ࠙ࢳೞѢա ഝਊ೧ࢲ ࢲ࠺झܳ ٜ݅ӝܳ ߄ۗ ইऔѱبӘѪبPVUPGTFSWJDFҊ٘݅թই݅ ઁחҴഥীࢲؘఠܳઁҕפ ഥ۾ উ ਗ 28
য়ࣗझ, য়ؘఠ: ݽٚ Ѫਸ ҕਬೞ۰ח ֢۱ 29
ೠҴ ܳ ٍൔٜѷҊ ࢤпೠ ಁӝ ֈח ٜ֙ • ೠಞਵ۽ח ߸
ࢎഥ۽ࠗఠ ੋਸ ߉ӝب ೮݅ • ࣁ࢚ Ӓۧѱ औѱ ߸ೞחѱ ইפۄח Ѫਸ ߓਛҊ ࢲߡ ࠺ਊ ࠺ऱחѪب ঌѱ غҊ • ޖࠁب ࣗೠ જ ҳܳ Ҋ 30
Ӓ যڃ ҃ࠁ ч જ ѐߊ ޙചܳ ߓਛणפ • ޙࢲച
೧ࣁਃ. ղ৻৬ ࣗాೞח ߑߨ • Git જ ۑ౭झח Ѣېਃ. ղ৻৬ ࣗాೡ ٸ ఃݶ જ ݒց • ࢜ ۽ંীࢲח Flaskۄח ۨਕܳ ॄࠅөਃ? ࢜۽ بҳܳ ח ߑߨ • ই٣যח working prototypeਵ۽ ࠁৈࣁਃ. ఌ࢚ҕۿਸ ߩযա ਸ ח ߑߨ • Ӓ ٘/ؘఠ ۄࣃझо ޤীਃ? ఋੋ ޛਸ ઓೞח ࣁ 31
ࣁ࢚ਸ ߄Բח Ѫ ա৬ ղ ߸ਸ ߄Բח Ѫ 32
दр ખ Ѧ۰ب ҡଳ ݽܰח Ѥ ߓݶ ػ 33
য়ࣗझо ܳ о٠ ؘఠܳ оҕ೧ࢲ য়ؘఠ۽ ܾܻૉೞח Ѫب
34
ঢ়զ ঠӝ ل: KoNLPy ਃо ࢤदఅ ॆ ۄ࠳۞ܻ 35
ٸח 2014֙ • ઁ ҕ ਗې ఫझی ࢚ҙহҊ ंܳ ࠙ࢳೞח
ؘఠ݃ • Ү ۽ંܳ ࣻ೯ೞ؍ ೠҴয ࠙ࢳਸ ಞܻೞѱ ೞӝ ਤ೧ द • ೠҴয ࠙ࢳਸ ೞ۰ݶ ੌױ షաਸ ೧ঠೞחؘ যڌѱ ೞ? • ഋకࣗ ࠙ࢳਸ ೞݶ ػחؘ য়ࣗझח ա? • ݻ оо ח Ѫ эؘ ࢿמ যڌѱ ܰ? 36
Enter KoNLPy >>> from konlpy.tag import Okt >>> okt =
Okt() >>> okt.pos('݅աࢲ ߈оਕਃ!') [('݅աࢲ', 'Verb'), ('߈оਕਃ', 'Adjective'), ('!', 'Punctuation')] • пઙ য়ࣗझ ഋకࣗ ࠙ࢳӝܳ ݽ • nltk ١ ܲ ۄ࠳۞ܻ৬ э ಞܻೞѱ ॶ ࣻ ѱ ੋఠಕझܳ ాੌ • ೠӖਸ ܙ ٸ ਃೠ пઙ utilਸ ୶оೣ 37
KoNLPy৬ ߣ૩ PyCon KR • द ॆ झथ: " ೠҴীࢲ
ৌܽ!" • ա: "ա৬ ࠺तೠ ࠛಞೣਸ ѻҊ ח ࢎۈ ݆ ঋਸө?" • ٜ݅؍ Ѧ ੜ ಁః೧ࢲ ҕѐ೧ࠁ! 38
39
ࢎۈٜ ৵ KoNLPyܳ ࢎਊ೮ਸө? • ୡࠁ: ࢎਊߨ एਕࢲ? ؊ एݶ
જѷ֎ਃ • ࢤ: ࠁҊ ٮۄೡ दо যࢲ? • पޖ: নೠ ҳഅ р ࢿמ ࠺Үо ಞ೧ࢲ? • ৻Ҵੋ: ޙࢲо য۽ب ॳৈ যࢲ? • ജ҃ ਃੋ: റ ॆ Ҵղ৻ীࢲ ೠହ ੋӝށܳ ೧ࢲ? 40
ղо ਃೠ بҳח ղо ٜ݅য ҕਬೠ Ӓ۞ݶ ࢤпب ޅೠ بਸ
Ҋ ߉ਸ ࣻ ! 41
ٞ೧ী فߣ૩ PyCon KR • ѐੋ زӝ: ҃
ցޖ જও • द ҅ীࢲח representation learning ೧ द • word2vec, doc2vecਸ Ҵղ ॆ ழޭפ౭ী ࣗѐ೧ࠅө? • ֙ী য়ࣗझ۽ KoNLPyܳ ҕѐ೮݅ ݄࢚ షؘఠо ߹۽ হ. ੌױ ೠҴয۽ ػ ؘఠࣇਸ ٜ݅যࠁ! 42
ೠҴয ചಣ ؘఠ nsmc $ head ratings_train.txt id document label
9976970 ই ؊ࡂ.. ૐա֎ਃ ݾܻࣗ 0 3819312 ൙...ನझఠࠁҊ ୡ٬ച....য়ߡোӝઑର оߺ ঋҳա 1 10265843 ցޖӒېࢲࠁחѪਸ୶ୌೠ 0 9045019 Үبࣗ ঠӝҳݢ ..ࣛ ח হ..ಣ ઑ 0 6483659 ࢎހಕӒ झ۠ োӝо ثࠁ؍ ച!झ؊ݔীࢲ יযࠁӝ݅ ೮؍ ழझ౯ ؍झо ցޖաب ࠁ 1 5403919 ݄ Ѧ݃ ڏ 3ࣁࠗఠ ୡ١Ү 1֙ࢤੋ 8ਊച.ƀƀƀ...߹߈ѐب ইө. 0 7797314 ਗ ӟхਸ ઁ۽ ۰ղޅ೮. 0 9443947 ߹ ߈ѐب ইӰ աৡ ҃ ӡਊ োӝࢤഝݻ֙ੋ..݈ ߊ۽೧ب ӒѪࠁױ իѶ ժ.хӘ݅߈ࠂ߈ࠂ.. 0 7156791 ঘ࣌ হחؘب ח ݻউغח ച 1 • য ചಣਸ ӛ ژח ࠗਵ۽ ࠙ܨೞח ؘఠࣇ • Maas et al. 2011 IMDB ؘఠࣇ(য)ਸ ߮݃ 43
ղо о ӝࣿ ױೞ ঋইب ழޭפ౭ী ӝৈೡ ࣻ ־ҳա
ޖٚ! 44
݄݃ਵ۽ ਃ્ ೠҴয য়ؘఠ ঠӝ 45
ٜযоӝী খࢲ ԙ ೞҊ र؍ ঠӝ 46
ঌ߳ যо ইפ! Alphabet NLPо ইפۄ English NLP 47
ೠӖ ೠҴযо ইפ! Hangul NLPо ইפۄ Korean NLP ೠӖ ޙ,
ೠҴযо য 48
ೠҴয য়ؘఠ അട: Sequence classification / labeling ҕѐदӝ ܴ ݾ
ӝ ۄࣃझ 2007 ࣁઙಌझ ಿࢎకӦ 838k sentences CC BY-NC-ND 4.0 2012 KOSAC хࢿ࠙ܨ (хࢿয ࢎب ߓನ) 7.7k sentences Custom 2015 nsmc хࢿ ࠙ܨ 200k sentences (Train: 150k, Test: 50k) CC0 (Public Domain) 2016 KoreanNERCorpus NER 3.5k sentences - 2018 nlp-challenge NER 90k sentences - 2018 nlp-challenge SRL 35k sentences - 2018 Question pair Paraphrase detection 7k sentence pairs - ۄࣃझоউഃחۄࣃझоݺदغযঋѢաઁоޅ҃ੑפ 49
ೠҴয য়ؘఠ അട: Sequence generation ҕѐदӝ ܴ ݾ ӝ ۄࣃझ
2015 JPO patent corpus ӝ҅ߣ ja-ko, ౠೲ بݫੋ 257k sentence pairs Custom (࢚স ਊ ࠛо) 2017 Korean parallel corpora ӝ҅ߣ ko-{en, fr, ja} п যह߹ 0.7k, 95k, 0.2k sentence pairs CC BY-NC-ND 3.0 (࢚স ࢎਊ ࠛо) 2018 KSS ࢿ ࢿ/ੋध 12.9k pairs CC BY-NC-SA 4.0 (࢚স ਊ ࠛо) 2018 Chatbot data ച 12k pairs - 2018 OpenSubtitles2018 ӝ҅ߣ ko-* 1.3m sentence pairs (For ko-en only) - 2019 AIHub ೠҴয-য ߣ ߽۳ ݈ޡ ӝ҅ߣ ko-en 16k sentence pairs (To be 1.6m within 2019) - ۄࣃझоউഃחۄࣃझоݺदغযঋѢաઁоޅ҃ੑפ "*)VC҃ۄࣃझоݺदغযঋ݅ ࣗӝস ߮ӝস झఋস ѐੋѐߊ োҳ١ఋѶਬੋࠗࢎস۽CC0ژחCC BYঋਸө୶ஏ೧ࠇפ ژ ઁоӝ҅ߣীҙबযࢲܲ࠙ঠী࠺೧࢚ਵ۽؊աৌೡࣻחѪэؘ֬જؘఠࣇݶBXFTPNFLPSFBOOMQэܻझীҕਬ೧ࣁਃ 50
ೠҴয য়ؘఠ അട: Others ҕѐदӝ ܴ ݾ ӝ ۄࣃझ 2018
KorQUAD MRC 66k Q/A pairs - 2019? AIHub ӝ҅ة೧ MRC 450k Q/A pairs - 2019 cc-kedict ೠ ࢎ 13.7k entries CC BY-SA 3.0 2019 kosentences Self-supervised learning 31m sentences MIT + GNU Free Documentation + CC BY-NC-SA ۄࣃझоউഃחۄࣃझоݺदغযঋѢաઁоޅ҃ੑפ ݻоחઁоਊ೧ࠁޅ೮णפ 51
ࢤпࠁ ݆! ! ؊ ݆ਵݶ જѷ݅... 52
જ ࢎ۹ 1: KorQUAD https://korquad.github.io/ • ࠙ೠ ন ؘఠ ҕѐ
• बয ܻ؊ࠁ٘ө ҕਬ! 53
જ ࢎ۹ 2: KSS https://kaggle.com/bryanpark/korean-single-speaker-speech-... • Ҵղ ୭ୡ ࢿ য়ؘఠ
• ઁߨ ݆ ࠙ • ۄࣃझо ڢ۶ೞѱ ݺदغয যࢲ ޖਸ ೡ ࣻ Ҋ হחо ݺഛೣ 54
৵ য়ؘఠо ਃೠоਃ? 1. ߮݃о ؼ ࣻ ӝ ٸޙ! •
ML ݽ؛ٜ ؘఠ৬ о эইঠ ݽ؛ р ࢲ۽ ࠺Үо оמ • ࠺Үо оמ೧ݶ ӝࣿ ߊ ৡ 2. ־ҳա ߄۽ ࠙ࢳա ݽ؛݂ਸ दೡ ࣻ ӝ ٸޙ! • ۽Ӓې߁ীࢲ reinventing the wheel ઙઙ ҃҅ ࢚ غ٠ • ݽفо ؘఠ ஂٙҗ ઁܳ ೡ ਃח হਗ਼ইਃ 55
ؘఠܳ ҕѐೡ ٸ ഛੋೞݶ જ 1. ࢎਊо ߄۽ ۽٘೧ࢲ
ࢎਊೡ ࣻ חо? • оәݶ ഥਗоੑ, ਊز ର হ ߄۽ ࢎਊೡ ࣻ ѱ ೣ 2. ਗޙী ѐੋࠁ/ӂ ޙઁח হחо? • য়ؘఠח ӝࣿ ߊਸ ਤ೧ ݒ ਃೞ݅ ѐੋࠁ৬ ӂب ઓ߉Ҋ ெઉঠ ೣ! 3. оәݶ ۄࣃझܳ ԙ ݺद೧ࣁਃ! • ਊ ߹ب ޙহ оמೠ Ѫ, ࠛоמೠ Ѫਸ ঌܻӝ ਤೣ 56
য়ؘఠܳ ࢎਊೡ ٸ ഛੋೞݶ જ 1. ؘఠо ࠙ о?
࠙ ח Ѿҗܳ յ ࣻ ח ب۽ ࠙ೠо? 2. ۄࣃझо ޖੋо? ߓನо оמೠо? ࢚সਊਵ۽ ਊ೧ب غחо? 57
Call for partipation • ೠҴয NLP э ߊदெਃ! • য়ࣗझ۽ب,
য়ؘఠ۽ب ӝৈೡ ࣻ णפ • য়ࣗझח ॆযب જ݅ ԙ ॆ ইפযب ؾפ • য়ؘఠח ࠗ, ӝস, Ү ݽف ೣԋ೧ਃ ޛۿ ѐੋ ѐߊب • ೠҴয NLP ݈Ҋ ܲ ߀ח ࠙ঠب ݆ ӝৈೞҊ ࢲ۽ ҕਬ೧ਃ! 58
хࢎפ @echojuliett https://lucypark.kr ठۄ٘ܳࠁҊӈೠೖ٘ߔਸन!MPWJU !TIVSBJOשԋхࢎ٘݀פ 59