Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
파이썬과 커뮤니티와 한국어 오픈데이터
Search
Lucy Park
August 17, 2019
Programming
0
1.7k
파이썬과 커뮤니티와 한국어 오픈데이터
PyCon Korea 2019 키노트 발표 자료입니다.
Lucy Park
August 17, 2019
Tweet
Share
Other Decks in Programming
See All in Programming
What's new in Spring Modulith?
olivergierke
1
160
Catch Up: Go Style Guide Update
andpad
0
240
pnpm に provenance のダウングレード を検出する PR を出してみた
ryo_manba
1
140
Android16 Migration Stories ~Building a Pattern for Android OS upgrades~
reoandroider
0
130
CSC509 Lecture 06
javiergs
PRO
0
260
フロントエンド開発のためのブラウザ組み込みAI入門
masashi
7
3.3k
Google Opalで使える37のライブラリ
mickey_kubo
3
130
Server Side Kotlin Meetup vol.16: 内部動作を理解して ハイパフォーマンスなサーバサイド Kotlin アプリケーションを書こう
ternbusty
3
230
他言語経験者が Golangci-lint を最初のコーディングメンターにした話 / How Golangci-lint Became My First Coding Mentor: A Story from a Polyglot Programmer
uma31
0
330
Webサーバーサイド言語としてのRustについて
kouyuume
1
3.8k
なぜGoのジェネリクスはこの形なのか? - Featherweight Goが明かす設計の核心
qualiarts
0
230
XP, Testing and ninja testing ZOZ5
m_seki
3
790
Featured
See All Featured
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
Agile that works and the tools we love
rasmusluckow
331
21k
GitHub's CSS Performance
jonrohan
1032
470k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
Building Applications with DynamoDB
mza
96
6.7k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Gamification - CAS2011
davidbonilla
81
5.5k
Making Projects Easy
brettharned
120
6.4k
Optimizing for Happiness
mojombo
379
70k
Build your cross-platform service in a week with App Engine
jlugia
232
18k
Building a Modern Day E-commerce SEO Strategy
aleyda
44
7.8k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
54k
Transcript
ॆҗ ழޭפ౭৬ ೠҴয য়ؘఠ 2019-08-17 ߅
߅ a.k.a. lucypark, echojuliett, e9t • ӝ҅ߣਸ ೞח ݠन۞ ূפয
• ؘఠ۽ ध ߷ਸ ծ୶Ҋ र ࢎۈ • Yak shaver 2
ٸח 2019֙ 5ਘ • ӝഥо ػݶ ೞҊ र ݈ য!
• "ઁউ೧࣊ࢲ хࢎפ! যڃ ফӝܳ ೧ঠೡ ഛन ٜח ঋ݅ ਬ ೠ ফӝܳ ೡ ࣻ ب۾ ળ࠺೧ࠅѱਃ." • ߊө 3ѐਘա ਵפ ળ࠺ೡ दрب ࠙ೞ! 3
ԝԝೠ ҅ദ • ೠҴয NLP য়ࣗझ/ؘఠী ೠ ࢲ߬ܳ ೧ࢲ, അടਸ
ҕਬೞҊ э ߊदெաоҊ ઁউ೧ঠѷয! • ః֢ۄݶ ࠁܳ ׳ೞӝࠁח ನਊinclusion , নࢿdiversity , ழޭפ౭৬ э оܳ ࠗпೞח ߊݶ જਸ Ѫ эই • Ӓؘ۠ ژ פө ॆী ೠ ফӝب ೧ঠ • Ӓۧݶ 3֙݅ী konlpyب ખ সؘ ೧ࠅө? пઙ ؘఠࣇਸ konlpy.downloadী ನೣ೧ࢲ ࢎۈٜ औѱ ۽٘ ߉ѱ ೞݶ જਸ Ѫ э! 4
ӒܻҊ 2019֙ 8ਘ • konlpyח Ҋசѱ ցޖ ݆... э ٜ݅যоਃ!
• ই ߊܐب হ֎... ƕƕ • ೯ ܲ ইஜ ߊפө ࢎۈ ߹۽ উৢѢঠ... 5
য়ט ೡ ঠӝ • ޙઁ ध ҕਬ • ழޭפ౭ী ѱա݃
ӝৈೡ ࣻ ؍ ѐੋੋ ҃ ҕਬ ঢ়զ ফӝ • ೠҴয য়ؘఠ ঠӝب ઑӘ ߸ ࢎۈٜ ݔզ ٜ؍ ফӝ *NBHFTPVSDF4BOESBBOE8PP 3JDIBSETHVJEFUPTPGUXBSFEFWFMPQNFOU 6
ৈ۞࠙ ۽Ӓې߁ਸ ৵ ೞदաਃ? 7
۽Ӓې߁ ৵ ೞחо? • ஂ • ݡҊࢎפ્ • ѐੋ ࢿ
• ա ழޭפ౭ী ӝৈೞӝ ೦ݾٜ࢚ഐߓఋঋ 8
ঢ়զ ঠӝ ೞա: ನಯ ॆਸ જইೠ द࠻ ೧ழٜ 9
ٸח 2011֙ 10
11
12
ਫ਼Ӭ, ܐо ࠗೞҊਃ? 13
ৈӝী חؘਃ? http://likms.assembly.go.kr 14
ೞ݅ ցޖ য۵Ҋ হ ־о ־ҵب ݽܰѷҊ ֙दചݶੑפӘউࠁदझమખ؊ಞܻೞѱ߄Շযਃ ! 15
ޙࢲח Ӕࢿ ڄয ֤ޙ ו՝ ޛঁ աח PDF 16
Ѩ࢝ূ Ӕೡ ࣻ হח ଼ ೞ݅ Ҵഥ ܐח ҕҕ. Ҵ
Ѩ࢝ೡ ࣻ যঠ! 17
ܻо о ӝࣿਸ ਊ೧ࢲ ࣁ࢚ਸ ߄Լࠁ! 18
ನಯ http://popong.com • ࠺ ٣ց, ѐߊ, ӝദ ١ਵ۽ ҳࢿػ ࢤ
10ৈ ݺ • ӝࣿ۽ॄ ೠҴ ܳ ٍൔ٘חѱ ݾ • ࢲ࠺झܳ ٜ݅যࠁݶࢲ ܻо ࢿೞח Ѫ ؒ! (ژח ؊ ݾ) 1010/(1VCMJD0QFO10MJUJDBMFOHJOFFSJ/( 19
ࢲ࠺झ ۽షఋೝਸ ೞҊ द೮؍۽షఋೝో#BMTBNJR ! ۽షఋೝҗҾӘೞݶৈӝ۽оࣁਃ 20
ਗ ֎ਕ ࠙ࢳب ೞҊ 3FBENPSFBCPVUUIJTIFSF 21
ݒ షਃੌ݃ ݽৈ ӝദҗ షۿҗ झఠ٣ ࣻо ੑߨҗ ೧ ইחѱ
ೞաب হѢٚਃ Ӓۧѱ 2ߣ ܻझ݃झܳ ೣԋ ݏೠ റ... 22
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ Ѫ ٸޙী ܀݂ب ৌब
ೞҊ ইऔѱبӘബਊ࠺࠺ਊ݆ػҊ౸ױ೧ࢲ֙ਘࠗఠPVUPGTFSWJDF ೞ݅ನಯӝоदबತࣗࢤۄदࣁ࢚ীաৢࣻبযਃ 23
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ ۿূ٘ب ߓҊ 24
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ D3৬ NLPীب ҙबਸ оѱ
غҊ 25
ਢࢲ࠺झ "ೠҴ ݽٚ Ѫ" ࢤ PDF यب Ҋ 26
South Korea/Seoul Maps ؘఠ ҕѐ ݫܰషܰ э ైߨب ߓ shpੌਸTopoJSONਵ۽߄Բחؘ݅֙աѦ۷যਃ
! 27
Ҵഥਗ/উ ؘఠ REST API • ࢎप ਢࢲ࠺झח ࣳாझ • ѱ
ਗې ೯ೞ۰؍ ۽ં • ؊ ݆ ࢎۈٜ ؘఠܳ ࠙ࢳೞѢա ഝਊ೧ࢲ ࢲ࠺झܳ ٜ݅ӝܳ ߄ۗ ইऔѱبӘѪبPVUPGTFSWJDFҊ٘݅թই݅ ઁחҴഥীࢲؘఠܳઁҕפ ഥ۾ উ ਗ 28
য়ࣗझ, য়ؘఠ: ݽٚ Ѫਸ ҕਬೞ۰ח ֢۱ 29
ೠҴ ܳ ٍൔٜѷҊ ࢤпೠ ಁӝ ֈח ٜ֙ • ೠಞਵ۽ח ߸
ࢎഥ۽ࠗఠ ੋਸ ߉ӝب ೮݅ • ࣁ࢚ Ӓۧѱ औѱ ߸ೞחѱ ইפۄח Ѫਸ ߓਛҊ ࢲߡ ࠺ਊ ࠺ऱחѪب ঌѱ غҊ • ޖࠁب ࣗೠ જ ҳܳ Ҋ 30
Ӓ যڃ ҃ࠁ ч જ ѐߊ ޙചܳ ߓਛणפ • ޙࢲച
೧ࣁਃ. ղ৻৬ ࣗాೞח ߑߨ • Git જ ۑ౭झח Ѣېਃ. ղ৻৬ ࣗాೡ ٸ ఃݶ જ ݒց • ࢜ ۽ંীࢲח Flaskۄח ۨਕܳ ॄࠅөਃ? ࢜۽ بҳܳ ח ߑߨ • ই٣যח working prototypeਵ۽ ࠁৈࣁਃ. ఌ࢚ҕۿਸ ߩযա ਸ ח ߑߨ • Ӓ ٘/ؘఠ ۄࣃझо ޤীਃ? ఋੋ ޛਸ ઓೞח ࣁ 31
ࣁ࢚ਸ ߄Բח Ѫ ա৬ ղ ߸ਸ ߄Բח Ѫ 32
दр ખ Ѧ۰ب ҡଳ ݽܰח Ѥ ߓݶ ػ 33
য়ࣗझо ܳ о٠ ؘఠܳ оҕ೧ࢲ য়ؘఠ۽ ܾܻૉೞח Ѫب
34
ঢ়զ ঠӝ ل: KoNLPy ਃо ࢤदఅ ॆ ۄ࠳۞ܻ 35
ٸח 2014֙ • ઁ ҕ ਗې ఫझی ࢚ҙহҊ ंܳ ࠙ࢳೞח
ؘఠ݃ • Ү ۽ંܳ ࣻ೯ೞ؍ ೠҴয ࠙ࢳਸ ಞܻೞѱ ೞӝ ਤ೧ द • ೠҴয ࠙ࢳਸ ೞ۰ݶ ੌױ షաਸ ೧ঠೞחؘ যڌѱ ೞ? • ഋకࣗ ࠙ࢳਸ ೞݶ ػחؘ য়ࣗझח ա? • ݻ оо ח Ѫ эؘ ࢿמ যڌѱ ܰ? 36
Enter KoNLPy >>> from konlpy.tag import Okt >>> okt =
Okt() >>> okt.pos('݅աࢲ ߈оਕਃ!') [('݅աࢲ', 'Verb'), ('߈оਕਃ', 'Adjective'), ('!', 'Punctuation')] • пઙ য়ࣗझ ഋకࣗ ࠙ࢳӝܳ ݽ • nltk ١ ܲ ۄ࠳۞ܻ৬ э ಞܻೞѱ ॶ ࣻ ѱ ੋఠಕझܳ ాੌ • ೠӖਸ ܙ ٸ ਃೠ пઙ utilਸ ୶оೣ 37
KoNLPy৬ ߣ૩ PyCon KR • द ॆ झथ: " ೠҴীࢲ
ৌܽ!" • ա: "ա৬ ࠺तೠ ࠛಞೣਸ ѻҊ ח ࢎۈ ݆ ঋਸө?" • ٜ݅؍ Ѧ ੜ ಁః೧ࢲ ҕѐ೧ࠁ! 38
39
ࢎۈٜ ৵ KoNLPyܳ ࢎਊ೮ਸө? • ୡࠁ: ࢎਊߨ एਕࢲ? ؊ एݶ
જѷ֎ਃ • ࢤ: ࠁҊ ٮۄೡ दо যࢲ? • पޖ: নೠ ҳഅ р ࢿמ ࠺Үо ಞ೧ࢲ? • ৻Ҵੋ: ޙࢲо য۽ب ॳৈ যࢲ? • ജ҃ ਃੋ: റ ॆ Ҵղ৻ীࢲ ೠହ ੋӝށܳ ೧ࢲ? 40
ղо ਃೠ بҳח ղо ٜ݅য ҕਬೠ Ӓ۞ݶ ࢤпب ޅೠ بਸ
Ҋ ߉ਸ ࣻ ! 41
ٞ೧ী فߣ૩ PyCon KR • ѐੋ زӝ: ҃
ցޖ જও • द ҅ীࢲח representation learning ೧ द • word2vec, doc2vecਸ Ҵղ ॆ ழޭפ౭ী ࣗѐ೧ࠅө? • ֙ী য়ࣗझ۽ KoNLPyܳ ҕѐ೮݅ ݄࢚ షؘఠо ߹۽ হ. ੌױ ೠҴয۽ ػ ؘఠࣇਸ ٜ݅যࠁ! 42
ೠҴয ചಣ ؘఠ nsmc $ head ratings_train.txt id document label
9976970 ই ؊ࡂ.. ૐա֎ਃ ݾܻࣗ 0 3819312 ൙...ನझఠࠁҊ ୡ٬ച....য়ߡোӝઑର оߺ ঋҳա 1 10265843 ցޖӒېࢲࠁחѪਸ୶ୌೠ 0 9045019 Үبࣗ ঠӝҳݢ ..ࣛ ח হ..ಣ ઑ 0 6483659 ࢎހಕӒ झ۠ োӝо ثࠁ؍ ച!झ؊ݔীࢲ יযࠁӝ݅ ೮؍ ழझ౯ ؍झо ցޖաب ࠁ 1 5403919 ݄ Ѧ݃ ڏ 3ࣁࠗఠ ୡ١Ү 1֙ࢤੋ 8ਊച.ƀƀƀ...߹߈ѐب ইө. 0 7797314 ਗ ӟхਸ ઁ۽ ۰ղޅ೮. 0 9443947 ߹ ߈ѐب ইӰ աৡ ҃ ӡਊ োӝࢤഝݻ֙ੋ..݈ ߊ۽೧ب ӒѪࠁױ իѶ ժ.хӘ݅߈ࠂ߈ࠂ.. 0 7156791 ঘ࣌ হחؘب ח ݻউغח ച 1 • য ചಣਸ ӛ ژח ࠗਵ۽ ࠙ܨೞח ؘఠࣇ • Maas et al. 2011 IMDB ؘఠࣇ(য)ਸ ߮݃ 43
ղо о ӝࣿ ױೞ ঋইب ழޭפ౭ী ӝৈೡ ࣻ ־ҳա
ޖٚ! 44
݄݃ਵ۽ ਃ્ ೠҴয য়ؘఠ ঠӝ 45
ٜযоӝী খࢲ ԙ ೞҊ र؍ ঠӝ 46
ঌ߳ যо ইפ! Alphabet NLPо ইפۄ English NLP 47
ೠӖ ೠҴযо ইפ! Hangul NLPо ইפۄ Korean NLP ೠӖ ޙ,
ೠҴযо য 48
ೠҴয য়ؘఠ അട: Sequence classification / labeling ҕѐदӝ ܴ ݾ
ӝ ۄࣃझ 2007 ࣁઙಌझ ಿࢎకӦ 838k sentences CC BY-NC-ND 4.0 2012 KOSAC хࢿ࠙ܨ (хࢿয ࢎب ߓನ) 7.7k sentences Custom 2015 nsmc хࢿ ࠙ܨ 200k sentences (Train: 150k, Test: 50k) CC0 (Public Domain) 2016 KoreanNERCorpus NER 3.5k sentences - 2018 nlp-challenge NER 90k sentences - 2018 nlp-challenge SRL 35k sentences - 2018 Question pair Paraphrase detection 7k sentence pairs - ۄࣃझоউഃחۄࣃझоݺदغযঋѢաઁоޅ҃ੑפ 49
ೠҴয য়ؘఠ അട: Sequence generation ҕѐदӝ ܴ ݾ ӝ ۄࣃझ
2015 JPO patent corpus ӝ҅ߣ ja-ko, ౠೲ بݫੋ 257k sentence pairs Custom (࢚স ਊ ࠛо) 2017 Korean parallel corpora ӝ҅ߣ ko-{en, fr, ja} п যह߹ 0.7k, 95k, 0.2k sentence pairs CC BY-NC-ND 3.0 (࢚স ࢎਊ ࠛо) 2018 KSS ࢿ ࢿ/ੋध 12.9k pairs CC BY-NC-SA 4.0 (࢚স ਊ ࠛо) 2018 Chatbot data ച 12k pairs - 2018 OpenSubtitles2018 ӝ҅ߣ ko-* 1.3m sentence pairs (For ko-en only) - 2019 AIHub ೠҴয-য ߣ ߽۳ ݈ޡ ӝ҅ߣ ko-en 16k sentence pairs (To be 1.6m within 2019) - ۄࣃझоউഃחۄࣃझоݺदغযঋѢաઁоޅ҃ੑפ "*)VC҃ۄࣃझоݺदغযঋ݅ ࣗӝস ߮ӝস झఋস ѐੋѐߊ োҳ١ఋѶਬੋࠗࢎস۽CC0ژחCC BYঋਸө୶ஏ೧ࠇפ ژ ઁоӝ҅ߣীҙबযࢲܲ࠙ঠী࠺೧࢚ਵ۽؊աৌೡࣻחѪэؘ֬જؘఠࣇݶBXFTPNFLPSFBOOMQэܻझীҕਬ೧ࣁਃ 50
ೠҴয য়ؘఠ അട: Others ҕѐदӝ ܴ ݾ ӝ ۄࣃझ 2018
KorQUAD MRC 66k Q/A pairs - 2019? AIHub ӝ҅ة೧ MRC 450k Q/A pairs - 2019 cc-kedict ೠ ࢎ 13.7k entries CC BY-SA 3.0 2019 kosentences Self-supervised learning 31m sentences MIT + GNU Free Documentation + CC BY-NC-SA ۄࣃझоউഃחۄࣃझоݺदغযঋѢաઁоޅ҃ੑפ ݻоחઁоਊ೧ࠁޅ೮णפ 51
ࢤпࠁ ݆! ! ؊ ݆ਵݶ જѷ݅... 52
જ ࢎ۹ 1: KorQUAD https://korquad.github.io/ • ࠙ೠ ন ؘఠ ҕѐ
• बয ܻ؊ࠁ٘ө ҕਬ! 53
જ ࢎ۹ 2: KSS https://kaggle.com/bryanpark/korean-single-speaker-speech-... • Ҵղ ୭ୡ ࢿ য়ؘఠ
• ઁߨ ݆ ࠙ • ۄࣃझо ڢ۶ೞѱ ݺदغয যࢲ ޖਸ ೡ ࣻ Ҋ হחо ݺഛೣ 54
৵ য়ؘఠо ਃೠоਃ? 1. ߮݃о ؼ ࣻ ӝ ٸޙ! •
ML ݽ؛ٜ ؘఠ৬ о эইঠ ݽ؛ р ࢲ۽ ࠺Үо оמ • ࠺Үо оמ೧ݶ ӝࣿ ߊ ৡ 2. ־ҳա ߄۽ ࠙ࢳա ݽ؛݂ਸ दೡ ࣻ ӝ ٸޙ! • ۽Ӓې߁ীࢲ reinventing the wheel ઙઙ ҃҅ ࢚ غ٠ • ݽفо ؘఠ ஂٙҗ ઁܳ ೡ ਃח হਗ਼ইਃ 55
ؘఠܳ ҕѐೡ ٸ ഛੋೞݶ જ 1. ࢎਊо ߄۽ ۽٘೧ࢲ
ࢎਊೡ ࣻ חо? • оәݶ ഥਗоੑ, ਊز ର হ ߄۽ ࢎਊೡ ࣻ ѱ ೣ 2. ਗޙী ѐੋࠁ/ӂ ޙઁח হחо? • য়ؘఠח ӝࣿ ߊਸ ਤ೧ ݒ ਃೞ݅ ѐੋࠁ৬ ӂب ઓ߉Ҋ ெઉঠ ೣ! 3. оәݶ ۄࣃझܳ ԙ ݺद೧ࣁਃ! • ਊ ߹ب ޙহ оמೠ Ѫ, ࠛоמೠ Ѫਸ ঌܻӝ ਤೣ 56
য়ؘఠܳ ࢎਊೡ ٸ ഛੋೞݶ જ 1. ؘఠо ࠙ о?
࠙ ח Ѿҗܳ յ ࣻ ח ب۽ ࠙ೠо? 2. ۄࣃझо ޖੋо? ߓನо оמೠо? ࢚সਊਵ۽ ਊ೧ب غחо? 57
Call for partipation • ೠҴয NLP э ߊदெਃ! • য়ࣗझ۽ب,
য়ؘఠ۽ب ӝৈೡ ࣻ णפ • য়ࣗझח ॆযب જ݅ ԙ ॆ ইפযب ؾפ • য়ؘఠח ࠗ, ӝস, Ү ݽف ೣԋ೧ਃ ޛۿ ѐੋ ѐߊب • ೠҴয NLP ݈Ҋ ܲ ߀ח ࠙ঠب ݆ ӝৈೞҊ ࢲ۽ ҕਬ೧ਃ! 58
хࢎפ @echojuliett https://lucypark.kr ठۄ٘ܳࠁҊӈೠೖ٘ߔਸन!MPWJU !TIVSBJOשԋхࢎ٘݀פ 59