Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scene Text Detection and Recognition: The Deep ...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yustoris
March 29, 2019
Research
0
1.1k
Scene Text Detection and Recognition: The Deep Learning Era
Yustoris
March 29, 2019
Tweet
Share
More Decks by Yustoris
See All by Yustoris
Introduction to PyTorch Lightning
yustoris
0
500
Introduction to Cuneiform Texts
yustoris
0
250
Other Decks in Research
See All in Research
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
490
SREのためのテレメトリー技術の探究 / Telemetry for SRE
yuukit
13
3k
病院向け生成AIプロダクト開発の実践と課題
hagino3000
0
530
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
590
Proposal of an Information Delivery Method for Electronic Paper Signage Using Human Mobility as the Communication Medium / ICCE-Asia 2025
yumulab
0
170
社内データ分析AIエージェントを できるだけ使いやすくする工夫
fufufukakaka
1
900
Combining Deep Learning and Street View Imagery to Map Smallholder Crop Types
satai
3
570
学習型データ構造:機械学習を内包する新しいデータ構造の設計と解析
matsui_528
6
3.1k
ブレグマン距離最小化に基づくリース表現量推定:バイアス除去学習の統一理論
masakat0
0
140
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
330
湯村研究室の紹介2025 / yumulab2025
yumulab
0
300
Agentic AI フレームワーク戦略白書 (2025年度版)
mickey_kubo
1
120
Featured
See All Featured
Crafting Experiences
bethany
1
49
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
54
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
0
1.1k
Test your architecture with Archunit
thirion
1
2.2k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
67
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
730
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.2k
AI: The stuff that nobody shows you
jnunemaker
PRO
2
260
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
250
Transcript
Scene Text Detection and Recognition: The Deep Learning Era 4IBOHCBOH-POH
9JO)F $POH:BP !ZVTUPSJTPOBS9JW5JNFT
֓ཁ w ܠจࣈೝࣝ 4DFOF5FYU3FDPHOJUJPO ʹ͓͚Δ ਂֶशϕʔεͷख๏ʹର͢ΔαʔϕΠ w ྺ࢙ΛৼΓฦΓͭͭख๏ͷτϨϯυ͔Βσʔληοτ·Ͱɺ แׅతʹѻ͍ͬͯΔ
1. Introduction + 2. Methodology Before the Deep Learning Era
w ଟ༷ੑ ݴޠɾܗ ࣈମɾࣈܗɾॻܗ ɾํɾ৭ɾॎԣൺ͕ଟ༷ w എܠͷଘࡏ എܠͷܗঢ়͕จࣈͱۃʹࣅ͍ͯΔ߹ɺѱӨڹ͕େ͖͍ w ը࣭ͷӨڹ
ը࣭͕ѱ͍ͱจࣈ෦ͷ௵ΕᕷΈ͕େ͖͘ͳΓɺѱӨڹ͕େ͖͍ ܠจࣈೝࣝͷ͠͞ <>IUUQTXXXNPSJTBXBDPKQDVMUVSFEJDUJPOBSZΑΓൈਮ <>
ਂֶशҎલͷܠจࣈೝࣝ w ಛྔநग़ ˠจࣈ୯ҐͰͷநग़ ˠߦݕग़ ˠࣈ w ༷ʑͳϞσϧΛΈ߹ΘͤͨQJQMJOF จ'JH
3. Methodology in the Deep Learning Era
ख๏ͷτϨϯυ w 4UFQT ݕग़ %FUFDUJPO ೝࣝ 3FDPHOJUJPO ͷஈ֊ w
%FUFDUJPOʜจࣈྖҬͷநग़ w 3FDPHOJUJPOʜநग़ͨ͠จࣈྖҬʹؚ·ΕΔ༰ͷࣈ 5SBOTDSJQUJPO w &OEUPFOE %FUFDUJPOͱ3FDPHOJUJPOΛҰؾ௨؏Ͱߦ͏ จ'JH
ख๏ͷτϨϯυछผ จ'JH
ख๏ͷτϨϯυछผ จ'JH %FUFDUJPO Ұൠମݕग़ͷख๏Λجຊͱ͠ɺ จࣈྖҬʹ͋Γ͕ͪͳಛ FHํɾΞεϖΫτͷଟ༷ੑ ʹ߹Θ֦ͤͯு
ख๏ͷτϨϯυछผ จ'JH 3FDPHOJUJPO $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO $5$ ͱ"UUFOUJPOͷڧ
ख๏ͷτϨϯυछผ จ'JH &OEUP&OE %FUFDUJPOͱ3FDPHOJUJPOͷ྆ϞσϧΛ݁߹
ख๏ͷτϨϯυछผ จ'JH पลٕज़ "VYJMJBSZ5FDIOPMPHJFT ͷϝΠϯ w ਓσʔλͷੜ w จࣈɾ୯ޠྖҬͷΞϊςʔγϣϯͷڭࢣ͋Γֶश
3.1 Detection
֓ཁ w Ұൠମݕग़༻ͷϞσϧΛ֦ு͢Δͷ͕جຊ େ͖͘"ODIPSCBTFEͱ3FHJPOQSPQPTBMʹྨͰ͖Δ w ݕग़ཻେ͖͘ύλʔϯ ςΩετશମΛ#PVOEJOH#PY ## Ͱݕग़
ΑΓࡉ͔͍୯ҐͰ ୯ޠͳͲͰ ݕग़͠ɺޙͰ݁߹ 4FH-JOL<4J >จͷը૾͔Β ൈਮɾҰ෦Ճ
ྖҬݕग़ͷجຊํ w "ODIPSCBTFE w ೖྗը૾Λݻఆͷ(SJEʹׂ͠ɺ֤(SJEதͷΛத৺ͱ͢Δ## "ODIPS Λෳਪఆ ##ީิݻఆΞεϖΫτΛ࠾༻ w
:0-0<3FENPO > 44%<-JV > ͳͲ͕ϕʔεϞσϧ w 3FHJPOQSPQPTBM w ೖྗը૾ʹରͯ͠ɺಛྔͳͲ͔ΒจࣈྖҬީิ 3FHJPOQSPQPTBM Λਪఆ͠ɺ ͦΕͧΕͷީิʹରͯ͠จࣈྖҬ͔Ͳ͏͔Λఆ w 3$//<(JSTIJDL > ͳͲ͕ϕʔεϞσϧ χϡʔϥϧωοτϫʔΫͰྖҬݕग़ˠޙॲཧ
"ODIPSCBTFE (SJE #PVOEJOH#PY ## ͜͜Ͱͭ ޙஈʹߦ͘΄Ͳ(SJEׂ͕ݮΓɺ "ODIPS͕େ͖͘ͳΔ ##ݕग़ཻΛௐ QPPMJOHͰ##ใΛಘΔ
ଛࣦɺਪఆ##ͱਖ਼ղ##ͱͷҐஔޡࠩͱΫϥε֬৴ͷࠩ ྫ5FYU#PYFT<-JP > 44%ϕʔε :0-0จͷը૾͔Β ൈਮɾҰ෦Ճ
3FHJPO1SPQPTBM 'BTUFS3$//ʹՃ͑ͯɺ3FHJPOQSPQPTBMநग़ͷࡍɺ3FHJPOͷճసΛߟྀ͍ͯ͠Δ 3FHJPOQSPQPTBMΛநग़ ྫ<.B > 'BTUFS3$//ϕʔε എܠ͔จࣈྖҬ͔ͷྨ
5FYUTQFDJpD.FUIPET w ςΩετશମΛճͰݕग़ͤͣɺখ୯ҐͰݕग़ͨ͠ޙʹ݁߹ w จࣈྖҬҰൠମΑΓํͳͲ͕༷ʑͳͨΊɺ ͭͷ##Λ͍͖ͳΓݕग़͢Δͷෆదͳ߹͕͋Δ w ୯ҐจࣈྖҬͷখ෦ $PNQPOFOUT ͱϐΫηϧ
1JYFM ͕͋Δ
$PNQPOFOUT-FWFM 4FH-JOLจ'JHVSF ྫ4FH-JOL
1JYFM-FWFM 1JYFM-JOLจ'JHVSF 1JYFM-JOLจ'JHVSF ྫ1JYFM-JOL<%FOH > w ֤ϐΫηϧͰɺྡ͢ΔͭͷϐΫηϧ͕ ಉ͡จࣈྖҬʹଐ͢Δ͔Λఆ w ࣄલͷ##ਪఆ͕͍Βͣɺۙ͢ΔจࣈྖҬऔΓ͍͢
4QFDJpD5BSHFUT w ൘ͳͲʹ͋Γ͕ͪͳɺۃͳΞεϖΫτൺɾΈɾۂɾಛघϑΥϯτ ͷରԠ͕ϝΠϯ w ྫ͑ɺจࣈͷۂʹରͯ͠5FYU4OBLF<-POH > ͕##୯ҐͰͳ͘ԁΛ ϕʔεͱͨ͠ྖҬநग़ΛࢼΈ͍ͯΔ
3.2 Recognition
֓ཁ w %FUFDUJPOͰநग़ͨ͠จࣈྖҬʹରͯ͠ࣈΛߦ͏ w 3//ϕʔεͷख๏͕΄ͱΜͲͰɺͦͷதͰ $5$ $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO ͱ"UUFOUJPO͕ ଟ͘ར༻͞Ε͍ͯΔ
$5$ <(SBWFT > w @ ۭന ΛؚΊͨจࣈ୯ҐͰͷੜ֬ΛٻΊΔͨΊͷଛࣦؔ w ೖྗͱग़ྗͷBMJHONFOUಉ࣌ʹߦ͑ΔͨΊɺ ೖྗͱग़ྗͷҧ͍Λߟ͑ͳͯ͘Α͍
HHHH_eell_lloo_ Hello ೖྗ ग़ྗ
$3// <4IJ > w ಛϕΫτϧΛೖྗͱͨ͠ CJ-45. $5$ͰࣈΛߦ͏ w 3$//ͱ໊લ͕ࠞಉͦ͠͏ʜʜ $5$Λར༻
ಛϕΫτϧΛ-45.ͷ લஈͰநग़ ʨ
"UUFOUJPO w ػց༁ʹ͓͚Δ"UUFOUJPO<#BIEBOBV -VPOH > Λԉ༻ w ೖྗը૾ʹରͯ͠લஈͰΈࠐΈͳͲʹΑΓ %FDPEFSͷೖྗͱͳΔಛϕΫτϧΛநग़͓ͯ͘͠
<"SCJUSBSJMZPSJFOUFEUFYUSFDPHOJUJPO $IFOH > w %FDPEFSͷิॿೖྗͱͯ͠ɺจࣈ୯Ґͷ##Λ ༩͑ΔͳͲͷ͕औΒΕΔ߹͋Δ <'PDVTJOHBUUFOUJPO5PXBSETBDDVSBUFUFYUSFDPHOJUJPOJOOBUVSBMJNBHFT $IFOH > ೖྗͷಛϕΫτϧ จ'JH
3.3 End-to-end System
֓ཁ w %FUFDUJPOͱ3FDPHOJUJPOͷϞσϧΛͦͷ··݁߹͢Δ %FUFDUJPOϞσϧͰݕग़ͨ͠จࣈྖҬ͕3FDPHOJUJPOϞσϧͷೖྗͱͳΔ w 3FDPHOJUJPOʹಛϚοϓ͚ͩ͢Α͏ʹ͢Δ จ'JH 4&&<#BSU[
>ͳͲ จ'JH
3.4 Auxiliary Technologies
"VYJMJBSZ5FDIOPMPHJFT w ਓσʔλͷੜ 4ZOUIFUJD%BUB w ΄ͱΜͲͷਓखͰΞϊςʔγϣϯ͞Εͨσʔλͷنઍఔ w എܠը૾ʹରͯ͠ɺΑΓࣗવʹจࣈྖҬΛॏͶΔ͜ͱΛඪͱ͢Δ w
ϒʔτετϥοϐϯά #PPUTUSBQQJOH w ڭࢣ͋ΓֶशʹΑΔΞϊςʔγϣϯίετͷܰݮ w গྔͷΞϊςʔγϣϯʹΑΓֶशͨ͠ϞσϧͰྖҬநग़ ˠείΞͰΓˠநग़ͨ͠ྖҬΛڭࢣͱͯ͠࠶ֶशˠʜɹͷ܁Γฦ͠
4.1 Benchmark Datasets
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
Performance on Dataset (Detection)
Performance on Dataset (Recognition) &SSBUBʹΑΔͱͱΒ͍͠
Performance on Dataset (End-to-End) w8PSE4QPUUJOH ରͱͳΔޠኮͷࣈੑೳ w&OEUP&OE ରޠኮҎ֎ͷจશମͷࣈੑೳ
6. Conclusion
4UBUVT2VPBOE'VUVSF5SFOET w σʔληοτϞσϧͷଟ༷ੑʹର͢Δؤ݈ੑ w ۂ͕ͬͨ DVSWFE จࣈͳͲɺಛघͳέʔεΛؚΉσʔληοτগͳ͍ w ϞσϧσʔληοτͷΈʹ࠷దԽͨ͠ධՁ͕ଟ͍ w
ଟݴޠରԠ ϞσϧσʔληοτෳݴޠΛಉ࣌ʹѻ͏͜ͱΛఆ͍ͯ͠ͳ͍ w ߴԽ ਓ͕ؒͻͱݟͯจࣈΛೝࣝͰ͖Δͷʹରͯ͠ɺ·ͩ·͍ͩ '14తʹఔ্͕ݶ