Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scene Text Detection and Recognition: The Deep ...
Search
Yustoris
March 29, 2019
Research
0
1.1k
Scene Text Detection and Recognition: The Deep Learning Era
Yustoris
March 29, 2019
Tweet
Share
More Decks by Yustoris
See All by Yustoris
Introduction to PyTorch Lightning
yustoris
0
470
Introduction to Cuneiform Texts
yustoris
0
250
Other Decks in Research
See All in Research
Towards a More Efficient Reasoning LLM: AIMO2 Solution Summary and Introduction to Fast-Math Models
analokmaus
2
230
言語モデルによるAI創薬の進展 / Advancements in AI-Driven Drug Discovery Using Language Models
tsurubee
2
370
EarthSynth: Generating Informative Earth Observation with Diffusion Models
satai
3
100
Trust No Bot? Forging Confidence in AI for Software Engineering
tomzimmermann
1
240
ノンパラメトリック分布表現を用いた位置尤度場周辺化によるRTK-GNSSの整数アンビギュイティ推定
aoki_nosse
0
320
SSII2025 [SS1] レンズレスカメラ
ssii
PRO
2
960
SkySense : A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
satai
3
250
Sosiaalisen median katsaus 03/2025 + tekoäly
hponka
0
1.3k
(NULLCON Goa 2025)Windows Keylogger Detection: Targeting Past and Present Keylogging Techniques
asuna_jp
1
530
大規模な2値整数計画問題に対する 効率的な重み付き局所探索法
mickey_kubo
1
250
A multimodal data fusion model for accurate and interpretable urban land use mapping with uncertainty analysis
satai
3
220
時系列データに対する解釈可能な 決定木クラスタリング
mickey_kubo
2
710
Featured
See All Featured
Side Projects
sachag
455
42k
Producing Creativity
orderedlist
PRO
346
40k
How STYLIGHT went responsive
nonsquared
100
5.6k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
Building a Modern Day E-commerce SEO Strategy
aleyda
42
7.4k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
46
9.6k
Visualization
eitanlees
146
16k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
800
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
107
19k
Transcript
Scene Text Detection and Recognition: The Deep Learning Era 4IBOHCBOH-POH
9JO)F $POH:BP !ZVTUPSJTPOBS9JW5JNFT
֓ཁ w ܠจࣈೝࣝ 4DFOF5FYU3FDPHOJUJPO ʹ͓͚Δ ਂֶशϕʔεͷख๏ʹର͢ΔαʔϕΠ w ྺ࢙ΛৼΓฦΓͭͭख๏ͷτϨϯυ͔Βσʔληοτ·Ͱɺ แׅతʹѻ͍ͬͯΔ
1. Introduction + 2. Methodology Before the Deep Learning Era
w ଟ༷ੑ ݴޠɾܗ ࣈମɾࣈܗɾॻܗ ɾํɾ৭ɾॎԣൺ͕ଟ༷ w എܠͷଘࡏ എܠͷܗঢ়͕จࣈͱۃʹࣅ͍ͯΔ߹ɺѱӨڹ͕େ͖͍ w ը࣭ͷӨڹ
ը࣭͕ѱ͍ͱจࣈ෦ͷ௵ΕᕷΈ͕େ͖͘ͳΓɺѱӨڹ͕େ͖͍ ܠจࣈೝࣝͷ͠͞ <>IUUQTXXXNPSJTBXBDPKQDVMUVSFEJDUJPOBSZΑΓൈਮ <>
ਂֶशҎલͷܠจࣈೝࣝ w ಛྔநग़ ˠจࣈ୯ҐͰͷநग़ ˠߦݕग़ ˠࣈ w ༷ʑͳϞσϧΛΈ߹ΘͤͨQJQMJOF จ'JH
3. Methodology in the Deep Learning Era
ख๏ͷτϨϯυ w 4UFQT ݕग़ %FUFDUJPO ೝࣝ 3FDPHOJUJPO ͷஈ֊ w
%FUFDUJPOʜจࣈྖҬͷநग़ w 3FDPHOJUJPOʜநग़ͨ͠จࣈྖҬʹؚ·ΕΔ༰ͷࣈ 5SBOTDSJQUJPO w &OEUPFOE %FUFDUJPOͱ3FDPHOJUJPOΛҰؾ௨؏Ͱߦ͏ จ'JH
ख๏ͷτϨϯυछผ จ'JH
ख๏ͷτϨϯυछผ จ'JH %FUFDUJPO Ұൠମݕग़ͷख๏Λجຊͱ͠ɺ จࣈྖҬʹ͋Γ͕ͪͳಛ FHํɾΞεϖΫτͷଟ༷ੑ ʹ߹Θ֦ͤͯு
ख๏ͷτϨϯυछผ จ'JH 3FDPHOJUJPO $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO $5$ ͱ"UUFOUJPOͷڧ
ख๏ͷτϨϯυछผ จ'JH &OEUP&OE %FUFDUJPOͱ3FDPHOJUJPOͷ྆ϞσϧΛ݁߹
ख๏ͷτϨϯυछผ จ'JH पลٕज़ "VYJMJBSZ5FDIOPMPHJFT ͷϝΠϯ w ਓσʔλͷੜ w จࣈɾ୯ޠྖҬͷΞϊςʔγϣϯͷڭࢣ͋Γֶश
3.1 Detection
֓ཁ w Ұൠମݕग़༻ͷϞσϧΛ֦ு͢Δͷ͕جຊ େ͖͘"ODIPSCBTFEͱ3FHJPOQSPQPTBMʹྨͰ͖Δ w ݕग़ཻେ͖͘ύλʔϯ ςΩετશମΛ#PVOEJOH#PY ## Ͱݕग़
ΑΓࡉ͔͍୯ҐͰ ୯ޠͳͲͰ ݕग़͠ɺޙͰ݁߹ 4FH-JOL<4J >จͷը૾͔Β ൈਮɾҰ෦Ճ
ྖҬݕग़ͷجຊํ w "ODIPSCBTFE w ೖྗը૾Λݻఆͷ(SJEʹׂ͠ɺ֤(SJEதͷΛத৺ͱ͢Δ## "ODIPS Λෳਪఆ ##ީิݻఆΞεϖΫτΛ࠾༻ w
:0-0<3FENPO > 44%<-JV > ͳͲ͕ϕʔεϞσϧ w 3FHJPOQSPQPTBM w ೖྗը૾ʹରͯ͠ɺಛྔͳͲ͔ΒจࣈྖҬީิ 3FHJPOQSPQPTBM Λਪఆ͠ɺ ͦΕͧΕͷީิʹରͯ͠จࣈྖҬ͔Ͳ͏͔Λఆ w 3$//<(JSTIJDL > ͳͲ͕ϕʔεϞσϧ χϡʔϥϧωοτϫʔΫͰྖҬݕग़ˠޙॲཧ
"ODIPSCBTFE (SJE #PVOEJOH#PY ## ͜͜Ͱͭ ޙஈʹߦ͘΄Ͳ(SJEׂ͕ݮΓɺ "ODIPS͕େ͖͘ͳΔ ##ݕग़ཻΛௐ QPPMJOHͰ##ใΛಘΔ
ଛࣦɺਪఆ##ͱਖ਼ղ##ͱͷҐஔޡࠩͱΫϥε֬৴ͷࠩ ྫ5FYU#PYFT<-JP > 44%ϕʔε :0-0จͷը૾͔Β ൈਮɾҰ෦Ճ
3FHJPO1SPQPTBM 'BTUFS3$//ʹՃ͑ͯɺ3FHJPOQSPQPTBMநग़ͷࡍɺ3FHJPOͷճసΛߟྀ͍ͯ͠Δ 3FHJPOQSPQPTBMΛநग़ ྫ<.B > 'BTUFS3$//ϕʔε എܠ͔จࣈྖҬ͔ͷྨ
5FYUTQFDJpD.FUIPET w ςΩετશମΛճͰݕग़ͤͣɺখ୯ҐͰݕग़ͨ͠ޙʹ݁߹ w จࣈྖҬҰൠମΑΓํͳͲ͕༷ʑͳͨΊɺ ͭͷ##Λ͍͖ͳΓݕग़͢Δͷෆదͳ߹͕͋Δ w ୯ҐจࣈྖҬͷখ෦ $PNQPOFOUT ͱϐΫηϧ
1JYFM ͕͋Δ
$PNQPOFOUT-FWFM 4FH-JOLจ'JHVSF ྫ4FH-JOL
1JYFM-FWFM 1JYFM-JOLจ'JHVSF 1JYFM-JOLจ'JHVSF ྫ1JYFM-JOL<%FOH > w ֤ϐΫηϧͰɺྡ͢ΔͭͷϐΫηϧ͕ ಉ͡จࣈྖҬʹଐ͢Δ͔Λఆ w ࣄલͷ##ਪఆ͕͍Βͣɺۙ͢ΔจࣈྖҬऔΓ͍͢
4QFDJpD5BSHFUT w ൘ͳͲʹ͋Γ͕ͪͳɺۃͳΞεϖΫτൺɾΈɾۂɾಛघϑΥϯτ ͷରԠ͕ϝΠϯ w ྫ͑ɺจࣈͷۂʹରͯ͠5FYU4OBLF<-POH > ͕##୯ҐͰͳ͘ԁΛ ϕʔεͱͨ͠ྖҬநग़ΛࢼΈ͍ͯΔ
3.2 Recognition
֓ཁ w %FUFDUJPOͰநग़ͨ͠จࣈྖҬʹରͯ͠ࣈΛߦ͏ w 3//ϕʔεͷख๏͕΄ͱΜͲͰɺͦͷதͰ $5$ $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO ͱ"UUFOUJPO͕ ଟ͘ར༻͞Ε͍ͯΔ
$5$ <(SBWFT > w @ ۭന ΛؚΊͨจࣈ୯ҐͰͷੜ֬ΛٻΊΔͨΊͷଛࣦؔ w ೖྗͱग़ྗͷBMJHONFOUಉ࣌ʹߦ͑ΔͨΊɺ ೖྗͱग़ྗͷҧ͍Λߟ͑ͳͯ͘Α͍
HHHH_eell_lloo_ Hello ೖྗ ग़ྗ
$3// <4IJ > w ಛϕΫτϧΛೖྗͱͨ͠ CJ-45. $5$ͰࣈΛߦ͏ w 3$//ͱ໊લ͕ࠞಉͦ͠͏ʜʜ $5$Λར༻
ಛϕΫτϧΛ-45.ͷ લஈͰநग़ ʨ
"UUFOUJPO w ػց༁ʹ͓͚Δ"UUFOUJPO<#BIEBOBV -VPOH > Λԉ༻ w ೖྗը૾ʹରͯ͠લஈͰΈࠐΈͳͲʹΑΓ %FDPEFSͷೖྗͱͳΔಛϕΫτϧΛநग़͓ͯ͘͠
<"SCJUSBSJMZPSJFOUFEUFYUSFDPHOJUJPO $IFOH > w %FDPEFSͷิॿೖྗͱͯ͠ɺจࣈ୯Ґͷ##Λ ༩͑ΔͳͲͷ͕औΒΕΔ߹͋Δ <'PDVTJOHBUUFOUJPO5PXBSETBDDVSBUFUFYUSFDPHOJUJPOJOOBUVSBMJNBHFT $IFOH > ೖྗͷಛϕΫτϧ จ'JH
3.3 End-to-end System
֓ཁ w %FUFDUJPOͱ3FDPHOJUJPOͷϞσϧΛͦͷ··݁߹͢Δ %FUFDUJPOϞσϧͰݕग़ͨ͠จࣈྖҬ͕3FDPHOJUJPOϞσϧͷೖྗͱͳΔ w 3FDPHOJUJPOʹಛϚοϓ͚ͩ͢Α͏ʹ͢Δ จ'JH 4&&<#BSU[
>ͳͲ จ'JH
3.4 Auxiliary Technologies
"VYJMJBSZ5FDIOPMPHJFT w ਓσʔλͷੜ 4ZOUIFUJD%BUB w ΄ͱΜͲͷਓखͰΞϊςʔγϣϯ͞Εͨσʔλͷنઍఔ w എܠը૾ʹରͯ͠ɺΑΓࣗવʹจࣈྖҬΛॏͶΔ͜ͱΛඪͱ͢Δ w
ϒʔτετϥοϐϯά #PPUTUSBQQJOH w ڭࢣ͋ΓֶशʹΑΔΞϊςʔγϣϯίετͷܰݮ w গྔͷΞϊςʔγϣϯʹΑΓֶशͨ͠ϞσϧͰྖҬநग़ ˠείΞͰΓˠநग़ͨ͠ྖҬΛڭࢣͱͯ͠࠶ֶशˠʜɹͷ܁Γฦ͠
4.1 Benchmark Datasets
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
Performance on Dataset (Detection)
Performance on Dataset (Recognition) &SSBUBʹΑΔͱͱΒ͍͠
Performance on Dataset (End-to-End) w8PSE4QPUUJOH ରͱͳΔޠኮͷࣈੑೳ w&OEUP&OE ରޠኮҎ֎ͷจશମͷࣈੑೳ
6. Conclusion
4UBUVT2VPBOE'VUVSF5SFOET w σʔληοτϞσϧͷଟ༷ੑʹର͢Δؤ݈ੑ w ۂ͕ͬͨ DVSWFE จࣈͳͲɺಛघͳέʔεΛؚΉσʔληοτগͳ͍ w ϞσϧσʔληοτͷΈʹ࠷దԽͨ͠ධՁ͕ଟ͍ w
ଟݴޠରԠ ϞσϧσʔληοτෳݴޠΛಉ࣌ʹѻ͏͜ͱΛఆ͍ͯ͠ͳ͍ w ߴԽ ਓ͕ؒͻͱݟͯจࣈΛೝࣝͰ͖Δͷʹରͯ͠ɺ·ͩ·͍ͩ '14తʹఔ্͕ݶ