Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scene Text Detection and Recognition: The Deep ...
Search
Yustoris
March 29, 2019
Research
1.1k
0
Share
Scene Text Detection and Recognition: The Deep Learning Era
Yustoris
March 29, 2019
More Decks by Yustoris
See All by Yustoris
Introduction to PyTorch Lightning
yustoris
0
500
Introduction to Cuneiform Texts
yustoris
0
260
Other Decks in Research
See All in Research
Dual Quadric表現を用いた動的物体追跡とRGB-D・IMU制約の密結合によるオドメトリ推定
nanoshimarobot
0
300
Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
satai
3
700
ScoreMatchingRiesz for Automatic Debiased Machine Learning and Policy Path Estimation with an Application to Japanese Monetary Policy Evaluation
masakat0
0
200
【NICOGRAPH2025】Photographic Conviviality: ボディペイント・ワークショップによる 同時的かつ共生的な写真体験
toremolo72
0
210
[チュートリアル] 電波マップ構築入門 :研究動向と課題設定の勘所
k_sato
0
360
Ankylosing Spondylitis
ankh2054
0
150
「車1割削減、渋滞半減、公共交通2倍」を 熊本から岡山へ@RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
1
850
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
280
製造業主導型経済からサービス経済化における中間層形成メカニズムのパラダイムシフト
yamotty
0
550
生成的情報検索時代におけるAI利用と認知バイアス
trycycle
PRO
0
420
2025-11-21-DA-10th-satellite
yegusa
0
140
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
satai
3
680
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
37
7.2k
Done Done
chrislema
186
16k
My Coaching Mixtape
mlcsv
0
90
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.4k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
310
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
0
250
First, design no harm
axbom
PRO
2
1.2k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.4k
Skip the Path - Find Your Career Trail
mkilby
1
93
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
Paper Plane
katiecoart
PRO
0
48k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.2k
Transcript
Scene Text Detection and Recognition: The Deep Learning Era 4IBOHCBOH-POH
9JO)F $POH:BP !ZVTUPSJTPOBS9JW5JNFT
֓ཁ w ܠจࣈೝࣝ 4DFOF5FYU3FDPHOJUJPO ʹ͓͚Δ ਂֶशϕʔεͷख๏ʹର͢ΔαʔϕΠ w ྺ࢙ΛৼΓฦΓͭͭख๏ͷτϨϯυ͔Βσʔληοτ·Ͱɺ แׅతʹѻ͍ͬͯΔ
1. Introduction + 2. Methodology Before the Deep Learning Era
w ଟ༷ੑ ݴޠɾܗ ࣈମɾࣈܗɾॻܗ ɾํɾ৭ɾॎԣൺ͕ଟ༷ w എܠͷଘࡏ എܠͷܗঢ়͕จࣈͱۃʹࣅ͍ͯΔ߹ɺѱӨڹ͕େ͖͍ w ը࣭ͷӨڹ
ը࣭͕ѱ͍ͱจࣈ෦ͷ௵ΕᕷΈ͕େ͖͘ͳΓɺѱӨڹ͕େ͖͍ ܠจࣈೝࣝͷ͠͞ <>IUUQTXXXNPSJTBXBDPKQDVMUVSFEJDUJPOBSZΑΓൈਮ <>
ਂֶशҎલͷܠจࣈೝࣝ w ಛྔநग़ ˠจࣈ୯ҐͰͷநग़ ˠߦݕग़ ˠࣈ w ༷ʑͳϞσϧΛΈ߹ΘͤͨQJQMJOF จ'JH
3. Methodology in the Deep Learning Era
ख๏ͷτϨϯυ w 4UFQT ݕग़ %FUFDUJPO ೝࣝ 3FDPHOJUJPO ͷஈ֊ w
%FUFDUJPOʜจࣈྖҬͷநग़ w 3FDPHOJUJPOʜநग़ͨ͠จࣈྖҬʹؚ·ΕΔ༰ͷࣈ 5SBOTDSJQUJPO w &OEUPFOE %FUFDUJPOͱ3FDPHOJUJPOΛҰؾ௨؏Ͱߦ͏ จ'JH
ख๏ͷτϨϯυछผ จ'JH
ख๏ͷτϨϯυछผ จ'JH %FUFDUJPO Ұൠମݕग़ͷख๏Λجຊͱ͠ɺ จࣈྖҬʹ͋Γ͕ͪͳಛ FHํɾΞεϖΫτͷଟ༷ੑ ʹ߹Θ֦ͤͯு
ख๏ͷτϨϯυछผ จ'JH 3FDPHOJUJPO $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO $5$ ͱ"UUFOUJPOͷڧ
ख๏ͷτϨϯυछผ จ'JH &OEUP&OE %FUFDUJPOͱ3FDPHOJUJPOͷ྆ϞσϧΛ݁߹
ख๏ͷτϨϯυछผ จ'JH पลٕज़ "VYJMJBSZ5FDIOPMPHJFT ͷϝΠϯ w ਓσʔλͷੜ w จࣈɾ୯ޠྖҬͷΞϊςʔγϣϯͷڭࢣ͋Γֶश
3.1 Detection
֓ཁ w Ұൠମݕग़༻ͷϞσϧΛ֦ு͢Δͷ͕جຊ େ͖͘"ODIPSCBTFEͱ3FHJPOQSPQPTBMʹྨͰ͖Δ w ݕग़ཻେ͖͘ύλʔϯ ςΩετશମΛ#PVOEJOH#PY ## Ͱݕग़
ΑΓࡉ͔͍୯ҐͰ ୯ޠͳͲͰ ݕग़͠ɺޙͰ݁߹ 4FH-JOL<4J >จͷը૾͔Β ൈਮɾҰ෦Ճ
ྖҬݕग़ͷجຊํ w "ODIPSCBTFE w ೖྗը૾Λݻఆͷ(SJEʹׂ͠ɺ֤(SJEதͷΛத৺ͱ͢Δ## "ODIPS Λෳਪఆ ##ީิݻఆΞεϖΫτΛ࠾༻ w
:0-0<3FENPO > 44%<-JV > ͳͲ͕ϕʔεϞσϧ w 3FHJPOQSPQPTBM w ೖྗը૾ʹରͯ͠ɺಛྔͳͲ͔ΒจࣈྖҬީิ 3FHJPOQSPQPTBM Λਪఆ͠ɺ ͦΕͧΕͷީิʹରͯ͠จࣈྖҬ͔Ͳ͏͔Λఆ w 3$//<(JSTIJDL > ͳͲ͕ϕʔεϞσϧ χϡʔϥϧωοτϫʔΫͰྖҬݕग़ˠޙॲཧ
"ODIPSCBTFE (SJE #PVOEJOH#PY ## ͜͜Ͱͭ ޙஈʹߦ͘΄Ͳ(SJEׂ͕ݮΓɺ "ODIPS͕େ͖͘ͳΔ ##ݕग़ཻΛௐ QPPMJOHͰ##ใΛಘΔ
ଛࣦɺਪఆ##ͱਖ਼ղ##ͱͷҐஔޡࠩͱΫϥε֬৴ͷࠩ ྫ5FYU#PYFT<-JP > 44%ϕʔε :0-0จͷը૾͔Β ൈਮɾҰ෦Ճ
3FHJPO1SPQPTBM 'BTUFS3$//ʹՃ͑ͯɺ3FHJPOQSPQPTBMநग़ͷࡍɺ3FHJPOͷճసΛߟྀ͍ͯ͠Δ 3FHJPOQSPQPTBMΛநग़ ྫ<.B > 'BTUFS3$//ϕʔε എܠ͔จࣈྖҬ͔ͷྨ
5FYUTQFDJpD.FUIPET w ςΩετશମΛճͰݕग़ͤͣɺখ୯ҐͰݕग़ͨ͠ޙʹ݁߹ w จࣈྖҬҰൠମΑΓํͳͲ͕༷ʑͳͨΊɺ ͭͷ##Λ͍͖ͳΓݕग़͢Δͷෆదͳ߹͕͋Δ w ୯ҐจࣈྖҬͷখ෦ $PNQPOFOUT ͱϐΫηϧ
1JYFM ͕͋Δ
$PNQPOFOUT-FWFM 4FH-JOLจ'JHVSF ྫ4FH-JOL
1JYFM-FWFM 1JYFM-JOLจ'JHVSF 1JYFM-JOLจ'JHVSF ྫ1JYFM-JOL<%FOH > w ֤ϐΫηϧͰɺྡ͢ΔͭͷϐΫηϧ͕ ಉ͡จࣈྖҬʹଐ͢Δ͔Λఆ w ࣄલͷ##ਪఆ͕͍Βͣɺۙ͢ΔจࣈྖҬऔΓ͍͢
4QFDJpD5BSHFUT w ൘ͳͲʹ͋Γ͕ͪͳɺۃͳΞεϖΫτൺɾΈɾۂɾಛघϑΥϯτ ͷରԠ͕ϝΠϯ w ྫ͑ɺจࣈͷۂʹରͯ͠5FYU4OBLF<-POH > ͕##୯ҐͰͳ͘ԁΛ ϕʔεͱͨ͠ྖҬநग़ΛࢼΈ͍ͯΔ
3.2 Recognition
֓ཁ w %FUFDUJPOͰநग़ͨ͠จࣈྖҬʹରͯ͠ࣈΛߦ͏ w 3//ϕʔεͷख๏͕΄ͱΜͲͰɺͦͷதͰ $5$ $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO ͱ"UUFOUJPO͕ ଟ͘ར༻͞Ε͍ͯΔ
$5$ <(SBWFT > w @ ۭന ΛؚΊͨจࣈ୯ҐͰͷੜ֬ΛٻΊΔͨΊͷଛࣦؔ w ೖྗͱग़ྗͷBMJHONFOUಉ࣌ʹߦ͑ΔͨΊɺ ೖྗͱग़ྗͷҧ͍Λߟ͑ͳͯ͘Α͍
HHHH_eell_lloo_ Hello ೖྗ ग़ྗ
$3// <4IJ > w ಛϕΫτϧΛೖྗͱͨ͠ CJ-45. $5$ͰࣈΛߦ͏ w 3$//ͱ໊લ͕ࠞಉͦ͠͏ʜʜ $5$Λར༻
ಛϕΫτϧΛ-45.ͷ લஈͰநग़ ʨ
"UUFOUJPO w ػց༁ʹ͓͚Δ"UUFOUJPO<#BIEBOBV -VPOH > Λԉ༻ w ೖྗը૾ʹରͯ͠લஈͰΈࠐΈͳͲʹΑΓ %FDPEFSͷೖྗͱͳΔಛϕΫτϧΛநग़͓ͯ͘͠
<"SCJUSBSJMZPSJFOUFEUFYUSFDPHOJUJPO $IFOH > w %FDPEFSͷิॿೖྗͱͯ͠ɺจࣈ୯Ґͷ##Λ ༩͑ΔͳͲͷ͕औΒΕΔ߹͋Δ <'PDVTJOHBUUFOUJPO5PXBSETBDDVSBUFUFYUSFDPHOJUJPOJOOBUVSBMJNBHFT $IFOH > ೖྗͷಛϕΫτϧ จ'JH
3.3 End-to-end System
֓ཁ w %FUFDUJPOͱ3FDPHOJUJPOͷϞσϧΛͦͷ··݁߹͢Δ %FUFDUJPOϞσϧͰݕग़ͨ͠จࣈྖҬ͕3FDPHOJUJPOϞσϧͷೖྗͱͳΔ w 3FDPHOJUJPOʹಛϚοϓ͚ͩ͢Α͏ʹ͢Δ จ'JH 4&&<#BSU[
>ͳͲ จ'JH
3.4 Auxiliary Technologies
"VYJMJBSZ5FDIOPMPHJFT w ਓσʔλͷੜ 4ZOUIFUJD%BUB w ΄ͱΜͲͷਓखͰΞϊςʔγϣϯ͞Εͨσʔλͷنઍఔ w എܠը૾ʹରͯ͠ɺΑΓࣗવʹจࣈྖҬΛॏͶΔ͜ͱΛඪͱ͢Δ w
ϒʔτετϥοϐϯά #PPUTUSBQQJOH w ڭࢣ͋ΓֶशʹΑΔΞϊςʔγϣϯίετͷܰݮ w গྔͷΞϊςʔγϣϯʹΑΓֶशͨ͠ϞσϧͰྖҬநग़ ˠείΞͰΓˠநग़ͨ͠ྖҬΛڭࢣͱͯ͠࠶ֶशˠʜɹͷ܁Γฦ͠
4.1 Benchmark Datasets
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
Performance on Dataset (Detection)
Performance on Dataset (Recognition) &SSBUBʹΑΔͱͱΒ͍͠
Performance on Dataset (End-to-End) w8PSE4QPUUJOH ରͱͳΔޠኮͷࣈੑೳ w&OEUP&OE ରޠኮҎ֎ͷจશମͷࣈੑೳ
6. Conclusion
4UBUVT2VPBOE'VUVSF5SFOET w σʔληοτϞσϧͷଟ༷ੑʹର͢Δؤ݈ੑ w ۂ͕ͬͨ DVSWFE จࣈͳͲɺಛघͳέʔεΛؚΉσʔληοτগͳ͍ w ϞσϧσʔληοτͷΈʹ࠷దԽͨ͠ධՁ͕ଟ͍ w
ଟݴޠରԠ ϞσϧσʔληοτෳݴޠΛಉ࣌ʹѻ͏͜ͱΛఆ͍ͯ͠ͳ͍ w ߴԽ ਓ͕ؒͻͱݟͯจࣈΛೝࣝͰ͖Δͷʹରͯ͠ɺ·ͩ·͍ͩ '14తʹఔ্͕ݶ