Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scene Text Detection and Recognition: The Deep ...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yustoris
March 29, 2019
Research
0
1.1k
Scene Text Detection and Recognition: The Deep Learning Era
Yustoris
March 29, 2019
Tweet
Share
More Decks by Yustoris
See All by Yustoris
Introduction to PyTorch Lightning
yustoris
0
500
Introduction to Cuneiform Texts
yustoris
0
250
Other Decks in Research
See All in Research
When Learned Data Structures Meet Computer Vision
matsui_528
1
2.8k
Combining Deep Learning and Street View Imagery to Map Smallholder Crop Types
satai
3
570
POI: Proof of Identity
katsyoshi
0
140
学習型データ構造:機械学習を内包する新しいデータ構造の設計と解析
matsui_528
6
3.1k
CoRL2025速報
rpc
4
4.2k
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
590
[IBIS 2025] 深層基盤モデルのための強化学習驚きから理論にもとづく納得へ
akifumi_wachi
19
9.6k
空間音響処理における物理法則に基づく機械学習
skoyamalab
0
190
Attaques quantiques sur Bitcoin : comment se protéger ?
rlifchitz
0
140
ForestCast: Forecasting Deforestation Risk at Scale with Deep Learning
satai
3
390
社内データ分析AIエージェントを できるだけ使いやすくする工夫
fufufukakaka
1
900
ローテーション別のサイドアウト戦略 ~なぜあのローテは回らないのか?~
vball_panda
0
280
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Amusing Abliteration
ianozsvald
0
100
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
66
37k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
220
Faster Mobile Websites
deanohume
310
31k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
70
RailsConf 2023
tenderlove
30
1.3k
Git: the NoSQL Database
bkeepers
PRO
432
66k
The Cult of Friendly URLs
andyhume
79
6.8k
Color Theory Basics | Prateek | Gurzu
gurzu
0
200
Designing for Performance
lara
610
70k
The untapped power of vector embeddings
frankvandijk
1
1.6k
Transcript
Scene Text Detection and Recognition: The Deep Learning Era 4IBOHCBOH-POH
9JO)F $POH:BP !ZVTUPSJTPOBS9JW5JNFT
֓ཁ w ܠจࣈೝࣝ 4DFOF5FYU3FDPHOJUJPO ʹ͓͚Δ ਂֶशϕʔεͷख๏ʹର͢ΔαʔϕΠ w ྺ࢙ΛৼΓฦΓͭͭख๏ͷτϨϯυ͔Βσʔληοτ·Ͱɺ แׅతʹѻ͍ͬͯΔ
1. Introduction + 2. Methodology Before the Deep Learning Era
w ଟ༷ੑ ݴޠɾܗ ࣈମɾࣈܗɾॻܗ ɾํɾ৭ɾॎԣൺ͕ଟ༷ w എܠͷଘࡏ എܠͷܗঢ়͕จࣈͱۃʹࣅ͍ͯΔ߹ɺѱӨڹ͕େ͖͍ w ը࣭ͷӨڹ
ը࣭͕ѱ͍ͱจࣈ෦ͷ௵ΕᕷΈ͕େ͖͘ͳΓɺѱӨڹ͕େ͖͍ ܠจࣈೝࣝͷ͠͞ <>IUUQTXXXNPSJTBXBDPKQDVMUVSFEJDUJPOBSZΑΓൈਮ <>
ਂֶशҎલͷܠจࣈೝࣝ w ಛྔநग़ ˠจࣈ୯ҐͰͷநग़ ˠߦݕग़ ˠࣈ w ༷ʑͳϞσϧΛΈ߹ΘͤͨQJQMJOF จ'JH
3. Methodology in the Deep Learning Era
ख๏ͷτϨϯυ w 4UFQT ݕग़ %FUFDUJPO ೝࣝ 3FDPHOJUJPO ͷஈ֊ w
%FUFDUJPOʜจࣈྖҬͷநग़ w 3FDPHOJUJPOʜநग़ͨ͠จࣈྖҬʹؚ·ΕΔ༰ͷࣈ 5SBOTDSJQUJPO w &OEUPFOE %FUFDUJPOͱ3FDPHOJUJPOΛҰؾ௨؏Ͱߦ͏ จ'JH
ख๏ͷτϨϯυछผ จ'JH
ख๏ͷτϨϯυछผ จ'JH %FUFDUJPO Ұൠମݕग़ͷख๏Λجຊͱ͠ɺ จࣈྖҬʹ͋Γ͕ͪͳಛ FHํɾΞεϖΫτͷଟ༷ੑ ʹ߹Θ֦ͤͯு
ख๏ͷτϨϯυछผ จ'JH 3FDPHOJUJPO $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO $5$ ͱ"UUFOUJPOͷڧ
ख๏ͷτϨϯυछผ จ'JH &OEUP&OE %FUFDUJPOͱ3FDPHOJUJPOͷ྆ϞσϧΛ݁߹
ख๏ͷτϨϯυछผ จ'JH पลٕज़ "VYJMJBSZ5FDIOPMPHJFT ͷϝΠϯ w ਓσʔλͷੜ w จࣈɾ୯ޠྖҬͷΞϊςʔγϣϯͷڭࢣ͋Γֶश
3.1 Detection
֓ཁ w Ұൠମݕग़༻ͷϞσϧΛ֦ு͢Δͷ͕جຊ େ͖͘"ODIPSCBTFEͱ3FHJPOQSPQPTBMʹྨͰ͖Δ w ݕग़ཻେ͖͘ύλʔϯ ςΩετશମΛ#PVOEJOH#PY ## Ͱݕग़
ΑΓࡉ͔͍୯ҐͰ ୯ޠͳͲͰ ݕग़͠ɺޙͰ݁߹ 4FH-JOL<4J >จͷը૾͔Β ൈਮɾҰ෦Ճ
ྖҬݕग़ͷجຊํ w "ODIPSCBTFE w ೖྗը૾Λݻఆͷ(SJEʹׂ͠ɺ֤(SJEதͷΛத৺ͱ͢Δ## "ODIPS Λෳਪఆ ##ީิݻఆΞεϖΫτΛ࠾༻ w
:0-0<3FENPO > 44%<-JV > ͳͲ͕ϕʔεϞσϧ w 3FHJPOQSPQPTBM w ೖྗը૾ʹରͯ͠ɺಛྔͳͲ͔ΒจࣈྖҬީิ 3FHJPOQSPQPTBM Λਪఆ͠ɺ ͦΕͧΕͷީิʹରͯ͠จࣈྖҬ͔Ͳ͏͔Λఆ w 3$//<(JSTIJDL > ͳͲ͕ϕʔεϞσϧ χϡʔϥϧωοτϫʔΫͰྖҬݕग़ˠޙॲཧ
"ODIPSCBTFE (SJE #PVOEJOH#PY ## ͜͜Ͱͭ ޙஈʹߦ͘΄Ͳ(SJEׂ͕ݮΓɺ "ODIPS͕େ͖͘ͳΔ ##ݕग़ཻΛௐ QPPMJOHͰ##ใΛಘΔ
ଛࣦɺਪఆ##ͱਖ਼ղ##ͱͷҐஔޡࠩͱΫϥε֬৴ͷࠩ ྫ5FYU#PYFT<-JP > 44%ϕʔε :0-0จͷը૾͔Β ൈਮɾҰ෦Ճ
3FHJPO1SPQPTBM 'BTUFS3$//ʹՃ͑ͯɺ3FHJPOQSPQPTBMநग़ͷࡍɺ3FHJPOͷճసΛߟྀ͍ͯ͠Δ 3FHJPOQSPQPTBMΛநग़ ྫ<.B > 'BTUFS3$//ϕʔε എܠ͔จࣈྖҬ͔ͷྨ
5FYUTQFDJpD.FUIPET w ςΩετશମΛճͰݕग़ͤͣɺখ୯ҐͰݕग़ͨ͠ޙʹ݁߹ w จࣈྖҬҰൠମΑΓํͳͲ͕༷ʑͳͨΊɺ ͭͷ##Λ͍͖ͳΓݕग़͢Δͷෆదͳ߹͕͋Δ w ୯ҐจࣈྖҬͷখ෦ $PNQPOFOUT ͱϐΫηϧ
1JYFM ͕͋Δ
$PNQPOFOUT-FWFM 4FH-JOLจ'JHVSF ྫ4FH-JOL
1JYFM-FWFM 1JYFM-JOLจ'JHVSF 1JYFM-JOLจ'JHVSF ྫ1JYFM-JOL<%FOH > w ֤ϐΫηϧͰɺྡ͢ΔͭͷϐΫηϧ͕ ಉ͡จࣈྖҬʹଐ͢Δ͔Λఆ w ࣄલͷ##ਪఆ͕͍Βͣɺۙ͢ΔจࣈྖҬऔΓ͍͢
4QFDJpD5BSHFUT w ൘ͳͲʹ͋Γ͕ͪͳɺۃͳΞεϖΫτൺɾΈɾۂɾಛघϑΥϯτ ͷରԠ͕ϝΠϯ w ྫ͑ɺจࣈͷۂʹରͯ͠5FYU4OBLF<-POH > ͕##୯ҐͰͳ͘ԁΛ ϕʔεͱͨ͠ྖҬநग़ΛࢼΈ͍ͯΔ
3.2 Recognition
֓ཁ w %FUFDUJPOͰநग़ͨ͠จࣈྖҬʹରͯ͠ࣈΛߦ͏ w 3//ϕʔεͷख๏͕΄ͱΜͲͰɺͦͷதͰ $5$ $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO ͱ"UUFOUJPO͕ ଟ͘ར༻͞Ε͍ͯΔ
$5$ <(SBWFT > w @ ۭന ΛؚΊͨจࣈ୯ҐͰͷੜ֬ΛٻΊΔͨΊͷଛࣦؔ w ೖྗͱग़ྗͷBMJHONFOUಉ࣌ʹߦ͑ΔͨΊɺ ೖྗͱग़ྗͷҧ͍Λߟ͑ͳͯ͘Α͍
HHHH_eell_lloo_ Hello ೖྗ ग़ྗ
$3// <4IJ > w ಛϕΫτϧΛೖྗͱͨ͠ CJ-45. $5$ͰࣈΛߦ͏ w 3$//ͱ໊લ͕ࠞಉͦ͠͏ʜʜ $5$Λར༻
ಛϕΫτϧΛ-45.ͷ લஈͰநग़ ʨ
"UUFOUJPO w ػց༁ʹ͓͚Δ"UUFOUJPO<#BIEBOBV -VPOH > Λԉ༻ w ೖྗը૾ʹରͯ͠લஈͰΈࠐΈͳͲʹΑΓ %FDPEFSͷೖྗͱͳΔಛϕΫτϧΛநग़͓ͯ͘͠
<"SCJUSBSJMZPSJFOUFEUFYUSFDPHOJUJPO $IFOH > w %FDPEFSͷิॿೖྗͱͯ͠ɺจࣈ୯Ґͷ##Λ ༩͑ΔͳͲͷ͕औΒΕΔ߹͋Δ <'PDVTJOHBUUFOUJPO5PXBSETBDDVSBUFUFYUSFDPHOJUJPOJOOBUVSBMJNBHFT $IFOH > ೖྗͷಛϕΫτϧ จ'JH
3.3 End-to-end System
֓ཁ w %FUFDUJPOͱ3FDPHOJUJPOͷϞσϧΛͦͷ··݁߹͢Δ %FUFDUJPOϞσϧͰݕग़ͨ͠จࣈྖҬ͕3FDPHOJUJPOϞσϧͷೖྗͱͳΔ w 3FDPHOJUJPOʹಛϚοϓ͚ͩ͢Α͏ʹ͢Δ จ'JH 4&&<#BSU[
>ͳͲ จ'JH
3.4 Auxiliary Technologies
"VYJMJBSZ5FDIOPMPHJFT w ਓσʔλͷੜ 4ZOUIFUJD%BUB w ΄ͱΜͲͷਓखͰΞϊςʔγϣϯ͞Εͨσʔλͷنઍఔ w എܠը૾ʹରͯ͠ɺΑΓࣗવʹจࣈྖҬΛॏͶΔ͜ͱΛඪͱ͢Δ w
ϒʔτετϥοϐϯά #PPUTUSBQQJOH w ڭࢣ͋ΓֶशʹΑΔΞϊςʔγϣϯίετͷܰݮ w গྔͷΞϊςʔγϣϯʹΑΓֶशͨ͠ϞσϧͰྖҬநग़ ˠείΞͰΓˠநग़ͨ͠ྖҬΛڭࢣͱͯ͠࠶ֶशˠʜɹͷ܁Γฦ͠
4.1 Benchmark Datasets
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
Performance on Dataset (Detection)
Performance on Dataset (Recognition) &SSBUBʹΑΔͱͱΒ͍͠
Performance on Dataset (End-to-End) w8PSE4QPUUJOH ରͱͳΔޠኮͷࣈੑೳ w&OEUP&OE ରޠኮҎ֎ͷจશମͷࣈੑೳ
6. Conclusion
4UBUVT2VPBOE'VUVSF5SFOET w σʔληοτϞσϧͷଟ༷ੑʹର͢Δؤ݈ੑ w ۂ͕ͬͨ DVSWFE จࣈͳͲɺಛघͳέʔεΛؚΉσʔληοτগͳ͍ w ϞσϧσʔληοτͷΈʹ࠷దԽͨ͠ධՁ͕ଟ͍ w
ଟݴޠରԠ ϞσϧσʔληοτෳݴޠΛಉ࣌ʹѻ͏͜ͱΛఆ͍ͯ͠ͳ͍ w ߴԽ ਓ͕ؒͻͱݟͯจࣈΛೝࣝͰ͖Δͷʹରͯ͠ɺ·ͩ·͍ͩ '14తʹఔ্͕ݶ