Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scene Text Detection and Recognition: The Deep ...
Search
Yustoris
March 29, 2019
Research
0
1.1k
Scene Text Detection and Recognition: The Deep Learning Era
Yustoris
March 29, 2019
Tweet
Share
More Decks by Yustoris
See All by Yustoris
Introduction to PyTorch Lightning
yustoris
0
500
Introduction to Cuneiform Texts
yustoris
0
250
Other Decks in Research
See All in Research
学習型データ構造:機械学習を内包する新しいデータ構造の設計と解析
matsui_528
6
3.1k
[チュートリアル] 電波マップ構築入門 :研究動向と課題設定の勘所
k_sato
0
260
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
590
AIスパコン「さくらONE」の オブザーバビリティ / Observability for AI Supercomputer SAKURAONE
yuukit
2
1.2k
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
240
地域丸ごとデイサービス「Go トレ」の紹介
smartfukushilab1
0
920
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
490
ウェブ・ソーシャルメディア論文読み会 第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)
hkefka385
0
160
Akamaiのキャッシュ効率を支えるAdaptSizeについての論文を読んでみた
bootjp
1
440
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
satai
3
500
競合や要望に流されない─B2B SaaSでミニマム要件を決めるリアルな取り組み / Don't be swayed by competitors or requests - A real effort to determine minimum requirements for B2B SaaS
kaminashi
0
740
Featured
See All Featured
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
300
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
Producing Creativity
orderedlist
PRO
348
40k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
83
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
180
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
120
Odyssey Design
rkendrick25
PRO
1
500
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
97
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
67
Evolving SEO for Evolving Search Engines
ryanjones
0
130
A designer walks into a library…
pauljervisheath
210
24k
Transcript
Scene Text Detection and Recognition: The Deep Learning Era 4IBOHCBOH-POH
9JO)F $POH:BP !ZVTUPSJTPOBS9JW5JNFT
֓ཁ w ܠจࣈೝࣝ 4DFOF5FYU3FDPHOJUJPO ʹ͓͚Δ ਂֶशϕʔεͷख๏ʹର͢ΔαʔϕΠ w ྺ࢙ΛৼΓฦΓͭͭख๏ͷτϨϯυ͔Βσʔληοτ·Ͱɺ แׅతʹѻ͍ͬͯΔ
1. Introduction + 2. Methodology Before the Deep Learning Era
w ଟ༷ੑ ݴޠɾܗ ࣈମɾࣈܗɾॻܗ ɾํɾ৭ɾॎԣൺ͕ଟ༷ w എܠͷଘࡏ എܠͷܗঢ়͕จࣈͱۃʹࣅ͍ͯΔ߹ɺѱӨڹ͕େ͖͍ w ը࣭ͷӨڹ
ը࣭͕ѱ͍ͱจࣈ෦ͷ௵ΕᕷΈ͕େ͖͘ͳΓɺѱӨڹ͕େ͖͍ ܠจࣈೝࣝͷ͠͞ <>IUUQTXXXNPSJTBXBDPKQDVMUVSFEJDUJPOBSZΑΓൈਮ <>
ਂֶशҎલͷܠจࣈೝࣝ w ಛྔநग़ ˠจࣈ୯ҐͰͷநग़ ˠߦݕग़ ˠࣈ w ༷ʑͳϞσϧΛΈ߹ΘͤͨQJQMJOF จ'JH
3. Methodology in the Deep Learning Era
ख๏ͷτϨϯυ w 4UFQT ݕग़ %FUFDUJPO ೝࣝ 3FDPHOJUJPO ͷஈ֊ w
%FUFDUJPOʜจࣈྖҬͷநग़ w 3FDPHOJUJPOʜநग़ͨ͠จࣈྖҬʹؚ·ΕΔ༰ͷࣈ 5SBOTDSJQUJPO w &OEUPFOE %FUFDUJPOͱ3FDPHOJUJPOΛҰؾ௨؏Ͱߦ͏ จ'JH
ख๏ͷτϨϯυछผ จ'JH
ख๏ͷτϨϯυछผ จ'JH %FUFDUJPO Ұൠମݕग़ͷख๏Λجຊͱ͠ɺ จࣈྖҬʹ͋Γ͕ͪͳಛ FHํɾΞεϖΫτͷଟ༷ੑ ʹ߹Θ֦ͤͯு
ख๏ͷτϨϯυछผ จ'JH 3FDPHOJUJPO $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO $5$ ͱ"UUFOUJPOͷڧ
ख๏ͷτϨϯυछผ จ'JH &OEUP&OE %FUFDUJPOͱ3FDPHOJUJPOͷ྆ϞσϧΛ݁߹
ख๏ͷτϨϯυछผ จ'JH पลٕज़ "VYJMJBSZ5FDIOPMPHJFT ͷϝΠϯ w ਓσʔλͷੜ w จࣈɾ୯ޠྖҬͷΞϊςʔγϣϯͷڭࢣ͋Γֶश
3.1 Detection
֓ཁ w Ұൠମݕग़༻ͷϞσϧΛ֦ு͢Δͷ͕جຊ େ͖͘"ODIPSCBTFEͱ3FHJPOQSPQPTBMʹྨͰ͖Δ w ݕग़ཻେ͖͘ύλʔϯ ςΩετશମΛ#PVOEJOH#PY ## Ͱݕग़
ΑΓࡉ͔͍୯ҐͰ ୯ޠͳͲͰ ݕग़͠ɺޙͰ݁߹ 4FH-JOL<4J >จͷը૾͔Β ൈਮɾҰ෦Ճ
ྖҬݕग़ͷجຊํ w "ODIPSCBTFE w ೖྗը૾Λݻఆͷ(SJEʹׂ͠ɺ֤(SJEதͷΛத৺ͱ͢Δ## "ODIPS Λෳਪఆ ##ީิݻఆΞεϖΫτΛ࠾༻ w
:0-0<3FENPO > 44%<-JV > ͳͲ͕ϕʔεϞσϧ w 3FHJPOQSPQPTBM w ೖྗը૾ʹରͯ͠ɺಛྔͳͲ͔ΒจࣈྖҬީิ 3FHJPOQSPQPTBM Λਪఆ͠ɺ ͦΕͧΕͷީิʹରͯ͠จࣈྖҬ͔Ͳ͏͔Λఆ w 3$//<(JSTIJDL > ͳͲ͕ϕʔεϞσϧ χϡʔϥϧωοτϫʔΫͰྖҬݕग़ˠޙॲཧ
"ODIPSCBTFE (SJE #PVOEJOH#PY ## ͜͜Ͱͭ ޙஈʹߦ͘΄Ͳ(SJEׂ͕ݮΓɺ "ODIPS͕େ͖͘ͳΔ ##ݕग़ཻΛௐ QPPMJOHͰ##ใΛಘΔ
ଛࣦɺਪఆ##ͱਖ਼ղ##ͱͷҐஔޡࠩͱΫϥε֬৴ͷࠩ ྫ5FYU#PYFT<-JP > 44%ϕʔε :0-0จͷը૾͔Β ൈਮɾҰ෦Ճ
3FHJPO1SPQPTBM 'BTUFS3$//ʹՃ͑ͯɺ3FHJPOQSPQPTBMநग़ͷࡍɺ3FHJPOͷճసΛߟྀ͍ͯ͠Δ 3FHJPOQSPQPTBMΛநग़ ྫ<.B > 'BTUFS3$//ϕʔε എܠ͔จࣈྖҬ͔ͷྨ
5FYUTQFDJpD.FUIPET w ςΩετશମΛճͰݕग़ͤͣɺখ୯ҐͰݕग़ͨ͠ޙʹ݁߹ w จࣈྖҬҰൠମΑΓํͳͲ͕༷ʑͳͨΊɺ ͭͷ##Λ͍͖ͳΓݕग़͢Δͷෆదͳ߹͕͋Δ w ୯ҐจࣈྖҬͷখ෦ $PNQPOFOUT ͱϐΫηϧ
1JYFM ͕͋Δ
$PNQPOFOUT-FWFM 4FH-JOLจ'JHVSF ྫ4FH-JOL
1JYFM-FWFM 1JYFM-JOLจ'JHVSF 1JYFM-JOLจ'JHVSF ྫ1JYFM-JOL<%FOH > w ֤ϐΫηϧͰɺྡ͢ΔͭͷϐΫηϧ͕ ಉ͡จࣈྖҬʹଐ͢Δ͔Λఆ w ࣄલͷ##ਪఆ͕͍Βͣɺۙ͢ΔจࣈྖҬऔΓ͍͢
4QFDJpD5BSHFUT w ൘ͳͲʹ͋Γ͕ͪͳɺۃͳΞεϖΫτൺɾΈɾۂɾಛघϑΥϯτ ͷରԠ͕ϝΠϯ w ྫ͑ɺจࣈͷۂʹରͯ͠5FYU4OBLF<-POH > ͕##୯ҐͰͳ͘ԁΛ ϕʔεͱͨ͠ྖҬநग़ΛࢼΈ͍ͯΔ
3.2 Recognition
֓ཁ w %FUFDUJPOͰநग़ͨ͠จࣈྖҬʹରͯ͠ࣈΛߦ͏ w 3//ϕʔεͷख๏͕΄ͱΜͲͰɺͦͷதͰ $5$ $POOFDUJPOJTU5FNQPSBM$MBTTJpDBUJPO ͱ"UUFOUJPO͕ ଟ͘ར༻͞Ε͍ͯΔ
$5$ <(SBWFT > w @ ۭന ΛؚΊͨจࣈ୯ҐͰͷੜ֬ΛٻΊΔͨΊͷଛࣦؔ w ೖྗͱग़ྗͷBMJHONFOUಉ࣌ʹߦ͑ΔͨΊɺ ೖྗͱग़ྗͷҧ͍Λߟ͑ͳͯ͘Α͍
HHHH_eell_lloo_ Hello ೖྗ ग़ྗ
$3// <4IJ > w ಛϕΫτϧΛೖྗͱͨ͠ CJ-45. $5$ͰࣈΛߦ͏ w 3$//ͱ໊લ͕ࠞಉͦ͠͏ʜʜ $5$Λར༻
ಛϕΫτϧΛ-45.ͷ લஈͰநग़ ʨ
"UUFOUJPO w ػց༁ʹ͓͚Δ"UUFOUJPO<#BIEBOBV -VPOH > Λԉ༻ w ೖྗը૾ʹରͯ͠લஈͰΈࠐΈͳͲʹΑΓ %FDPEFSͷೖྗͱͳΔಛϕΫτϧΛநग़͓ͯ͘͠
<"SCJUSBSJMZPSJFOUFEUFYUSFDPHOJUJPO $IFOH > w %FDPEFSͷิॿೖྗͱͯ͠ɺจࣈ୯Ґͷ##Λ ༩͑ΔͳͲͷ͕औΒΕΔ߹͋Δ <'PDVTJOHBUUFOUJPO5PXBSETBDDVSBUFUFYUSFDPHOJUJPOJOOBUVSBMJNBHFT $IFOH > ೖྗͷಛϕΫτϧ จ'JH
3.3 End-to-end System
֓ཁ w %FUFDUJPOͱ3FDPHOJUJPOͷϞσϧΛͦͷ··݁߹͢Δ %FUFDUJPOϞσϧͰݕग़ͨ͠จࣈྖҬ͕3FDPHOJUJPOϞσϧͷೖྗͱͳΔ w 3FDPHOJUJPOʹಛϚοϓ͚ͩ͢Α͏ʹ͢Δ จ'JH 4&&<#BSU[
>ͳͲ จ'JH
3.4 Auxiliary Technologies
"VYJMJBSZ5FDIOPMPHJFT w ਓσʔλͷੜ 4ZOUIFUJD%BUB w ΄ͱΜͲͷਓखͰΞϊςʔγϣϯ͞Εͨσʔλͷنઍఔ w എܠը૾ʹରͯ͠ɺΑΓࣗવʹจࣈྖҬΛॏͶΔ͜ͱΛඪͱ͢Δ w
ϒʔτετϥοϐϯά #PPUTUSBQQJOH w ڭࢣ͋ΓֶशʹΑΔΞϊςʔγϣϯίετͷܰݮ w গྔͷΞϊςʔγϣϯʹΑΓֶशͨ͠ϞσϧͰྖҬநग़ ˠείΞͰΓˠநग़ͨ͠ྖҬΛڭࢣͱͯ͠࠶ֶशˠʜɹͷ܁Γฦ͠
4.1 Benchmark Datasets
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
#FODINBSL%BUBTFU w 4ZOUIFUJD%BUB w #PPUTUSBQQJOH
Performance on Dataset (Detection)
Performance on Dataset (Recognition) &SSBUBʹΑΔͱͱΒ͍͠
Performance on Dataset (End-to-End) w8PSE4QPUUJOH ରͱͳΔޠኮͷࣈੑೳ w&OEUP&OE ରޠኮҎ֎ͷจશମͷࣈੑೳ
6. Conclusion
4UBUVT2VPBOE'VUVSF5SFOET w σʔληοτϞσϧͷଟ༷ੑʹର͢Δؤ݈ੑ w ۂ͕ͬͨ DVSWFE จࣈͳͲɺಛघͳέʔεΛؚΉσʔληοτগͳ͍ w ϞσϧσʔληοτͷΈʹ࠷దԽͨ͠ධՁ͕ଟ͍ w
ଟݴޠରԠ ϞσϧσʔληοτෳݴޠΛಉ࣌ʹѻ͏͜ͱΛఆ͍ͯ͠ͳ͍ w ߴԽ ਓ͕ؒͻͱݟͯจࣈΛೝࣝͰ͖Δͷʹରͯ͠ɺ·ͩ·͍ͩ '14తʹఔ্͕ݶ