NAVER Clova OCR

2019 DevDay NAVER Clova OCR > Hwalsuk Lee > NAVER
OCR Team AI Researcher

OCR (Optical Character Recognition) What is OCR?

Text Detection OCR = Text Detection + Text Recognition Optical
Character Recognition Text Recognition SAKE BAR Text Area UTF-8 Image CVPR 2019 ICCV 2019

Isn’t OCR a conquered technology?

Subtitle 30pt / Arial / Normal Rapid Improvements with Deep
Learning Japanese vertical Very small Rotated

Learning Detect simple horizontal texts Japanese vertical Very small Rotated

Learning Clova OCR Detect small, vertical, multilingual texts Japanese vertical Very small Rotated

Agenda 1. Things to Know When Applying Recent OCR Approaches
to Japanese 3. Clova OCR Text Recognizer ICCV 2019 2. Clova OCR Text Detector CVPR 2019 4. Full Pipelines for Clova OCR

Things to Know When Applying Recent OCR Approaches to Japanese

Number of Deep Learning Papers for OCR

Deep Learning (DL) vs.. Not DL lDL lNot DL Detection
Recognition

Most Papers Primarily Deal with English!

KEY Difference 1: No Space Between Words

KEY Difference 2: Vertical Writing

KEY Difference 3: Number of Characters English: 26 Japanese: Over
3K

Character Region Awareness for Text Detection, CVPR 2019 Text Detection
for Japanese

How to Deal with Long Sentences? Existing papers are based
on word / line unit detection Word/Line Level We detect them character by character and combine them! Character Level

Introduction | Why is it So Difficult? Extreme aspect ratio
Variety of character sizes Shape distortion à Caused by line/word detection!

Introduction | Why is it So Difficult? Extreme aspect ratio
Variety of character sizes Shape distortion à Caused by line/word detection! How about character-level detection?

Approach | Character-level Detection Then, what are the necessary annotations?
Character-level ground truth example Too expensive!

Approach | Character-level Detection Weakly Supervised Training! Hakan Bilen “Weakly
supervised object detection”, CVPR 2018 Workshop. Weak Supervision Strong Supervision

CRAFT | Definition of Model Outputs Character Region Awareness for
Text detection Image (h×w×3) Region score (h/2×w/2×1) Affinity score (h/2×w/2×1) The probability that the given pixel is the center of the character The center probability of the space between adjacent characters

CRAFT | Definition of Model Outputs Character Region Awareness for
Text detection Image (h×w×3) Region score (h/2×w/2×1) Affinity score (h/2×w/2×1) The probability that the given pixel is the center of the character The center probability of the space between adjacent characters to find individual character areas to locate line/word level areas

CRAFT | Definition of Model Outputs Ground Truth Label Generation
Region Score GT Character Boxes When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape

Region Score GT Character Boxes Affinity Box Generation Center of a character box Center of a triangle Character box Affinity box When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape Generate affinity box from adjacent two character-boxes

Region Score GT Character Boxes Affinity Box Generation Center of a character box Center of a triangle Character box Affinity box Affinity Score GT Affinity Boxes When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape Generate affinity box from adjacent two character-boxes

CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image
Loss Loss Train with Real Image Train with Synthetic Image Synthetic GT ? Word/line level annotations Character level annotation

Cropped Splitting Characters Loss Loss Generate Pseudo-GT Train with Real Image Train with Synthetic Image (6/6) (5/7) (5/6) Synthetic GT Word/line level annotations Character level annotation ? Region scores

Cropped Splitting Characters Loss Loss Confidence map Generate Pseudo-GT Train with Real Image Train with Synthetic Image (6/6) (5/7) (5/6) Synthetic GT Pseudo GT Objective function: Word/line level annotations Character level annotation Region scores

CRAFT | Training Ground Truth Label Generation

CRAFT | Training Character Region Scores Changes as Learning Progresses
Epoch #1 Epoch #2 Epoch #3 Epoch #4 Epoch #10 Charbox Wordbox . . . . . .

CRAFT | Post-processing Rectangle Bounding Box : Character region :
Affinity region Map Binarization Merging & Labeling

CRAFT | Post-processing Rectangle Bounding Box ü Simple yet efficient!
: Character region : Affinity region : Minimum bounding box Map Binarization Merging & Labeling Minimum Bounding Box

Results | TotalText Dataset

Analysis | Improved Recognition Performance Correction of Curved Text

Summary

What is Wrong with Scene Text Recognition Model Comparisons? Dataset
and Model Analysis, ICCV 2019, Oral Text Recognition for Japanese

How to Handle 100X Larger Characters? English: 26 Japanese: Over
3K

“UNITED” Image Patch with Text What is Scene Text Recognition
(STR)? Character String

What are Wrong with STR Comparisons?

What is Wrong with STR Comparisons? 1. Different training datasets.

2. Different evaluation datasets. ‘’ means ‘not reported in paper’

2. Different evaluation datasets. 3. Speed and memory are not always evaluated.

Our Solution: Unified STR Evaluation 1. Unified training data. 2.
Unified evaluation data. 3. Time and memory evaluation. 1. 2. 3.

Which Module Contributes to the Performance? Transformation Feature Extraction Sequence
Modeling Prediction

2 3 2 2 x x x = 24

49 Accuracy vs. Time Accuracy vs. Speed

Analysis: Trade-off Plot The best accuracy model is not an
existing combination. (TPS+ResNet+BiLSTM+Attn)

Analysis: The Best Accuracy Model

Summary

Full Pipelines for Clova OCR

Full Pipeline Text Detection Compensate rotation for each box Boxes
Angles

Full Pipeline Text Detection Compensate rotation for each box Boxes
Angles Text Recognition One Model for Horizontal/Vertical in JPN/KOR/ENG Welcome to JAPAN 잘 오셨습니다

https://ocrdemo.linebrain.ai

Thank You for Listening

NAVER Clova OCR

NAVER Clova OCR

More Decks by LINE DevDay 2019

Other Decks in Technology

Featured

Transcript